ImageConverter
classkeras_hub.layers.ImageConverter(
image_size=None,
scale=None,
offset=None,
crop_to_aspect_ratio=True,
interpolation="bilinear",
data_format=None,
**kwargs
)
Preprocess raw images into model ready inputs.
This class converts from raw images to model ready inputs. This conversion proceeds in the following steps:
image_size
. If image_size
is None
, this
step will be skipped.scale
, which can be either global
or per channel. If scale
is None
, this step will be skipped.offset
, which can be either global
or per channel. If offset
is None
, this step will be skipped.The layer will take as input a raw image tensor in the channels last or channels first format, and output a preprocessed image input for modeling. This tensor can be batched (rank 4), or unbatched (rank 3).
This layer can be used with the from_preset()
constructor to load a layer
that will rescale and resize an image for a specific pretrained model.
Using the layer this way allows writing preprocessing code that does not
need updating when switching between model checkpoints.
Arguments
(int, int)
tuple or None
. The output size of the image,
not including the channels axis. If None
, the input will not be
resized.None
. The scale to apply to the
inputs. If scale
is a single float, the entire input will be
multiplied by scale
. If scale
is a tuple, it's assumed to
contain per-channel scale value multiplied against each channel of
the input images. If scale
is None
, no scaling is applied.None
. The offset to apply to the
inputs. If offset
is a single float, the entire input will be
summed with offset
. If offset
is a tuple, it's assumed to
contain per-channel offset value summed against each channel of the
input images. If offset
is None
, no scaling is applied.True
, resize the images without aspect
ratio distortion. When the original aspect ratio differs
from the target aspect ratio, the output image will be
cropped so as to return the
largest possible window in the image (of size (height, width)
)
that matches the target aspect ratio. By default
(crop_to_aspect_ratio=False
), aspect ratio may not be preserved."bilinear"
, "nearest"
, "bicubic"
,
"lanczos3"
, "lanczos5"
. Defaults to "bilinear"
."channels_last"
or "channels_first"
.
The ordering of the dimensions in the inputs. "channels_last"
corresponds to inputs with shape (batch, height, width, channels)
while "channels_first"
corresponds to inputs with shape
(batch, channels, height, width)
. It defaults to the
image_data_format
value found in your Keras config file at
~/.keras/keras.json
. If you never set it, then it will be
"channels_last"
.Examples
# Resize raw images and scale them to [0, 1].
converter = keras_hub.layers.ImageConverter(
image_size=(128, 128),
scale=1. / 255,
)
converter(np.random.randint(0, 256, size=(2, 512, 512, 3)))
# Resize images to the specific size needed for a PaliGemma preset.
converter = keras_hub.layers.ImageConverter.from_preset(
"pali_gemma_3b_224"
)
converter(np.random.randint(0, 256, size=(2, 512, 512, 3)))
from_preset
methodImageConverter.from_preset(preset, **kwargs)
Instantiate a keras_hub.layers.ImageConverter
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as
one of:
'pali_gemma_3b_224'
'kaggle://user/paligemma/keras/pali_gemma_3b_224'
'hf://user/pali_gemma_3b_224'
'./pali_gemma_3b_224'
You can run cls.presets.keys()
to list all built-in presets available
on the class.
Arguments
True
, the weights will be loaded into the
model architecture. If False
, the weights will be randomly
initialized.Examples
batch = np.random.randint(0, 256, size=(2, 512, 512, 3))
# Resize images for `"pali_gemma_3b_224"`.
converter = keras_hub.layers.ImageConverter.from_preset(
"pali_gemma_3b_224"
)
converter(batch) # # Output shape (2, 224, 224, 3)
# Resize images for `"pali_gemma_3b_448"` without cropping.
converter = keras_hub.layers.ImageConverter.from_preset(
"pali_gemma_3b_448",
crop_to_aspect_ratio=False,
)
converter(batch) # # Output shape (2, 448, 448, 3)
Preset | Parameters | Description |
---|---|---|
clip_vit_base_patch16 | 149.62M | 150 million parameter, 12-layer for vision and 12-layer for text, patch size of 16, CLIP model. |
clip_vit_base_patch32 | 151.28M | 151 million parameter, 12-layer for vision and 12-layer for text, patch size of 32, CLIP model. |
clip_vit_b_32_laion2b_s34b_b79k | 151.28M | 151 million parameter, 12-layer for vision and 12-layer for text, patch size of 32, Open CLIP model. |
clip_vit_large_patch14 | 427.62M | 428 million parameter, 24-layer for vision and 12-layer for text, patch size of 14, CLIP model. |
clip_vit_large_patch14_336 | 427.94M | 428 million parameter, 24-layer for vision and 12-layer for text, patch size of 14, image size of 336, CLIP model. |
clip_vit_h_14_laion2b_s32b_b79k | 986.11M | 986 million parameter, 32-layer for vision and 24-layer for text, patch size of 14, Open CLIP model. |
clip_vit_g_14_laion2b_s12b_b42k | 1.37B | 1.4 billion parameter, 40-layer for vision and 24-layer for text, patch size of 14, Open CLIP model. |
clip_vit_bigg_14_laion2b_39b_b160k | 2.54B | 2.5 billion parameter, 48-layer for vision and 32-layer for text, patch size of 14, Open CLIP model. |
deeplab_v3_plus_resnet50_pascalvoc | 39.19M | DeepLabV3+ model with ResNet50 as image encoder and trained on augmented Pascal VOC dataset by Semantic Boundaries Dataset(SBD)which is having categorical accuracy of 90.01 and 0.63 Mean IoU. |
densenet_121_imagenet | 7.04M | 121-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
densenet_169_imagenet | 12.64M | 169-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
densenet_201_imagenet | 18.32M | 201-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
efficientnet_lite0_ra_imagenet | 4.65M | EfficientNet-Lite model fine-trained on the ImageNet 1k dataset with RandAugment recipe. |
efficientnet_b0_ra_imagenet | 5.29M | EfficientNet B0 model pre-trained on the ImageNet 1k dataset with RandAugment recipe. |
efficientnet_b0_ra4_e3600_r224_imagenet | 5.29M | EfficientNet B0 model pre-trained on the ImageNet 1k dataset by Ross Wightman. Trained with timm scripts using hyper-parameters inspired by the MobileNet-V4 small, mixed with go-to hparams from timm and "ResNet Strikes Back". |
efficientnet_es_ra_imagenet | 5.44M | EfficientNet-EdgeTPU Small model trained on the ImageNet 1k dataset with RandAugment recipe. |
efficientnet_em_ra2_imagenet | 6.90M | EfficientNet-EdgeTPU Medium model trained on the ImageNet 1k dataset with RandAugment2 recipe. |
efficientnet_b1_ft_imagenet | 7.79M | EfficientNet B1 model fine-tuned on the ImageNet 1k dataset. |
efficientnet_b1_ra4_e3600_r240_imagenet | 7.79M | EfficientNet B1 model pre-trained on the ImageNet 1k dataset by Ross Wightman. Trained with timm scripts using hyper-parameters inspired by the MobileNet-V4 small, mixed with go-to hparams from timm and "ResNet Strikes Back". |
efficientnet_b2_ra_imagenet | 9.11M | EfficientNet B2 model pre-trained on the ImageNet 1k dataset with RandAugment recipe. |
efficientnet_el_ra_imagenet | 10.59M | EfficientNet-EdgeTPU Large model trained on the ImageNet 1k dataset with RandAugment recipe. |
efficientnet_b3_ra2_imagenet | 12.23M | EfficientNet B3 model pre-trained on the ImageNet 1k dataset with RandAugment2 recipe. |
efficientnet_b4_ra2_imagenet | 19.34M | EfficientNet B4 model pre-trained on the ImageNet 1k dataset with RandAugment2 recipe. |
efficientnet_b5_sw_imagenet | 30.39M | EfficientNet B5 model pre-trained on the ImageNet 12k dataset by Ross Wightman. Based on Swin Transformer train / pretrain recipe with modifications (related to both DeiT and ConvNeXt recipes). |
efficientnet_b5_sw_ft_imagenet | 30.39M | EfficientNet B5 model pre-trained on the ImageNet 12k dataset and fine-tuned on ImageNet-1k by Ross Wightman. Based on Swin Transformer train / pretrain recipe with modifications (related to both DeiT and ConvNeXt recipes). |
mit_b0_ade20k_512 | 3.32M | MiT (MixTransformer) model with 8 transformer blocks. |
mit_b0_cityscapes_1024 | 3.32M | MiT (MixTransformer) model with 8 transformer blocks. |
mit_b1_ade20k_512 | 13.16M | MiT (MixTransformer) model with 8 transformer blocks. |
mit_b1_cityscapes_1024 | 13.16M | MiT (MixTransformer) model with 8 transformer blocks. |
mit_b2_ade20k_512 | 24.20M | MiT (MixTransformer) model with 16 transformer blocks. |
mit_b2_cityscapes_1024 | 24.20M | MiT (MixTransformer) model with 16 transformer blocks. |
mit_b3_ade20k_512 | 44.08M | MiT (MixTransformer) model with 28 transformer blocks. |
mit_b3_cityscapes_1024 | 44.08M | MiT (MixTransformer) model with 28 transformer blocks. |
mit_b4_ade20k_512 | 60.85M | MiT (MixTransformer) model with 41 transformer blocks. |
mit_b4_cityscapes_1024 | 60.85M | MiT (MixTransformer) model with 41 transformer blocks. |
mit_b5_ade20k_640 | 81.45M | MiT (MixTransformer) model with 52 transformer blocks. |
mit_b5_cityscapes_1024 | 81.45M | MiT (MixTransformer) model with 52 transformer blocks. |
pali_gemma_3b_mix_224 | 2.92B | image size 224, mix fine tuned, text sequence length is 256 |
pali_gemma_3b_224 | 2.92B | image size 224, pre trained, text sequence length is 128 |
pali_gemma_3b_mix_448 | 2.92B | image size 448, mix fine tuned, text sequence length is 512 |
pali_gemma_3b_448 | 2.92B | image size 448, pre trained, text sequence length is 512 |
pali_gemma_3b_896 | 2.93B | image size 896, pre trained, text sequence length is 512 |
pali_gemma2_pt_3b_224 | 3.03B | 3 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets. |
pali_gemma2_3b_ft_docci_448 | 3.03B | 3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details. |
pali_gemma2_pt_3b_448 | 3.03B | 3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets. |
pali_gemma2_pt_3b_896 | 3.04B | 3 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets. |
pali_gemma2_pt_10b_224 | 9.66B | 10 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets. |
pali_gemma2_pt_28b_224 | 9.66B | 28 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets. |
pali_gemma2_10b_ft_docci_448 | 9.66B | 10 billion parameter, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details. |
pali_gemma2_pt_10b_448 | 9.66B | 10 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets. |
pali_gemma2_pt_28b_448 | 9.66B | 28 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets. |
pali_gemma2_pt_10b_896 | 9.67B | 10 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets. |
pali_gemma2_pt_28b_896 | 9.67B | 28 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets. |
resnet_18_imagenet | 11.19M | 18-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_18_imagenet | 11.72M | 18-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_34_imagenet | 21.84M | 34-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_50_imagenet | 23.56M | 50-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_v2_50_imagenet | 23.56M | 50-layer ResNetV2 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_50_imagenet | 25.63M | 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_50_ssld_imagenet | 25.63M | 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation. |
resnet_vd_50_ssld_v2_imagenet | 25.63M | 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation and AutoAugment. |
resnet_vd_50_ssld_v2_fix_imagenet | 25.63M | 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation, AutoAugment and additional fine-tuning of the classification head. |
resnet_101_imagenet | 42.61M | 101-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_v2_101_imagenet | 42.61M | 101-layer ResNetV2 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_101_imagenet | 44.67M | 101-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_101_ssld_imagenet | 44.67M | 101-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation. |
resnet_152_imagenet | 58.30M | 152-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_152_imagenet | 60.36M | 152-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_200_imagenet | 74.93M | 200-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
retinanet_resnet50_fpn_coco | 34.12M | RetinaNet model with ResNet50 backbone fine-tuned on COCO in 800x800 resolution. |
sam_base_sa1b | 93.74M | The base SAM model trained on the SA1B dataset. |
sam_huge_sa1b | 312.34M | The huge SAM model trained on the SA1B dataset. |
sam_large_sa1b | 641.09M | The large SAM model trained on the SA1B dataset. |
vgg_11_imagenet | 9.22M | 11-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
vgg_13_imagenet | 9.40M | 13-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
vgg_16_imagenet | 14.71M | 16-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
vgg_19_imagenet | 20.02M | 19-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |