ImageConverter
classkeras_hub.layers.ImageConverter(
image_size=None,
scale=None,
offset=None,
crop_to_aspect_ratio=True,
interpolation="bilinear",
data_format=None,
**kwargs
)
Preprocess raw images into model ready inputs.
This class converts from raw images to model ready inputs. This conversion proceeds in the following steps:
image_size
. If image_size
is None
, this
step will be skipped.scale
, which can be either global
or per channel. If scale
is None
, this step will be skipped.offset
, which can be either global
or per channel. If offset
is None
, this step will be skipped.The layer will take as input a raw image tensor in the channels last or channels first format, and output a preprocessed image input for modeling. This tensor can be batched (rank 4), or unbatched (rank 3).
This layer can be used with the from_preset()
constructor to load a layer
that will rescale and resize an image for a specific pretrained model.
Using the layer this way allows writing preprocessing code that does not
need updating when switching between model checkpoints.
Arguments
(int, int)
tuple or None
. The output size of the image,
not including the channels axis. If None
, the input will not be
resized.None
. The scale to apply to the
inputs. If scale
is a single float, the entire input will be
multiplied by scale
. If scale
is a tuple, it's assumed to
contain per-channel scale value multiplied against each channel of
the input images. If scale
is None
, no scaling is applied.None
. The offset to apply to the
inputs. If offset
is a single float, the entire input will be
summed with offset
. If offset
is a tuple, it's assumed to
contain per-channel offset value summed against each channel of the
input images. If offset
is None
, no scaling is applied.True
, resize the images without aspect
ratio distortion. When the original aspect ratio differs
from the target aspect ratio, the output image will be
cropped so as to return the
largest possible window in the image (of size (height, width)
)
that matches the target aspect ratio. By default
(crop_to_aspect_ratio=False
), aspect ratio may not be preserved."bilinear"
, "nearest"
, "bicubic"
,
"lanczos3"
, "lanczos5"
. Defaults to "bilinear"
."channels_last"
or "channels_first"
.
The ordering of the dimensions in the inputs. "channels_last"
corresponds to inputs with shape (batch, height, width, channels)
while "channels_first"
corresponds to inputs with shape
(batch, channels, height, width)
. It defaults to the
image_data_format
value found in your Keras config file at
~/.keras/keras.json
. If you never set it, then it will be
"channels_last"
.Examples
# Resize raw images and scale them to [0, 1].
converter = keras_hub.layers.ImageConverter(
image_size=(128, 128),
scale=1. / 255,
)
converter(np.random.randint(0, 256, size=(2, 512, 512, 3)))
# Resize images to the specific size needed for a PaliGemma preset.
converter = keras_hub.layers.ImageConverter.from_preset(
"pali_gemma_3b_224"
)
converter(np.random.randint(0, 256, size=(2, 512, 512, 3)))
from_preset
methodImageConverter.from_preset(preset, **kwargs)
Instantiate a keras_hub.layers.ImageConverter
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as
one of:
'pali_gemma_3b_224'
'kaggle://user/paligemma/keras/pali_gemma_3b_224'
'hf://user/pali_gemma_3b_224'
'./pali_gemma_3b_224'
You can run cls.presets.keys()
to list all built-in presets available
on the class.
Arguments
True
, the weights will be loaded into the
model architecture. If False
, the weights will be randomly
initialized.Examples
batch = np.random.randint(0, 256, size=(2, 512, 512, 3))
# Resize images for `"pali_gemma_3b_224"`.
converter = keras_hub.layers.ImageConverter.from_preset(
"pali_gemma_3b_224"
)
converter(batch) # # Output shape (2, 224, 224, 3)
# Resize images for `"pali_gemma_3b_448"` without cropping.
converter = keras_hub.layers.ImageConverter.from_preset(
"pali_gemma_3b_448",
crop_to_aspect_ratio=False,
)
converter(batch) # # Output shape (2, 448, 448, 3)
Preset | Parameters | Description |
---|---|---|
deeplab_v3_plus_resnet50_pascalvoc | 39.19M | DeepLabV3+ model with ResNet50 as image encoder and trained on augmented Pascal VOC dataset by Semantic Boundaries Dataset(SBD)which is having categorical accuracy of 90.01 and 0.63 Mean IoU. |
densenet_121_imagenet | 7.04M | 121-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
densenet_169_imagenet | 12.64M | 169-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
densenet_201_imagenet | 18.32M | 201-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
mit_b0_ade20k_512 | 3.32M | MiT (MixTransformer) model with 8 transformer blocks. |
mit_b0_cityscapes_1024 | 3.32M | MiT (MixTransformer) model with 8 transformer blocks. |
mit_b1_ade20k_512 | 13.16M | MiT (MixTransformer) model with 8 transformer blocks. |
mit_b1_cityscapes_1024 | 13.16M | MiT (MixTransformer) model with 8 transformer blocks. |
mit_b2_ade20k_512 | 24.20M | MiT (MixTransformer) model with 16 transformer blocks. |
mit_b2_cityscapes_1024 | 24.20M | MiT (MixTransformer) model with 16 transformer blocks. |
mit_b3_ade20k_512 | 44.08M | MiT (MixTransformer) model with 28 transformer blocks. |
mit_b3_cityscapes_1024 | 44.08M | MiT (MixTransformer) model with 28 transformer blocks. |
mit_b4_ade20k_512 | 60.85M | MiT (MixTransformer) model with 41 transformer blocks. |
mit_b4_cityscapes_1024 | 60.85M | MiT (MixTransformer) model with 41 transformer blocks. |
mit_b5_ade20k_640 | 81.45M | MiT (MixTransformer) model with 52 transformer blocks. |
mit_b5_cityscapes_1024 | 81.45M | MiT (MixTransformer) model with 52 transformer blocks. |
pali_gemma_3b_mix_224 | 2.92B | image size 224, mix fine tuned, text sequence length is 256 |
pali_gemma_3b_224 | 2.92B | image size 224, pre trained, text sequence length is 128 |
pali_gemma_3b_mix_448 | 2.92B | image size 448, mix fine tuned, text sequence length is 512 |
pali_gemma_3b_448 | 2.92B | image size 448, pre trained, text sequence length is 512 |
pali_gemma_3b_896 | 2.93B | image size 896, pre trained, text sequence length is 512 |
resnet_18_imagenet | 11.19M | 18-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_18_imagenet | 11.72M | 18-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_34_imagenet | 21.84M | 34-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_50_imagenet | 23.56M | 50-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_v2_50_imagenet | 23.56M | 50-layer ResNetV2 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_50_imagenet | 25.63M | 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_50_ssld_imagenet | 25.63M | 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation. |
resnet_vd_50_ssld_v2_imagenet | 25.63M | 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation and AutoAugment. |
resnet_vd_50_ssld_v2_fix_imagenet | 25.63M | 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation, AutoAugment and additional fine-tuning of the classification head. |
resnet_101_imagenet | 42.61M | 101-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_v2_101_imagenet | 42.61M | 101-layer ResNetV2 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_101_imagenet | 44.67M | 101-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_101_ssld_imagenet | 44.67M | 101-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation. |
resnet_152_imagenet | 58.30M | 152-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_152_imagenet | 60.36M | 152-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
resnet_vd_200_imagenet | 74.93M | 200-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
sam_base_sa1b | 93.74M | The base SAM model trained on the SA1B dataset. |
sam_huge_sa1b | 312.34M | The huge SAM model trained on the SA1B dataset. |
sam_large_sa1b | 641.09M | The large SAM model trained on the SA1B dataset. |
vgg_11_imagenet | 9.22M | 11-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
vgg_13_imagenet | 9.40M | 13-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
vgg_16_imagenet | 14.71M | 16-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |
vgg_19_imagenet | 20.02M | 19-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. |