ViTImageConverter
classkeras_hub.layers.ViTImageConverter(
norm_mean=[0.5, 0.5, 0.5], norm_std=[0.5, 0.5, 0.5], **kwargs
)
Converts images to the format expected by a ViT model.
This layer performs image normalization using mean and standard deviation
values. By default, it uses the same normalization as the
"google/vit-large-patch16-224" model on Hugging Face:
norm_mean=[0.5, 0.5, 0.5]
and norm_std=[0.5, 0.5, 0.5]
(reference).
These defaults are suitable for models pretrained using this normalization.
Arguments
[0.5, 0.5, 0.5]
.[0.5, 0.5, 0.5]
.keras_hub.layers.preprocessing.ImageConverter
.Examples
import keras
import numpy as np
from keras_hub.src.layers import ViTImageConverter
# Example image (replace with your actual image data)
image = np.random.rand(1, 224, 224, 3) # # Example
(B, H, W, C)
# Create a ViTImageConverter instance
converter = ViTImageConverter(
image_size=(28,28),
scale=1/255.
)
# Preprocess the image
preprocessed_image = converter(image)
from_preset
methodViTImageConverter.from_preset(preset, **kwargs)
Instantiate a keras_hub.layers.ImageConverter
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as
one of:
'pali_gemma_3b_224'
'kaggle://user/paligemma/keras/pali_gemma_3b_224'
'hf://user/pali_gemma_3b_224'
'./pali_gemma_3b_224'
You can run cls.presets.keys()
to list all built-in presets available
on the class.
Arguments
True
, the weights will be loaded into the
model architecture. If False
, the weights will be randomly
initialized.Examples
batch = np.random.randint(0, 256, size=(2, 512, 512, 3))
# Resize images for `"pali_gemma_3b_224"`.
converter = keras_hub.layers.ImageConverter.from_preset(
"pali_gemma_3b_224"
)
converter(batch) # # Output shape (2, 224, 224, 3)
# Resize images for `"pali_gemma_3b_448"` without cropping.
converter = keras_hub.layers.ImageConverter.from_preset(
"pali_gemma_3b_448",
crop_to_aspect_ratio=False,
)
converter(batch) # # Output shape (2, 448, 448, 3)
Preset | Parameters | Description |
---|---|---|
vit_base_patch16_224_imagenet | 85.80M | ViT-B16 model pre-trained on the ImageNet 1k dataset with image resolution of 224x224 |
vit_base_patch16_224_imagenet21k | 85.80M | ViT-B16 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224 |
vit_base_patch16_384_imagenet | 86.09M | ViT-B16 model pre-trained on the ImageNet 1k dataset with image resolution of 384x384 |
vit_base_patch32_224_imagenet21k | 87.46M | ViT-B32 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224 |
vit_base_patch32_384_imagenet | 87.53M | ViT-B32 model pre-trained on the ImageNet 1k dataset with image resolution of 384x384 |
vit_large_patch16_224_imagenet | 303.30M | ViT-L16 model pre-trained on the ImageNet 1k dataset with image resolution of 224x224 |
vit_large_patch16_224_imagenet21k | 303.30M | ViT-L16 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224 |
vit_large_patch16_384_imagenet | 303.69M | ViT-L16 model pre-trained on the ImageNet 1k dataset with image resolution of 384x384 |
vit_large_patch32_224_imagenet21k | 305.51M | ViT-L32 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224 |
vit_large_patch32_384_imagenet | 305.61M | ViT-L32 model pre-trained on the ImageNet 1k dataset with image resolution of 384x384 |
vit_huge_patch14_224_imagenet21k | 630.76M | ViT-H14 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224 |