► KerasHub: Pretrained Models / KerasHub pretrained models

KerasHub pretrained models

Below, we list all presets available in the KerasHub library. For more detailed usage, browse the docstring for a particular class. For an in depth introduction to our API, see the getting started guide.

The following preset names correspond to a config and weights for a pretrained model. Any task, preprocessor, backbone, or tokenizer from_preset() can be used to create a model from the saved preset.

backbone = keras_hub.models.Backbone.from_preset("bert_base_en")
tokenizer = keras_hub.models.Tokenizer.from_preset("bert_base_en")
classifier = keras_hub.models.TextClassifier.from_preset("bert_base_en", num_classes=2)
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset("bert_base_en")

Preset	Model API	Parameters	Description
albert_base_en_uncased	Albert	11.68M	12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_large_en_uncased	Albert	17.68M	24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_extra_large_en_uncased	Albert	58.72M	24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_extra_extra_large_en_uncased	Albert	222.60M	12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bart_base_en	Bart	139.42M	6-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl.
bart_large_en	Bart	406.29M	12-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl.
bart_large_en_cnn	Bart	406.29M	The `bart_large_en` backbone model fine-tuned on the CNN+DM summarization dataset.
basnet_duts	BASNet	108.89M	BASNet model with a 34-layer ResNet backbone, pre-trained on the DUTS image dataset at a 288x288 resolution. Model training was performed by Hamid Ali (https://github.com/hamidriasat/BASNet).
bert_tiny_en_uncased	Bert	4.39M	2-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_tiny_en_uncased_sst2	Bert	4.39M	The bert_tiny_en_uncased backbone model fine-tuned on the SST-2 sentiment analysis dataset.
bert_small_en_uncased	Bert	28.76M	4-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_medium_en_uncased	Bert	41.37M	8-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_base_zh	Bert	102.27M	12-layer BERT model. Trained on Chinese Wikipedia.
bert_base_en	Bert	108.31M	12-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus.
bert_base_en_uncased	Bert	109.48M	12-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_base_multi	Bert	177.85M	12-layer BERT model where case is maintained. Trained on trained on Wikipedias of 104 languages
bert_large_en	Bert	333.58M	24-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus.
bert_large_en_uncased	Bert	335.14M	24-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bloom_560m_multi	Bloom	559.21M	24-layer Bloom model with hidden dimension of 1024. trained on 45 natural languages and 12 programming languages.
bloomz_560m_multi	Bloom	559.21M	24-layer Bloom model with hidden dimension of 1024. finetuned on crosslingual task mixture (xP3) dataset.
bloom_1.1b_multi	Bloom	1.07B	24-layer Bloom model with hidden dimension of 1536. trained on 45 natural languages and 12 programming languages.
bloomz_1.1b_multi	Bloom	1.07B	24-layer Bloom model with hidden dimension of 1536. finetuned on crosslingual task mixture (xP3) dataset.
bloom_1.7b_multi	Bloom	1.72B	24-layer Bloom model with hidden dimension of 2048. trained on 45 natural languages and 12 programming languages.
bloomz_1.7b_multi	Bloom	1.72B	24-layer Bloom model with hidden dimension of 2048. finetuned on crosslingual task mixture (xP3) dataset.
bloom_3b_multi	Bloom	3.00B	30-layer Bloom model with hidden dimension of 2560. trained on 45 natural languages and 12 programming languages.
bloomz_3b_multi	Bloom	3.00B	30-layer Bloom model with hidden dimension of 2560. finetuned on crosslingual task mixture (xP3) dataset.
clip_vit_base_patch16	CLIP	149.62M	150 million parameter, 12-layer for vision and 12-layer for text, patch size of 16, CLIP model.
clip_vit_base_patch32	CLIP	151.28M	151 million parameter, 12-layer for vision and 12-layer for text, patch size of 32, CLIP model.
clip_vit_b_32_laion2b_s34b_b79k	CLIP	151.28M	151 million parameter, 12-layer for vision and 12-layer for text, patch size of 32, Open CLIP model.
clip_vit_large_patch14	CLIP	427.62M	428 million parameter, 24-layer for vision and 12-layer for text, patch size of 14, CLIP model.
clip_vit_large_patch14_336	CLIP	427.94M	428 million parameter, 24-layer for vision and 12-layer for text, patch size of 14, image size of 336, CLIP model.
clip_vit_h_14_laion2b_s32b_b79k	CLIP	986.11M	986 million parameter, 32-layer for vision and 24-layer for text, patch size of 14, Open CLIP model.
clip_vit_g_14_laion2b_s12b_b42k	CLIP	1.37B	1.4 billion parameter, 40-layer for vision and 24-layer for text, patch size of 14, Open CLIP model.
clip_vit_bigg_14_laion2b_39b_b160k	CLIP	2.54B	2.5 billion parameter, 48-layer for vision and 32-layer for text, patch size of 14, Open CLIP model.
csp_resnext_50_ra_imagenet	CSPNet	20.57M	A CSP-ResNeXt (Cross-Stage-Partial) image classification model pre-trained on the Randomly Augmented ImageNet 1k dataset at a 256x256 resolution.
csp_resnet_50_ra_imagenet	CSPNet	21.62M	A CSP-ResNet (Cross-Stage-Partial) image classification model pre-trained on the Randomly Augmented ImageNet 1k dataset at a 256x256 resolution.
csp_darknet_53_ra_imagenet	CSPNet	27.64M	A CSP-DarkNet (Cross-Stage-Partial) image classification model pre-trained on the Randomly Augmented ImageNet 1k dataset at a 256x256 resolution.
darknet_53_imagenet	CSPNet	41.61M	A DarkNet image classification model pre-trained on theImageNet 1k dataset at a 256x256 resolution.
deberta_v3_extra_small_en	DebertaV3	70.68M	12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_small_en	DebertaV3	141.30M	6-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_base_en	DebertaV3	183.83M	12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_base_multi	DebertaV3	278.22M	12-layer DeBERTaV3 model where case is maintained. Trained on the 2.5TB multilingual CC100 dataset.
deberta_v3_large_en	DebertaV3	434.01M	24-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deeplab_v3_plus_resnet50_pascalvoc	DeepLabV3	39.19M	DeepLabV3+ model with ResNet50 as image encoder and trained on augmented Pascal VOC dataset by Semantic Boundaries Dataset(SBD) which is having categorical accuracy of 90.01 and 0.63 Mean IoU.
densenet_121_imagenet	DenseNet	7.04M	121-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
densenet_169_imagenet	DenseNet	12.64M	169-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
densenet_201_imagenet	DenseNet	18.32M	201-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
distil_bert_base_en	DistilBert	65.19M	6-layer DistilBERT model where case is maintained. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model.
distil_bert_base_en_uncased	DistilBert	66.36M	6-layer DistilBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model.
distil_bert_base_multi	DistilBert	134.73M	6-layer DistilBERT model where case is maintained. Trained on Wikipedias of 104 languages
efficientnet_lite0_ra_imagenet	EfficientNet	4.65M	EfficientNet-Lite model fine-trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_b0_ra_imagenet	EfficientNet	5.29M	EfficientNet B0 model pre-trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_b0_ra4_e3600_r224_imagenet	EfficientNet	5.29M	EfficientNet B0 model pre-trained on the ImageNet 1k dataset by Ross Wightman. Trained with timm scripts using hyper-parameters inspired by the MobileNet-V4 small, mixed with go-to hparams from timm and 'ResNet Strikes Back'.
efficientnet_es_ra_imagenet	EfficientNet	5.44M	EfficientNet-EdgeTPU Small model trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_em_ra2_imagenet	EfficientNet	6.90M	EfficientNet-EdgeTPU Medium model trained on the ImageNet 1k dataset with RandAugment2 recipe.
efficientnet_b1_ft_imagenet	EfficientNet	7.79M	EfficientNet B1 model fine-tuned on the ImageNet 1k dataset.
efficientnet_b1_ra4_e3600_r240_imagenet	EfficientNet	7.79M	EfficientNet B1 model pre-trained on the ImageNet 1k dataset by Ross Wightman. Trained with timm scripts using hyper-parameters inspired by the MobileNet-V4 small, mixed with go-to hparams from timm and 'ResNet Strikes Back'.
efficientnet_b2_ra_imagenet	EfficientNet	9.11M	EfficientNet B2 model pre-trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_el_ra_imagenet	EfficientNet	10.59M	EfficientNet-EdgeTPU Large model trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_b3_ra2_imagenet	EfficientNet	12.23M	EfficientNet B3 model pre-trained on the ImageNet 1k dataset with RandAugment2 recipe.
efficientnet2_rw_t_ra2_imagenet	EfficientNet	13.65M	EfficientNet-v2 Tiny model trained on the ImageNet 1k dataset with RandAugment2 recipe.
efficientnet_b4_ra2_imagenet	EfficientNet	19.34M	EfficientNet B4 model pre-trained on the ImageNet 1k dataset with RandAugment2 recipe.
efficientnet2_rw_s_ra2_imagenet	EfficientNet	23.94M	EfficientNet-v2 Small model trained on the ImageNet 1k dataset with RandAugment2 recipe.
efficientnet_b5_sw_imagenet	EfficientNet	30.39M	EfficientNet B5 model pre-trained on the ImageNet 12k dataset by Ross Wightman. Based on Swin Transformer train / pretrain recipe with modifications (related to both DeiT and ConvNeXt recipes).
efficientnet_b5_sw_ft_imagenet	EfficientNet	30.39M	EfficientNet B5 model pre-trained on the ImageNet 12k dataset and fine-tuned on ImageNet-1k by Ross Wightman. Based on Swin Transformer train / pretrain recipe with modifications (related to both DeiT and ConvNeXt recipes).
efficientnet2_rw_m_agc_imagenet	EfficientNet	53.24M	EfficientNet-v2 Medium model trained on the ImageNet 1k dataset with adaptive gradient clipping.
electra_small_discriminator_uncased_en	Electra	13.55M	12-layer small ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_small_generator_uncased_en	Electra	13.55M	12-layer small ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_base_generator_uncased_en	Electra	33.58M	12-layer base ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_large_generator_uncased_en	Electra	51.07M	24-layer large ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_base_discriminator_uncased_en	Electra	109.48M	12-layer base ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_large_discriminator_uncased_en	Electra	335.14M	24-layer large ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
f_net_base_en	FNet	82.86M	12-layer FNet model where case is maintained. Trained on the C4 dataset.
f_net_large_en	FNet	236.95M	24-layer FNet model where case is maintained. Trained on the C4 dataset.
falcon_refinedweb_1b_en	Falcon	1.31B	24-layer Falcon model (Falcon with 1B parameters), trained on 350B tokens of RefinedWeb dataset.
gemma_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, base Gemma model.
gemma_instruct_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, instruction tuned Gemma model.
gemma_1.1_instruct_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, instruction tuned Gemma model. The 1.1 update improves model quality.
code_gemma_1.1_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. The 1.1 update improves model quality.
code_gemma_2b_en	Gemma	2.51B	2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion.
gemma2_2b_en	Gemma	2.61B	2 billion parameter, 26-layer, base Gemma model.
gemma2_instruct_2b_en	Gemma	2.61B	2 billion parameter, 26-layer, instruction tuned Gemma model.
shieldgemma_2b_en	Gemma	2.61B	2 billion parameter, 26-layer, ShieldGemma model.
gemma_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, base Gemma model.
gemma_instruct_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, instruction tuned Gemma model.
gemma_1.1_instruct_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, instruction tuned Gemma model. The 1.1 update improves model quality.
code_gemma_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion.
code_gemma_instruct_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code.
code_gemma_1.1_instruct_7b_en	Gemma	8.54B	7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. The 1.1 update improves model quality.
gemma2_9b_en	Gemma	9.24B	9 billion parameter, 42-layer, base Gemma model.
gemma2_instruct_9b_en	Gemma	9.24B	9 billion parameter, 42-layer, instruction tuned Gemma model.
shieldgemma_9b_en	Gemma	9.24B	9 billion parameter, 42-layer, ShieldGemma model.
gemma2_27b_en	Gemma	27.23B	27 billion parameter, 42-layer, base Gemma model.
gemma2_instruct_27b_en	Gemma	27.23B	27 billion parameter, 42-layer, instruction tuned Gemma model.
shieldgemma_27b_en	Gemma	27.23B	27 billion parameter, 42-layer, ShieldGemma model.
gemma3_1b	Gemma3	999.89M	1 billion parameter, 26-layer, text-only pretrained Gemma3 model.
gemma3_instruct_1b	Gemma3	999.89M	1 billion parameter, 26-layer, text-only instruction-tuned Gemma3 model.
gemma3_4b_text	Gemma3	3.88B	4 billion parameter, 34-layer, text-only pretrained Gemma3 model.
gemma3_instruct_4b_text	Gemma3	3.88B	4 billion parameter, 34-layer, text-only instruction-tuned Gemma3 model.
gemma3_4b	Gemma3	4.30B	4 billion parameter, 34-layer, vision+text pretrained Gemma3 model.
gemma3_instruct_4b	Gemma3	4.30B	4 billion parameter, 34-layer, vision+text instruction-tuned Gemma3 model.
gemma3_12b_text	Gemma3	11.77B	12 billion parameter, 48-layer, text-only pretrained Gemma3 model.
gemma3_instruct_12b_text	Gemma3	11.77B	12 billion parameter, 48-layer, text-only instruction-tuned Gemma3 model.
gemma3_12b	Gemma3	12.19B	12 billion parameter, 48-layer, vision+text pretrained Gemma3 model.
gemma3_instruct_12b	Gemma3	12.19B	12 billion parameter, 48-layer, vision+text instruction-tuned Gemma3 model.
gemma3_27b_text	Gemma3	27.01B	27 billion parameter, 62-layer, text-only pretrained Gemma3 model.
gemma3_instruct_27b_text	Gemma3	27.01B	27 billion parameter, 62-layer, text-only instruction-tuned Gemma3 model.
gemma3_27b	Gemma3	27.43B	27 billion parameter, 62-layer, vision+text pretrained Gemma3 model.
gemma3_instruct_27b	Gemma3	27.43B	27 billion parameter, 62-layer, vision+text instruction-tuned Gemma3 model.
gpt2_base_en	GPT2	124.44M	12-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_base_en_cnn_dailymail	GPT2	124.44M	12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset.
gpt2_medium_en	GPT2	354.82M	24-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_large_en	GPT2	774.03M	36-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_extra_large_en	GPT2	1.56B	48-layer GPT-2 model where case is maintained. Trained on WebText.
llama2_7b_en	Llama	6.74B	7 billion parameter, 32-layer, base LLaMA 2 model.
llama2_instruct_7b_en	Llama	6.74B	7 billion parameter, 32-layer, instruction tuned LLaMA 2 model.
vicuna_1.5_7b_en	Llama	6.74B	7 billion parameter, 32-layer, instruction tuned Vicuna v1.5 model.
llama2_7b_en_int8	Llama	6.74B	7 billion parameter, 32-layer, base LLaMA 2 model with activation and weights quantized to int8.
llama2_instruct_7b_en_int8	Llama	6.74B	7 billion parameter, 32-layer, instruction tuned LLaMA 2 model with activation and weights quantized to int8.
llama3.2_1b	Llama3	1.50B	1 billion parameter, 16-layer, based LLaMA 3.2 model.
llama3.2_instruct_1b	Llama3	1.50B	1 billion parameter, 16-layer, instruction tuned LLaMA 3.2.
llama3.2_guard_1b	Llama3	1.50B	1 billion parameter, 16-layer, based LLaMA 3.2 model fine-tuned for consent safety classification.
llama3.2_3b	Llama3	3.61B	3 billion parameter, 26-layer, based LLaMA 3.2 model.
llama3.2_instruct_3b	Llama3	3.61B	3 billion parameter, 28-layer, instruction tuned LLaMA 3.2.
llama3_8b_en	Llama3	8.03B	8 billion parameter, 32-layer, base LLaMA 3 model.
llama3_instruct_8b_en	Llama3	8.03B	8 billion parameter, 32-layer, instruction tuned LLaMA 3 model.
llama3.1_8b	Llama3	8.03B	8 billion parameter, 32-layer, based LLaMA 3.1 model.
llama3.1_instruct_8b	Llama3	8.03B	8 billion parameter, 32-layer, instruction tuned LLaMA 3.1.
llama3.1_guard_8b	Llama3	8.03B	8 billion parameter, 32-layer, LLaMA 3.1 fine-tuned for consent safety classification.
llama3_8b_en_int8	Llama3	8.03B	8 billion parameter, 32-layer, base LLaMA 3 model with activation and weights quantized to int8.
llama3_instruct_8b_en_int8	Llama3	8.03B	8 billion parameter, 32-layer, instruction tuned LLaMA 3 model with activation and weights quantized to int8.
mistral_7b_en	Mistral	7.24B	Mistral 7B base model
mistral_instruct_7b_en	Mistral	7.24B	Mistral 7B instruct model
mistral_0.2_instruct_7b_en	Mistral	7.24B	Mistral 7B instruct Version 0.2 model
mit_b0_ade20k_512	MiT	3.32M	MiT (MixTransformer) model with 8 transformer blocks.
mit_b0_cityscapes_1024	MiT	3.32M	MiT (MixTransformer) model with 8 transformer blocks.
mit_b1_ade20k_512	MiT	13.16M	MiT (MixTransformer) model with 8 transformer blocks.
mit_b1_cityscapes_1024	MiT	13.16M	MiT (MixTransformer) model with 8 transformer blocks.
mit_b2_ade20k_512	MiT	24.20M	MiT (MixTransformer) model with 16 transformer blocks.
mit_b2_cityscapes_1024	MiT	24.20M	MiT (MixTransformer) model with 16 transformer blocks.
mit_b3_ade20k_512	MiT	44.08M	MiT (MixTransformer) model with 28 transformer blocks.
mit_b3_cityscapes_1024	MiT	44.08M	MiT (MixTransformer) model with 28 transformer blocks.
mit_b4_ade20k_512	MiT	60.85M	MiT (MixTransformer) model with 41 transformer blocks.
mit_b4_cityscapes_1024	MiT	60.85M	MiT (MixTransformer) model with 41 transformer blocks.
mit_b5_ade20k_640	MiT	81.45M	MiT (MixTransformer) model with 52 transformer blocks.
mit_b5_cityscapes_1024	MiT	81.45M	MiT (MixTransformer) model with 52 transformer blocks.
mixtral_8_7b_en	Mixtral	46.70B	32-layer Mixtral MoE model with 7 billionactive parameters and 8 experts per MoE layer.
mixtral_8_instruct_7b_en	Mixtral	46.70B	Instruction fine-tuned 32-layer Mixtral MoE modelwith 7 billion active parameters and 8 experts per MoE layer.
mobilenet_v3_small_050_imagenet	-	278.78K	Small Mobilenet V3 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. Has half channel multiplier.
mobilenet_v3_small_100_imagenet	-	939.12K	Small Mobilenet V3 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. Has baseline channel multiplier.
mobilenet_v3_large_100_imagenet	-	3.00M	Large Mobilenet V3 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution. Has baseline channel multiplier.
mobilenet_v3_large_100_imagenet_21k	-	3.00M	Large Mobilenet V3 model pre-trained on the ImageNet 21k dataset at a 224x224 resolution. Has baseline channel multiplier.
moonshine_tiny_en	Moonshine	27.09M	Moonshine tiny model for English speech recognition. Developed by Useful Sensors for real-time transcription.
moonshine_base_en	Moonshine	61.51M	Moonshine base model for English speech recognition. Developed by Useful Sensors for real-time transcription.
opt_125m_en	OPT	125.24M	12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_1.3b_en	OPT	1.32B	24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_2.7b_en	OPT	2.70B	32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_6.7b_en	OPT	6.70B	32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
pali_gemma_3b_mix_224	PaliGemma	2.92B	image size 224, mix fine tuned, text sequence length is 256
pali_gemma_3b_224	PaliGemma	2.92B	image size 224, pre trained, text sequence length is 128
pali_gemma_3b_mix_448	PaliGemma	2.92B	image size 448, mix fine tuned, text sequence length is 512
pali_gemma_3b_448	PaliGemma	2.92B	image size 448, pre trained, text sequence length is 512
pali_gemma_3b_896	PaliGemma	2.93B	image size 896, pre trained, text sequence length is 512
pali_gemma2_mix_3b_224	-	3.03B	3 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_3b_224	-	3.03B	3 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma_2_ft_docci_3b_448	-	3.03B	3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details.
pali_gemma2_mix_3b_448	-	3.03B	3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_3b_448	-	3.03B	3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_3b_896	-	3.04B	3 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_mix_10b_224	-	9.66B	10 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_10b_224	-	9.66B	10 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_ft_docci_10b_448	-	9.66B	10 billion parameter, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details.
pali_gemma2_mix_10b_448	-	9.66B	10 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_10b_448	-	9.66B	10 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_10b_896	-	9.67B	10 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_mix_28b_224	-	27.65B	28 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_mix_28b_448	-	27.65B	28 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_28b_224	-	27.65B	28 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_28b_448	-	27.65B	28 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_28b_896	-	27.65B	28 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
phi3_mini_4k_instruct_en	Phi3	3.82B	3.8 billion parameters, 32 layers, 4k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
phi3_mini_128k_instruct_en	Phi3	3.82B	3.8 billion parameters, 32 layers, 128k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
qwen2.5_0.5b_en	Qwen	494.03M	24-layer Qwen model with 0.5 billion parameters.
qwen2.5_instruct_0.5b_en	Qwen	494.03M	Instruction fine-tuned 24-layer Qwen model with 0.5 billion parameters.
qwen2.5_3b_en	Qwen	3.09B	36-layer Qwen model with 3.1 billion parameters.
qwen2.5_7b_en	Qwen	6.99B	48-layer Qwen model with 7 billion parameters.
qwen2.5_instruct_32b_en	Qwen	32.76B	Instruction fine-tuned 64-layer Qwen model with 32 billion parameters.
qwen2.5_instruct_72b_en	Qwen	72.71B	Instruction fine-tuned 80-layer Qwen model with 72 billion parameters.
qwen1.5_moe_2.7b_en	-	14.32B	24-layer Qwen MoE model with 2.7 billion active parameters and 8 experts per MoE layer.
resnet_18_imagenet	ResNet	11.19M	18-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_18_imagenet	ResNet	11.72M	18-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_34_imagenet	ResNet	21.84M	34-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_50_imagenet	ResNet	23.56M	50-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_v2_50_imagenet	ResNet	23.56M	50-layer ResNetV2 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_50_imagenet	ResNet	25.63M	50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_50_ssld_imagenet	ResNet	25.63M	50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation.
resnet_vd_50_ssld_v2_imagenet	ResNet	25.63M	50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation and AutoAugment.
resnet_vd_50_ssld_v2_fix_imagenet	ResNet	25.63M	50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation, AutoAugment and additional fine-tuning of the classification head.
resnet_101_imagenet	ResNet	42.61M	101-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_v2_101_imagenet	ResNet	42.61M	101-layer ResNetV2 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_101_imagenet	ResNet	44.67M	101-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_101_ssld_imagenet	ResNet	44.67M	101-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation.
resnet_152_imagenet	ResNet	58.30M	152-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_152_imagenet	ResNet	60.36M	152-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_200_imagenet	ResNet	74.93M	200-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
retinanet_resnet50_fpn_v2_coco	RetinaNet	31.56M	RetinaNet model with ResNet50 backbone fine-tuned on COCO in 800x800 resolution with FPN features created from P5 level.
retinanet_resnet50_fpn_coco	RetinaNet	34.12M	RetinaNet model with ResNet50 backbone fine-tuned on COCO in 800x800 resolution.
roberta_base_en	Roberta	124.05M	12-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText.
roberta_large_en	Roberta	354.31M	24-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText.
sam_base_sa1b	Segment Anything Model	93.74M	The base SAM model trained on the SA1B dataset.
sam_huge_sa1b	Segment Anything Model	312.34M	The huge SAM model trained on the SA1B dataset.
sam_large_sa1b	Segment Anything Model	641.09M	The large SAM model trained on the SA1B dataset.
siglip_base_patch16_224	SigLIP	203.16M	200 million parameter, image size 224, pre-trained on WebLi.
siglip_base_patch16_256	SigLIP	203.20M	200 million parameter, image size 256, pre-trained on WebLi.
siglip_base_patch16_384	SigLIP	203.45M	200 million parameter, image size 384, pre-trained on WebLi.
siglip_base_patch16_512	SigLIP	203.79M	200 million parameter, image size 512, pre-trained on WebLi.
siglip_base_patch16_256_multilingual	SigLIP	370.63M	370 million parameter, image size 256, pre-trained on WebLi.
siglip2_base_patch16_224	SigLIP	375.19M	375 million parameter, patch size 16, image size 224, pre-trained on WebLi.
siglip2_base_patch16_256	SigLIP	375.23M	375 million parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_base_patch32_256	SigLIP	376.86M	376 million parameter, patch size 32, image size 256, pre-trained on WebLi.
siglip2_base_patch16_384	SigLIP	376.86M	376 million parameter, patch size 16, image size 384, pre-trained on WebLi.
siglip_large_patch16_256	SigLIP	652.15M	652 million parameter, image size 256, pre-trained on WebLi.
siglip_large_patch16_384	SigLIP	652.48M	652 million parameter, image size 384, pre-trained on WebLi.
siglip_so400m_patch14_224	SigLIP	877.36M	877 million parameter, image size 224, shape-optimized version, pre-trained on WebLi.
siglip_so400m_patch14_384	SigLIP	877.96M	877 million parameter, image size 384, shape-optimized version, pre-trained on WebLi.
siglip2_large_patch16_256	SigLIP	881.53M	881 million parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_large_patch16_384	SigLIP	881.86M	881 million parameter, patch size 16, image size 384, pre-trained on WebLi.
siglip2_large_patch16_512	SigLIP	882.31M	882 million parameter, patch size 16, image size 512, pre-trained on WebLi.
siglip_so400m_patch16_256_i18n	SigLIP	1.13B	1.1 billion parameter, image size 256, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch14_224	SigLIP	1.14B	1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_256	SigLIP	1.14B	1.1 billion parameter, patch size 16, image size 256, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch14_384	SigLIP	1.14B	1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_384	SigLIP	1.14B	1.1 billion parameter, patch size 16, image size 384, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_512	SigLIP	1.14B	1.1 billion parameter, patch size 16, image size 512, shape-optimized version, pre-trained on WebLi.
siglip2_giant_opt_patch16_256	SigLIP	1.87B	1.8 billion parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_giant_opt_patch16_384	SigLIP	1.87B	1.8 billion parameter, patch size 16, image size 384, pre-trained on WebLi.
stable_diffusion_3_medium	Stable Diffusion 3	2.99B	3 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.
stable_diffusion_3.5_medium	Stable Diffusion 3	3.37B	3 billion parameter, including CLIP L and CLIP G text encoders, MMDiT-X generative model, and VAE autoencoder. Developed by Stability AI.
stable_diffusion_3.5_large	Stable Diffusion 3	9.05B	9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.
stable_diffusion_3.5_large_turbo	Stable Diffusion 3	9.05B	9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. A timestep-distilled version that eliminates classifier-free guidance and uses fewer steps for generation. Developed by Stability AI.
t5_small_multi	T5	0	8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
t5_base_multi	T5	0	12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
t5_large_multi	T5	0	24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
flan_small_multi	T5	0	8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
flan_base_multi	T5	0	12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
flan_large_multi	T5	0	24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
t5_1.1_small	T5	60.51M
t5_1.1_base	T5	247.58M
t5_1.1_large	T5	750.25M
t5_1.1_xl	T5	2.85B
t5_1.1_xxl	T5	11.14B
vgg_11_imagenet	VGG	9.22M	11-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
vgg_13_imagenet	VGG	9.40M	13-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
vgg_16_imagenet	VGG	14.71M	16-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
vgg_19_imagenet	VGG	20.02M	19-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
vit_base_patch16_224_imagenet	ViT	85.80M	ViT-B16 model pre-trained on the ImageNet 1k dataset with image resolution of 224x224
vit_base_patch16_224_imagenet21k	ViT	85.80M	ViT-B16 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224
vit_base_patch16_384_imagenet	ViT	86.09M	ViT-B16 model pre-trained on the ImageNet 1k dataset with image resolution of 384x384
vit_base_patch32_224_imagenet21k	ViT	87.46M	ViT-B32 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224
vit_base_patch32_384_imagenet	ViT	87.53M	ViT-B32 model pre-trained on the ImageNet 1k dataset with image resolution of 384x384
vit_large_patch16_224_imagenet	ViT	303.30M	ViT-L16 model pre-trained on the ImageNet 1k dataset with image resolution of 224x224
vit_large_patch16_224_imagenet21k	ViT	303.30M	ViT-L16 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224
vit_large_patch16_384_imagenet	ViT	303.69M	ViT-L16 model pre-trained on the ImageNet 1k dataset with image resolution of 384x384
vit_large_patch32_224_imagenet21k	ViT	305.51M	ViT-L32 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224
vit_large_patch32_384_imagenet	ViT	305.61M	ViT-L32 model pre-trained on the ImageNet 1k dataset with image resolution of 384x384
vit_huge_patch14_224_imagenet21k	ViT	630.76M	ViT-H14 backbone pre-trained on the ImageNet 21k dataset with image resolution of 224x224
whisper_tiny_en	Whisper	37.18M	4-layer Whisper model. Trained on 438,000 hours of labelled English speech data.
whisper_tiny_multi	Whisper	37.76M	4-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_base_multi	Whisper	72.59M	6-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_base_en	Whisper	124.44M	6-layer Whisper model. Trained on 438,000 hours of labelled English speech data.
whisper_small_en	Whisper	241.73M	12-layer Whisper model. Trained on 438,000 hours of labelled English speech data.
whisper_small_multi	Whisper	241.73M	12-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_medium_en	Whisper	763.86M	24-layer Whisper model. Trained on 438,000 hours of labelled English speech data.
whisper_medium_multi	Whisper	763.86M	24-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_large_multi	Whisper	1.54B	32-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_large_multi_v2	Whisper	1.54B	32-layer Whisper model. Trained for 2.5 epochs on 680,000 hours of labelled multilingual speech data. An improved of `whisper_large_multi`.
xception_41_imagenet	Xception	20.86M	41-layer Xception model pre-trained on ImageNet 1k.
xlm_roberta_base_multi	XLMRoberta	277.45M	12-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages.
xlm_roberta_large_multi	XLMRoberta	558.84M	24-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages.