KerasHub: Pretrained Models / KerasHub pretrained models

KerasHub pretrained models

Below, we list all presets available in the KerasHub library. For more detailed usage, browse the docstring for a particular class. For an in depth introduction to our API, see the getting started guide.

The following preset names correspond to a config and weights for a pretrained model. Any task, preprocessor, backbone, or tokenizer from_preset() can be used to create a model from the saved preset.

backbone = keras_hub.models.Backbone.from_preset("bert_base_en")
tokenizer = keras_hub.models.Tokenizer.from_preset("bert_base_en")
classifier = keras_hub.models.TextClassifier.from_preset("bert_base_en", num_classes=2)
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset("bert_base_en")
Preset Model API Parameters Description
albert_base_en_uncased Albert 11.68M 12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_large_en_uncased Albert 17.68M 24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_extra_large_en_uncased Albert 58.72M 24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_extra_extra_large_en_uncased Albert 222.60M 12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bart_base_en Bart 139.42M 6-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl.
bart_large_en Bart 406.29M 12-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl.
bart_large_en_cnn Bart 406.29M The bart_large_en backbone model fine-tuned on the CNN+DM summarization dataset.
bert_tiny_en_uncased Bert 4.39M 2-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_tiny_en_uncased_sst2 Bert 4.39M The bert_tiny_en_uncased backbone model fine-tuned on the SST-2 sentiment analysis dataset.
bert_small_en_uncased Bert 28.76M 4-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_medium_en_uncased Bert 41.37M 8-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_base_zh Bert 102.27M 12-layer BERT model. Trained on Chinese Wikipedia.
bert_base_en Bert 108.31M 12-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus.
bert_base_en_uncased Bert 109.48M 12-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_base_multi Bert 177.85M 12-layer BERT model where case is maintained. Trained on trained on Wikipedias of 104 languages
bert_large_en Bert 333.58M 24-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus.
bert_large_en_uncased Bert 335.14M 24-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bloom_560m_multi Bloom 559.21M 24-layer Bloom model with hidden dimension of 1024. trained on 45 natural languages and 12 programming languages.
bloomz_560m_multi Bloom 559.21M 24-layer Bloom model with hidden dimension of 1024. finetuned on crosslingual task mixture (xP3) dataset.
bloom_1.1b_multi Bloom 1.07B 24-layer Bloom model with hidden dimension of 1536. trained on 45 natural languages and 12 programming languages.
bloomz_1.1b_multi Bloom 1.07B 24-layer Bloom model with hidden dimension of 1536. finetuned on crosslingual task mixture (xP3) dataset.
bloom_1.7b_multi Bloom 1.72B 24-layer Bloom model with hidden dimension of 2048. trained on 45 natural languages and 12 programming languages.
bloomz_1.7b_multi Bloom 1.72B 24-layer Bloom model with hidden dimension of 2048. finetuned on crosslingual task mixture (xP3) dataset.
bloom_3b_multi Bloom 3.00B 30-layer Bloom model with hidden dimension of 2560. trained on 45 natural languages and 12 programming languages.
bloomz_3b_multi Bloom 3.00B 30-layer Bloom model with hidden dimension of 2560. finetuned on crosslingual task mixture (xP3) dataset.
clip_vit_base_patch16 - 149.62M 150 million parameter, 12-layer for vision and 12-layer for text, patch size of 16, CLIP model.
clip_vit_base_patch32 - 151.28M 151 million parameter, 12-layer for vision and 12-layer for text, patch size of 32, CLIP model.
clip_vit_b_32_laion2b_s34b_b79k - 151.28M 151 million parameter, 12-layer for vision and 12-layer for text, patch size of 32, Open CLIP model.
clip_vit_large_patch14 - 427.62M 428 million parameter, 24-layer for vision and 12-layer for text, patch size of 14, CLIP model.
clip_vit_large_patch14_336 - 427.94M 428 million parameter, 24-layer for vision and 12-layer for text, patch size of 14, image size of 336, CLIP model.
clip_vit_h_14_laion2b_s32b_b79k - 986.11M 986 million parameter, 32-layer for vision and 24-layer for text, patch size of 14, Open CLIP model.
clip_vit_g_14_laion2b_s12b_b42k - 1.37B 1.4 billion parameter, 40-layer for vision and 24-layer for text, patch size of 14, Open CLIP model.
clip_vit_bigg_14_laion2b_39b_b160k - 2.54B 2.5 billion parameter, 48-layer for vision and 32-layer for text, patch size of 14, Open CLIP model.
deberta_v3_extra_small_en DebertaV3 70.68M 12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_small_en DebertaV3 141.30M 6-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_base_en DebertaV3 183.83M 12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_base_multi DebertaV3 278.22M 12-layer DeBERTaV3 model where case is maintained. Trained on the 2.5TB multilingual CC100 dataset.
deberta_v3_large_en DebertaV3 434.01M 24-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deeplab_v3_plus_resnet50_pascalvoc DeepLabV3 39.19M DeepLabV3+ model with ResNet50 as image encoder and trained on augmented Pascal VOC dataset by Semantic Boundaries Dataset(SBD)which is having categorical accuracy of 90.01 and 0.63 Mean IoU.
densenet_121_imagenet DenseNet 7.04M 121-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
densenet_169_imagenet DenseNet 12.64M 169-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
densenet_201_imagenet DenseNet 18.32M 201-layer DenseNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
distil_bert_base_en DistilBert 65.19M 6-layer DistilBERT model where case is maintained. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model.
distil_bert_base_en_uncased DistilBert 66.36M 6-layer DistilBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model.
distil_bert_base_multi DistilBert 134.73M 6-layer DistilBERT model where case is maintained. Trained on Wikipedias of 104 languages
efficientnet_lite0_ra_imagenet - 4.65M EfficientNet-Lite model fine-trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_b0_ra_imagenet - 5.29M EfficientNet B0 model pre-trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_b0_ra4_e3600_r224_imagenet - 5.29M EfficientNet B0 model pre-trained on the ImageNet 1k dataset by Ross Wightman. Trained with timm scripts using hyper-parameters inspired by the MobileNet-V4 small, mixed with go-to hparams from timm and "ResNet Strikes Back".
efficientnet_es_ra_imagenet - 5.44M EfficientNet-EdgeTPU Small model trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_em_ra2_imagenet - 6.90M EfficientNet-EdgeTPU Medium model trained on the ImageNet 1k dataset with RandAugment2 recipe.
efficientnet_b1_ft_imagenet - 7.79M EfficientNet B1 model fine-tuned on the ImageNet 1k dataset.
efficientnet_b1_ra4_e3600_r240_imagenet - 7.79M EfficientNet B1 model pre-trained on the ImageNet 1k dataset by Ross Wightman. Trained with timm scripts using hyper-parameters inspired by the MobileNet-V4 small, mixed with go-to hparams from timm and "ResNet Strikes Back".
efficientnet_b2_ra_imagenet - 9.11M EfficientNet B2 model pre-trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_el_ra_imagenet - 10.59M EfficientNet-EdgeTPU Large model trained on the ImageNet 1k dataset with RandAugment recipe.
efficientnet_b3_ra2_imagenet - 12.23M EfficientNet B3 model pre-trained on the ImageNet 1k dataset with RandAugment2 recipe.
efficientnet_b4_ra2_imagenet - 19.34M EfficientNet B4 model pre-trained on the ImageNet 1k dataset with RandAugment2 recipe.
efficientnet_b5_sw_imagenet - 30.39M EfficientNet B5 model pre-trained on the ImageNet 12k dataset by Ross Wightman. Based on Swin Transformer train / pretrain recipe with modifications (related to both DeiT and ConvNeXt recipes).
efficientnet_b5_sw_ft_imagenet - 30.39M EfficientNet B5 model pre-trained on the ImageNet 12k dataset and fine-tuned on ImageNet-1k by Ross Wightman. Based on Swin Transformer train / pretrain recipe with modifications (related to both DeiT and ConvNeXt recipes).
electra_small_discriminator_uncased_en Electra 13.55M 12-layer small ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_small_generator_uncased_en Electra 13.55M 12-layer small ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_base_generator_uncased_en Electra 33.58M 12-layer base ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_large_generator_uncased_en Electra 51.07M 24-layer large ELECTRA generator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_base_discriminator_uncased_en Electra 109.48M 12-layer base ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
electra_large_discriminator_uncased_en Electra 335.14M 24-layer large ELECTRA discriminator model. All inputs are lowercased. Trained on English Wikipedia + BooksCorpus.
f_net_base_en FNet 82.86M 12-layer FNet model where case is maintained. Trained on the C4 dataset.
f_net_large_en FNet 236.95M 24-layer FNet model where case is maintained. Trained on the C4 dataset.
falcon_refinedweb_1b_en Falcon 1.31B 24-layer Falcon model (Falcon with 1B parameters), trained on 350B tokens of RefinedWeb dataset.
schnell - 124.44M A 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
gemma_2b_en Gemma 2.51B 2 billion parameter, 18-layer, base Gemma model.
gemma_instruct_2b_en Gemma 2.51B 2 billion parameter, 18-layer, instruction tuned Gemma model.
gemma_1.1_instruct_2b_en Gemma 2.51B 2 billion parameter, 18-layer, instruction tuned Gemma model. The 1.1 update improves model quality.
code_gemma_1.1_2b_en Gemma 2.51B 2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. The 1.1 update improves model quality.
code_gemma_2b_en Gemma 2.51B 2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion.
gemma2_2b_en Gemma 2.61B 2 billion parameter, 26-layer, base Gemma model.
gemma2_instruct_2b_en Gemma 2.61B 2 billion parameter, 26-layer, instruction tuned Gemma model.
shieldgemma_2b_en Gemma 2.61B 2 billion parameter, 26-layer, ShieldGemma model.
gemma_7b_en Gemma 8.54B 7 billion parameter, 28-layer, base Gemma model.
gemma_instruct_7b_en Gemma 8.54B 7 billion parameter, 28-layer, instruction tuned Gemma model.
gemma_1.1_instruct_7b_en Gemma 8.54B 7 billion parameter, 28-layer, instruction tuned Gemma model. The 1.1 update improves model quality.
code_gemma_7b_en Gemma 8.54B 7 billion parameter, 28-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion.
code_gemma_instruct_7b_en Gemma 8.54B 7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code.
code_gemma_1.1_instruct_7b_en Gemma 8.54B 7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. The 1.1 update improves model quality.
gemma2_9b_en Gemma 9.24B 9 billion parameter, 42-layer, base Gemma model.
gemma2_instruct_9b_en Gemma 9.24B 9 billion parameter, 42-layer, instruction tuned Gemma model.
shieldgemma_9b_en Gemma 9.24B 9 billion parameter, 42-layer, ShieldGemma model.
gemma2_27b_en Gemma 27.23B 27 billion parameter, 42-layer, base Gemma model.
gemma2_instruct_27b_en Gemma 27.23B 27 billion parameter, 42-layer, instruction tuned Gemma model.
shieldgemma_27b_en Gemma 27.23B 27 billion parameter, 42-layer, ShieldGemma model.
gpt2_base_en GPT2 124.44M 12-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_base_en_cnn_dailymail GPT2 124.44M 12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset.
gpt2_medium_en GPT2 354.82M 24-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_large_en GPT2 774.03M 36-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_extra_large_en GPT2 1.56B 48-layer GPT-2 model where case is maintained. Trained on WebText.
llama2_7b_en Llama 6.74B 7 billion parameter, 32-layer, base LLaMA 2 model.
llama2_instruct_7b_en Llama 6.74B 7 billion parameter, 32-layer, instruction tuned LLaMA 2 model.
vicuna_1.5_7b_en Llama 6.74B 7 billion parameter, 32-layer, instruction tuned Vicuna v1.5 model.
llama2_7b_en_int8 Llama 6.74B 7 billion parameter, 32-layer, base LLaMA 2 model with activation and weights quantized to int8.
llama2_instruct_7b_en_int8 Llama 6.74B 7 billion parameter, 32-layer, instruction tuned LLaMA 2 model with activation and weights quantized to int8.
llama3_8b_en Llama3 8.03B 8 billion parameter, 32-layer, base LLaMA 3 model.
llama3_instruct_8b_en Llama3 8.03B 8 billion parameter, 32-layer, instruction tuned LLaMA 3 model.
llama3_8b_en_int8 Llama3 8.03B 8 billion parameter, 32-layer, base LLaMA 3 model with activation and weights quantized to int8.
llama3_instruct_8b_en_int8 Llama3 8.03B 8 billion parameter, 32-layer, instruction tuned LLaMA 3 model with activation and weights quantized to int8.
mistral_7b_en Mistral 7.24B Mistral 7B base model
mistral_instruct_7b_en Mistral 7.24B Mistral 7B instruct model
mistral_0.2_instruct_7b_en Mistral 7.24B Mistral 7B instruct Version 0.2 model
mit_b0_ade20k_512 MiT 3.32M MiT (MixTransformer) model with 8 transformer blocks.
mit_b0_cityscapes_1024 MiT 3.32M MiT (MixTransformer) model with 8 transformer blocks.
mit_b1_ade20k_512 MiT 13.16M MiT (MixTransformer) model with 8 transformer blocks.
mit_b1_cityscapes_1024 MiT 13.16M MiT (MixTransformer) model with 8 transformer blocks.
mit_b2_ade20k_512 MiT 24.20M MiT (MixTransformer) model with 16 transformer blocks.
mit_b2_cityscapes_1024 MiT 24.20M MiT (MixTransformer) model with 16 transformer blocks.
mit_b3_ade20k_512 MiT 44.08M MiT (MixTransformer) model with 28 transformer blocks.
mit_b3_cityscapes_1024 MiT 44.08M MiT (MixTransformer) model with 28 transformer blocks.
mit_b4_ade20k_512 MiT 60.85M MiT (MixTransformer) model with 41 transformer blocks.
mit_b4_cityscapes_1024 MiT 60.85M MiT (MixTransformer) model with 41 transformer blocks.
mit_b5_ade20k_640 MiT 81.45M MiT (MixTransformer) model with 52 transformer blocks.
mit_b5_cityscapes_1024 MiT 81.45M MiT (MixTransformer) model with 52 transformer blocks.
opt_125m_en OPT 125.24M 12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_1.3b_en OPT 1.32B 24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_2.7b_en OPT 2.70B 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_6.7b_en OPT 6.70B 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
pali_gemma_3b_mix_224 PaliGemma 2.92B image size 224, mix fine tuned, text sequence length is 256
pali_gemma_3b_224 PaliGemma 2.92B image size 224, pre trained, text sequence length is 128
pali_gemma_3b_mix_448 PaliGemma 2.92B image size 448, mix fine tuned, text sequence length is 512
pali_gemma_3b_448 PaliGemma 2.92B image size 448, pre trained, text sequence length is 512
pali_gemma_3b_896 PaliGemma 2.93B image size 896, pre trained, text sequence length is 512
pali_gemma2_pt_3b_224 - 3.03B 3 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_3b_ft_docci_448 - 3.03B 3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details.
pali_gemma2_pt_3b_448 - 3.03B 3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_3b_896 - 3.04B 3 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_10b_224 - 9.66B 10 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_28b_224 - 9.66B 28 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_10b_ft_docci_448 - 9.66B 10 billion parameter, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details.
pali_gemma2_pt_10b_448 - 9.66B 10 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_28b_448 - 9.66B 28 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_10b_896 - 9.67B 10 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_28b_896 - 9.67B 28 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
phi3_mini_4k_instruct_en Phi3 3.82B 3.8 billion parameters, 32 layers, 4k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
phi3_mini_128k_instruct_en Phi3 3.82B 3.8 billion parameters, 32 layers, 128k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
resnet_18_imagenet ResNet 11.19M 18-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_18_imagenet ResNet 11.72M 18-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_34_imagenet ResNet 21.84M 34-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_50_imagenet ResNet 23.56M 50-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_v2_50_imagenet ResNet 23.56M 50-layer ResNetV2 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_50_imagenet ResNet 25.63M 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_50_ssld_imagenet ResNet 25.63M 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation.
resnet_vd_50_ssld_v2_imagenet ResNet 25.63M 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation and AutoAugment.
resnet_vd_50_ssld_v2_fix_imagenet ResNet 25.63M 50-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation, AutoAugment and additional fine-tuning of the classification head.
resnet_101_imagenet ResNet 42.61M 101-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_v2_101_imagenet ResNet 42.61M 101-layer ResNetV2 model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_101_imagenet ResNet 44.67M 101-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_101_ssld_imagenet ResNet 44.67M 101-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution with knowledge distillation.
resnet_152_imagenet ResNet 58.30M 152-layer ResNet model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_152_imagenet ResNet 60.36M 152-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
resnet_vd_200_imagenet ResNet 74.93M 200-layer ResNetVD (ResNet with bag of tricks) model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
retinanet_resnet50_fpn_coco - 34.12M RetinaNet model with ResNet50 backbone fine-tuned on COCO in 800x800 resolution.
roberta_base_en Roberta 124.05M 12-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText.
roberta_large_en Roberta 354.31M 24-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText.
sam_base_sa1b Segment Anything Model 93.74M The base SAM model trained on the SA1B dataset.
sam_huge_sa1b Segment Anything Model 312.34M The huge SAM model trained on the SA1B dataset.
sam_large_sa1b Segment Anything Model 641.09M The large SAM model trained on the SA1B dataset.
stable_diffusion_3_medium Stable Diffusion 3 2.99B 3 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.
stable_diffusion_3.5_large Stable Diffusion 3 9.05B 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI.
stable_diffusion_3.5_large_turbo Stable Diffusion 3 9.05B 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. A timestep-distilled version that eliminates classifier-free guidance and uses fewer steps for generation. Developed by Stability AI.
t5_small_multi T5 0 8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
t5_base_multi T5 0 12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
t5_large_multi T5 0 24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
flan_small_multi T5 0 8-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
flan_base_multi T5 0 12-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
flan_large_multi T5 0 24-layer T5 model. Trained on the Colossal Clean Crawled Corpus (C4).
t5_1.1_small T5 60.51M
t5_1.1_base T5 247.58M
t5_1.1_large T5 750.25M
t5_1.1_xl T5 2.85B
t5_1.1_xxl T5 11.14B
vgg_11_imagenet VGG 9.22M 11-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
vgg_13_imagenet VGG 9.40M 13-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
vgg_16_imagenet VGG 14.71M 16-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
vgg_19_imagenet VGG 20.02M 19-layer vgg model pre-trained on the ImageNet 1k dataset at a 224x224 resolution.
whisper_tiny_en Whisper 37.18M 4-layer Whisper model. Trained on 438,000 hours of labelled English speech data.
whisper_tiny_multi Whisper 37.76M 4-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_base_multi Whisper 72.59M 6-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_base_en Whisper 124.44M 6-layer Whisper model. Trained on 438,000 hours of labelled English speech data.
whisper_small_en Whisper 241.73M 12-layer Whisper model. Trained on 438,000 hours of labelled English speech data.
whisper_small_multi Whisper 241.73M 12-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_medium_en Whisper 763.86M 24-layer Whisper model. Trained on 438,000 hours of labelled English speech data.
whisper_medium_multi Whisper 763.86M 24-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_large_multi Whisper 1.54B 32-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data.
whisper_large_multi_v2 Whisper 1.54B 32-layer Whisper model. Trained for 2.5 epochs on 680,000 hours of labelled multilingual speech data. An improved of whisper_large_multi.
xlm_roberta_base_multi XLMRoberta 277.45M 12-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages.
xlm_roberta_large_multi XLMRoberta 558.84M 24-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages.