Keras 3 API documentation / KerasCV / KerasCV Models

KerasCV Models

KerasCV contains end-to-end implementations of popular model architectures. These models can be created in two ways:

  • Through the from_preset() constructor, which instantiates an object with a pre-trained configuration, and (optionally) weights. Available preset names are listed on this page.
model = keras_cv.models.RetinaNet.from_preset(
    "resnet50_v2_imagenet",
    num_classes=20,
    bounding_box_format="xywh",
)
  • Through custom configuration controlled by the user. To do this, simply pass the desired configuration parameters to the default constructors of the symbols documented below.
backbone = keras_cv.models.ResNetBackbone(
    stackwise_filters=[64, 128, 256, 512],
    stackwise_blocks=[2, 2, 2, 2],
    stackwise_strides=[1, 2, 2, 2],
    include_rescaling=False,
)
model = keras_cv.models.RetinaNet(
    backbone=backbone,
    num_classes=20,
    bounding_box_format="xywh",
)

Backbone presets

Each of the following preset name corresponds to a configuration and weights for a backbone model.

The names below can be used with the from_preset() constructor for the corresponding backbone model.

backbone = keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")

For brevity, we do not include the presets without pretrained weights in the following table.

Note: All pretrained weights should be used with unnormalized pixel intensities in the range [0, 255] if include_rescaling=True or in the range [0, 1] if including_rescaling=False.

Preset name Model Parameters Description
csp_darknet_tiny_imagenet CSPDarkNet 2.38M CSPDarkNet model with [48, 96, 192, 384] channels and [1, 3, 3, 1] depths where the batch normalization and SiLU activation are applied after the convolution layers. Trained on Imagenet 2012 classification task.
csp_darknet_l_imagenet CSPDarkNet 27.11M CSPDarkNet model with [128, 256, 512, 1024] channels and [3, 9, 9, 3] depths where the batch normalization and SiLU activation are applied after the convolution layers. Trained on Imagenet 2012 classification task.
densenet121_imagenet Unknown Unknown DenseNet model with 121 layers. Trained on Imagenet 2012 classification task.
densenet169_imagenet Unknown Unknown DenseNet model with 169 layers. Trained on Imagenet 2012 classification task.
densenet201_imagenet Unknown Unknown DenseNet model with 201 layers. Trained on Imagenet 2012 classification task.
efficientnetv2_b0_imagenet EfficientNetV2 5.92M EfficientNet B-style architecture with 6 convolutional blocks. This B-style model has width_coefficient=1.0 and depth_coefficient=1.0. Weights are initialized to pretrained imagenet classification weights. Published weights are capable of scoring 77.1% top 1 accuracy and 93.3% top 5 accuracy on imagenet.
efficientnetv2_b1_imagenet EfficientNetV2 6.93M EfficientNet B-style architecture with 6 convolutional blocks. This B-style model has width_coefficient=1.0 and depth_coefficient=1.1. Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 79.1% top 1 accuracy and 94.4% top 5 accuracy on imagenet.
efficientnetv2_b2_imagenet EfficientNetV2 8.77M EfficientNet B-style architecture with 6 convolutional blocks. This B-style model has width_coefficient=1.1 and depth_coefficient=1.2. Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 80.1% top 1 accuracy and 94.9% top 5 accuracy on imagenet.
efficientnetv2_s_imagenet EfficientNetV2 20.33M EfficientNet architecture with 6 convolutional blocks. Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 83.9%top 1 accuracy and 96.7% top 5 accuracy on imagenet.
mit_b0_imagenet MiT 3.32M MiT (MixTransformer) model with 8 transformer blocks. Pre-trained on ImageNet-1K and scores 69% top-1 accuracy on the validation set.
mobilenet_v3_large_imagenet MobileNetV3 2.99M MobileNetV3 model with 28 layers where the batch normalization and hard-swish activation are applied after the convolution layers. Pre-trained on the ImageNet 2012 classification task.
mobilenet_v3_small_imagenet MobileNetV3 933.50K MobileNetV3 model with 14 layers where the batch normalization and hard-swish activation are applied after the convolution layers. Pre-trained on the ImageNet 2012 classification task.
resnet50_imagenet ResNetV1 23.56M ResNet model with 50 layers where the batch normalization and ReLU activation are applied after the convolution layers (v1 style). Trained on Imagenet 2012 classification task.
resnet50_v2_imagenet ResNetV2 23.56M ResNet model with 50 layers where the batch normalization and ReLU activation precede the convolution layers (v2 style). Trained on Imagenet 2012 classification task.
vitdet_base_sa1b VitDet 89.67M A base Detectron2 ViT backbone trained on the SA1B dataset.
vitdet_large_sa1b VitDet 308.28M A large Detectron2 ViT backbone trained on the SA1B dataset.
vitdet_huge_sa1b VitDet 637.03M A huge Detectron2 ViT backbone trained on the SA1B dataset.
yolo_v8_xs_backbone_coco YOLOV8 1.28M An extra small YOLOV8 backbone pretrained on COCO
yolo_v8_s_backbone_coco YOLOV8 5.09M A small YOLOV8 backbone pretrained on COCO
yolo_v8_m_backbone_coco YOLOV8 11.87M A medium YOLOV8 backbone pretrained on COCO
yolo_v8_l_backbone_coco YOLOV8 19.83M A large YOLOV8 backbone pretrained on COCO
yolo_v8_xl_backbone_coco YOLOV8 30.97M An extra large YOLOV8 backbone pretrained on COCO

Task presets

Each of the following preset name corresponds to a configuration and weights for a task model. These models are application-ready, but can be further fine-tuned if desired.

The names below can be used with the from_preset() constructor for the corresponding task models.

object_detector = keras_cv.models.RetinaNet.from_preset(
    "retinanet_resnet50_pascalvoc",
    bounding_box_format="xywh",
)

Note that all backbone presets are also applicable to the tasks. For example, you can directly use a ResNetBackbone preset with the RetinaNet. In this case, fine-tuning is necessary since task-specific layers will be randomly initialized.

backbone = keras_cv.models.RetinaNet.from_preset(
    "resnet50_imagenet",
    bounding_box_format="xywh",
)

For brevity, we do not include the backbone presets in the following table.

Note: All pretrained weights should be used with unnormalized pixel intensities in the range [0, 255] if include_rescaling=True or in the range [0, 1] if including_rescaling=False.

Preset name Model Parameters Description
resnet50_v2_imagenet_classifier ImageClassifier 25.61M ResNet classifier with 50 layers where the batch normalization and ReLU activation precede the convolution layers (v2 style). Trained on Imagenet 2012 classification task.
efficientnetv2_s_imagenet_classifier ImageClassifier 21.61M ImageClassifier using the EfficientNet smallarchitecture. In this variant of the EfficientNet architecture, there are 6 convolutional blocks. Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 83.9% top 1 accuracy and 96.7% top 5 accuracy on imagenet.
efficientnetv2_b0_imagenet_classifier ImageClassifier 7.20M ImageClassifier using the EfficientNet B0 architecture. In this variant of the EfficientNet architecture, there are 6 convolutional blocks. As with all of the B style EfficientNet variants, the number of filters in each convolutional block is scaled by width_coefficient=1.0 and depth_coefficient=1.0. Weights are initialized to pretrained imagenet classification weights. Published weights are capable of scoring 77.1% top 1 accuracy and 93.3% top 5 accuracy on imagenet.
efficientnetv2_b1_imagenet_classifier ImageClassifier 8.21M ImageClassifier using the EfficientNet B1 architecture. In this variant of the EfficientNet architecture, there are 6 convolutional blocks. As with all of the B style EfficientNet variants, the number of filters in each convolutional block is scaled by width_coefficient=1.0 and depth_coefficient=1.1. Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 79.1% top 1 accuracy and 94.4% top 5 accuracy on imagenet.
efficientnetv2_b2_imagenet_classifier ImageClassifier 10.18M ImageClassifier using the EfficientNet B2 architecture. In this variant of the EfficientNet architecture, there are 6 convolutional blocks. As with all of the B style EfficientNet variants, the number of filters in each convolutional block is scaled by width_coefficient=1.1 and depth_coefficient1.2. Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 80.1% top 1 accuracy and 94.9% top 5 accuracy on imagenet.
mobilenet_v3_large_imagenet_classifier ImageClassifier 3.96M ImageClassifier using the MobileNetV3Large architecture. This preset uses a Dense layer as a classification head instead of the typical fully-convolutional MobileNet head. As a result, it has fewer parameters than the original MobileNetV3Large model, which has 5.4 million parameters.Published weights are capable of scoring 69.4% top-1 accuracy and 89.4% top 5 accuracy on imagenet.
retinanet_resnet50_pascalvoc RetinaNet 35.60M RetinaNet with a ResNet50 v1 backbone. Trained on PascalVOC 2012 object detection task, which consists of 20 classes. This model achieves a final MaP of 0.33 on the evaluation set.
yolo_v8_m_pascalvoc YOLOV8Detector 25.90M YOLOV8-M pretrained on PascalVOC 2012 object detection task, which consists of 20 classes. This model achieves a final MaP of 0.45 on the evaluation set.
deeplab_v3_plus_resnet50_pascalvoc DeepLabV3Plus 39.19M DeeplabV3Plus with a ResNet50 v2 backbone. Trained on PascalVOC 2012 Semantic segmentation task, which consists of 20 classes and one background class. This model achieves a final categorical accuracy of 89.34% and mIoU of 0.6391 on evaluation dataset. This preset is only comptabile with Keras 3.
segformer_b0_imagenet SegFormerB0 3.72M SegFormer model with a pretrained MiTB0 backbone.
sam_base_sa1b SAM 93.74M The base SAM model trained on the SA1B dataset.
sam_large_sa1b SAM 312.34M The large SAM model trained on the SA1B dataset.
sam_huge_sa1b SAM 641.09M The huge SAM model trained on the SA1B dataset.

API Documentation

Tasks

Backbones