KerasCV contains end-to-end implementations of popular model architectures. These models can be created in two ways:
from_preset()
constructor, which instantiates an object with
a pre-trained configuration, and (optionally) weights.
Available preset names are listed on this page.model = keras_cv.models.RetinaNet.from_preset(
"resnet50_v2_imagenet",
num_classes=20,
bounding_box_format="xywh",
)
backbone = keras_cv.models.ResNetBackbone(
stackwise_filters=[64, 128, 256, 512],
stackwise_blocks=[2, 2, 2, 2],
stackwise_strides=[1, 2, 2, 2],
include_rescaling=False,
)
model = keras_cv.models.RetinaNet(
backbone=backbone,
num_classes=20,
bounding_box_format="xywh",
)
Each of the following preset name corresponds to a configuration and weights for a backbone model.
The names below can be used with the from_preset()
constructor for the
corresponding backbone model.
backbone = keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")
For brevity, we do not include the presets without pretrained weights in the following table.
Note: All pretrained weights should be used with unnormalized pixel
intensities in the range [0, 255]
if include_rescaling=True
or in the range
[0, 1]
if including_rescaling=False
.
Preset name | Model | Parameters | Description |
---|---|---|---|
csp_darknet_tiny_imagenet | CSPDarkNet | 2.38M | CSPDarkNet model with [48, 96, 192, 384] channels and [1, 3, 3, 1] depths where the batch normalization and SiLU activation are applied after the convolution layers. Trained on Imagenet 2012 classification task. |
csp_darknet_l_imagenet | CSPDarkNet | 27.11M | CSPDarkNet model with [128, 256, 512, 1024] channels and [3, 9, 9, 3] depths where the batch normalization and SiLU activation are applied after the convolution layers. Trained on Imagenet 2012 classification task. |
densenet121_imagenet | Unknown | Unknown | DenseNet model with 121 layers. Trained on Imagenet 2012 classification task. |
densenet169_imagenet | Unknown | Unknown | DenseNet model with 169 layers. Trained on Imagenet 2012 classification task. |
densenet201_imagenet | Unknown | Unknown | DenseNet model with 201 layers. Trained on Imagenet 2012 classification task. |
efficientnetv2_b0_imagenet | EfficientNetV2 | 5.92M | EfficientNet B-style architecture with 6 convolutional blocks. This B-style model has width_coefficient=1.0 and depth_coefficient=1.0 . Weights are initialized to pretrained imagenet classification weights. Published weights are capable of scoring 77.1% top 1 accuracy and 93.3% top 5 accuracy on imagenet. |
efficientnetv2_b1_imagenet | EfficientNetV2 | 6.93M | EfficientNet B-style architecture with 6 convolutional blocks. This B-style model has width_coefficient=1.0 and depth_coefficient=1.1 . Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 79.1% top 1 accuracy and 94.4% top 5 accuracy on imagenet. |
efficientnetv2_b2_imagenet | EfficientNetV2 | 8.77M | EfficientNet B-style architecture with 6 convolutional blocks. This B-style model has width_coefficient=1.1 and depth_coefficient=1.2 . Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 80.1% top 1 accuracy and 94.9% top 5 accuracy on imagenet. |
efficientnetv2_s_imagenet | EfficientNetV2 | 20.33M | EfficientNet architecture with 6 convolutional blocks. Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 83.9%top 1 accuracy and 96.7% top 5 accuracy on imagenet. |
mobilenet_v3_large_imagenet | MobileNetV3 | 2.99M | MobileNetV3 model with 28 layers where the batch normalization and hard-swish activation are applied after the convolution layers. Pre-trained on the ImageNet 2012 classification task. |
resnet50_imagenet | ResNetV1 | 23.56M | ResNet model with 50 layers where the batch normalization and ReLU activation are applied after the convolution layers (v1 style). Trained on Imagenet 2012 classification task. |
resnet50_v2_imagenet | ResNetV2 | 23.56M | ResNet model with 50 layers where the batch normalization and ReLU activation precede the convolution layers (v2 style). Trained on Imagenet 2012 classification task. |
yolo_v8_xs_backbone_coco | YOLOV8 | 1.28M | An extra small YOLOV8 backbone pretrained on COCO |
yolo_v8_s_backbone_coco | YOLOV8 | 5.09M | A small YOLOV8 backbone pretrained on COCO |
yolo_v8_m_backbone_coco | YOLOV8 | 11.87M | A medium YOLOV8 backbone pretrained on COCO |
yolo_v8_l_backbone_coco | YOLOV8 | 19.83M | A large YOLOV8 backbone pretrained on COCO |
yolo_v8_xl_backbone_coco | YOLOV8 | 30.97M | An extra large YOLOV8 backbone pretrained on COCO |
Each of the following preset name corresponds to a configuration and weights for a task model. These models are application-ready, but can be further fine-tuned if desired.
The names below can be used with the from_preset()
constructor for the
corresponding task models.
object_detector = keras_cv.models.RetinaNet.from_preset(
"retinanet_resnet50_pascalvoc",
bounding_box_format="xywh",
)
Note that all backbone presets are also applicable to the tasks. For example,
you can directly use a ResNetBackbone
preset with the RetinaNet
. In this
case, fine-tuning is necessary since task-specific layers will be randomly
initialized.
backbone = keras_cv.models.RetinaNet.from_preset(
"resnet50_imagenet",
bounding_box_format="xywh",
)
For brevity, we do not include the backbone presets in the following table.
Note: All pretrained weights should be used with unnormalized pixel
intensities in the range [0, 255]
if include_rescaling=True
or in the range
[0, 1]
if including_rescaling=False
.
Preset name | Model | Parameters | Description |
---|---|---|---|
resnet50_v2_imagenet_classifier | ImageClassifier | 25.61M | ResNet classifier with 50 layers where the batch normalization and ReLU activation precede the convolution layers (v2 style). Trained on Imagenet 2012 classification task. |
efficientnetv2_s_imagenet_classifier | ImageClassifier | 21.61M | ImageClassifier using the EfficientNet smallarchitecture. In this variant of the EfficientNet architecture, there are 6 convolutional blocks. Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 83.9% top 1 accuracy and 96.7% top 5 accuracy on imagenet. |
efficientnetv2_b0_imagenet_classifier | ImageClassifier | 7.20M | ImageClassifier using the EfficientNet B0 architecture. In this variant of the EfficientNet architecture, there are 6 convolutional blocks. As with all of the B style EfficientNet variants, the number of filters in each convolutional block is scaled by width_coefficient=1.0 and depth_coefficient=1.0 . Weights are initialized to pretrained imagenet classification weights. Published weights are capable of scoring 77.1% top 1 accuracy and 93.3% top 5 accuracy on imagenet. |
efficientnetv2_b1_imagenet_classifier | ImageClassifier | 8.21M | ImageClassifier using the EfficientNet B1 architecture. In this variant of the EfficientNet architecture, there are 6 convolutional blocks. As with all of the B style EfficientNet variants, the number of filters in each convolutional block is scaled by width_coefficient=1.0 and depth_coefficient=1.1 . Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 79.1% top 1 accuracy and 94.4% top 5 accuracy on imagenet. |
efficientnetv2_b2_imagenet_classifier | ImageClassifier | 10.18M | ImageClassifier using the EfficientNet B2 architecture. In this variant of the EfficientNet architecture, there are 6 convolutional blocks. As with all of the B style EfficientNet variants, the number of filters in each convolutional block is scaled by width_coefficient=1.1 and depth_coefficient1.2 . Weights are initialized to pretrained imagenet classification weights.Published weights are capable of scoring 80.1% top 1 accuracy and 94.9% top 5 accuracy on imagenet. |
mobilenet_v3_large_imagenet_classifier | ImageClassifier | 3.96M | ImageClassifier using the MobileNetV3Large architecture. This preset uses a Dense layer as a classification head instead of the typical fully-convolutional MobileNet head. As a result, it has fewer parameters than the original MobileNetV3Large model, which has 5.4 million parameters.Published weights are capable of scoring 69.4% top-1 accuracy and 89.4% top 5 accuracy on imagenet. |
retinanet_resnet50_pascalvoc | RetinaNet | 35.60M | RetinaNet with a ResNet50 v1 backbone. Trained on PascalVOC 2012 object detection task, which consists of 20 classes. This model achieves a final MaP of 0.33 on the evaluation set. |
yolo_v8_m_pascalvoc | YOLOV8Detector | 25.90M | YOLOV8-M pretrained on PascalVOC 2012 object detection task, which consists of 20 classes. This model achieves a final MaP of 0.45 on the evaluation set. |