The RetinaNet model


RetinaNet class


A Keras model implementing the RetinaNet meta-architecture.

Implements the RetinaNet architecture for object detection. The constructor requires num_classes, bounding_box_format, and a backbone. Optionally, a custom label encoder, and prediction decoder may be provided.


images = np.ones((1, 512, 512, 3))
labels = {
    "boxes": tf.cast([
            [0, 0, 100, 100],
            [100, 100, 200, 200],
            [300, 300, 100, 100],
    ], dtype=tf.float32),
    "classes": tf.cast([[1, 1, 1]], dtype=tf.float32),
model = keras_cv.models.RetinaNet(

# Evaluate model without box decoding and NMS

# Prediction with box decoding and NMS

# Train model
), labels)


  • num_classes: the number of classes in your dataset excluding the background class. Classes should be represented by integers in the range [0, num_classes).
  • bounding_box_format: The format of bounding boxes of input dataset. Refer to the docs for more details on supported bounding box formats.
  • backbone: keras.Model. If the default feature_pyramid is used, must implement the pyramid_level_inputs property with keys "P3", "P4", and "P5" and layer names as values. A somewhat sensible backbone to use in many cases is the: keras_cv.models.ResNetBackbone.from_preset("resnet50_imagenet")
  • anchor_generator: (Optional) a keras_cv.layers.AnchorGenerator. If provided, the anchor generator will be passed to both the label_encoder and the prediction_decoder. Only to be used when both label_encoder and prediction_decoder are both None. Defaults to an anchor generator with the parameterization: strides=[2**i for i in range(3, 8)], scales=[2**x for x in [0, 1 / 3, 2 / 3]], sizes=[32.0, 64.0, 128.0, 256.0, 512.0], and aspect_ratios=[0.5, 1.0, 2.0].
  • label_encoder: (Optional) a keras.Layer that accepts an image Tensor, a bounding box Tensor and a bounding box class Tensor to its call() method, and returns RetinaNet training targets. By default, a KerasCV standard RetinaNetLabelEncoder is created and used. Results of this object's call() method are passed to the loss object for box_loss and classification_loss the y_true argument.
  • prediction_decoder: (Optional) A keras.layers.Layer that is responsible for transforming RetinaNet predictions into usable bounding box Tensors. If not provided, a default is provided. The default prediction_decoder layer is a keras_cv.layers.MultiClassNonMaxSuppression layer, which uses a Non-Max Suppression for box pruning.
  • feature_pyramid: (Optional) A keras.layers.Layer that produces a list of 4D feature maps (batch dimension included) when called on the pyramid-level outputs of the backbone. If not provided, the reference implementation from the paper will be used.
  • classification_head: (Optional) A keras.Layer that performs classification of the bounding boxes. If not provided, a simple ConvNet with 3 layers will be used.
  • box_head: (Optional) A keras.Layer that performs regression of the bounding boxes. If not provided, a simple ConvNet with 3 layers will be used.


from_preset method


Instantiate RetinaNet model from preset config and weights.


  • preset: string. Must be one of "resnet18", "resnet34", "resnet50", "resnet101", "resnet152", "resnet18_v2", "resnet34_v2", "resnet50_v2", "resnet101_v2", "resnet152_v2", "mobilenet_v3_small", "mobilenet_v3_large", "csp_darknet_tiny", "csp_darknet_s", "csp_darknet_m", "csp_darknet_l", "csp_darknet_xl", "efficientnetv1_b0", "efficientnetv1_b1", "efficientnetv1_b2", "efficientnetv1_b3", "efficientnetv1_b4", "efficientnetv1_b5", "efficientnetv1_b6", "efficientnetv1_b7", "efficientnetv2_s", "efficientnetv2_m", "efficientnetv2_l", "efficientnetv2_b0", "efficientnetv2_b1", "efficientnetv2_b2", "efficientnetv2_b3", "densenet121", "densenet169", "densenet201", "efficientnetlite_b0", "efficientnetlite_b1", "efficientnetlite_b2", "efficientnetlite_b3", "efficientnetlite_b4", "yolo_v8_xs_backbone", "yolo_v8_s_backbone", "yolo_v8_m_backbone", "yolo_v8_l_backbone", "yolo_v8_xl_backbone", "vitdet_base", "vitdet_large", "vitdet_huge", "resnet50_imagenet", "resnet50_v2_imagenet", "mobilenet_v3_large_imagenet", "mobilenet_v3_small_imagenet", "csp_darknet_tiny_imagenet", "csp_darknet_l_imagenet", "efficientnetv2_s_imagenet", "efficientnetv2_b0_imagenet", "efficientnetv2_b1_imagenet", "efficientnetv2_b2_imagenet", "densenet121_imagenet", "densenet169_imagenet", "densenet201_imagenet", "yolo_v8_xs_backbone_coco", "yolo_v8_s_backbone_coco", "yolo_v8_m_backbone_coco", "yolo_v8_l_backbone_coco", "yolo_v8_xl_backbone_coco", "vitdet_base_sa1b", "vitdet_large_sa1b", "vitdet_huge_sa1b", "retinanet_resnet50_pascalvoc". If looking for a preset with pretrained weights, choose one of "resnet50_imagenet", "resnet50_v2_imagenet", "mobilenet_v3_large_imagenet", "mobilenet_v3_small_imagenet", "csp_darknet_tiny_imagenet", "csp_darknet_l_imagenet", "efficientnetv2_s_imagenet", "efficientnetv2_b0_imagenet", "efficientnetv2_b1_imagenet", "efficientnetv2_b2_imagenet", "densenet121_imagenet", "densenet169_imagenet", "densenet201_imagenet", "yolo_v8_xs_backbone_coco", "yolo_v8_s_backbone_coco", "yolo_v8_m_backbone_coco", "yolo_v8_l_backbone_coco", "yolo_v8_xl_backbone_coco", "vitdet_base_sa1b", "vitdet_large_sa1b", "vitdet_huge_sa1b", "retinanet_resnet50_pascalvoc".
  • load_weights: Whether to load pre-trained weights into model. Defaults to None, which follows whether the preset has pretrained weights available.
  • input_shape : input shape that will be passed to backbone initialization, Defaults to None.If None, the preset value will be used.


# Load architecture and weights from preset
model = keras_cv.models.RetinaNet.from_preset(

# Load randomly initialized model from preset architecture with weights
model = keras_cv.models.RetinaNet.from_preset(
Preset name Parameters Description
retinanet_resnet50_pascalvoc 35.60M RetinaNet with a ResNet50 v1 backbone. Trained on PascalVOC 2012 object detection task, which consists of 20 classes. This model achieves a final MaP of 0.33 on the evaluation set.