SAMMaskDecoder layer

[source]

SAMMaskDecoder class

keras_hub.layers.SAMMaskDecoder(
    hidden_size,
    num_layers,
    intermediate_dim,
    num_heads,
    embedding_dim=256,
    num_multimask_outputs=3,
    iou_head_depth=3,
    iou_head_hidden_dim=256,
    activation="gelu",
    **kwargs
)

Mask decoder for the Segment Anything Model (SAM).

This lightweight module efficiently maps the image embedding and a set of prompt embeddings to an output mask. Before applying the transformer decoder, the layer first inserts into the set of prompt embeddings a learned output token embedding that will be used at the decoder's output. For simplicity, these embeddings (not including the image embedding) are collectively called "tokens".

The image embeddings, positional image embeddings, and tokens are passed through a transformer decoder. After running the decoder, the layer upsamples the updated image embedding by 4x with two transposed convolutional layers (now it's downscaled 4x relative to the input image). Then, the tokens attend once more to the image embedding and the updated output token embedding are passed to a small 3-layer MLP that outputs a vector matching the channel dimension of the upscaled image embedding.

Finally, a mask is predicted with a spatially point-wise product between the upscaled image embedding and the MLP's output.

Arguments

  • hidden_size: int. The hidden size of the TwoWayTransformer.
  • num_layers: int. The number of layers in the TwoWayTransformer.
  • intermediate_dim: int. The intermediate dimension of the TwoWayTransformer.
  • num_heads: int. The number of heads in the TwoWayTransformer.
  • embedding_dim: int, optional. The number of input features to the transformer decoder. Defaults to 256.
  • num_multimask_outputs: int, optional. Number of multimask outputs. The model would generate these many extra masks. The total masks generated by the model are 1 + num_multimask_outputs. Defaults to 3.
  • iou_head_depth: int, optional. The depth of the dense net used to predict the IoU confidence score. Defaults to 3.
  • iou_head_hidden_dim: int, optional. The number of units in the hidden layers used in the dense net to predict the IoU confidence score. Defaults to 256.
  • activation: str, optional. Activation to use in the mask upscaler network. Defaults to "gelu".