SAMPromptEncoder layer

[source]

SAMPromptEncoder class

keras_hub.layers.SAMPromptEncoder(
    hidden_size=256,
    image_embedding_size=(64, 64),
    input_image_size=(1024, 1024),
    mask_in_channels=16,
    activation="gelu",
    **kwargs
)

Prompt Encoder for the Segment Anything Model (SAM).

The prompt encoder generates encodings for three types of prompts: - Point prompts: Points on the image along with a label indicating whether the point is in the foreground (part of the mask) or in the background (not a part of the mask). - Box prompts: A batch of bounding boxes with format [(x1, y1), (x2, y2)] used to determine the location of the masks in the image. - Masks: An input mask can be passed to refine the positional embeddings for the output mask.

First, the point prompts and box prompts are concatenated and positional encodings are generated using random spatial frequencies. A point is represented as the sum of a positional encoding of the point's location and one of two learned embeddings that indicate if the point is either in the foreground or background. A box is represented by an embedding pair: (1) the positional encoding of its top-left corner summed with a learned embedding representing "top-left corner" and (2) the same structure but using a learned embedding indicating "bottom-right corner". The box and point encodings are referred to as "prompt_sparse encodings" If a mask prompt is passed, a convolutional neural net is used to downscale it to generate "dense encodings". If no mask prompt is passed, an embedding layer is used instead to generate a "no mask" embedding.

Arguments

  • hidden_size: int, optional. The number of features in the output embeddings. Defaults to 256.
  • image_embedding_size: int, optional. The number of features in the image embeddings generated by an image encoder. Defaults to (64, 64).
  • input_image_size: tuple[int], optional. A tuple of the height and width of the image being prompted. Defaults to (1024, 1024).
  • mask_in_channels: int, optional. The number of channels of the mask prompt. Defaults to 16.
  • activation: str, optional. The activation to use in the mask downscaler neural net. Defaults to "gelu".