MaskedLMHead
classkeras_nlp.layers.MaskedLMHead(
vocabulary_size=None,
token_embedding=None,
intermediate_activation="relu",
activation=None,
layer_norm_epsilon=1e-05,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
**kwargs
)
Masked Language Model (MaskedLM) head.
This layer takes two inputs:
inputs
: which should be a tensor of encoded tokens with shape
(batch_size, sequence_length, hidden_dim)
.mask_positions
: which should be a tensor of integer positions to
predict with shape (batch_size, masks_per_sequence)
.The token encodings should usually be the last output of an encoder model, and mask positions should be the integer positions you would like to predict for the MaskedLM task.
The layer will first gather the token encodings at the mask positions. These
gathered tokens will be passed through a dense layer the same size as
encoding dimension, then transformed to predictions the same size as the
input vocabulary. This layer will produce a single output with shape
(batch_size, masks_per_sequence, vocabulary_size)
, which can be used to
compute an MaskedLM loss function.
This layer is often be paired with keras_hub.layers.MaskedLMMaskGenerator
,
which will help prepare inputs for the MaskedLM task.
Arguments
keras_hub.layers.ReversibleEmbedding
instance. If passed, the layer will be used to project from the
hidden_dim
of the model to the output vocabulary_size
.None
(return logits), or "softmax"
(return probabilities).1e-5
.keras.initializers
initializer.
The kernel initializer for the dense and multiheaded
attention layers. Defaults to "glorot_uniform"
.keras.initializers
initializer.
The bias initializer for the dense and multiheaded
attention layers. Defaults to "zeros"
.keras.layers.Layer
,
including name
, trainable
, dtype
etc.Example
batch_size = 16
vocab_size = 100
hidden_dim = 32
seq_length = 50
# Generate random inputs.
token_ids = np.random.randint(vocab_size, size=(batch_size, seq_length))
# Choose random positions as the masked inputs.
mask_positions = np.random.randint(seq_length, size=(batch_size, 5))
# Embed tokens in a `hidden_dim` feature space.
token_embedding = keras_hub.layers.ReversibleEmbedding(
vocab_size,
hidden_dim,
)
hidden_states = token_embedding(token_ids)
preds = keras_hub.layers.MaskedLMHead(
vocabulary_size=vocab_size,
token_embedding=token_embedding,
activation="softmax",
)(hidden_states, mask_positions)
References