MaskedLMMaskGenerator
classkeras_hub.layers.MaskedLMMaskGenerator(
vocabulary_size,
mask_selection_rate,
mask_token_id,
mask_selection_length=None,
unselectable_token_ids=[0],
mask_token_rate=0.8,
random_token_rate=0.1,
**kwargs
)
Layer that applies language model masking.
This layer is useful for preparing inputs for masked language modeling (MaskedLM) tasks. It follows the masking strategy described in the original BERT paper. Given tokenized text, it randomly selects certain number of tokens for masking. Then for each selected token, it has a chance (configurable) to be replaced by "mask token" or random token, or stay unchanged.
Input data should be passed as tensors, tf.RaggedTensor
s, or lists. For
batched input, inputs should be a list of lists or a rank two tensor. For
unbatched inputs, each element should be a list or a rank one tensor.
This layer can be used with tf.data
to generate dynamic masks on the fly
during training.
Arguments
mask_positions
, mask_ids
and mask_weights
will be padded
to dense tensors of length mask_selection_length
, otherwise
the output will be a RaggedTensor. Defaults to None
.0
corresponds to a padding token and ignore it. Defaults to [0]
.mask_token_rate
must be
between 0 and 1 which indicates how often the mask_token is
substituted for tokens selected for masking. Defaults to 0.8
.random_token_rate
must be
between 0 and 1 which indicates how often a random token is
substituted for tokens selected for masking.
Note: mask_token_rate + random_token_rate <= 1, and for
(1 - mask_token_rate - random_token_rate), the token will not be
changed. Defaults to 0.1
.Returns
mask_selection_length
is None. The positions of token_ids getting masked.
mask_ids: Tensor, or RaggedTensor if mask_selection_length
is
None. The original token ids at masked positions.
mask_weights: Tensor, or RaggedTensor if mask_selection_length
is
None. mask_weights
has the same shape as mask_positions
and
mask_ids
. Each element in mask_weights
should be 0 or 1,
1 means the corresponding position in mask_positions
is an
actual mask, 0 means it is a pad.Examples
Basic usage.
masker = keras_hub.layers.MaskedLMMaskGenerator(
vocabulary_size=10,
mask_selection_rate=0.2,
mask_token_id=0,
mask_selection_length=5
)
# Dense input.
masker([1, 2, 3, 4, 5])
# Ragged input.
masker([[1, 2], [1, 2, 3, 4]])
Masking a batch that contains special tokens.
pad_id, cls_id, sep_id, mask_id = 0, 1, 2, 3
batch = [
[cls_id, 4, 5, 6, sep_id, 7, 8, sep_id, pad_id, pad_id],
[cls_id, 4, 5, sep_id, 6, 7, 8, 9, sep_id, pad_id],
]
masker = keras_hub.layers.MaskedLMMaskGenerator(
vocabulary_size = 10,
mask_selection_rate = 0.2,
mask_selection_length = 5,
mask_token_id = mask_id,
unselectable_token_ids = [
cls_id,
sep_id,
pad_id,
]
)
masker(batch)