TransformerEncoder classkeras_hub.layers.TransformerEncoder(
intermediate_dim,
num_heads,
dropout=0,
activation="relu",
layer_norm_epsilon=1e-05,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
normalize_first=False,
**kwargs
)
Transformer encoder.
This class follows the architecture of the transformer encoder layer in the paper Attention is All You Need. Users can instantiate multiple instances of this class to stack up an encoder.
This layer will compute an attention mask, prioritizing explicitly provided
masks (a padding_mask or a custom attention_mask) over an implicit Keras
padding mask (for example, by passing mask_zero=True to a
keras.layers.Embedding layer). If both a padding_mask and a
attention_mask are provided, they will be combined to determine the final
mask. See the Masking and Padding
guide
for more details.
Arguments
keras.layers.MultiHeadAttention layer.keras.layers.MultiHeadAttention and feedforward network.
Defaults to 0..keras.activations. the
activation function of feedforward network.
Defaults to "relu".1e-5.keras.initializers initializer.
The kernel initializer for the dense and multiheaded
attention layers. Defaults to "glorot_uniform".keras.initializers initializer.
The bias initializer for the dense and multiheaded
attention layers. Defaults to "zeros".False.keras.layers.Layer,
including name, trainable, dtype etc.Example
# Create a single transformer encoder layer.
encoder = keras_hub.layers.TransformerEncoder(
intermediate_dim=64, num_heads=8)
# Create a simple model containing the encoder.
input = keras.Input(shape=(10, 64))
output = encoder(input)
model = keras.Model(inputs=input, outputs=output)
# Call encoder on the inputs.
input_data = np.random.uniform(size=(2, 10, 64))
output = model(input_data)
References
call methodTransformerEncoder.call(
inputs,
padding_mask=None,
attention_mask=None,
training=None,
return_attention_scores=False,
)
Forward pass of the TransformerEncoder.
Arguments
padding_mask should have shape [batch_size, sequence_length].attention_mask should have shape
[batch_size, sequence_length, sequence_length].(attention_output, attention_scores) if True or
attention_output if False. Defaults to False.Returns
A Tensor of the same shape as the inputs.