T5GemmaBackbone classkeras_hub.models.T5GemmaBackbone(
vocabulary_size,
encoder_hidden_dim,
encoder_intermediate_dim,
encoder_num_layers,
encoder_num_attention_heads,
encoder_num_key_value_heads,
encoder_head_dim,
encoder_layer_types,
decoder_hidden_dim,
decoder_intermediate_dim,
decoder_num_layers,
decoder_num_attention_heads,
decoder_num_key_value_heads,
decoder_head_dim,
decoder_layer_types,
dropout_rate=0.0,
rms_norm_eps=1e-06,
query_pre_attn_scalar=1.0,
attention_bias=False,
hidden_activation="gelu_approximate",
tie_word_embeddings=True,
initializer_range=0.02,
attention_dropout=0.0,
sliding_window=None,
cross_attention_hidden_size=None,
attn_logit_softcapping=None,
final_logit_softcapping=None,
rope_max_wavelength=10000.0,
dtype=None,
**kwargs
)
T5Gemma backbone model.
This class implements the encoder-decoder backbone of the T5Gemma model, consisting of an embedding layer, a stack of encoder layers, and a stack of decoder layers.
Arguments
"sliding_attention" or "full_attention". For example,
["full_attention", "sliding_attention", ...]."sliding_attention" or "full_attention". For example,
["full_attention", "sliding_attention", ...].0.0.1e-6.1.0.False."gelu_approximate".True.0.02.0.0.layer_type is "sliding_attention". Defaults to
None.encoder_hidden_dim. Defaults to None.None.None.10000.0.keras.mixed_precision.DTypePolicy. The dtype to use
for model computations and weights. Note that some computations,
such as softmax and layer normalization, will always be done at
float32 precision regardless of dtype. Defaults to None.Backbone
class.Examples
import numpy as np
from keras_hub.models import T5GemmaBackbone
input_data = {
"encoder_token_ids": np.ones(shape=(1, 12), dtype="int32"),
"encoder_padding_mask": np.array(
[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]], dtype="int32"
),
"decoder_token_ids": np.ones(shape=(1, 8), dtype="int32"),
"decoder_padding_mask": np.array(
[[1, 1, 1, 1, 1, 1, 1, 1]], dtype="int32"
),
}
# Randomly initialized T5Gemma backbone with custom config.
model = T5GemmaBackbone(
vocabulary_size=32000,
# Encoder parameters.
encoder_hidden_dim=256,
encoder_intermediate_dim=512,
encoder_num_layers=4,
encoder_num_attention_heads=4,
encoder_num_key_value_heads=2,
encoder_head_dim=64,
encoder_layer_types=["full_attention"] * 4,
# Decoder parameters.
decoder_hidden_dim=256,
decoder_intermediate_dim=512,
decoder_num_layers=4,
decoder_num_attention_heads=4,
decoder_num_key_value_heads=2,
decoder_head_dim=64,
decoder_layer_types=["full_attention"] * 4,
# Common parameters.
dropout_rate=0.1,
rms_norm_eps=1e-6,
query_pre_attn_scalar=1.0,
attention_bias=False,
hidden_activation="gelu_approximate",
)
output = model(input_data)
from_preset methodT5GemmaBackbone.from_preset(preset, load_weights=True, **kwargs)
Instantiate a keras_hub.models.Backbone from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as a
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''modelscope://user/bert_base_en''./bert_base_en'This constructor can be called in one of two ways. Either from the base
class like keras_hub.models.Backbone.from_preset(), or from
a model class like keras_hub.models.GemmaBackbone.from_preset().
If calling from the base class, the subclass of the returning object
will be inferred from the config in the preset directory.
For any Backbone subclass, you can run cls.presets.keys() to list
all built-in presets available on the class.
Arguments
True, the weights will be loaded into the
model architecture. If False, the weights will be randomly
initialized.Examples
# Load a Gemma backbone with pre-trained weights.
model = keras_hub.models.Backbone.from_preset(
"gemma_2b_en",
)
# Load a Bert backbone with a pre-trained config and random weights.
model = keras_hub.models.Backbone.from_preset(
"bert_base_en",
load_weights=False,
)
| Preset | Parameters | Description |
|---|---|---|
| t5gemma_s_s_ul2 | 312.52M | T5Gemma S/S model with a small encoder and small decoder, adapted as a UL2 model. |
| t5gemma_s_s_prefixlm | 312.52M | T5Gemma S/S model with a small encoder and small decoder, adapted as a prefix language model. |
| t5gemma_s_s_ul2_it | 312.52M | T5Gemma S/S model with a small encoder and small decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_s_s_prefixlm_it | 312.52M | T5Gemma S/S model with a small encoder and small decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_b_b_ul2 | 591.49M | T5Gemma B/B model with a base encoder and base decoder, adapted as a UL2 model. |
| t5gemma_b_b_prefixlm | 591.49M | T5Gemma B/B model with a base encoder and base decoder, adapted as a prefix language model. |
| t5gemma_b_b_ul2_it | 591.49M | T5Gemma B/B model with a base encoder and base decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_b_b_prefixlm_it | 591.49M | T5Gemma B/B model with a base encoder and base decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_l_l_ul2 | 1.24B | T5Gemma L/L model with a large encoder and large decoder, adapted as a UL2 model. |
| t5gemma_l_l_prefixlm | 1.24B | T5Gemma L/L model with a large encoder and large decoder, adapted as a prefix language model. |
| t5gemma_l_l_ul2_it | 1.24B | T5Gemma L/L model with a large encoder and large decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_l_l_prefixlm_it | 1.24B | T5Gemma L/L model with a large encoder and large decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_ml_ml_ul2 | 2.20B | T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a UL2 model. |
| t5gemma_ml_ml_prefixlm | 2.20B | T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a prefix language model. |
| t5gemma_ml_ml_ul2_it | 2.20B | T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_ml_ml_prefixlm_it | 2.20B | T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_xl_xl_ul2 | 3.77B | T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a UL2 model. |
| t5gemma_xl_xl_prefixlm | 3.77B | T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a prefix language model. |
| t5gemma_xl_xl_ul2_it | 3.77B | T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_xl_xl_prefixlm_it | 3.77B | T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_2b_2b_ul2 | 5.60B | T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model. |
| t5gemma_2b_2b_prefixlm | 5.60B | T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model. |
| t5gemma_2b_2b_ul2_it | 5.60B | T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_2b_2b_prefixlm_it | 5.60B | T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_9b_2b_ul2 | 12.29B | T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model. |
| t5gemma_9b_2b_prefixlm | 12.29B | T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model. |
| t5gemma_9b_2b_ul2_it | 12.29B | T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_9b_2b_prefixlm_it | 12.29B | T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_9b_9b_ul2 | 20.33B | T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a UL2 model. |
| t5gemma_9b_9b_prefixlm | 20.33B | T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a prefix language model. |
| t5gemma_9b_9b_ul2_it | 20.33B | T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_9b_9b_prefixlm_it | 20.33B | T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following. |
token_embedding propertykeras_hub.models.T5GemmaBackbone.token_embedding
A keras.layers.Embedding instance for embedding token ids.
This layer embeds integer token ids to the hidden dim of the model.