► KerasHub: Pretrained Models / API documentation / Modeling API / Seq2SeqLM

Seq2SeqLM

`Seq2SeqLM` class

keras_hub.models.Seq2SeqLM()

Base class for sequence to sequence language modeling tasks.

Seq2SeqLM tasks wrap a keras_hub.models.Backbone and a keras_hub.models.Preprocessor to create a model that can be used for generation and generative fine-tuning, when generation is conditioned on additional input sequence in a sequence-to-sequence setting.

Seq2SeqLM tasks provide an additional, high-level generate() function which can be used to auto-regressively sample an output sequence token by token. The compile() method of Seq2SeqLM classes contains an additional sampler argument, which can be used to pass a keras_hub.samplers.Sampler to control how the predicted distribution will be sampled.

When calling fit(), each input should contain an input and output sequence. The model will be trained to predict the output sequence token-by-token using a causal mask, similar to a keras_hub.models.CausalLM task. Unlike the CausalLM task, an input sequence must be passed, and can be attended to in full by all tokens in the output sequence.

All Seq2SeqLM tasks include a from_preset() constructor which can be used to load a pre-trained config and weights.

Example

# Load a Bart backbone with pre-trained weights.
seq_2_seq_lm = keras_hub.models.Seq2SeqLM.from_preset(
    "bart_base_en",
)
seq_2_seq_lm.compile(sampler="top_k")
# Generate conditioned on the `"The quick brown fox."` as an input sequence.
seq_2_seq_lm.generate("The quick brown fox.", max_length=30)

[source]

`from_preset` method

Seq2SeqLM.from_preset(preset, load_weights=True, **kwargs)

Instantiate a keras_hub.models.Task from a model preset.

A preset is a directory of configs, weights and other file assets used to save and load a pre-trained model. The preset can be passed as one of:

a built-in preset identifier like 'bert_base_en'
a Kaggle Models handle like 'kaggle://user/bert/keras/bert_base_en'
a Hugging Face handle like 'hf://user/bert_base_en'
a path to a local preset directory like './bert_base_en'

For any Task subclass, you can run cls.presets.keys() to list all built-in presets available on the class.

This constructor can be called in one of two ways. Either from a task specific base class like keras_hub.models.CausalLM.from_preset(), or from a model class like keras_hub.models.BertTextClassifier.from_preset(). If calling from the a base class, the subclass of the returning object will be inferred from the config in the preset directory.

Arguments

preset: string. A built-in preset identifier, a Kaggle Models handle, a Hugging Face handle, or a path to a local directory.
load_weights: bool. If True, saved weights will be loaded into the model architecture. If False, all weights will be randomly initialized.

Examples

# Load a Gemma generative task.
causal_lm = keras_hub.models.CausalLM.from_preset(
    "gemma_2b_en",
)

# Load a Bert classification task.
model = keras_hub.models.TextClassifier.from_preset(
    "bert_base_en",
    num_classes=2,
)

Preset	Parameters	Description
bart_base_en	139.42M	6-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl.
bart_large_en	406.29M	12-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl.
bart_large_en_cnn	406.29M	The `bart_large_en` backbone model fine-tuned on the CNN+DM summarization dataset.
moonshine_tiny_en	27.09M	Moonshine tiny model for English speech recognition. Developed by Useful Sensors for real-time transcription.
moonshine_base_en	61.51M	Moonshine base model for English speech recognition. Developed by Useful Sensors for real-time transcription.
t5gemma_s_s_ul2	312.52M	T5Gemma S/S model with a small encoder and small decoder, adapted as a UL2 model.
t5gemma_s_s_prefixlm	312.52M	T5Gemma S/S model with a small encoder and small decoder, adapted as a prefix language model.
t5gemma_s_s_ul2_it	312.52M	T5Gemma S/S model with a small encoder and small decoder, adapted as a UL2 model and fine-tuned for instruction following.
t5gemma_s_s_prefixlm_it	312.52M	T5Gemma S/S model with a small encoder and small decoder, adapted as a prefix language model and fine-tuned for instruction following.
t5gemma_b_b_ul2	591.49M	T5Gemma B/B model with a base encoder and base decoder, adapted as a UL2 model.
t5gemma_b_b_prefixlm	591.49M	T5Gemma B/B model with a base encoder and base decoder, adapted as a prefix language model.
t5gemma_b_b_ul2_it	591.49M	T5Gemma B/B model with a base encoder and base decoder, adapted as a UL2 model and fine-tuned for instruction following.
t5gemma_b_b_prefixlm_it	591.49M	T5Gemma B/B model with a base encoder and base decoder, adapted as a prefix language model and fine-tuned for instruction following.
t5gemma_l_l_ul2	1.24B	T5Gemma L/L model with a large encoder and large decoder, adapted as a UL2 model.
t5gemma_l_l_prefixlm	1.24B	T5Gemma L/L model with a large encoder and large decoder, adapted as a prefix language model.
t5gemma_l_l_ul2_it	1.24B	T5Gemma L/L model with a large encoder and large decoder, adapted as a UL2 model and fine-tuned for instruction following.
t5gemma_l_l_prefixlm_it	1.24B	T5Gemma L/L model with a large encoder and large decoder, adapted as a prefix language model and fine-tuned for instruction following.
t5gemma_ml_ml_ul2	2.20B	T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a UL2 model.
t5gemma_ml_ml_prefixlm	2.20B	T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a prefix language model.
t5gemma_ml_ml_ul2_it	2.20B	T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a UL2 model and fine-tuned for instruction following.
t5gemma_ml_ml_prefixlm_it	2.20B	T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a prefix language model and fine-tuned for instruction following.
t5gemma_xl_xl_ul2	3.77B	T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a UL2 model.
t5gemma_xl_xl_prefixlm	3.77B	T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a prefix language model.
t5gemma_xl_xl_ul2_it	3.77B	T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a UL2 model and fine-tuned for instruction following.
t5gemma_xl_xl_prefixlm_it	3.77B	T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a prefix language model and fine-tuned for instruction following.
t5gemma_2b_2b_ul2	5.60B	T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model.
t5gemma_2b_2b_prefixlm	5.60B	T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model.
t5gemma_2b_2b_ul2_it	5.60B	T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following.
t5gemma_2b_2b_prefixlm_it	5.60B	T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following.
t5gemma_9b_2b_ul2	12.29B	T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model.
t5gemma_9b_2b_prefixlm	12.29B	T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model.
t5gemma_9b_2b_ul2_it	12.29B	T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following.
t5gemma_9b_2b_prefixlm_it	12.29B	T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following.
t5gemma_9b_9b_ul2	20.33B	T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a UL2 model.
t5gemma_9b_9b_prefixlm	20.33B	T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a prefix language model.
t5gemma_9b_9b_ul2_it	20.33B	T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following.
t5gemma_9b_9b_prefixlm_it	20.33B	T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following.
t5gemma2_270m_270m	953.80M	Encoder–decoder (T5-style) based out of Gemma3 model with 270M encoder + 270M decoder parameters, supporting text generation, multilingual tasks and long-context inputs.
t5gemma2_1b_1b	2.42B	Encoder–decoder (T5-style) based out of Gemma3 model with 1B encoder + 1B decoder parameters, supporting text generation, multilingual tasks and long-context inputs.
t5gemma2_4b_4b	8.18B	Encoder–decoder (T5-style) based out of Gemma3 model with 4B encoder + 4B decoder parameters, supporting text generation, multilingual tasks and long-context inputs.

[source]

`compile` method

Seq2SeqLM.compile(
    optimizer="auto", loss="auto", weighted_metrics="auto", sampler="top_k", **kwargs
)

Configures the CausalLM task for training and generation.

The CausalLM task extends the default compilation signature of keras.Model.compile with defaults for optimizer, loss, and weighted_metrics. To override these defaults, pass any value to these arguments during compilation.

The CausalLM task adds a new sampler to compile, which can be used to control the sampling strategy used with the generate function.

Note that because training inputs include padded tokens which are excluded from the loss, it is almost always a good idea to compile with weighted_metrics and not metrics.

Arguments

optimizer: "auto", an optimizer name, or a keras.Optimizer instance. Defaults to "auto", which uses the default optimizer for the given model and task. See keras.Model.compile and keras.optimizers for more info on possible optimizer values.
loss: "auto", a loss name, or a keras.losses.Loss instance. Defaults to "auto", where a keras.losses.SparseCategoricalCrossentropy loss will be applied for the token classification CausalLM task. See keras.Model.compile and keras.losses for more info on possible loss values.
weighted_metrics: "auto", or a list of metrics to be evaluated by the model during training and testing. Defaults to "auto", where a keras.metrics.SparseCategoricalAccuracy will be applied to track the accuracy of the model at guessing masked token values. See keras.Model.compile and keras.metrics for more info on possible weighted_metrics values.
sampler: A sampler name, or a keras_hub.samplers.Sampler instance. Configures the sampling method used during generate() calls. See keras_hub.samplers for a full list of built-in sampling strategies.
**kwargs: See keras.Model.compile for a full list of arguments supported by the compile method.

[source]

`generate` method

Seq2SeqLM.generate(
    inputs, max_length=None, stop_token_ids="auto", strip_prompt=False
)

Generate text given prompt inputs.

This method generates text based on given inputs. The sampling method used for generation can be set via the compile() method.

If inputs are a tf.data.Dataset, outputs will be generated "batch-by-batch" and concatenated. Otherwise, all inputs will be handled as a single batch.

If a preprocessor is attached to the model, inputs will be preprocessed inside the generate() function and should match the structure expected by the preprocessor layer (usually raw strings). If a preprocessor is not attached, inputs should match the structure expected by the backbone. See the example usage above for a demonstration of each.

Arguments

inputs: python data, tensor data, or a tf.data.Dataset. If a preprocessor is attached to the model, inputs should match the structure expected by the preprocessor layer. If a preprocessor is not attached, inputs should match the structure expected the backbone model.
max_length: Optional. int. The max length of the generated sequence. Will default to the max configured sequence_length of the preprocessor. If preprocessor is None, inputs should be should be padded to the desired maximum length and this argument will be ignored.
stop_token_ids: Optional. None, "auto", or tuple of token ids. Defaults to "auto" which uses the preprocessor.tokenizer.end_token_id. Not specifying a processor will produce an error. None stops generation after generating max_length tokens. You may also specify a list of token id's the model should stop on. Note that sequences of tokens will each be interpreted as a stop token, multi-token stop sequences are not supported.
strip_prompt: Optional. By default, generate() returns the full prompt followed by its completion generated by the model. If this option is set to True, only the newly generated text is returned.

[source]

`save_to_preset` method

Seq2SeqLM.save_to_preset(preset_dir, max_shard_size=10)

Save task to a preset directory.

Arguments

preset_dir: The path to the local model preset directory.
max_shard_size: int or float. Maximum size in GB for each sharded file. If None, no sharding will be done. Defaults to 10.