T5Gemma2Seq2SeqLMPreprocessor classkeras_hub.models.T5Gemma2Seq2SeqLMPreprocessor(
tokenizer,
encoder_sequence_length=512,
decoder_sequence_length=512,
image_converter=None,
image_size=None,
num_vision_tokens_per_image=None,
add_start_token=False,
add_end_token=True,
**kwargs
)
T5Gemma2 Seq2Seq LM preprocessor.
This preprocessing layer is meant for use with
keras_hub.models.T5Gemma2Seq2SeqLM. By default, it will take in
batches of strings, and return outputs in a
(x, y, sample_weight) format, where the y label is the next
token id in the x sequence.
For use with generation, the layer also exposes two methods
generate_preprocess() and generate_postprocess(). When this
preprocessor is attached to a keras_hub.models.T5Gemma2Seq2SeqLM
instance, these methods will be called implicitly in generate().
When an image_converter is provided, the preprocessor also
supports multimodal inputs with images. Images are inserted into
the encoder sequence as placeholder tokens that the backbone's
vision encoder will replace with image embeddings.
Arguments
keras_hub.models.T5Gemma2Tokenizer instance.keras_hub.layers.ImageConverter instance,
or None for text-only. Defaults to None.True, prepend the start token. Defaults
to False.True, append the end token. Defaults to
True.from_preset methodT5Gemma2Seq2SeqLMPreprocessor.from_preset(
preset, config_file="preprocessor.json", **kwargs
)
Instantiate a keras_hub.models.Preprocessor from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''./bert_base_en'For any Preprocessor subclass, you can run cls.presets.keys() to
list all built-in presets available on the class.
As there are usually multiple preprocessing classes for a given model,
this method should be called on a specific subclass like
keras_hub.models.BertTextClassifierPreprocessor.from_preset().
Arguments
Examples
# Load a preprocessor for Gemma generation.
preprocessor = keras_hub.models.CausalLMPreprocessor.from_preset(
"gemma_2b_en",
)
# Load a preprocessor for Bert classification.
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset(
"bert_base_en",
)
| Preset | Parameters | Description |
|---|---|---|
| t5gemma2_270m_270m | 953.80M | Encoder–decoder (T5-style) based out of Gemma3 model with 270M encoder + 270M decoder parameters, supporting text generation, multilingual tasks and long-context inputs. |
| t5gemma2_1b_1b | 2.42B | Encoder–decoder (T5-style) based out of Gemma3 model with 1B encoder + 1B decoder parameters, supporting text generation, multilingual tasks and long-context inputs. |
| t5gemma2_4b_4b | 8.18B | Encoder–decoder (T5-style) based out of Gemma3 model with 4B encoder + 4B decoder parameters, supporting text generation, multilingual tasks and long-context inputs. |
generate_preprocess methodT5Gemma2Seq2SeqLMPreprocessor.generate_preprocess(
x, encoder_sequence_length=None, decoder_sequence_length=None, sequence_length=None
)
Convert input strings to integer token inputs for generation.
Similar to calling the layer for training, this method takes in a dict
containing "encoder_text" and "decoder_text", with strings or tensor
strings for values, tokenizes and packs the input, and computes a
padding mask masking all inputs not filled in with a padded value.
Unlike calling the layer for training, this method does not compute labels and will never append a tokenizer.end_token_id to the end of the decoder sequence (as generation is expected to continue at the end of the inputted decoder prompt).
generate_postprocess methodT5Gemma2Seq2SeqLMPreprocessor.generate_postprocess(x)
Convert integer token output to strings for generation.
This method reverses generate_preprocess(), by first removing all
padding and start/end tokens, and then converting the integer sequence
back to a string.
tokenizer propertykeras_hub.models.T5Gemma2Seq2SeqLMPreprocessor.tokenizer
The tokenizer used to tokenize strings.