Gemma4CausalLMPreprocessor classkeras_hub.models.Gemma4CausalLMPreprocessor(
tokenizer,
image_converter=None,
audio_converter=None,
sequence_length=1024,
add_start_token=True,
add_end_token=True,
max_images_per_prompt=2,
num_vision_tokens_per_image=280,
max_audio_clips_per_prompt=1,
num_audio_tokens_per_clip=750,
audio_input_feat_size=0,
**kwargs
)
Gemma4 Causal LM preprocessor.
This preprocessing layer is meant for use with
keras_hub.models.Gemma4CausalLM. It can be configured in two ways:
text-only and text + vision, based on whether the passed value of
image_converter is None. For the former, it takes in batches of strings,
whereas for the latter, it takes in batches of images and strings. It
returns outputs in a (x, y, sample_weight) format, where the y label is
the next token id in the x sequence. sample_weight is 0 for "prompt"
tokens, and 1 for "response" tokens, so that the loss is computed only on
the "response" tokens.
For the text + vision case, this layer replaces instances of
<|image> token in the prompt with
num_vision_tokens_per_image placeholder tokens. It also returns indices
of where these vision tokens are present so that in the model, image
embeddings can be placed in the right position in the sequence of text
embeddings. Note that if max_images_per_prompt is 2, you can pass either
0, 1, or 2 images per sample. The value 0 corresponds to text-only input.
For use with generation, the layer also exposes two methods
generate_preprocess() and generate_postprocess(). When this preprocessor
is attached to a keras_hub.models.Gemma4CausalLM instance, these methods
will be called implicitly in generate(). They can also be called
standalone (e.g. to precompute preprocessing inputs for generation in a
separate process).
Arguments
keras_hub.models.Gemma4Tokenizer instance.keras_hub.layers.ImageConverter instance. Defaults
to None.True, the preprocessor will prepend the tokenizer
start token to each input sequence. Defaults to True.True, the preprocessor will append the tokenizer
end token to each input sequence. Defaults to True.from_preset methodGemma4CausalLMPreprocessor.from_preset(
preset, config_file="preprocessor.json", **kwargs
)
Instantiate a keras_hub.models.Preprocessor from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''./bert_base_en'For any Preprocessor subclass, you can run cls.presets.keys() to
list all built-in presets available on the class.
As there are usually multiple preprocessing classes for a given model,
this method should be called on a specific subclass like
keras_hub.models.BertTextClassifierPreprocessor.from_preset().
Arguments
Examples
# Load a preprocessor for Gemma generation.
preprocessor = keras_hub.models.CausalLMPreprocessor.from_preset(
"gemma_2b_en",
)
# Load a preprocessor for Bert classification.
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset(
"bert_base_en",
)
| Preset | Parameters | Description |
|---|---|---|
| gemma4_2b | 5.10B | Gemma 4 E2B base model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_instruct_2b | 5.10B | Gemma 4 E2B instruction-tuned model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_4b | 7.90B | Gemma 4 E4B base model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_instruct_4b | 7.90B | Gemma 4 E4B instruction-tuned model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_26b_a4b | 26.00B | Gemma 4 26B A4B base model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text pretrained Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model. |
| gemma4_instruct_26b_a4b | 26.00B | Gemma 4 26B A4B instruction-tuned model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text instruction-tuned Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model. |
| gemma4_31b | 31.00B | Gemma 4 31B base model: 31B parameter, 60-layer, dense vision+text pretrained Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint. |
| gemma4_instruct_31b | 31.00B | Gemma 4 31B instruction-tuned model: 31B parameter, 60-layer, dense vision+text instruction-tuned Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint. |
tokenizer propertykeras_hub.models.Gemma4CausalLMPreprocessor.tokenizer
The tokenizer used to tokenize strings.