Gemma4CausalLMPreprocessor classkeras_hub.models.Gemma4CausalLMPreprocessor(
tokenizer,
image_converter=None,
audio_converter=None,
video_converter=None,
sequence_length=1024,
add_start_token=True,
add_end_token=True,
max_images_per_prompt=2,
num_vision_tokens_per_image=280,
max_audio_clips_per_prompt=1,
num_audio_tokens_per_clip=750,
audio_input_feat_size=128,
num_frames_per_video=32,
num_vision_tokens_per_frame=70,
video_fps=24.0,
**kwargs
)
Gemma4 Causal LM preprocessor.
This preprocessing layer is meant for use with
keras_hub.models.Gemma4CausalLM. It can be configured in three modes:
text-only, text + image/video, and text + audio, based on whether
image_converter, video_converter, or audio_converter are provided.
It returns outputs in a (x, y, sample_weight) format, where the y label
is the next token id in the x sequence. sample_weight is 0 for "prompt"
tokens and 1 for "response" tokens, so that the loss is computed only on
the "response" tokens.
For image inputs, this layer replaces each <|image|> placeholder in the
prompt with num_vision_tokens_per_image soft tokens wrapped in
<|image>...<image|> markers. It also returns indices of where these
vision tokens are present so that image embeddings can be placed at the
correct positions in the sequence.
For video inputs, each <|video|> placeholder is replaced with a sequence
of per-frame blocks. Each block contains a timestamp and
num_vision_tokens_per_frame soft tokens wrapped in <|image>...<image|>
markers. The actual token count per frame is computed dynamically from the
input frame dimensions.
For audio inputs, each <|audio|> placeholder is expanded to the exact
number of audio tokens required for the clip, computed dynamically from the
mel-spectrogram length.
By default, per-frame timestamps are computed from sequential indices
[0, 1, ..., N-1] at video_fps. When your video was sampled at
irregular intervals (e.g. every 8th frame of a 30 fps source), set
preprocessor.video_metadata to a list of per-sample dicts before
calling the preprocessor. Each dict accepts a "frames_indices" key
(list[int], the source frame indices that were sampled) and an optional
"fps" key (float, defaults to preprocessor.video_fps). When
video_metadata is None (the default) the preprocessor falls back to
sequential indices at video_fps, so existing code is unaffected.
Examples
Using video_metadata to pass real frame indices and fps.
# One dict per sample in the batch.
preprocessor.video_metadata = [
{"frames_indices": [0, 8, 16, 24], "fps": 30.0},
]
output = preprocessor({
"prompts": ["Describe this video: <|video|>"],
"responses": [""],
"videos": [my_video_frames], # shape (N_frames, H, W, 3)
})
preprocessor.video_metadata = None # reset to default after use
For use with generation, the layer also exposes two methods
generate_preprocess() and generate_postprocess(). When this preprocessor
is attached to a keras_hub.models.Gemma4CausalLM instance, these methods
will be called implicitly in generate(). They can also be called
standalone (e.g. to precompute preprocessing inputs for generation in a
separate process).
Arguments
keras_hub.models.Gemma4Tokenizer instance.keras_hub.layers.Gemma4ImageConverter instance.
Defaults to None.keras_hub.layers.Gemma4AudioConverter instance.
Defaults to None.keras_hub.layers.Gemma4VideoConverter instance.
Defaults to None.1024.True, the preprocessor will prepend the
tokenizer start token to each input sequence. Defaults to True.True, the preprocessor will append the
tokenizer end token to each input sequence. Defaults to True.2.280.1.750.128.32.70.24.0.from_preset methodGemma4CausalLMPreprocessor.from_preset(
preset, config_file="preprocessor.json", **kwargs
)
Instantiate a keras_hub.models.Preprocessor from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''./bert_base_en'For any Preprocessor subclass, you can run cls.presets.keys() to
list all built-in presets available on the class.
As there are usually multiple preprocessing classes for a given model,
this method should be called on a specific subclass like
keras_hub.models.BertTextClassifierPreprocessor.from_preset().
Arguments
Examples
# Load a preprocessor for Gemma generation.
preprocessor = keras_hub.models.CausalLMPreprocessor.from_preset(
"gemma_2b_en",
)
# Load a preprocessor for Bert classification.
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset(
"bert_base_en",
)
| Preset | Parameters | Description |
|---|---|---|
| gemma4_2b | 5.10B | Gemma 4 E2B base model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_instruct_2b | 5.10B | Gemma 4 E2B instruction-tuned model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_4b | 7.90B | Gemma 4 E4B base model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_instruct_4b | 7.90B | Gemma 4 E4B instruction-tuned model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_26b_a4b | 26.00B | Gemma 4 26B A4B base model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text pretrained Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model. |
| gemma4_instruct_26b_a4b | 26.00B | Gemma 4 26B A4B instruction-tuned model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text instruction-tuned Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model. |
| gemma4_31b | 31.00B | Gemma 4 31B base model: 31B parameter, 60-layer, dense vision+text pretrained Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint. |
| gemma4_instruct_31b | 31.00B | Gemma 4 31B instruction-tuned model: 31B parameter, 60-layer, dense vision+text instruction-tuned Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint. |
tokenizer propertykeras_hub.models.Gemma4CausalLMPreprocessor.tokenizer
The tokenizer used to tokenize strings.