KerasHub: Pretrained Models / API documentation / Model Architectures / Gemma4 / Gemma4CausalLMPreprocessor layer

Gemma4CausalLMPreprocessor layer

[source]

Gemma4CausalLMPreprocessor class

keras_hub.models.Gemma4CausalLMPreprocessor(
    tokenizer,
    image_converter=None,
    audio_converter=None,
    sequence_length=1024,
    add_start_token=True,
    add_end_token=True,
    max_images_per_prompt=2,
    num_vision_tokens_per_image=280,
    max_audio_clips_per_prompt=1,
    num_audio_tokens_per_clip=750,
    audio_input_feat_size=0,
    **kwargs
)

Gemma4 Causal LM preprocessor.

This preprocessing layer is meant for use with keras_hub.models.Gemma4CausalLM. It can be configured in two ways: text-only and text + vision, based on whether the passed value of image_converter is None. For the former, it takes in batches of strings, whereas for the latter, it takes in batches of images and strings. It returns outputs in a (x, y, sample_weight) format, where the y label is the next token id in the x sequence. sample_weight is 0 for "prompt" tokens, and 1 for "response" tokens, so that the loss is computed only on the "response" tokens.

For the text + vision case, this layer replaces instances of <|image> token in the prompt with num_vision_tokens_per_image placeholder tokens. It also returns indices of where these vision tokens are present so that in the model, image embeddings can be placed in the right position in the sequence of text embeddings. Note that if max_images_per_prompt is 2, you can pass either 0, 1, or 2 images per sample. The value 0 corresponds to text-only input.

For use with generation, the layer also exposes two methods generate_preprocess() and generate_postprocess(). When this preprocessor is attached to a keras_hub.models.Gemma4CausalLM instance, these methods will be called implicitly in generate(). They can also be called standalone (e.g. to precompute preprocessing inputs for generation in a separate process).

Arguments

  • tokenizer: A keras_hub.models.Gemma4Tokenizer instance.
  • image_converter: A keras_hub.layers.ImageConverter instance. Defaults to None.
  • sequence_length: The length of the packed inputs. Defaults to 1024.
  • add_start_token: If True, the preprocessor will prepend the tokenizer start token to each input sequence. Defaults to True.
  • add_end_token: If True, the preprocessor will append the tokenizer end token to each input sequence. Defaults to True.
  • max_images_per_prompt: int. Permissible number of images per sample in the batch. Defaults to 2.
  • num_vision_tokens_per_image: int. Number of vision placeholder tokens per image. Defaults to 280.

[source]

from_preset method

Gemma4CausalLMPreprocessor.from_preset(
    preset, config_file="preprocessor.json", **kwargs
)

Instantiate a keras_hub.models.Preprocessor from a model preset.

A preset is a directory of configs, weights and other file assets used to save and load a pre-trained model. The preset can be passed as one of:

  1. a built-in preset identifier like 'bert_base_en'
  2. a Kaggle Models handle like 'kaggle://user/bert/keras/bert_base_en'
  3. a Hugging Face handle like 'hf://user/bert_base_en'
  4. a path to a local preset directory like './bert_base_en'

For any Preprocessor subclass, you can run cls.presets.keys() to list all built-in presets available on the class.

As there are usually multiple preprocessing classes for a given model, this method should be called on a specific subclass like keras_hub.models.BertTextClassifierPreprocessor.from_preset().

Arguments

  • preset: string. A built-in preset identifier, a Kaggle Models handle, a Hugging Face handle, or a path to a local directory.

Examples

# Load a preprocessor for Gemma generation.
preprocessor = keras_hub.models.CausalLMPreprocessor.from_preset(
    "gemma_2b_en",
)

# Load a preprocessor for Bert classification.
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset(
    "bert_base_en",
)
Preset Parameters Description
gemma4_2b 5.10B Gemma 4 E2B base model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment.
gemma4_instruct_2b 5.10B Gemma 4 E2B instruction-tuned model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment.
gemma4_4b 7.90B Gemma 4 E4B base model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment.
gemma4_instruct_4b 7.90B Gemma 4 E4B instruction-tuned model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment.
gemma4_26b_a4b 26.00B Gemma 4 26B A4B base model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text pretrained Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model.
gemma4_instruct_26b_a4b 26.00B Gemma 4 26B A4B instruction-tuned model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text instruction-tuned Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model.
gemma4_31b 31.00B Gemma 4 31B base model: 31B parameter, 60-layer, dense vision+text pretrained Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint.
gemma4_instruct_31b 31.00B Gemma 4 31B instruction-tuned model: 31B parameter, 60-layer, dense vision+text instruction-tuned Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint.

tokenizer property

keras_hub.models.Gemma4CausalLMPreprocessor.tokenizer

The tokenizer used to tokenize strings.