► KerasHub: Pretrained Models / API documentation / Model Architectures / Gemma4 / Gemma4CausalLM model

Gemma4CausalLM model

`Gemma4CausalLM` class

keras_hub.models.Gemma4CausalLM(
    preprocessor, backbone, final_logit_cap=None, **kwargs
)

An end-to-end multimodal Gemma4 model for causal language modeling.

A causal language model (LM) predicts the next token based on previous tokens. This task setup can be used to train the model unsupervised on multimodal inputs (text, images, audio, and video) or to autoregressively generate plain text similar to the data used for training. The model accepts multimodal inputs (text, image, audio, video) and produces text output.

This model has a generate() method, which generates text based on a prompt. The generation strategy used is controlled by an additional sampler argument on compile(). You can recompile the model with different keras_hub.samplers objects to control the generation. By default, "greedy" sampling will be used.

This model can optionally be configured with a preprocessor layer, in which case it will automatically apply preprocessing to string inputs during fit(), predict(), evaluate() and generate(). This is done by default when creating the model with from_preset().

Arguments

preprocessor: A keras_hub.models.Gemma4CausalLMPreprocessor or None. If None, this model will not apply preprocessing, and inputs should be preprocessed before calling the model.
backbone: A keras_hub.models.Gemma4Backbone instance.

Examples

Text generation from a text prompt.

# All Gemma4 presets support text generation.
gemma4_lm = keras_hub.models.Gemma4CausalLM.from_preset(
    "gemma4_instruct_2b",
)
gemma4_lm.generate("What is the capital of France?")

Image + text generation.

# All Gemma4 presets support image inputs.
gemma4_lm = keras_hub.models.Gemma4CausalLM.from_preset(
    "gemma4_instruct_2b",
)
gemma4_lm.generate({
    "prompts": "Describe this image: <|image|>",
    "images": image_array,  # np.ndarray of shape (H, W, 3)
})

Audio + text generation.

# Only the E2B (2b) and E4B (4b) presets include an audio encoder.
gemma4_lm = keras_hub.models.Gemma4CausalLM.from_preset(
    "gemma4_instruct_2b",
)
gemma4_lm.generate({
    "prompts": "Transcribe this audio: <|audio|>",
    "audio": waveform,  # np.ndarray of shape (num_samples,) at 16 kHz
})

Video + text generation.

# All Gemma4 presets support video inputs (processed as frame sequences).
gemma4_lm = keras_hub.models.Gemma4CausalLM.from_preset(
    "gemma4_instruct_2b",
)
gemma4_lm.generate({
    "prompts": "Describe this video: <|video|>",
    "videos": frames,  # np.ndarray of shape (N_frames, H, W, 3)
})

[source]

`from_preset` method

Gemma4CausalLM.from_preset(preset, load_weights=True, **kwargs)

Instantiate a keras_hub.models.Task from a model preset.

A preset is a directory of configs, weights and other file assets used to save and load a pre-trained model. The preset can be passed as one of:

a built-in preset identifier like 'bert_base_en'
a Kaggle Models handle like 'kaggle://user/bert/keras/bert_base_en'
a Hugging Face handle like 'hf://user/bert_base_en'
a path to a local preset directory like './bert_base_en'

For any Task subclass, you can run cls.presets.keys() to list all built-in presets available on the class.

This constructor can be called in one of two ways. Either from a task specific base class like keras_hub.models.CausalLM.from_preset(), or from a model class like keras_hub.models.BertTextClassifier.from_preset(). If calling from the a base class, the subclass of the returning object will be inferred from the config in the preset directory.

Arguments

preset: string. A built-in preset identifier, a Kaggle Models handle, a Hugging Face handle, or a path to a local directory.
load_weights: bool. If True, saved weights will be loaded into the model architecture. If False, all weights will be randomly initialized.

Examples

# Load a Gemma generative task.
causal_lm = keras_hub.models.CausalLM.from_preset(
    "gemma_2b_en",
)

# Load a Bert classification task.
model = keras_hub.models.TextClassifier.from_preset(
    "bert_base_en",
    num_classes=2,
)

Preset	Parameters	Description
gemma4_instruct_2b_assistant	77.73M	Gemma 4 E2B MTP Assistant model: 4-layer speculative-decoding assistant for the 2B-it model. Uses Multi-Token Prediction to propose candidate tokens and achieve inference speedups. This model must NOT be used standalone. It is designed exclusively as a draft model to be passed to the target model's generate() method via the assistant_model argument.
gemma4_instruct_4b_assistant	77.73M	Gemma 4 E4B MTP Assistant model: 4-layer speculative-decoding assistant for the 4B-it model. Uses Multi-Token Prediction to propose candidate tokens and achieve inference speedups. This model must NOT be used standalone. It is designed exclusively as a draft model to be passed to the target model's generate() method via the assistant_model argument.
gemma4_instruct_26b_a4b_assistant	412.76M	Gemma 4 26B A4B MTP Assistant model: 4-layer speculative-decoding assistant for the 26B MoE model. Uses Multi-Token Prediction and a standard logit head to propose candidates. This model must NOT be used standalone. It is designed exclusively as a draft model to be passed to the target model's generate() method via the assistant_model argument.
gemma4_instruct_31b_assistant	454.71M	Gemma 4 31B MTP Assistant model: 4-layer speculative-decoding assistant for the 31B dense model. Uses Multi-Token Prediction and a standard logit head to propose candidates. This model must NOT be used standalone. It is designed exclusively as a draft model to be passed to the target model's generate() method via the assistant_model argument.
gemma4_2b	5.10B	Gemma 4 E2B base model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment.
gemma4_instruct_2b	5.10B	Gemma 4 E2B instruction-tuned model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment.
gemma4_4b	7.90B	Gemma 4 E4B base model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment.
gemma4_instruct_4b	7.90B	Gemma 4 E4B instruction-tuned model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment.
gemma4_26b_a4b	26.00B	Gemma 4 26B A4B base model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text pretrained Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model.
gemma4_instruct_26b_a4b	26.00B	Gemma 4 26B A4B instruction-tuned model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text instruction-tuned Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model.
gemma4_31b	31.00B	Gemma 4 31B base model: 31B parameter, 60-layer, dense vision+text pretrained Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint.
gemma4_instruct_31b	31.00B	Gemma 4 31B instruction-tuned model: 31B parameter, 60-layer, dense vision+text instruction-tuned Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint.

Guides and examples using from_preset

Multimodal and Agentic Workflows with Gemma 4 in KerasHub

[source]

`generate` method

Gemma4CausalLM.generate(
    inputs,
    max_length=None,
    stop_token_ids="auto",
    strip_prompt=False,
    assistant_model=None,
)

Generate text given prompt inputs.

This method generates text based on given inputs. The sampling method used for generation can be set via the compile() method.

If inputs are a tf.data.Dataset, outputs will be generated "batch-by-batch" and concatenated. Otherwise, all inputs will be handled as a single batch.

If a preprocessor is attached to the model, inputs will be preprocessed inside the generate() function and should match the structure expected by the preprocessor layer (usually raw strings). If a preprocessor is not attached, inputs should match the structure expected by the backbone. See the example usage above for a demonstration of each.

Arguments

inputs: python data, tensor data, or a tf.data.Dataset. If a preprocessor is attached to the model, inputs should match the structure expected by the preprocessor layer. If a preprocessor is not attached, inputs should match the structure expected the backbone model.
max_length: Optional. int. The max length of the generated sequence. Will default to the max configured sequence_length of the preprocessor. If preprocessor is None, inputs should be should be padded to the desired maximum length and this argument will be ignored.
stop_token_ids: Optional. None, "auto", or tuple of token ids. Defaults to "auto" which uses the preprocessor.tokenizer.end_token_id. Not specifying a processor will produce an error. None stops generation after generating max_length tokens. You may also specify a list of token id's the model should stop on. Note that sequences of tokens will each be interpreted as a stop token, multi-token stop sequences are not supported.
strip_prompt: Optional. By default, generate() returns the full prompt followed by its completion generated by the model. If this option is set to True, only the newly generated text is returned.

`backbone` property

keras_hub.models.Gemma4CausalLM.backbone

A keras_hub.models.Backbone model with the core architecture.

`preprocessor` property

keras_hub.models.Gemma4CausalLM.preprocessor

A keras_hub.models.Preprocessor layer used to preprocess input.

Gemma4CausalLM model

Gemma4CausalLM class

from_preset method

generate method

backbone property

preprocessor property

Gemma4CausalLM model

Gemma4CausalLM class

from_preset method

generate method

backbone property

preprocessor property

`Gemma4CausalLM` class

`from_preset` method

`generate` method

`backbone` property

`preprocessor` property