CausalLM classkeras_hub.models.CausalLM()
Base class for generative language modeling tasks.
CausalLM tasks wrap a keras_hub.models.Backbone and
a keras_hub.models.Preprocessor to create a model that can be used for
generation and generative fine-tuning.
CausalLM tasks provide an additional, high-level generate() function
which can be used to auto-regressively sample a model token by token with a
string in, string out signature. The compile() method of all CausalLM
classes contains an additional sampler argument, which can be used to pass
a keras_hub.samplers.Sampler to control how the predicted distribution
will be sampled.
When calling fit(), the tokenized input will be predicted token-by-token
with a causal mask applied, which gives both a pre-training and supervised
fine-tuning setup for controlling inference-time generation.
All CausalLM tasks include a from_preset() constructor which can be used
to load a pre-trained config and weights.
Example
# Load a GPT2 backbone with pre-trained weights.
causal_lm = keras_hub.models.CausalLM.from_preset(
"gpt2_base_en",
)
causal_lm.compile(sampler="top_k")
causal_lm.generate("Keras is a", max_length=64)
# Load a Mistral instruction tuned checkpoint at bfloat16 precision.
causal_lm = keras_hub.models.CausalLM.from_preset(
"mistral_instruct_7b_en",
dtype="bfloat16",
)
causal_lm.compile(sampler="greedy")
causal_lm.generate("Keras is a", max_length=64)
from_preset methodCausalLM.from_preset(preset, load_weights=True, **kwargs)
Instantiate a keras_hub.models.Task from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''./bert_base_en'For any Task subclass, you can run cls.presets.keys() to list all
built-in presets available on the class.
This constructor can be called in one of two ways. Either from a task
specific base class like keras_hub.models.CausalLM.from_preset(), or
from a model class like
keras_hub.models.BertTextClassifier.from_preset().
If calling from the a base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True, saved weights will be loaded into
the model architecture. If False, all weights will be
randomly initialized.Examples
# Load a Gemma generative task.
causal_lm = keras_hub.models.CausalLM.from_preset(
"gemma_2b_en",
)
# Load a Bert classification task.
model = keras_hub.models.TextClassifier.from_preset(
"bert_base_en",
num_classes=2,
)
| Preset | Parameters | Description |
|---|---|---|
| bart_base_en | 139.42M | 6-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl. |
| bart_large_en | 406.29M | 12-layer BART model where case is maintained. Trained on BookCorpus, English Wikipedia and CommonCrawl. |
| bart_large_en_cnn | 406.29M | The bart_large_en backbone model fine-tuned on the CNN+DM summarization dataset. |
| bloom_560m_multi | 559.21M | 24-layer Bloom model with hidden dimension of 1024. trained on 45 natural languages and 12 programming languages. |
| bloomz_560m_multi | 559.21M | 24-layer Bloom model with hidden dimension of 1024. finetuned on crosslingual task mixture (xP3) dataset. |
| bloom_1.1b_multi | 1.07B | 24-layer Bloom model with hidden dimension of 1536. trained on 45 natural languages and 12 programming languages. |
| bloomz_1.1b_multi | 1.07B | 24-layer Bloom model with hidden dimension of 1536. finetuned on crosslingual task mixture (xP3) dataset. |
| bloom_1.7b_multi | 1.72B | 24-layer Bloom model with hidden dimension of 2048. trained on 45 natural languages and 12 programming languages. |
| bloomz_1.7b_multi | 1.72B | 24-layer Bloom model with hidden dimension of 2048. finetuned on crosslingual task mixture (xP3) dataset. |
| bloom_3b_multi | 3.00B | 30-layer Bloom model with hidden dimension of 2560. trained on 45 natural languages and 12 programming languages. |
| bloomz_3b_multi | 3.00B | 30-layer Bloom model with hidden dimension of 2560. finetuned on crosslingual task mixture (xP3) dataset. |
| falcon_refinedweb_1b_en | 1.31B | 24-layer Falcon model (Falcon with 1B parameters), trained on 350B tokens of RefinedWeb dataset. |
| vault_gemma_1b_en | 1.04B | 1 billion parameter, 26-layer, VaultGemma model. |
| gemma_2b_en | 2.51B | 2 billion parameter, 18-layer, base Gemma model. |
| gemma_instruct_2b_en | 2.51B | 2 billion parameter, 18-layer, instruction tuned Gemma model. |
| gemma_1.1_instruct_2b_en | 2.51B | 2 billion parameter, 18-layer, instruction tuned Gemma model. The 1.1 update improves model quality. |
| code_gemma_1.1_2b_en | 2.51B | 2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. The 1.1 update improves model quality. |
| code_gemma_2b_en | 2.51B | 2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. |
| gemma2_2b_en | 2.61B | 2 billion parameter, 26-layer, base Gemma model. |
| gemma2_instruct_2b_en | 2.61B | 2 billion parameter, 26-layer, instruction tuned Gemma model. |
| shieldgemma_2b_en | 2.61B | 2 billion parameter, 26-layer, ShieldGemma model. |
| c2s_scale_gemma_2_2b_en | 2.61B | A 2 billion parameter, single-cell biology-aware model built on the Gemma-2 architecture. |
| gemma_7b_en | 8.54B | 7 billion parameter, 28-layer, base Gemma model. |
| gemma_instruct_7b_en | 8.54B | 7 billion parameter, 28-layer, instruction tuned Gemma model. |
| gemma_1.1_instruct_7b_en | 8.54B | 7 billion parameter, 28-layer, instruction tuned Gemma model. The 1.1 update improves model quality. |
| code_gemma_7b_en | 8.54B | 7 billion parameter, 28-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. |
| code_gemma_instruct_7b_en | 8.54B | 7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. |
| code_gemma_1.1_instruct_7b_en | 8.54B | 7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. The 1.1 update improves model quality. |
| gemma2_9b_en | 9.24B | 9 billion parameter, 42-layer, base Gemma model. |
| gemma2_instruct_9b_en | 9.24B | 9 billion parameter, 42-layer, instruction tuned Gemma model. |
| shieldgemma_9b_en | 9.24B | 9 billion parameter, 42-layer, ShieldGemma model. |
| gemma2_27b_en | 27.23B | 27 billion parameter, 42-layer, base Gemma model. |
| gemma2_instruct_27b_en | 27.23B | 27 billion parameter, 42-layer, instruction tuned Gemma model. |
| shieldgemma_27b_en | 27.23B | 27 billion parameter, 42-layer, ShieldGemma model. |
| c2s_scale_gemma_2_27b_en | 27.23B | A 27 billion parameter, single-cell biology-aware model built on the Gemma-2 architecture. |
| gemma3_270m | 268.10M | 270-million parameter(170m embedding,100m transformer params) model, 18-layer, text-only designed for hyper-efficient AI, particularly for task-specific fine-tuning. |
| gemma3_instruct_270m | 268.10M | 270-million parameter(170m embedding,100m transformer params) model, 18-layer, text-only,instruction-tuned model designed for hyper-efficient AI, particularly for task-specific fine-tuning. |
| gemma3_1b | 999.89M | 1 billion parameter, 26-layer, text-only pretrained Gemma3 model. |
| gemma3_instruct_1b | 999.89M | 1 billion parameter, 26-layer, text-only instruction-tuned Gemma3 model. |
| gemma3_4b_text | 3.88B | 4 billion parameter, 34-layer, text-only pretrained Gemma3 model. |
| gemma3_instruct_4b_text | 3.88B | 4 billion parameter, 34-layer, text-only instruction-tuned Gemma3 model. |
| gemma3_4b | 4.30B | 4 billion parameter, 34-layer, vision+text pretrained Gemma3 model. |
| gemma3_instruct_4b | 4.30B | 4 billion parameter, 34-layer, vision+text instruction-tuned Gemma3 model. |
| gemma3_12b_text | 11.77B | 12 billion parameter, 48-layer, text-only pretrained Gemma3 model. |
| gemma3_instruct_12b_text | 11.77B | 12 billion parameter, 48-layer, text-only instruction-tuned Gemma3 model. |
| gemma3_12b | 12.19B | 12 billion parameter, 48-layer, vision+text pretrained Gemma3 model. |
| gemma3_instruct_12b | 12.19B | 12 billion parameter, 48-layer, vision+text instruction-tuned Gemma3 model. |
| gemma3_27b_text | 27.01B | 27 billion parameter, 62-layer, text-only pretrained Gemma3 model. |
| gemma3_instruct_27b_text | 27.01B | 27 billion parameter, 62-layer, text-only instruction-tuned Gemma3 model. |
| gemma3_27b | 27.43B | 27 billion parameter, 62-layer, vision+text pretrained Gemma3 model. |
| gemma3_instruct_27b | 27.43B | 27 billion parameter, 62-layer, vision+text instruction-tuned Gemma3 model. |
| gpt2_base_en | 124.44M | 12-layer GPT-2 model where case is maintained. Trained on WebText. |
| gpt2_base_en_cnn_dailymail | 124.44M | 12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset. |
| gpt2_medium_en | 354.82M | 24-layer GPT-2 model where case is maintained. Trained on WebText. |
| gpt2_large_en | 774.03M | 36-layer GPT-2 model where case is maintained. Trained on WebText. |
| gpt2_extra_large_en | 1.56B | 48-layer GPT-2 model where case is maintained. Trained on WebText. |
| llama2_7b_en | 6.74B | 7 billion parameter, 32-layer, base LLaMA 2 model. |
| llama2_instruct_7b_en | 6.74B | 7 billion parameter, 32-layer, instruction tuned LLaMA 2 model. |
| vicuna_1.5_7b_en | 6.74B | 7 billion parameter, 32-layer, instruction tuned Vicuna v1.5 model. |
| llama2_7b_en_int8 | 6.74B | 7 billion parameter, 32-layer, base LLaMA 2 model with activation and weights quantized to int8. |
| llama2_instruct_7b_en_int8 | 6.74B | 7 billion parameter, 32-layer, instruction tuned LLaMA 2 model with activation and weights quantized to int8. |
| llama3.2_1b | 1.50B | 1 billion parameter, 16-layer, based LLaMA 3.2 model. |
| llama3.2_instruct_1b | 1.50B | 1 billion parameter, 16-layer, instruction tuned LLaMA 3.2. |
| llama3.2_guard_1b | 1.50B | 1 billion parameter, 16-layer, based LLaMA 3.2 model fine-tuned for consent safety classification. |
| llama3.2_3b | 3.61B | 3 billion parameter, 26-layer, based LLaMA 3.2 model. |
| llama3.2_instruct_3b | 3.61B | 3 billion parameter, 28-layer, instruction tuned LLaMA 3.2. |
| llama3_8b_en | 8.03B | 8 billion parameter, 32-layer, base LLaMA 3 model. |
| llama3_instruct_8b_en | 8.03B | 8 billion parameter, 32-layer, instruction tuned LLaMA 3 model. |
| llama3.1_8b | 8.03B | 8 billion parameter, 32-layer, based LLaMA 3.1 model. |
| llama3.1_instruct_8b | 8.03B | 8 billion parameter, 32-layer, instruction tuned LLaMA 3.1. |
| llama3.1_guard_8b | 8.03B | 8 billion parameter, 32-layer, LLaMA 3.1 fine-tuned for consent safety classification. |
| llama3_8b_en_int8 | 8.03B | 8 billion parameter, 32-layer, base LLaMA 3 model with activation and weights quantized to int8. |
| llama3_instruct_8b_en_int8 | 8.03B | 8 billion parameter, 32-layer, instruction tuned LLaMA 3 model with activation and weights quantized to int8. |
| mistral_7b_en | 7.24B | Mistral 7B base model |
| mistral_instruct_7b_en | 7.24B | Mistral 7B instruct model |
| mistral_0.2_instruct_7b_en | 7.24B | Mistral 7B instruct version 0.2 model |
| mistral_0.3_7b_en | 7.25B | Mistral 7B base version 0.3 model |
| mistral_0.3_instruct_7b_en | 7.25B | Mistral 7B instruct version 0.3 model |
| mixtral_8_7b_en | 46.70B | 32-layer Mixtral MoE model with 7 billionactive parameters and 8 experts per MoE layer. |
| mixtral_8_instruct_7b_en | 46.70B | Instruction fine-tuned 32-layer Mixtral MoE modelwith 7 billion active parameters and 8 experts per MoE layer. |
| moonshine_tiny_en | 27.09M | Moonshine tiny model for English speech recognition. Developed by Useful Sensors for real-time transcription. |
| moonshine_base_en | 61.51M | Moonshine base model for English speech recognition. Developed by Useful Sensors for real-time transcription. |
| opt_125m_en | 125.24M | 12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. |
| opt_1.3b_en | 1.32B | 24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. |
| opt_2.7b_en | 2.70B | 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. |
| opt_6.7b_en | 6.70B | 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. |
| pali_gemma_3b_mix_224 | 2.92B | image size 224, mix fine tuned, text sequence length is 256 |
| pali_gemma_3b_224 | 2.92B | image size 224, pre trained, text sequence length is 128 |
| pali_gemma_3b_mix_448 | 2.92B | image size 448, mix fine tuned, text sequence length is 512 |
| pali_gemma_3b_448 | 2.92B | image size 448, pre trained, text sequence length is 512 |
| pali_gemma_3b_896 | 2.93B | image size 896, pre trained, text sequence length is 512 |
| pali_gemma2_mix_3b_224 | 3.03B | 3 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains. |
| pali_gemma2_pt_3b_224 | 3.03B | 3 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets. |
| pali_gemma_2_ft_docci_3b_448 | 3.03B | 3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details. |
| pali_gemma2_mix_3b_448 | 3.03B | 3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains. |
| pali_gemma2_pt_3b_448 | 3.03B | 3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets. |
| pali_gemma2_pt_3b_896 | 3.04B | 3 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets. |
| pali_gemma2_mix_10b_224 | 9.66B | 10 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains. |
| pali_gemma2_pt_10b_224 | 9.66B | 10 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets. |
| pali_gemma2_ft_docci_10b_448 | 9.66B | 10 billion parameter, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details. |
| pali_gemma2_mix_10b_448 | 9.66B | 10 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains. |
| pali_gemma2_pt_10b_448 | 9.66B | 10 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets. |
| pali_gemma2_pt_10b_896 | 9.67B | 10 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets. |
| pali_gemma2_mix_28b_224 | 27.65B | 28 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains. |
| pali_gemma2_mix_28b_448 | 27.65B | 28 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains. |
| pali_gemma2_pt_28b_224 | 27.65B | 28 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets. |
| pali_gemma2_pt_28b_448 | 27.65B | 28 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets. |
| pali_gemma2_pt_28b_896 | 27.65B | 28 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets. |
| parseq | 23.83M | Permuted autoregressive sequence (PARSeq) base model for scene text recognition |
| phi3_mini_4k_instruct_en | 3.82B | 3.8 billion parameters, 32 layers, 4k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. |
| phi3_mini_128k_instruct_en | 3.82B | 3.8 billion parameters, 32 layers, 128k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties. |
| qwen2.5_0.5b_en | 494.03M | 24-layer Qwen model with 0.5 billion parameters. |
| qwen2.5_instruct_0.5b_en | 494.03M | Instruction fine-tuned 24-layer Qwen model with 0.5 billion parameters. |
| qwen2.5_3b_en | 3.09B | 36-layer Qwen model with 3.1 billion parameters. |
| qwen2.5_7b_en | 6.99B | 48-layer Qwen model with 7 billion parameters. |
| qwen2.5_instruct_32b_en | 32.76B | Instruction fine-tuned 64-layer Qwen model with 32 billion parameters. |
| qwen2.5_instruct_72b_en | 72.71B | Instruction fine-tuned 80-layer Qwen model with 72 billion parameters. |
| qwen3_0.6b_en | 596.05M | 28-layer Qwen3 model with 596M parameters, optimized for efficiency and fast inference on resource-constrained devices. |
| qwen3_1.7b_en | 1.72B | 28-layer Qwen3 model with 1.72B parameters, offering a good balance between performance and resource usage. |
| qwen3_4b_en | 4.02B | 36-layer Qwen3 model with 4.02B parameters, offering improved reasoning capabilities and better performance than smaller variants. |
| qwen3_8b_en | 8.19B | 36-layer Qwen3 model with 8.19B parameters, featuring enhanced reasoning, coding, and instruction-following capabilities. |
| qwen3_14b_en | 14.77B | 40-layer Qwen3 model with 14.77B parameters, featuring advanced reasoning, coding, and multilingual capabilities. |
| qwen3_32b_en | 32.76B | 64-layer Qwen3 model with 32.76B parameters, featuring state-of-the-art performance across reasoning, coding, and general language tasks. |
| qwen3_moe_30b_a3b_en | 30.53B | Mixture-of-Experts (MoE) model has 30.5 billion total parameters with 3.3 billion activated, built on 48 layers and utilizes 32 query and 4 key/value attention heads with 128 experts (8 active). |
| qwen3_moe_235b_a22b_en | 235.09B | Mixture-of-Experts (MoE) model has 235 billion total parameters with 22 billion activated, built on 94 layers and utilizes 64 query and 4 key/value attention heads with 128 experts (8 active). |
| qwen1.5_moe_2.7b_en | 14.32B | 24-layer Qwen MoE model with 2.7 billion active parameters and 8 experts per MoE layer. |
| t5gemma_s_s_ul2 | 312.52M | T5Gemma S/S model with a small encoder and small decoder, adapted as a UL2 model. |
| t5gemma_s_s_prefixlm | 312.52M | T5Gemma S/S model with a small encoder and small decoder, adapted as a prefix language model. |
| t5gemma_s_s_ul2_it | 312.52M | T5Gemma S/S model with a small encoder and small decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_s_s_prefixlm_it | 312.52M | T5Gemma S/S model with a small encoder and small decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_b_b_ul2 | 591.49M | T5Gemma B/B model with a base encoder and base decoder, adapted as a UL2 model. |
| t5gemma_b_b_prefixlm | 591.49M | T5Gemma B/B model with a base encoder and base decoder, adapted as a prefix language model. |
| t5gemma_b_b_ul2_it | 591.49M | T5Gemma B/B model with a base encoder and base decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_b_b_prefixlm_it | 591.49M | T5Gemma B/B model with a base encoder and base decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_l_l_ul2 | 1.24B | T5Gemma L/L model with a large encoder and large decoder, adapted as a UL2 model. |
| t5gemma_l_l_prefixlm | 1.24B | T5Gemma L/L model with a large encoder and large decoder, adapted as a prefix language model. |
| t5gemma_l_l_ul2_it | 1.24B | T5Gemma L/L model with a large encoder and large decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_l_l_prefixlm_it | 1.24B | T5Gemma L/L model with a large encoder and large decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_ml_ml_ul2 | 2.20B | T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a UL2 model. |
| t5gemma_ml_ml_prefixlm | 2.20B | T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a prefix language model. |
| t5gemma_ml_ml_ul2_it | 2.20B | T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_ml_ml_prefixlm_it | 2.20B | T5Gemma ML/ML model with a medium-large encoder and medium-large decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_xl_xl_ul2 | 3.77B | T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a UL2 model. |
| t5gemma_xl_xl_prefixlm | 3.77B | T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a prefix language model. |
| t5gemma_xl_xl_ul2_it | 3.77B | T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_xl_xl_prefixlm_it | 3.77B | T5Gemma XL/XL model with an extra-large encoder and extra-large decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_2b_2b_ul2 | 5.60B | T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model. |
| t5gemma_2b_2b_prefixlm | 5.60B | T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model. |
| t5gemma_2b_2b_ul2_it | 5.60B | T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_2b_2b_prefixlm_it | 5.60B | T5Gemma 2B/2B model with a 2-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_9b_2b_ul2 | 12.29B | T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model. |
| t5gemma_9b_2b_prefixlm | 12.29B | T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model. |
| t5gemma_9b_2b_ul2_it | 12.29B | T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_9b_2b_prefixlm_it | 12.29B | T5Gemma 9B/2B model with a 9-billion-parameter encoder and 2-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following. |
| t5gemma_9b_9b_ul2 | 20.33B | T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a UL2 model. |
| t5gemma_9b_9b_prefixlm | 20.33B | T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a prefix language model. |
| t5gemma_9b_9b_ul2_it | 20.33B | T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a UL2 model and fine-tuned for instruction following. |
| t5gemma_9b_9b_prefixlm_it | 20.33B | T5Gemma 9B/9B model with a 9-billion-parameter encoder and 9-billion-parameter decoder, adapted as a prefix language model and fine-tuned for instruction following. |
compile methodCausalLM.compile(
optimizer="auto", loss="auto", weighted_metrics="auto", sampler="top_k", **kwargs
)
Configures the CausalLM task for training and generation.
The CausalLM task extends the default compilation signature of
keras.Model.compile with defaults for optimizer, loss, and
weighted_metrics. To override these defaults, pass any value
to these arguments during compilation.
The CausalLM task adds a new sampler to compile, which can be used
to control the sampling strategy used with the generate function.
Note that because training inputs include padded tokens which are
excluded from the loss, it is almost always a good idea to compile with
weighted_metrics and not metrics.
Arguments
"auto", an optimizer name, or a keras.Optimizer
instance. Defaults to "auto", which uses the default optimizer
for the given model and task. See keras.Model.compile and
keras.optimizers for more info on possible optimizer values."auto", a loss name, or a keras.losses.Loss instance.
Defaults to "auto", where a
keras.losses.SparseCategoricalCrossentropy loss will be
applied for the token classification CausalLM task. See
keras.Model.compile and keras.losses for more info on
possible loss values."auto", or a list of metrics to be evaluated by
the model during training and testing. Defaults to "auto",
where a keras.metrics.SparseCategoricalAccuracy will be
applied to track the accuracy of the model at guessing masked
token values. See keras.Model.compile and keras.metrics for
more info on possible weighted_metrics values.keras_hub.samplers.Sampler instance.
Configures the sampling method used during generate() calls.
See keras_hub.samplers for a full list of built-in sampling
strategies.keras.Model.compile for a full list of arguments
supported by the compile method.generate methodCausalLM.generate(inputs, max_length=None, stop_token_ids="auto", strip_prompt=False)
Generate text given prompt inputs.
This method generates text based on given inputs. The sampling method
used for generation can be set via the compile() method.
If inputs are a tf.data.Dataset, outputs will be generated
"batch-by-batch" and concatenated. Otherwise, all inputs will be handled
as a single batch.
If a preprocessor is attached to the model, inputs will be
preprocessed inside the generate() function and should match the
structure expected by the preprocessor layer (usually raw strings).
If a preprocessor is not attached, inputs should match the structure
expected by the backbone. See the example usage above for a
demonstration of each.
Arguments
tf.data.Dataset. If a
preprocessor is attached to the model, inputs should match
the structure expected by the preprocessor layer. If a
preprocessor is not attached, inputs should match the
structure expected the backbone model.sequence_length of the
preprocessor. If preprocessor is None, inputs should be
should be padded to the desired maximum length and this argument
will be ignored.None, "auto", or tuple of token ids.
Defaults to "auto" which uses the
preprocessor.tokenizer.end_token_id. Not specifying a
processor will produce an error. None stops generation after
generating max_length tokens. You may also specify a list of
token id's the model should stop on. Note that sequences of
tokens will each be interpreted as a stop token, multi-token
stop sequences are not supported.save_to_preset methodCausalLM.save_to_preset(preset_dir, max_shard_size=10)
Save task to a preset directory.
Arguments
int or float. Maximum size in GB for each
sharded file. If None, no sharding will be done. Defaults to
10.preprocessor propertykeras_hub.models.CausalLM.preprocessor
A keras_hub.models.Preprocessor layer used to preprocess input.
backbone propertykeras_hub.models.CausalLM.backbone
A keras_hub.models.Backbone model with the core architecture.