► KerasHub: Pretrained Models / API documentation / Modeling API / CausalLMPreprocessor

CausalLMPreprocessor

`CausalLMPreprocessor` class

keras_hub.models.CausalLMPreprocessor(
    tokenizer, sequence_length=1024, add_start_token=True, add_end_token=True, **kwargs
)

Base class for causal language modeling preprocessing layers.

CausalLMPreprocessor tasks wrap a keras_hub.tokenizer.Tokenizer to create a preprocessing layer for causal language modeling tasks. It is intended to be paired with a keras.models.CausalLM task.

All CausalLMPreprocessor take inputs a single input. This can be a single string or a batch of strings. See examples below. These inputs will be tokenized and padded/truncated to a fixed sequence length.

This layer will always output a (x, y, sample_weight) tuple, where x is a dictionary with the tokenized inputs, y contains the tokens from x offset by 1, and sample_weight marks where y contains padded values. The exact contents of x will vary depending on the model being used.

a CausalLMPreprocessor contains two extra methods, generate_preprocess and generate_postprocess for use with generation. See examples below.

All CausalLMPreprocessor tasks include a from_preset() constructor which can be used to load a pre-trained config and vocabularies. You can call the from_preset() constructor directly on this base class, in which case the correct class for you model will be automatically instantiated.

Examples.

preprocessor = keras_hub.models.CausalLMPreprocessor.from_preset(
    "bert_base_en_uncased",
    sequence_length=256, # Optional.
)

# Tokenize, mask and pack a single sentence.
x = "The quick brown fox jumped."
x, y, sample_weight = preprocessor(x)

# Tokenize and pad/truncate a batch of labeled sentences.
x = ["The quick brown fox jumped.", "Call me Ishmael."]
x, y, sample_weight = preprocessor(x)

# With a [`tf.data.Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset).
ds = tf.data.Dataset.from_tensor_slices(x)
ds = ds.map(preprocessor, num_parallel_calls=tf.data.AUTOTUNE)

# Generate preprocess and postprocess.
x = preprocessor.generate_preprocess(x)  # Tokenized numeric inputs.
x = preprocessor.generate_postprocess(x)  # Detokenized string outputs.

[source]

`from_preset` method

CausalLMPreprocessor.from_preset(preset, config_file="preprocessor.json", **kwargs)

Instantiate a keras_hub.models.Preprocessor from a model preset.

A preset is a directory of configs, weights and other file assets used to save and load a pre-trained model. The preset can be passed as one of:

a built-in preset identifier like 'bert_base_en'
a Kaggle Models handle like 'kaggle://user/bert/keras/bert_base_en'
a Hugging Face handle like 'hf://user/bert_base_en'
a path to a local preset directory like './bert_base_en'

For any Preprocessor subclass, you can run cls.presets.keys() to list all built-in presets available on the class.

As there are usually multiple preprocessing classes for a given model, this method should be called on a specific subclass like keras_hub.models.BertTextClassifierPreprocessor.from_preset().

Arguments

preset: string. A built-in preset identifier, a Kaggle Models handle, a Hugging Face handle, or a path to a local directory.

Examples

# Load a preprocessor for Gemma generation.
preprocessor = keras_hub.models.CausalLMPreprocessor.from_preset(
    "gemma_2b_en",
)

# Load a preprocessor for Bert classification.
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset(
    "bert_base_en",
)

Preset	Parameters	Description
bloom_560m_multi	559.21M	24-layer Bloom model with hidden dimension of 1024. trained on 45 natural languages and 12 programming languages.
bloomz_560m_multi	559.21M	24-layer Bloom model with hidden dimension of 1024. finetuned on crosslingual task mixture (xP3) dataset.
bloom_1.1b_multi	1.07B	24-layer Bloom model with hidden dimension of 1536. trained on 45 natural languages and 12 programming languages.
bloomz_1.1b_multi	1.07B	24-layer Bloom model with hidden dimension of 1536. finetuned on crosslingual task mixture (xP3) dataset.
bloom_1.7b_multi	1.72B	24-layer Bloom model with hidden dimension of 2048. trained on 45 natural languages and 12 programming languages.
bloomz_1.7b_multi	1.72B	24-layer Bloom model with hidden dimension of 2048. finetuned on crosslingual task mixture (xP3) dataset.
bloom_3b_multi	3.00B	30-layer Bloom model with hidden dimension of 2560. trained on 45 natural languages and 12 programming languages.
bloomz_3b_multi	3.00B	30-layer Bloom model with hidden dimension of 2560. finetuned on crosslingual task mixture (xP3) dataset.
falcon_refinedweb_1b_en	1.31B	24-layer Falcon model (Falcon with 1B parameters), trained on 350B tokens of RefinedWeb dataset.
gemma_2b_en	2.51B	2 billion parameter, 18-layer, base Gemma model.
gemma_instruct_2b_en	2.51B	2 billion parameter, 18-layer, instruction tuned Gemma model.
gemma_1.1_instruct_2b_en	2.51B	2 billion parameter, 18-layer, instruction tuned Gemma model. The 1.1 update improves model quality.
code_gemma_1.1_2b_en	2.51B	2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion. The 1.1 update improves model quality.
code_gemma_2b_en	2.51B	2 billion parameter, 18-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion.
gemma2_2b_en	2.61B	2 billion parameter, 26-layer, base Gemma model.
gemma2_instruct_2b_en	2.61B	2 billion parameter, 26-layer, instruction tuned Gemma model.
shieldgemma_2b_en	2.61B	2 billion parameter, 26-layer, ShieldGemma model.
gemma_7b_en	8.54B	7 billion parameter, 28-layer, base Gemma model.
gemma_instruct_7b_en	8.54B	7 billion parameter, 28-layer, instruction tuned Gemma model.
gemma_1.1_instruct_7b_en	8.54B	7 billion parameter, 28-layer, instruction tuned Gemma model. The 1.1 update improves model quality.
code_gemma_7b_en	8.54B	7 billion parameter, 28-layer, CodeGemma model. This model has been trained on a fill-in-the-middle (FIM) task for code completion.
code_gemma_instruct_7b_en	8.54B	7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code.
code_gemma_1.1_instruct_7b_en	8.54B	7 billion parameter, 28-layer, instruction tuned CodeGemma model. This model has been trained for chat use cases related to code. The 1.1 update improves model quality.
gemma2_9b_en	9.24B	9 billion parameter, 42-layer, base Gemma model.
gemma2_instruct_9b_en	9.24B	9 billion parameter, 42-layer, instruction tuned Gemma model.
shieldgemma_9b_en	9.24B	9 billion parameter, 42-layer, ShieldGemma model.
gemma2_27b_en	27.23B	27 billion parameter, 42-layer, base Gemma model.
gemma2_instruct_27b_en	27.23B	27 billion parameter, 42-layer, instruction tuned Gemma model.
shieldgemma_27b_en	27.23B	27 billion parameter, 42-layer, ShieldGemma model.
gemma3_1b	999.89M	1 billion parameter, 26-layer, text-only pretrained Gemma3 model.
gemma3_instruct_1b	999.89M	1 billion parameter, 26-layer, text-only instruction-tuned Gemma3 model.
gemma3_4b_text	3.88B	4 billion parameter, 34-layer, text-only pretrained Gemma3 model.
gemma3_instruct_4b_text	3.88B	4 billion parameter, 34-layer, text-only instruction-tuned Gemma3 model.
gemma3_4b	4.30B	4 billion parameter, 34-layer, vision+text pretrained Gemma3 model.
gemma3_instruct_4b	4.30B	4 billion parameter, 34-layer, vision+text instruction-tuned Gemma3 model.
gemma3_12b_text	11.77B	12 billion parameter, 48-layer, text-only pretrained Gemma3 model.
gemma3_instruct_12b_text	11.77B	12 billion parameter, 48-layer, text-only instruction-tuned Gemma3 model.
gemma3_12b	12.19B	12 billion parameter, 48-layer, vision+text pretrained Gemma3 model.
gemma3_instruct_12b	12.19B	12 billion parameter, 48-layer, vision+text instruction-tuned Gemma3 model.
gemma3_27b_text	27.01B	27 billion parameter, 62-layer, text-only pretrained Gemma3 model.
gemma3_instruct_27b_text	27.01B	27 billion parameter, 62-layer, text-only instruction-tuned Gemma3 model.
gemma3_27b	27.43B	27 billion parameter, 62-layer, vision+text pretrained Gemma3 model.
gemma3_instruct_27b	27.43B	27 billion parameter, 62-layer, vision+text instruction-tuned Gemma3 model.
gpt2_base_en	124.44M	12-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_base_en_cnn_dailymail	124.44M	12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset.
gpt2_medium_en	354.82M	24-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_large_en	774.03M	36-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_extra_large_en	1.56B	48-layer GPT-2 model where case is maintained. Trained on WebText.
llama2_7b_en	6.74B	7 billion parameter, 32-layer, base LLaMA 2 model.
llama2_instruct_7b_en	6.74B	7 billion parameter, 32-layer, instruction tuned LLaMA 2 model.
vicuna_1.5_7b_en	6.74B	7 billion parameter, 32-layer, instruction tuned Vicuna v1.5 model.
llama2_7b_en_int8	6.74B	7 billion parameter, 32-layer, base LLaMA 2 model with activation and weights quantized to int8.
llama2_instruct_7b_en_int8	6.74B	7 billion parameter, 32-layer, instruction tuned LLaMA 2 model with activation and weights quantized to int8.
llama3.2_1b	1.50B	1 billion parameter, 16-layer, based LLaMA 3.2 model.
llama3.2_instruct_1b	1.50B	1 billion parameter, 16-layer, instruction tuned LLaMA 3.2.
llama3.2_guard_1b	1.50B	1 billion parameter, 16-layer, based LLaMA 3.2 model fine-tuned for consent safety classification.
llama3.2_3b	3.61B	3 billion parameter, 26-layer, based LLaMA 3.2 model.
llama3.2_instruct_3b	3.61B	3 billion parameter, 28-layer, instruction tuned LLaMA 3.2.
llama3_8b_en	8.03B	8 billion parameter, 32-layer, base LLaMA 3 model.
llama3_instruct_8b_en	8.03B	8 billion parameter, 32-layer, instruction tuned LLaMA 3 model.
llama3.1_8b	8.03B	8 billion parameter, 32-layer, based LLaMA 3.1 model.
llama3.1_instruct_8b	8.03B	8 billion parameter, 32-layer, instruction tuned LLaMA 3.1.
llama3.1_guard_8b	8.03B	8 billion parameter, 32-layer, LLaMA 3.1 fine-tuned for consent safety classification.
llama3_8b_en_int8	8.03B	8 billion parameter, 32-layer, base LLaMA 3 model with activation and weights quantized to int8.
llama3_instruct_8b_en_int8	8.03B	8 billion parameter, 32-layer, instruction tuned LLaMA 3 model with activation and weights quantized to int8.
mistral_7b_en	7.24B	Mistral 7B base model
mistral_instruct_7b_en	7.24B	Mistral 7B instruct model
mistral_0.2_instruct_7b_en	7.24B	Mistral 7B instruct Version 0.2 model
mixtral_8_7b_en	46.70B	32-layer Mixtral MoE model with 7 billionactive parameters and 8 experts per MoE layer.
mixtral_8_instruct_7b_en	46.70B	Instruction fine-tuned 32-layer Mixtral MoE modelwith 7 billion active parameters and 8 experts per MoE layer.
opt_125m_en	125.24M	12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_1.3b_en	1.32B	24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_2.7b_en	2.70B	32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_6.7b_en	6.70B	32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
pali_gemma_3b_mix_224	2.92B	image size 224, mix fine tuned, text sequence length is 256
pali_gemma_3b_224	2.92B	image size 224, pre trained, text sequence length is 128
pali_gemma_3b_mix_448	2.92B	image size 448, mix fine tuned, text sequence length is 512
pali_gemma_3b_448	2.92B	image size 448, pre trained, text sequence length is 512
pali_gemma_3b_896	2.93B	image size 896, pre trained, text sequence length is 512
pali_gemma2_mix_3b_224	3.03B	3 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_3b_224	3.03B	3 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma_2_ft_docci_3b_448	3.03B	3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details.
pali_gemma2_mix_3b_448	3.03B	3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_3b_448	3.03B	3 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_3b_896	3.04B	3 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 26-layer Gemma2 2B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_mix_10b_224	9.66B	10 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_10b_224	9.66B	10 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_ft_docci_10b_448	9.66B	10 billion parameter, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on the DOCCI dataset for improved descriptions with fine-grained details.
pali_gemma2_mix_10b_448	9.66B	10 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_10b_448	9.66B	10 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_10b_896	9.67B	10 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 42-layer Gemma2 9B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_mix_28b_224	27.65B	28 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_mix_28b_448	27.65B	28 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been fine-tuned on a wide range of vision-language tasks and domains.
pali_gemma2_pt_28b_224	27.65B	28 billion parameter, image size 224, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_28b_448	27.65B	28 billion parameter, image size 448, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
pali_gemma2_pt_28b_896	27.65B	28 billion parameter, image size 896, 27-layer for SigLIP-So400m vision encoder and 46-layer Gemma2 27B lanuage model. This model has been pre-trained on a mixture of datasets.
phi3_mini_4k_instruct_en	3.82B	3.8 billion parameters, 32 layers, 4k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
phi3_mini_128k_instruct_en	3.82B	3.8 billion parameters, 32 layers, 128k context length, Phi-3 model. The model was trained using the Phi-3 datasets. This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.
qwen2.5_0.5b_en	494.03M	24-layer Qwen model with 0.5 billion parameters.
qwen2.5_instruct_0.5b_en	494.03M	Instruction fine-tuned 24-layer Qwen model with 0.5 billion parameters.
qwen2.5_3b_en	3.09B	36-layer Qwen model with 3.1 billion parameters.
qwen2.5_7b_en	6.99B	48-layer Qwen model with 7 billion parameters.
qwen2.5_instruct_32b_en	32.76B	Instruction fine-tuned 64-layer Qwen model with 32 billion parameters.
qwen2.5_instruct_72b_en	72.71B	Instruction fine-tuned 80-layer Qwen model with 72 billion parameters.
qwen1.5_moe_2.7b_en	14.32B	24-layer Qwen MoE model with 2.7 billion active parameters and 8 experts per MoE layer.
siglip_base_patch16_224	203.16M	200 million parameter, image size 224, pre-trained on WebLi.
siglip_base_patch16_256	203.20M	200 million parameter, image size 256, pre-trained on WebLi.
siglip_base_patch16_384	203.45M	200 million parameter, image size 384, pre-trained on WebLi.
siglip_base_patch16_512	203.79M	200 million parameter, image size 512, pre-trained on WebLi.
siglip_base_patch16_256_multilingual	370.63M	370 million parameter, image size 256, pre-trained on WebLi.
siglip2_base_patch16_224	375.19M	375 million parameter, patch size 16, image size 224, pre-trained on WebLi.
siglip2_base_patch16_256	375.23M	375 million parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_base_patch32_256	376.86M	376 million parameter, patch size 32, image size 256, pre-trained on WebLi.
siglip2_base_patch16_384	376.86M	376 million parameter, patch size 16, image size 384, pre-trained on WebLi.
siglip_large_patch16_256	652.15M	652 million parameter, image size 256, pre-trained on WebLi.
siglip_large_patch16_384	652.48M	652 million parameter, image size 384, pre-trained on WebLi.
siglip_so400m_patch14_224	877.36M	877 million parameter, image size 224, shape-optimized version, pre-trained on WebLi.
siglip_so400m_patch14_384	877.96M	877 million parameter, image size 384, shape-optimized version, pre-trained on WebLi.
siglip2_large_patch16_256	881.53M	881 million parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_large_patch16_384	881.86M	881 million parameter, patch size 16, image size 384, pre-trained on WebLi.
siglip2_large_patch16_512	882.31M	882 million parameter, patch size 16, image size 512, pre-trained on WebLi.
siglip_so400m_patch16_256_i18n	1.13B	1.1 billion parameter, image size 256, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch14_224	1.14B	1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_256	1.14B	1.1 billion parameter, patch size 16, image size 256, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch14_384	1.14B	1.1 billion parameter, patch size 14, image size 224, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_384	1.14B	1.1 billion parameter, patch size 16, image size 384, shape-optimized version, pre-trained on WebLi.
siglip2_so400m_patch16_512	1.14B	1.1 billion parameter, patch size 16, image size 512, shape-optimized version, pre-trained on WebLi.
siglip2_giant_opt_patch16_256	1.87B	1.8 billion parameter, patch size 16, image size 256, pre-trained on WebLi.
siglip2_giant_opt_patch16_384	1.87B	1.8 billion parameter, patch size 16, image size 384, pre-trained on WebLi.

[source]

`save_to_preset` method

CausalLMPreprocessor.save_to_preset(preset_dir)

Save preprocessor to a preset directory.

Arguments

preset_dir: The path to the local model preset directory.

`tokenizer` property

keras_hub.models.CausalLMPreprocessor.tokenizer

The tokenizer used to tokenize strings.

CausalLMPreprocessor

CausalLMPreprocessor class

from_preset method

save_to_preset method

tokenizer property

CausalLMPreprocessor