► KerasHub: Pretrained Models / API documentation / Model Architectures / Qwen3 / Qwen3CausalLM model

Qwen3CausalLM model

`Qwen3CausalLM` class

keras_hub.models.Qwen3CausalLM(backbone, preprocessor=None, **kwargs)

An end-to-end Qwen3 model for causal language modeling.

A causal language model (LM) predicts the next token based on previous tokens. This task setup can be used to train the model unsupervised on plain text input, or to autoregressively generate plain text similar to the data used for training. This task can be used for pre-training or fine-tuning a Qwen3 model, simply by calling fit().

This model has a generate() method, which generates text based on a prompt. The generation strategy used is controlled by an additional sampler argument on compile(). You can recompile the model with different keras_hub.samplers objects to control the generation. By default, "greedy" sampling will be used.

This model can optionally be configured with a preprocessor layer, in which case it will automatically apply preprocessing to string inputs during fit(), predict(), evaluate(), and generate(). This is done by default when creating the model with from_preset().

Arguments

backbone: A keras_hub.models.Qwen3Backbone instance.
preprocessor: A keras_hub.models.Qwen3CausalLMPreprocessor or None. If None, this model will not apply preprocessing, and inputs should be preprocessed before calling the model.

Examples

Use generate() to do text generation.

qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset("qwen3_0.6b_en")
qwen3_lm.generate("I want to say", max_length=30)

# Generate with batched prompts.
qwen3_lm.generate(["This is a", "Where are you"], max_length=30)

Compile the generate() function with a custom sampler.

qwen3_lm = keras_hub.models.Qwen3MoeCausalLM.from_preset("qwen3_0.6b_en")
qwen3_lm.compile(sampler="top_k")
qwen3_lm.generate("I want to say", max_length=30)

qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2))
qwen3_lm.generate("I want to say", max_length=30)

Use generate() without preprocessing.

prompt = {
    # Token ids for "<bos> Qwen3 is".
    "token_ids": np.array([[2, 12345, 678, 0, 0, 0, 0]] * 2),
    # Use `"padding_mask"` to indicate values that should not be overridden.
    "padding_mask": np.array([[1, 1, 1, 0, 0, 0, 0]] * 2),
}

qwen3_lm = keras_hub.models.Qwen3MoeCausalLM.from_preset(
    "qwen3_0.6b_en",
    preprocessor=None,
)
qwen3_lm.generate(prompt)

Call fit() on a single batch.

features = ["The quick brown fox jumped.", "I forgot my homework."]
qwen3_lm = keras_hub.models.Qwen3MoeCausalLM.from_preset("qwen3_0.6b_en")
qwen3_lm.fit(x=features, batch_size=2)

Call fit() with LoRA fine-tuning enabled.

features = ["The quick brown fox jumped.", "I forgot my homework."]
qwen3_lm = keras_hub.models.Qwen3MoeCausalLM.from_preset(
    'qwen3_0.6b_en'
)
qwen3_lm.backbone.enable_lora(rank=4)
qwen3_lm.fit(x=features, batch_size=2)

Call fit() without preprocessing.

x = {
    # Token ids for "<bos> Qwen3 is a language model<eos>"
    "token_ids": np.array([[2, 12345, 678, 543, 9876, 1, 0, 0]] * 2),
    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 0, 0]] * 2),
}
y = np.array([[12345, 678, 543, 9876, 1, 0, 0, 0]] * 2)
sw = np.array([[1, 1, 1, 1, 1, 0, 0, 0]] * 2)

qwen3_lm = keras_hub.models.Qwen3MoeCausalLM.from_preset(
    "qwen3_0.6b_en",
    preprocessor=None,
)
qwen3_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2)

Custom backbone and vocabulary.

tokenizer = keras_hub.models.Qwen3MoeTokenizer(
    proto="qwen3_moe_vocab.spm",
)
preprocessor = keras_hub.models.Qwen3MoeCausalLMPreprocessor(
    tokenizer=tokenizer,
    sequence_length=128,
)
backbone = keras_hub.models.Qwen3MoeBackbone(
    vocabulary_size=151936,
    num_layers=28,
    num_query_heads=16,
    num_key_value_heads=8,
    hidden_dim=2048,
    intermediate_dim=4096,
    moe_intermediate_dim=128,
    shared_expert_intermediate_dim=4096,
    num_experts=60,
    top_k=4,
    max_sequence_length=4096,
)
qwen3_lm = keras_hub.models.Qwen3MoeCausalLM(
    backbone=backbone,
    preprocessor=preprocessor,
)
qwen3_lm.fit(x=features, batch_size=2)

[source]

`from_preset` method

Qwen3CausalLM.from_preset(preset, load_weights=True, **kwargs)

Instantiate a keras_hub.models.Task from a model preset.

A preset is a directory of configs, weights and other file assets used to save and load a pre-trained model. The preset can be passed as one of:

a built-in preset identifier like 'bert_base_en'
a Kaggle Models handle like 'kaggle://user/bert/keras/bert_base_en'
a Hugging Face handle like 'hf://user/bert_base_en'
a path to a local preset directory like './bert_base_en'

For any Task subclass, you can run cls.presets.keys() to list all built-in presets available on the class.

This constructor can be called in one of two ways. Either from a task specific base class like keras_hub.models.CausalLM.from_preset(), or from a model class like keras_hub.models.BertTextClassifier.from_preset(). If calling from the a base class, the subclass of the returning object will be inferred from the config in the preset directory.

Arguments

preset: string. A built-in preset identifier, a Kaggle Models handle, a Hugging Face handle, or a path to a local directory.
load_weights: bool. If True, saved weights will be loaded into the model architecture. If False, all weights will be randomly initialized.

Examples

# Load a Gemma generative task.
causal_lm = keras_hub.models.CausalLM.from_preset(
    "gemma_2b_en",
)

# Load a Bert classification task.
model = keras_hub.models.TextClassifier.from_preset(
    "bert_base_en",
    num_classes=2,
)

Preset	Parameters	Description
qwen3_embedding_0.6b_en	595.78M	This text embedding model features a 32k context length and offers flexible, user-defined embedding dimensions that can range from 32 to 1024.
qwen3_0.6b_en	596.05M	28-layer Qwen3 model with 596M parameters, optimized for efficiency and fast inference on resource-constrained devices.
qwen3_1.7b_en	1.72B	28-layer Qwen3 model with 1.72B parameters, offering a good balance between performance and resource usage.
qwen3_embedding_4b_en	4.02B	This text embedding model features a 32k context length and offers flexible, user-defined embedding dimensions that can range from 32 to 2560.
qwen3_4b_en	4.02B	36-layer Qwen3 model with 4.02B parameters, offering improved reasoning capabilities and better performance than smaller variants.
qwen3_embedding_8b_en	8.19B	This text embedding model features a 32k context length and offers flexible, user-defined embedding dimensions that can range from 32 to 4096.
qwen3_8b_en	8.19B	36-layer Qwen3 model with 8.19B parameters, featuring enhanced reasoning, coding, and instruction-following capabilities.
qwen3_14b_en	14.77B	40-layer Qwen3 model with 14.77B parameters, featuring advanced reasoning, coding, and multilingual capabilities.
qwen3_32b_en	32.76B	64-layer Qwen3 model with 32.76B parameters, featuring state-of-the-art performance across reasoning, coding, and general language tasks.

[source]

`generate` method

Qwen3CausalLM.generate(
    inputs, max_length=None, stop_token_ids="auto", strip_prompt=False
)

Generate text given prompt inputs.

This method generates text based on given inputs. The sampling method used for generation can be set via the compile() method.

If inputs are a tf.data.Dataset, outputs will be generated "batch-by-batch" and concatenated. Otherwise, all inputs will be handled as a single batch.

If a preprocessor is attached to the model, inputs will be preprocessed inside the generate() function and should match the structure expected by the preprocessor layer (usually raw strings). If a preprocessor is not attached, inputs should match the structure expected by the backbone. See the example usage above for a demonstration of each.

Arguments

inputs: python data, tensor data, or a tf.data.Dataset. If a preprocessor is attached to the model, inputs should match the structure expected by the preprocessor layer. If a preprocessor is not attached, inputs should match the structure expected the backbone model.
max_length: Optional. int. The max length of the generated sequence. Will default to the max configured sequence_length of the preprocessor. If preprocessor is None, inputs should be should be padded to the desired maximum length and this argument will be ignored.
stop_token_ids: Optional. None, "auto", or tuple of token ids. Defaults to "auto" which uses the preprocessor.tokenizer.end_token_id. Not specifying a processor will produce an error. None stops generation after generating max_length tokens. You may also specify a list of token id's the model should stop on. Note that sequences of tokens will each be interpreted as a stop token, multi-token stop sequences are not supported.
strip_prompt: Optional. By default, generate() returns the full prompt followed by its completion generated by the model. If this option is set to True, only the newly generated text is returned.

`backbone` property

keras_hub.models.Qwen3CausalLM.backbone

A keras_hub.models.Backbone model with the core architecture.

`preprocessor` property

keras_hub.models.Qwen3CausalLM.preprocessor

A keras_hub.models.Preprocessor layer used to preprocess input.

Qwen3CausalLM model

Qwen3CausalLM class

from_preset method

generate method

backbone property

preprocessor property

Qwen3CausalLM model

Qwen3CausalLM class

from_preset method

generate method

backbone property

preprocessor property

`Qwen3CausalLM` class

`from_preset` method

`generate` method

`backbone` property

`preprocessor` property