Qwen3_5CausalLM classkeras_hub.models.Qwen3_5CausalLM(backbone, preprocessor=None, **kwargs)
An end-to-end Qwen3.5 model for causal language modeling.
This model predicts the next token based on previous tokens using the
Qwen3.5 hybrid architecture (full attention + GatedDeltaNet linear
attention layers). It optionally supports multimodal (image + text)
inputs when the backbone has a vision_encoder attached.
This model has a generate() method for autoregressive text
generation.
Arguments
[keras_hub.models.Qwen3_5Backbone](/keras_hub/api/models/qwen3_5/qwen3_5_backbone#qwen35backbone-class) instance.[keras_hub.models.Qwen3_5CausalLMPreprocessor](/keras_hub/api/models/qwen3_5/qwen3_5_causal_lm_preprocessor#qwen35causallmpreprocessor-class)
or None.from_preset methodQwen3_5CausalLM.from_preset(preset, load_weights=True, **kwargs)
Instantiate a keras_hub.models.Task from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''./bert_base_en'For any Task subclass, you can run cls.presets.keys() to list all
built-in presets available on the class.
This constructor can be called in one of two ways. Either from a task
specific base class like keras_hub.models.CausalLM.from_preset(), or
from a model class like
keras_hub.models.BertTextClassifier.from_preset().
If calling from the a base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True, saved weights will be loaded into
the model architecture. If False, all weights will be
randomly initialized.Examples
# Load a Gemma generative task.
causal_lm = keras_hub.models.CausalLM.from_preset(
"gemma_2b_en",
)
# Load a Bert classification task.
model = keras_hub.models.TextClassifier.from_preset(
"bert_base_en",
num_classes=2,
)
| Preset | Parameters | Description |
|---|---|---|
| qwen3_5_0.8b_base | 852.99M | Ultra-lightweight foundation model. Ideal for edge devices and efficient, task-specific fine-tuning. Supports Text, Multimodal, video processing tasks. |
| qwen3_5_0.8b | 852.99M | Instruction-tuned ultra-lightweight model. Best for simple chat and basic NLP tasks on resource-constrained devices. Supports Text, Multimodal, video processing tasks. |
| qwen3_5_2b_base | 2.21B | Lightweight foundation model. Balances speed and capability; great for mobile deployment and domain-specific fine-tuning. Supports Text, Multimodal, video processing tasks. |
| qwen3_5_2b | 2.21B | Instruction-tuned lightweight model. Optimized for fast chat applications and general assistance on consumer hardware. Supports Text, Multimodal, video processing tasks. |
| qwen3_5_4b_base | 4.54B | Mid-small foundation model. Offers improved reasoning and context understanding for custom fine-tuning tasks. |
| qwen3_5_4b | 4.54B | Instruction-tuned mid-small model. A capable assistant for general text generation and conversational tasks on standard GPUs. Supports Multimodal, video processing tasks. |
| qwen3_5_9b_base | 9.41B | Mid-sized foundation model. Delivers strong reasoning, coding, and math baseline capabilities for advanced fine-tuning. Supports Multimodal, video processing tasks. |
| qwen3_5_9b | 9.41B | Instruction-tuned mid-sized model. Highly capable chatbot offering strong logic, coding assistance, and multi-lingual support. Supports Multimodal, video processing tasks. |
| qwen3_5_27b | 27.36B | Instruction-tuned large model. Delivers high-tier performance for complex reasoning, coding, and extensive contextual tasks. Supports Multimodal, video processing tasks. |
generate methodQwen3_5CausalLM.generate(
inputs, max_length=None, stop_token_ids="auto", strip_prompt=False
)
Generate text given prompt inputs.
This method generates text based on given inputs. The sampling method
used for generation can be set via the compile() method.
If inputs are a tf.data.Dataset, outputs will be generated
"batch-by-batch" and concatenated. Otherwise, all inputs will be handled
as a single batch.
If a preprocessor is attached to the model, inputs will be
preprocessed inside the generate() function and should match the
structure expected by the preprocessor layer (usually raw strings).
If a preprocessor is not attached, inputs should match the structure
expected by the backbone. See the example usage above for a
demonstration of each.
Arguments
tf.data.Dataset. If a
preprocessor is attached to the model, inputs should match
the structure expected by the preprocessor layer. If a
preprocessor is not attached, inputs should match the
structure expected the backbone model.sequence_length of the
preprocessor. If preprocessor is None, inputs should be
should be padded to the desired maximum length and this argument
will be ignored.None, "auto", or tuple of token ids.
Defaults to "auto" which uses the
preprocessor.tokenizer.end_token_id. Not specifying a
processor will produce an error. None stops generation after
generating max_length tokens. You may also specify a list of
token id's the model should stop on. Note that sequences of
tokens will each be interpreted as a stop token, multi-token
stop sequences are not supported.backbone propertykeras_hub.models.Qwen3_5CausalLM.backbone
A keras_hub.models.Backbone model with the core architecture.
preprocessor propertykeras_hub.models.Qwen3_5CausalLM.preprocessor
A keras_hub.models.Preprocessor layer used to preprocess input.