Qwen3_5Backbone classkeras_hub.models.Qwen3_5Backbone(
vocabulary_size,
num_layers,
num_query_heads,
num_key_value_heads,
head_dim,
hidden_dim,
intermediate_dim,
layer_types=None,
partial_rotary_factor=0.25,
rope_max_wavelength=10000,
rope_scaling_factor=1.0,
layer_norm_epsilon=1e-06,
dropout=0.0,
tie_word_embeddings=False,
sliding_window_size=32768,
linear_num_key_heads=16,
linear_num_value_heads=32,
linear_key_head_dim=128,
linear_value_head_dim=128,
linear_conv_kernel_dim=4,
vision_encoder=None,
mrope_section=None,
dtype=None,
**kwargs
)
The Qwen3.5 Transformer core architecture with hyperparameters.
This network implements a hybrid Transformer-based decoder with two
layer types:
- full_attention: Standard grouped-query attention with partial
rotary embeddings and sigmoid output gating.
- linear_attention: GatedDeltaNet recurrent linear attention with
causal conv1d and delta rule recurrence.
The backbone optionally accepts a vision_encoder to enable
multimodal (image + text) inputs. When present, visual token embeddings
are interleaved into the text embedding sequence before the transformer
layers. M-RoPE (multi-dimensional RoPE) position encoding is used for
the full-attention layers when mrope_section is provided.
Arguments
"full_attention" or
"linear_attention".0.25.10000.1.0.1e-6.0.0.False.32768.16.32.128.128.4.Qwen3_5VisionEncoder or None. When supplied,
the backbone accepts pixel_values, image_grid_thw, and
vision_indices in addition to text inputs.[s_t, s_h, s_w] — number of
pairs of rotary dimensions assigned to temporal, height,
and width axes. Required for M-RoPE in multimodal mode.
e.g. [11, 11, 10] for the 27B model. Defaults to None
(plain 1D RoPE).keras.mixed_precision.DTypePolicy. The
dtype to use for model computations and weights.from_preset methodQwen3_5Backbone.from_preset(preset, load_weights=True, **kwargs)
Instantiate a keras_hub.models.Backbone from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as a
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''modelscope://user/bert_base_en''./bert_base_en'This constructor can be called in one of two ways. Either from the base
class like keras_hub.models.Backbone.from_preset(), or from
a model class like keras_hub.models.GemmaBackbone.from_preset().
If calling from the base class, the subclass of the returning object
will be inferred from the config in the preset directory.
For any Backbone subclass, you can run cls.presets.keys() to list
all built-in presets available on the class.
Arguments
True, the weights will be loaded into the
model architecture. If False, the weights will be randomly
initialized.Examples
# Load a Gemma backbone with pre-trained weights.
model = keras_hub.models.Backbone.from_preset(
"gemma_2b_en",
)
# Load a Bert backbone with a pre-trained config and random weights.
model = keras_hub.models.Backbone.from_preset(
"bert_base_en",
load_weights=False,
)
| Preset | Parameters | Description |
|---|---|---|
| qwen3_5_0.8b_base | 852.99M | Ultra-lightweight foundation model. Ideal for edge devices and efficient, task-specific fine-tuning. Supports Text, Multimodal, video processing tasks. |
| qwen3_5_0.8b | 852.99M | Instruction-tuned ultra-lightweight model. Best for simple chat and basic NLP tasks on resource-constrained devices. Supports Text, Multimodal, video processing tasks. |
| qwen3_5_2b_base | 2.21B | Lightweight foundation model. Balances speed and capability; great for mobile deployment and domain-specific fine-tuning. Supports Text, Multimodal, video processing tasks. |
| qwen3_5_2b | 2.21B | Instruction-tuned lightweight model. Optimized for fast chat applications and general assistance on consumer hardware. Supports Text, Multimodal, video processing tasks. |
| qwen3_5_4b_base | 4.54B | Mid-small foundation model. Offers improved reasoning and context understanding for custom fine-tuning tasks. |
| qwen3_5_4b | 4.54B | Instruction-tuned mid-small model. A capable assistant for general text generation and conversational tasks on standard GPUs. Supports Multimodal, video processing tasks. |
| qwen3_5_9b_base | 9.41B | Mid-sized foundation model. Delivers strong reasoning, coding, and math baseline capabilities for advanced fine-tuning. Supports Multimodal, video processing tasks. |
| qwen3_5_9b | 9.41B | Instruction-tuned mid-sized model. Highly capable chatbot offering strong logic, coding assistance, and multi-lingual support. Supports Multimodal, video processing tasks. |
| qwen3_5_27b | 27.36B | Instruction-tuned large model. Delivers high-tier performance for complex reasoning, coding, and extensive contextual tasks. Supports Multimodal, video processing tasks. |