Gemma4AudioConverter classkeras_hub.layers.Gemma4AudioConverter(
num_mels=128,
num_fft_bins=400,
stride=160,
sampling_rate=16000,
max_audio_length=30,
min_frequency=0.0,
max_frequency=8000.0,
mel_floor=1e-05,
per_bin_mean=None,
per_bin_stddev=None,
**kwargs
)
Gemma4 audio feature extraction layer.
Converts raw audio waveforms into log-mel spectrogram features for the Gemma4 USM audio encoder. The processing pipeline is:
max_audio_length * sampling_rate samples.center=True to produce a power spectrogram.Arguments
128.400.160.16000.300.0.0.8000.0.1e-5.None disables mean subtraction.
Defaults to None.None disables
scaling. Defaults to None.[keras_hub.layers.AudioConverter](/keras_hub/api/preprocessing_layers/audio_converter#audioconverter-class).Call arguments
(num_samples,) or
(batch_size, num_samples). Raw mono-channel audio waveform(s)
at sampling_rate Hz.Returns
Log-mel spectrogram of shape (num_frames, num_mels) for a 1-D
input, or (batch_size, num_frames, num_mels) for a 2-D input,
where num_frames = num_samples // stride.
Examples
import numpy as np
import keras_hub
# Single waveform (1 second at 16 kHz).
waveform = np.random.randn(16000).astype("float32")
converter = keras_hub.layers.Gemma4AudioConverter()
features = converter(waveform)
print(features.shape) # (100, 128)
# Batched waveforms.
batch = np.random.randn(4, 16000).astype("float32")
features = converter(batch)
print(features.shape) # (4, 100, 128)
from_preset methodGemma4AudioConverter.from_preset(preset, **kwargs)
Instantiate a keras_hub.layers.AudioConverter from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as
one of:
'whisper_base_en''kaggle://user/whisper/keras/whisper_base_en''hf://user/whisper_base_en''./whisper_base_en'You can run cls.presets.keys() to list all built-in presets available
on the class.
This constructor can be called in one of two ways. Either from the base
class like keras_hub.models.AudioConverter.from_preset(), or from a
model class like keras_hub.models.WhisperAudioConverter.from_preset().
If calling from the base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True, the weights will be loaded into the
model architecture. If False, the weights will be randomly
initialized.Examples
# Load an audio converter from a preset.
converter = keras_hub.layers.AudioConverter.from_preset(
"whisper_base_en"
)
# Convert some raw mono channel audio input.
converter(np.ones(2, 1_000))
| Preset | Parameters | Description |
|---|---|---|
| gemma4_2b | 5.10B | Gemma 4 E2B base model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_instruct_2b | 5.10B | Gemma 4 E2B instruction-tuned model: 2.3B effective parameters (5.1B total with Per-Layer Embeddings), 35-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_4b | 7.90B | Gemma 4 E4B base model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text pretrained Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_instruct_4b | 7.90B | Gemma 4 E4B instruction-tuned model: 4.5B effective parameters (7.9B total with Per-Layer Embeddings), 42-layer, audio+vision+text instruction-tuned Gemma4 model. The 'E' denotes effective parameters — PLE gives each decoder layer its own token embedding table, maximizing parameter efficiency for on-device deployment. |
| gemma4_26b_a4b | 26.00B | Gemma 4 26B A4B base model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text pretrained Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model. |
| gemma4_instruct_26b_a4b | 26.00B | Gemma 4 26B A4B instruction-tuned model: Mixture-of-Experts (MoE) model with 26B total parameters and only 4B active parameters per forward pass, 30-layer, vision+text instruction-tuned Gemma4 model. The 'A' denotes active parameters — by activating only a 4B subset during inference, this MoE model runs nearly as fast as a dense 4B model. |
| gemma4_31b | 31.00B | Gemma 4 31B base model: 31B parameter, 60-layer, dense vision+text pretrained Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint. |
| gemma4_instruct_31b | 31.00B | Gemma 4 31B instruction-tuned model: 31B parameter, 60-layer, dense vision+text instruction-tuned Gemma4 model. The dense model in the Gemma 4 family, offering maximum quality for deployments where inference speed is less of a constraint. |