MoonshineAudioConverter
classkeras_hub.layers.MoonshineAudioConverter(
sampling_rate=16000, padding_value=0.0, do_normalize=False, **kwargs
)
Moonshine audio preprocessing layer.
This layer processes raw audio waveforms for the Moonshine ASR model. Audio is formatted as a batched tensor at a 16kHz sample rate and validated for length (0.1 to 64 seconds). The layer handles padding and optional normalization. It does not contain trainable weights.
Arguments
Call arguments
inputs
: The raw audio data to be processed. It should be a tensor of
shape (batch_size, time_steps, 1)
for mono audio. If the input has
shape (batch_size, time_steps)
, the layer will add the channel
dimension.sampling_rate
: The sampling rate of the audio in Hz. If
provided, it must match the expected sampling rate set during
initialization (default is 16,000 Hz). If not provided, the expected
sampling rate is taken from the initialization arguments.padding
: The padding strategy to apply. If provided, can be one of:"longest"
: If pad_to_multiple_of
is set, pads the audio to
make the time_steps dimension a multiple of pad_to_multiple_of
."max_length"
: Pads or truncates the audio to max_length
time
steps. If pad_to_multiple_of
is set, the target length will be
the smallest multiple of pad_to_multiple_of
that is greater than
or equal to max_length
.None
, no padding is applied.max_length
: The target number of time steps when padding
is
"max_length"
. If not provided and padding
is "max_length"
, no
padding or truncation is applied.pad_to_multiple_of
: If set, the padded time_steps will be a
multiple of this value for the chosen padding strategy.Examples
import keras
from keras_hub.layers import MoonshineAudioConverter
# Create a dummy audio input (1 second at 16kHz).
dummy_audio = keras.ops.convert_to_tensor(
[[0.1] * 16000],
dtype="float32"
)
dummy_audio = keras.ops.expand_dims(dummy_audio, axis=-1)
# Initialize the preprocessor.
preprocessor = MoonshineAudioConverter(do_normalize=True)
# Process the audio.
processed_audio = preprocessor(dummy_audio)
# Output shape.
print(processed_audio.shape) # Expected: (1, 16000, 1) or padded length
from_preset
methodMoonshineAudioConverter.from_preset(preset, **kwargs)
Instantiate a keras_hub.layers.AudioConverter
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as
one of:
'whisper_base_en'
'kaggle://user/whisper/keras/whisper_base_en'
'hf://user/whisper_base_en'
'./whisper_base_en'
You can run cls.presets.keys()
to list all built-in presets available
on the class.
This constructor can be called in one of two ways. Either from the base
class like keras_hub.models.AudioConverter.from_preset()
, or from a
model class like keras_hub.models.WhisperAudioConverter.from_preset()
.
If calling from the base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True
, the weights will be loaded into the
model architecture. If False
, the weights will be randomly
initialized.Examples
# Load an audio converter from a preset.
converter = keras_hub.layers.AudioConverter.from_preset(
"whisper_base_en"
)
# Convert some raw mono channel audio input.
converter(np.ones(2, 1_000))
Preset | Parameters | Description |
---|---|---|
moonshine_tiny_en | 27.09M | Moonshine tiny model for English speech recognition. Developed by Useful Sensors for real-time transcription. |
moonshine_base_en | 61.51M | Moonshine base model for English speech recognition. Developed by Useful Sensors for real-time transcription. |