MoonshineAudioToTextPreprocessor classkeras_hub.models.MoonshineAudioToTextPreprocessor(
audio_converter, tokenizer, decoder_sequence_length=1024, **kwargs
)
Moonshine Seq2Seq LM preprocessor for audio-to-text tasks.
This preprocessor converts raw audio and text inputs into a format suitable
for the MoonshineAudioToText model. It processes audio waveforms using
MoonshineAudioConverter for basic preprocessing (padding, normalization)
and tokenizes text using MoonshineTokenizer for the decoder. It supports
training and generation.
Arguments
MoonshineAudioConverter instance to process audio.MoonshineTokenizer instance to tokenize text.Examples
import keras
from keras_hub.layers import MoonshineAudioConverter
from keras_hub.models import MoonshineTokenizer
# Create audio converter and tokenizer instances.
audio_converter = MoonshineAudioConverter()
tokenizer = MoonshineTokenizer.from_preset("moonshine_base")
# Initialize the preprocessor.
preprocessor = keras_hub.models.MoonshineAudioToTextPreprocessor(
audio_converter=audio_converter,
tokenizer=tokenizer,
decoder_sequence_length=8
)
# Prepare input data (audio tensor and text).
inputs = {
"audio": keras.random.normal((1, 16000)),
"text": ["the quick brown fox"]
}
# Process the inputs for training.
x, y, sample_weight = preprocessor(inputs)
# Check output keys and shapes (shapes depend on padding/truncation).
print(x.keys())
# dict_keys(['encoder_input_values', 'encoder_padding_mask',
# 'decoder_token_ids', 'decoder_padding_mask']).
print(x["encoder_input_values"].shape) # e.g., (1, 16000, 1) / padded length
print(x["encoder_padding_mask"].shape) # e.g., (1, 16000) or padded length
print(x["decoder_token_ids"].shape) # (1, 8)
print(x["decoder_padding_mask"].shape) # (1, 8)
print(y.shape) # (1, 8) - Labels
print(sample_weight.shape) # (1, 8) - Sample weights
# Process inputs for generation.
gen_inputs = preprocessor.generate_preprocess(inputs)
print(gen_inputs.keys())
# dict_keys(['encoder_input_values', 'encoder_padding_mask',
# 'decoder_token_ids', 'decoder_padding_mask']).
from_preset methodMoonshineAudioToTextPreprocessor.from_preset(
preset, config_file="preprocessor.json", **kwargs
)
Instantiate a keras_hub.models.Preprocessor from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''./bert_base_en'For any Preprocessor subclass, you can run cls.presets.keys() to
list all built-in presets available on the class.
As there are usually multiple preprocessing classes for a given model,
this method should be called on a specific subclass like
keras_hub.models.BertTextClassifierPreprocessor.from_preset().
Arguments
Examples
# Load a preprocessor for Gemma generation.
preprocessor = keras_hub.models.CausalLMPreprocessor.from_preset(
"gemma_2b_en",
)
# Load a preprocessor for Bert classification.
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset(
"bert_base_en",
)
| Preset | Parameters | Description |
|---|---|---|
| moonshine_tiny_en | 27.09M | Moonshine tiny model for English speech recognition. Developed by Useful Sensors for real-time transcription. |
| moonshine_base_en | 61.51M | Moonshine base model for English speech recognition. Developed by Useful Sensors for real-time transcription. |
tokenizer propertykeras_hub.models.MoonshineAudioToTextPreprocessor.tokenizer
The tokenizer used to tokenize strings.