MoonshineAudioToText
classkeras_hub.models.MoonshineAudioToText(backbone, preprocessor=None, **kwargs)
An end-to-end Moonshine model for audio-to-text tasks.
A Seq2Seq LM designed for audio-to-text tasks, such as speech recognition.
The encoder processes audio features, and the decoder generates text
transcriptions. You can finetune MoonshineAudioToText
for any
audio-to-text task (e.g., live transcription or voice commands).
This model includes a generate()
method for text generation based on audio
inputs and an optional text prompt for the decoder. The generation strategy
is controlled by a sampler
argument passed to compile()
. By default,
"top_k"
sampling is used.
Arguments
keras_hub.models.MoonshineBackbone
instance.keras_hub.models.MoonshineAudioToTextPreprocessor
or
None
. If None
, inputs must be preprocessed before calling the
model.Examples
# Initialize model from preset.
moonshine_lm = keras_hub.models.MoonshineAudioToText.from_preset(
"moonshine_base"
)
# Generate with single audio input.
audio_tensor = keras.random.normal((1, 16000, 1))
moonshine_lm.generate({"audio": audio_tensor})
# Generate with text prompt.
moonshine_lm.generate({"audio": audio_tensor, "text": "quick"})
# Use different sampling strategy.
moonshine_lm.compile(sampler="greedy")
moonshine_lm.generate({"audio": audio_tensor})
from_preset
methodMoonshineAudioToText.from_preset(preset, load_weights=True, **kwargs)
Instantiate a keras_hub.models.Task
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as
one of:
'bert_base_en'
'kaggle://user/bert/keras/bert_base_en'
'hf://user/bert_base_en'
'./bert_base_en'
For any Task
subclass, you can run cls.presets.keys()
to list all
built-in presets available on the class.
This constructor can be called in one of two ways. Either from a task
specific base class like keras_hub.models.CausalLM.from_preset()
, or
from a model class like
keras_hub.models.BertTextClassifier.from_preset()
.
If calling from the a base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True
, saved weights will be loaded into
the model architecture. If False
, all weights will be
randomly initialized.Examples
# Load a Gemma generative task.
causal_lm = keras_hub.models.CausalLM.from_preset(
"gemma_2b_en",
)
# Load a Bert classification task.
model = keras_hub.models.TextClassifier.from_preset(
"bert_base_en",
num_classes=2,
)
Preset | Parameters | Description |
---|---|---|
moonshine_tiny_en | 27.09M | Moonshine tiny model for English speech recognition. Developed by Useful Sensors for real-time transcription. |
moonshine_base_en | 61.51M | Moonshine base model for English speech recognition. Developed by Useful Sensors for real-time transcription. |
generate
methodMoonshineAudioToText.generate(
inputs, max_length=None, stop_token_ids="auto", strip_prompt=False
)
Generate text given prompt inputs
.
This method generates text based on given inputs
. The sampling method
used for generation can be set via the compile()
method.
If inputs
are a tf.data.Dataset
, outputs will be generated
"batch-by-batch" and concatenated. Otherwise, all inputs will be handled
as a single batch.
If a preprocessor
is attached to the model, inputs
will be
preprocessed inside the generate()
function and should match the
structure expected by the preprocessor
layer (usually raw strings).
If a preprocessor
is not attached, inputs should match the structure
expected by the backbone
. See the example usage above for a
demonstration of each.
Arguments
tf.data.Dataset
. If a
preprocessor
is attached to the model, inputs
should match
the structure expected by the preprocessor
layer. If a
preprocessor
is not attached, inputs
should match the
structure expected the backbone
model.sequence_length
of the
preprocessor
. If preprocessor
is None
, inputs
should be
should be padded to the desired maximum length and this argument
will be ignored.None
, "auto", or tuple of token ids.
Defaults to "auto" which uses the
preprocessor.tokenizer.end_token_id
. Not specifying a
processor will produce an error. None stops generation after
generating max_length
tokens. You may also specify a list of
token id's the model should stop on. Note that sequences of
tokens will each be interpreted as a stop token, multi-token
stop sequences are not supported.backbone
propertykeras_hub.models.MoonshineAudioToText.backbone
A keras_hub.models.Backbone
model with the core architecture.
preprocessor
propertykeras_hub.models.MoonshineAudioToText.preprocessor
A keras_hub.models.Preprocessor
layer used to preprocess input.