► KerasHub: Pretrained Models / API documentation / Model Architectures / Moonshine / MoonshineAudioToText model

MoonshineAudioToText model

`MoonshineAudioToText` class

keras_hub.models.MoonshineAudioToText(backbone, preprocessor=None, **kwargs)

An end-to-end Moonshine model for audio-to-text tasks.

A Seq2Seq LM designed for audio-to-text tasks, such as speech recognition. The encoder processes audio features, and the decoder generates text transcriptions. You can finetune MoonshineAudioToText for any audio-to-text task (e.g., live transcription or voice commands).

This model includes a generate() method for text generation based on audio inputs and an optional text prompt for the decoder. The generation strategy is controlled by a sampler argument passed to compile(). By default, "top_k" sampling is used.

Arguments

backbone: A keras_hub.models.MoonshineBackbone instance.
preprocessor: A keras_hub.models.MoonshineAudioToTextPreprocessor or None. If None, inputs must be preprocessed before calling the model.

Examples

# Initialize model from preset.
moonshine_lm = keras_hub.models.MoonshineAudioToText.from_preset(
    "moonshine_base"
)

# Generate with single audio input.
audio_tensor = keras.random.normal((1, 16000, 1))
moonshine_lm.generate({"audio": audio_tensor})

# Generate with text prompt.
moonshine_lm.generate({"audio": audio_tensor, "text": "quick"})

# Use different sampling strategy.
moonshine_lm.compile(sampler="greedy")
moonshine_lm.generate({"audio": audio_tensor})

[source]

`from_preset` method

MoonshineAudioToText.from_preset(preset, load_weights=True, **kwargs)

Instantiate a keras_hub.models.Task from a model preset.

A preset is a directory of configs, weights and other file assets used to save and load a pre-trained model. The preset can be passed as one of:

a built-in preset identifier like 'bert_base_en'
a Kaggle Models handle like 'kaggle://user/bert/keras/bert_base_en'
a Hugging Face handle like 'hf://user/bert_base_en'
a path to a local preset directory like './bert_base_en'

For any Task subclass, you can run cls.presets.keys() to list all built-in presets available on the class.

This constructor can be called in one of two ways. Either from a task specific base class like keras_hub.models.CausalLM.from_preset(), or from a model class like keras_hub.models.BertTextClassifier.from_preset(). If calling from the a base class, the subclass of the returning object will be inferred from the config in the preset directory.

Arguments

preset: string. A built-in preset identifier, a Kaggle Models handle, a Hugging Face handle, or a path to a local directory.
load_weights: bool. If True, saved weights will be loaded into the model architecture. If False, all weights will be randomly initialized.

Examples

# Load a Gemma generative task.
causal_lm = keras_hub.models.CausalLM.from_preset(
    "gemma_2b_en",
)

# Load a Bert classification task.
model = keras_hub.models.TextClassifier.from_preset(
    "bert_base_en",
    num_classes=2,
)

Preset	Parameters	Description
moonshine_tiny_en	27.09M	Moonshine tiny model for English speech recognition. Developed by Useful Sensors for real-time transcription.
moonshine_base_en	61.51M	Moonshine base model for English speech recognition. Developed by Useful Sensors for real-time transcription.

[source]

`generate` method

MoonshineAudioToText.generate(
    inputs, max_length=None, stop_token_ids="auto", strip_prompt=False
)

Generate text given prompt inputs.

This method generates text based on given inputs. The sampling method used for generation can be set via the compile() method.

If inputs are a tf.data.Dataset, outputs will be generated "batch-by-batch" and concatenated. Otherwise, all inputs will be handled as a single batch.

If a preprocessor is attached to the model, inputs will be preprocessed inside the generate() function and should match the structure expected by the preprocessor layer (usually raw strings). If a preprocessor is not attached, inputs should match the structure expected by the backbone. See the example usage above for a demonstration of each.

Arguments

inputs: python data, tensor data, or a tf.data.Dataset. If a preprocessor is attached to the model, inputs should match the structure expected by the preprocessor layer. If a preprocessor is not attached, inputs should match the structure expected the backbone model.
max_length: Optional. int. The max length of the generated sequence. Will default to the max configured sequence_length of the preprocessor. If preprocessor is None, inputs should be should be padded to the desired maximum length and this argument will be ignored.
stop_token_ids: Optional. None, "auto", or tuple of token ids. Defaults to "auto" which uses the preprocessor.tokenizer.end_token_id. Not specifying a processor will produce an error. None stops generation after generating max_length tokens. You may also specify a list of token id's the model should stop on. Note that sequences of tokens will each be interpreted as a stop token, multi-token stop sequences are not supported.
strip_prompt: Optional. By default, generate() returns the full prompt followed by its completion generated by the model. If this option is set to True, only the newly generated text is returned.

`backbone` property

keras_hub.models.MoonshineAudioToText.backbone

A keras_hub.models.Backbone model with the core architecture.

`preprocessor` property

keras_hub.models.MoonshineAudioToText.preprocessor

A keras_hub.models.Preprocessor layer used to preprocess input.

MoonshineAudioToText model

MoonshineAudioToText class

from_preset method

generate method

backbone property

preprocessor property

MoonshineAudioToText model

MoonshineAudioToText class

from_preset method

generate method

backbone property

preprocessor property

`MoonshineAudioToText` class

`from_preset` method

`generate` method

`backbone` property

`preprocessor` property