AudioConverter
classkeras_hub.layers.AudioConverter(**kwargs)
Convert raw audio for models that support audio input.
This class converts from raw audio tensors of any length, to preprocessed
audio for pretrained model inputs. It is meant to be a convenient way to
write custom preprocessing code that is not model specific. This layer
should be instantiated via the from_preset()
constructor, which will
create the correct subclass of this layer for the model preset.
The layer will take as input a raw audio tensor with shape (batch_size,
num_samples)
, and output a preprocessed audio input for modeling. The exact
structure of the preprocessed input will vary per model. Preprocessing
will often include computing a spectogram of the raw audio signal.
Examples
# Load an audio converter from a preset.
converter = keras_hub.layers.AudioConverter.from_preset("whisper_base_en")
# Convert some raw audio input.
converter(np.ones(2, 1_000))
from_preset
methodAudioConverter.from_preset(preset, **kwargs)
Instantiate a keras_hub.layers.AudioConverter
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as
one of:
'whisper_base_en'
'kaggle://user/whisper/keras/whisper_base_en'
'hf://user/whisper_base_en'
'./whisper_base_en'
You can run cls.presets.keys()
to list all built-in presets available
on the class.
This constructor can be called in one of two ways. Either from the base
class like keras_hub.models.AudioConverter.from_preset()
, or from a
model class like keras_hub.models.WhisperAudioConverter.from_preset()
.
If calling from the base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True
, the weights will be loaded into the
model architecture. If False
, the weights will be randomly
initialized.Examples
# Load an audio converter from a preset.
converter = keras_hub.layers.AudioConverter.from_preset(
"whisper_base_en"
)
# Convert some raw mono channel audio input.
converter(np.ones(2, 1_000))
Preset | Parameters | Description |
---|---|---|
whisper_tiny_en | 37.18M | 4-layer Whisper model. Trained on 438,000 hours of labelled English speech data. |
whisper_tiny_multi | 37.76M | 4-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. |
whisper_base_multi | 72.59M | 6-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. |
whisper_base_en | 124.44M | 6-layer Whisper model. Trained on 438,000 hours of labelled English speech data. |
whisper_small_en | 241.73M | 12-layer Whisper model. Trained on 438,000 hours of labelled English speech data. |
whisper_small_multi | 241.73M | 12-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. |
whisper_medium_en | 763.86M | 24-layer Whisper model. Trained on 438,000 hours of labelled English speech data. |
whisper_medium_multi | 763.86M | 24-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. |
whisper_large_multi | 1.54B | 32-layer Whisper model. Trained on 680,000 hours of labelled multilingual speech data. |
whisper_large_multi_v2 | 1.54B | 32-layer Whisper model. Trained for 2.5 epochs on 680,000 hours of labelled multilingual speech data. An improved of whisper_large_multi . |