CLIPPreprocessor
classkeras_hub.models.CLIPPreprocessor(
tokenizer,
sequence_length=77,
add_start_token=True,
add_end_token=True,
to_lower=True,
**kwargs
)
CLIP preprocessing layer which tokenizes and packs inputs.
This preprocessing layer will do 2 things:
tokenizer
."token_ids"
, "padding_mask"
.This layer can be used directly with tf.data.Dataset.map
to preprocess
string data in the (x, y, sample_weight)
format used by
keras.Model.fit
.
The call method of this layer accepts three arguments, x
, y
, and
sample_weight
. x
can be a python string or tensor representing a single
segment, a list of python strings representing a batch of single segments,
or a list of tensors representing multiple segments to be packed together.
y
and sample_weight
are both optional, can have any format, and will be
passed through unaltered.
CLIPPreprocessor
forces the input to have only one segment, as CLIP is
mainly used for generation tasks. For tasks having multi-segment inputs
like "glue/mnli", please use a model designed for classification purposes
such as BERT or RoBERTa.
Arguments
keras_hub.models.CLIPTokenizer
instance.True
, the preprocessor will prepend the tokenizer
start token to each input sequence.True
, the preprocessor will append the tokenizer
end token to each input sequence.Call arguments
tf.Tensor
or list of python strings.sequence_length
of
the layer.from_preset
methodCLIPPreprocessor.from_preset(preset, config_file="preprocessor.json", **kwargs)
Instantiate a keras_hub.models.Preprocessor
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as
one of:
'bert_base_en'
'kaggle://user/bert/keras/bert_base_en'
'hf://user/bert_base_en'
'./bert_base_en'
For any Preprocessor
subclass, you can run cls.presets.keys()
to
list all built-in presets available on the class.
As there are usually multiple preprocessing classes for a given model,
this method should be called on a specific subclass like
keras_hub.models.BertTextClassifierPreprocessor.from_preset()
.
Arguments
Examples
# Load a preprocessor for Gemma generation.
preprocessor = keras_hub.models.CausalLMPreprocessor.from_preset(
"gemma_2b_en",
)
# Load a preprocessor for Bert classification.
preprocessor = keras_hub.models.TextClassifierPreprocessor.from_preset(
"bert_base_en",
)
tokenizer
propertykeras_hub.models.CLIPPreprocessor.tokenizer
The tokenizer used to tokenize strings.