Star
About Keras Getting started Developer guides Keras API reference Models API Layers API Callbacks API Optimizers Metrics Losses Data loading Built-in small datasets Keras Applications Mixed precision Utilities KerasTuner KerasCV KerasNLP Code examples Why choose Keras? Community & governance Contributing to Keras KerasTuner KerasCV KerasNLP
ยป Keras API reference / KerasNLP / KerasNLP Tokenizers

KerasNLP Tokenizers

Tokenizers convert raw string input into integer input suitable for a Keras Embedding layer. They can also convert back from predicted integer sequences to raw string output.

All tokenizers subclass keras_nlp.tokenizers.Tokenizer, which in turn subclasses keras.layers.Layer. Tokenizers should generally be applied inside a tf.data.Dataset.map for training, and can be included inside a keras.Model for inference.

Tokenizer base class

  • Tokenizer class
  • tokenize method
  • detokenize method
  • get_vocabulary method
  • vocabulary_size method
  • token_to_id method
  • id_to_token method

WordPieceTokenizer

  • WordPieceTokenizer class
  • tokenize method
  • detokenize method
  • get_vocabulary method
  • vocabulary_size method
  • token_to_id method
  • id_to_token method

ByteTokenizer

  • ByteTokenizer class
  • tokenize method
  • detokenize method
  • get_vocabulary method
  • vocabulary_size method
  • token_to_id method
  • id_to_token method

UnicodeCharacterTokenizer

  • UnicodeCharacterTokenizer class
  • tokenize method
  • detokenize method
  • get_vocabulary method
  • vocabulary_size method
  • token_to_id method
  • id_to_token method
KerasNLP Tokenizers
Tokenizer base class
WordPieceTokenizer
ByteTokenizer
UnicodeCharacterTokenizer