ยป Keras API reference / KerasNLP / KerasNLP Models

KerasNLP Models

KerasNLP contains end-to-end implementations of popular model architectures. These models can be created in two ways:

  • Through the from_preset() constructor, which instantiates an object with a pre-trained configurations, vocabularies, and (optionally) weights.
  • Through custom configuration controlled by the user.

Below, we list all presets available in the library. For more detailed usage, browse the docstring for a particular class. For an in depth introduction to our API, see the getting started guide.

Backbone presets

The following preset names correspond to a configuration, weights and vocabulary for a model backbone. These presets are not inference-ready, and must be fine-tuned for a given task!

The names below can be used with any from_preset() constructor for a given model.

classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased")
backbone = keras_nlp.models.BertBackbone.from_preset("bert_tiny_en_uncased")
tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased")
preprocessor = keras_nlp.models.BertPreprocessor.from_preset("bert_tiny_en_uncased")
Preset name Model Parameters Description
albert_base_en_uncased ALBERT 11.68M 12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_large_en_uncased ALBERT 17.68M 24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_extra_large_en_uncased ALBERT 58.72M 24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
albert_extra_extra_large_en_uncased ALBERT 222.60M 12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_tiny_en_uncased BERT 4.39M 2-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_small_en_uncased BERT 28.76M 4-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_medium_en_uncased BERT 41.37M 8-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_base_en_uncased BERT 109.48M 12-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_base_en BERT 108.31M 12-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus.
bert_base_zh BERT 102.27M 12-layer BERT model. Trained on Chinese Wikipedia.
bert_base_multi BERT 177.85M 12-layer BERT model where case is maintained. Trained on trained on Wikipedias of 104 languages
bert_large_en_uncased BERT 335.14M 24-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus.
bert_large_en BERT 333.58M 24-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus.
deberta_v3_extra_small_en DeBERTaV3 70.68M 12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_small_en DeBERTaV3 141.30M 6-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_base_en DeBERTaV3 183.83M 12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_large_en DeBERTaV3 434.01M 24-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText.
deberta_v3_base_multi DeBERTaV3 278.22M 12-layer DeBERTaV3 model where case is maintained. Trained on the 2.5TB multilingual CC100 dataset.
distil_bert_base_en_uncased DistilBERT 66.36M 6-layer DistilBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model.
distil_bert_base_en DistilBERT 65.19M 6-layer DistilBERT model where case is maintained. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model.
distil_bert_base_multi DistilBERT 134.73M 6-layer DistilBERT model where case is maintained. Trained on Wikipedias of 104 languages
f_net_base_en FNet 82.86M 12-layer FNet model where case is maintained. Trained on the C4 dataset.
f_net_large_en FNet 236.95M 24-layer FNet model where case is maintained. Trained on the C4 dataset.
gpt2_base_en GPT-2 124.44M 12-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_medium_en GPT-2 354.82M 24-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_large_en GPT-2 774.03M 36-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_extra_large_en GPT-2 1.56B 48-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_base_en_cnn_dailymail GPT-2 124.44M 12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset.
opt_125m_en OPT 125.24M 12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_1.3b_en OPT 1.32B 24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_2.7b_en OPT 2.70B 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
opt_6.7b_en OPT 6.70B 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora.
roberta_base_en RoBERTa 124.05M 12-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText.
roberta_large_en RoBERTa 354.31M 24-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText.
xlm_roberta_base_multi XLM-RoBERTa 277.45M 12-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages.
xlm_roberta_large_multi XLM-RoBERTa 558.84M 24-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages.

Classification presets

The following preset names correspond to a configuration, weights and vocabulary for a model classifier. These models are inference ready, but can be further fine-tuned if desired.

The names below can be used with the from_preset() constructor for classifier models and preprocessing layers.

classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased_sst2")
tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased_sst2")
preprocessor = keras_nlp.models.BertPreprocessor.from_preset("bert_tiny_en_uncased_sst2")
Preset name Model Parameters Description
bert_tiny_en_uncased_sst2 BERT 4.39M The bert_tiny_en_uncased backbone model fine-tuned on the SST-2 sentiment analysis dataset.

API Documentation

Albert

Bert

DebertaV3

DistilBert

GPT2

FNet

OPT

Roberta

XLMRoberta