KerasNLP contains end-to-end implementations of popular model architectures. These models can be created in two ways:
from_preset()
constructor, which instantiates an object with
a pre-trained configurations, vocabularies, and (optionally) weights.Below, we list all presets available in the library. For more detailed usage, browse the docstring for a particular class. For an in depth introduction to our API, see the getting started guide.
The following preset names correspond to a configuration, weights and vocabulary for a model backbone. These presets are not inference-ready, and must be fine-tuned for a given task!
The names below can be used with any from_preset()
constructor for a given model.
classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased")
backbone = keras_nlp.models.BertBackbone.from_preset("bert_tiny_en_uncased")
tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased")
preprocessor = keras_nlp.models.BertPreprocessor.from_preset("bert_tiny_en_uncased")
Preset name | Model | Parameters | Description |
---|---|---|---|
albert_base_en_uncased | ALBERT | 11.68M | 12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
albert_large_en_uncased | ALBERT | 17.68M | 24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
albert_extra_large_en_uncased | ALBERT | 58.72M | 24-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
albert_extra_extra_large_en_uncased | ALBERT | 222.60M | 12-layer ALBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
bert_tiny_en_uncased | BERT | 4.39M | 2-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
bert_small_en_uncased | BERT | 28.76M | 4-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
bert_medium_en_uncased | BERT | 41.37M | 8-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
bert_base_en_uncased | BERT | 109.48M | 12-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
bert_base_en | BERT | 108.31M | 12-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus. Model Card |
bert_base_zh | BERT | 102.27M | 12-layer BERT model. Trained on Chinese Wikipedia. Model Card |
bert_base_multi | BERT | 177.85M | 12-layer BERT model where case is maintained. Trained on trained on Wikipedias of 104 languages Model Card |
bert_large_en_uncased | BERT | 335.14M | 24-layer BERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus. Model Card |
bert_large_en | BERT | 333.58M | 24-layer BERT model where case is maintained. Trained on English Wikipedia + BooksCorpus. Model Card |
deberta_v3_extra_small_en | DeBERTaV3 | 70.68M | 12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card |
deberta_v3_small_en | DeBERTaV3 | 141.30M | 6-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card |
deberta_v3_base_en | DeBERTaV3 | 183.83M | 12-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card |
deberta_v3_large_en | DeBERTaV3 | 434.01M | 24-layer DeBERTaV3 model where case is maintained. Trained on English Wikipedia, BookCorpus and OpenWebText. Model Card |
deberta_v3_base_multi | DeBERTaV3 | 278.22M | 12-layer DeBERTaV3 model where case is maintained. Trained on the 2.5TB multilingual CC100 dataset. Model Card |
distil_bert_base_en_uncased | DistilBERT | 66.36M | 6-layer DistilBERT model where all input is lowercased. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model. Model Card |
distil_bert_base_en | DistilBERT | 65.19M | 6-layer DistilBERT model where case is maintained. Trained on English Wikipedia + BooksCorpus using BERT as the teacher model. Model Card |
distil_bert_base_multi | DistilBERT | 134.73M | 6-layer DistilBERT model where case is maintained. Trained on Wikipedias of 104 languages Model Card |
f_net_base_en | FNet | 82.86M | 12-layer FNet model where case is maintained. Trained on the C4 dataset. Model Card |
f_net_large_en | FNet | 236.95M | 24-layer FNet model where case is maintained. Trained on the C4 dataset. Model Card |
gpt2_base_en | GPT-2 | 124.44M | 12-layer GPT-2 model where case is maintained. Trained on WebText. Model Card |
gpt2_medium_en | GPT-2 | 354.82M | 24-layer GPT-2 model where case is maintained. Trained on WebText. Model Card |
gpt2_large_en | GPT-2 | 774.03M | 36-layer GPT-2 model where case is maintained. Trained on WebText. Model Card |
gpt2_extra_large_en | GPT-2 | 1.56B | 48-layer GPT-2 model where case is maintained. Trained on WebText. Model Card |
gpt2_base_en_cnn_dailymail | GPT-2 | 124.44M | 12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset. |
opt_125m_en | OPT | 125.24M | 12-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card |
opt_1.3b_en | OPT | 1.32B | 24-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card |
opt_2.7b_en | OPT | 2.70B | 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card |
opt_6.7b_en | OPT | 6.70B | 32-layer OPT model where case in maintained. Trained on BookCorpus, CommonCrawl, Pile, and PushShift.io corpora. Model Card |
roberta_base_en | RoBERTa | 124.05M | 12-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText. Model Card |
roberta_large_en | RoBERTa | 354.31M | 24-layer RoBERTa model where case is maintained.Trained on English Wikipedia, BooksCorpus, CommonCraw, and OpenWebText. Model Card |
xlm_roberta_base_multi | XLM-RoBERTa | 277.45M | 12-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages. Model Card |
xlm_roberta_large_multi | XLM-RoBERTa | 558.84M | 24-layer XLM-RoBERTa model where case is maintained. Trained on CommonCrawl in 100 languages. Model Card |
Note: The links provided will lead to the model card or to the official README, if no model card has been provided by the author.
The following preset names correspond to a configuration, weights and vocabulary for a model classifier. These models are inference ready, but can be further fine-tuned if desired.
The names below can be used with the from_preset()
constructor for classifier models
and preprocessing layers.
classifier = keras_nlp.models.BertClassifier.from_preset("bert_tiny_en_uncased_sst2")
tokenizer = keras_nlp.models.BertTokenizer.from_preset("bert_tiny_en_uncased_sst2")
preprocessor = keras_nlp.models.BertPreprocessor.from_preset("bert_tiny_en_uncased_sst2")
Preset name | Model | Parameters | Description |
---|---|---|---|
bert_tiny_en_uncased_sst2 | BERT | 4.39M | The bert_tiny_en_uncased backbone model fine-tuned on the SST-2 sentiment analysis dataset. |
Note: The links provided will lead to the model card or to the official README, if no model card has been provided by the author.