Keras 3 API documentation / KerasNLP / Models / GPT2 / GPT2CausalLM model

GPT2CausalLM model

[source]

GPT2CausalLM class

keras_nlp.models.GPT2CausalLM(backbone, preprocessor=None, **kwargs)

An end-to-end GPT2 model for causal language modeling.

A causal language model (LM) predicts the next token based on previous tokens. This task setup can be used to train the model unsupervised on plain text input, or to autoregressively generate plain text similar to the data used for training. This task can be used for pre-training or fine-tuning a GPT-2 model, simply by calling fit().

This model has a generate() method, which generates text based on a prompt. The generation strategy used is controlled by an additional sampler argument on compile(). You can recompile the model with different keras_nlp.samplers objects to control the generation. By default, "top_k" sampling will be used.

This model can optionally be configured with a preprocessor layer, in which case it will automatically apply preprocessing to string inputs during fit(), predict(), evaluate() and generate(). This is done by default when creating the model with from_preset().

Disclaimer: Pre-trained models are provided on an "as is" basis, without warranties or conditions of any kind. The underlying model is provided by a third party and subject to a separate license, available here.

Arguments

Examples

Use generate() to do text generation.

gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en")
gpt2_lm.generate("I want to say", max_length=30)

# Generate with batched prompts.
gpt2_lm.generate(["This is a", "Where are you"], max_length=30)

Compile the generate() function with a custom sampler.

gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en")
gpt2_lm.compile(sampler="greedy")
gpt2_lm.generate("I want to say", max_length=30)

gpt2_lm.compile(sampler=keras_nlp.samplers.BeamSampler(num_beams=2))
gpt2_lm.generate("I want to say", max_length=30)

Use generate() without preprocessing.

# Prompt the model with `5338, 318` (the token ids for `"Who is"`).
# Use `"padding_mask"` to indicate values that should not be overridden.
prompt = {
    "token_ids": np.array([[5338, 318, 0, 0, 0]] * 2),
    "padding_mask": np.array([[1, 1, 0, 0, 0]] * 2),
}

gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset(
    "gpt2_base_en",
    preprocessor=None,
)
gpt2_lm.generate(prompt)

Call fit() on a single batch.

features = ["The quick brown fox jumped.", "I forgot my homework."]
gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en")
gpt2_lm.fit(x=features, batch_size=2)

Call fit() without preprocessing.

x = {
    "token_ids": np.array([[50256, 1, 2, 3, 4]] * 2),
    "padding_mask": np.array([[1, 1, 1, 1, 1]] * 2),
}
y = np.array([[1, 2, 3, 4, 50256]] * 2)
sw = np.array([[1, 1, 1, 1, 1]] * 2)

gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset(
    "gpt2_base_en",
    preprocessor=None,
)
gpt2_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2)

Custom backbone and vocabulary.

features = ["a quick fox.", "a fox quick."]
vocab = {"<|endoftext|>": 0, "a": 4, "Ġquick": 5, "Ġfox": 6}
merges = ["Ġ q", "u i", "c k", "ui ck", "Ġq uick"]
merges += ["Ġ f", "o x", "Ġf ox"]

tokenizer = keras_nlp.models.GPT2Tokenizer(
    vocabulary=vocab,
    merges=merges,
)
preprocessor = keras_nlp.models.GPT2CausalLMPreprocessor(
    tokenizer=tokenizer,
    sequence_length=128,
)
backbone = keras_nlp.models.GPT2Backbone(
    vocabulary_size=30552,
    num_layers=4,
    num_heads=4,
    hidden_dim=256,
    intermediate_dim=512,
    max_sequence_length=128,
)
gpt2_lm = keras_nlp.models.GPT2CausalLM(
    backbone=backbone,
    preprocessor=preprocessor,
)
gpt2_lm.fit(x=features, batch_size=2)

[source]

from_preset method

GPT2CausalLM.from_preset()

Instantiate GPT2CausalLM model from preset architecture and weights.

Arguments

  • preset: string. Must be one of "gpt2_base_en", "gpt2_medium_en", "gpt2_large_en", "gpt2_extra_large_en", "gpt2_base_en_cnn_dailymail".
  • load_weights: Whether to load pre-trained weights into model. Defaults to True.

Examples

# Load architecture and weights from preset
model = GPT2CausalLM.from_preset("gpt2_base_en")

# Load randomly initialized model from preset architecture
model = GPT2CausalLM.from_preset(
    "gpt2_base_en",
    load_weights=False
)
Preset name Parameters Description
gpt2_base_en 124.44M 12-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_medium_en 354.82M 24-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_large_en 774.03M 36-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_extra_large_en 1.56B 48-layer GPT-2 model where case is maintained. Trained on WebText.
gpt2_base_en_cnn_dailymail 124.44M 12-layer GPT-2 model where case is maintained. Finetuned on the CNN/DailyMail summarization dataset.

[source]

generate method

GPT2CausalLM.generate(inputs, max_length=None)

Generate text given prompt inputs.

This method generates text based on given inputs. The sampling method used for generation can be set via the compile() method.

If inputs are a tf.data.Dataset, outputs will be generated "batch-by-batch" and concatenated. Otherwise, all inputs will be handled as a single batch.

If a preprocessor is attached to the model, inputs will be preprocessed inside the generate() function and should match the structure expected by the preprocessor layer (usually raw strings). If a preprocessor is not attached, inputs should match the structure expected by the backbone. See the example usage above for a demonstration of each.

Arguments

  • inputs: python data, tensor data, or a tf.data.Dataset. If a preprocessor is attached to the model, inputs should match the structure expected by the preprocessor layer. If a preprocessor is not attached, inputs should match the structure expected the backbone model.
  • max_length: Optional. int. The max length of the generated sequence. Will default to the max configured sequence_length of the preprocessor. If preprocessor is None, inputs should be should be padded to the desired maximum length and this argument will be ignored.

backbone property

keras_nlp.models.GPT2CausalLM.backbone

A keras.Model instance providing the backbone sub-model.


preprocessor property

keras_nlp.models.GPT2CausalLM.preprocessor

A keras.layers.Layer instance used to preprocess inputs.