» API reference / Optimizers


Usage with compile() & fit()

An optimizer is one of the two arguments required for compiling a Keras model:

from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))

opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=opt)

You can either instantiate an optimizer before passing it to model.compile() , as in the above example, or you can pass it by its string identifier. In the latter case, the default parameters for the optimizer will be used.

# pass optimizer by name: default parameters will be used
model.compile(loss='categorical_crossentropy', optimizer='adam')

Usage in a custom training loop

When writing a custom training loop, you would retrieve gradients via a tf.GradientTape instance, then call optimizer.apply_gradients() to update your weights:

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam()

# Iterate over the batches of a dataset.
for x, y in dataset:
    # Open a GradientTape.
    with tf.GradientTape() as tape:
        # Forward pass.
        logits = model(x)
        # Loss value for this batch.
        loss_value = loss_fn(y, logits)

    # Get gradients of loss wrt the weights.
    gradients = tape.gradient(loss_value, model.trainable_weights)

    # Update the weights of the model.
    optimizer.apply_gradients(zip(gradients, model.trainable_weights))

Note that when you use apply_gradients, the optimizer does not apply gradient clipping to the gradients: if you want gradient clipping, you would have to do it by hand before calling the method.

Learning rate decay / scheduling

You can use a learning rate schedule to modulate how the learning rate of your optimizer changes over time:

lr_schedule = keras.optimizers.schedules.ExponentialDecay(
optimizer = keras.optimizers.SGD(learning_rate=lr_schedule)

Check out the learning rate schedule API documentation for a list of available schedules.

Available optimizers

Core Optimizer API

These methods and attributes are common to all Keras optimizers.


apply_gradients method

    grads_and_vars, name=None, skip_gradients_aggregation=False, **kwargs

Apply gradients to variables.


  • grads_and_vars: List of (gradient, variable) pairs.
  • name: string, defaults to None. The name of the namescope to use when creating variables. If None, self.name will be used.
  • skip_gradients_aggregation: If true, gradients aggregation will not be performed inside optimizer. Usually this arg is set to True when you write custom code aggregating gradients outside the optimizer.
  • **kwargs: keyword arguments only used for backward compatibility.


A tf.Variable, representing the current iteration.


  • TypeError: If grads_and_vars is malformed.
  • RuntimeError: If called in a cross-replica context.

variables property


Returns variables of this optimizer.