compile()
& fit()
An optimizer is one of the two arguments required for compiling a Keras model:
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential()
model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,)))
model.add(layers.Activation('softmax'))
opt = keras.optimizers.Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=opt)
You can either instantiate an optimizer before passing it to model.compile()
, as in the above example,
or you can pass it by its string identifier. In the latter case, the default parameters for the optimizer will be used.
# pass optimizer by name: default parameters will be used
model.compile(loss='categorical_crossentropy', optimizer='adam')
When writing a custom training loop, you would retrieve
gradients via a tf.GradientTape
instance,
then call optimizer.apply_gradients()
to update your weights:
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam()
# Iterate over the batches of a dataset.
for x, y in dataset:
# Open a GradientTape.
with tf.GradientTape() as tape:
# Forward pass.
logits = model(x)
# Loss value for this batch.
loss_value = loss_fn(y, logits)
# Get gradients of loss wrt the weights.
gradients = tape.gradient(loss_value, model.trainable_weights)
# Update the weights of the model.
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
Note that when you use apply_gradients
, the optimizer does not
apply gradient clipping to the gradients: if you want gradient clipping,
you would have to do it by hand before calling the method.
You can use a learning rate schedule to modulate how the learning rate of your optimizer changes over time:
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate=1e-2,
decay_steps=10000,
decay_rate=0.9)
optimizer = keras.optimizers.SGD(learning_rate=lr_schedule)
Check out the learning rate schedule API documentation for a list of available schedules.
These methods and attributes are common to all Keras optimizers.
apply_gradients
methodOptimizer.apply_gradients(
grads_and_vars, name=None, skip_gradients_aggregation=False, **kwargs
)
Apply gradients to variables.
Arguments
(gradient, variable)
pairs.self.name
will be used.Returns
A tf.Variable
, representing the current iteration.
Raises
grads_and_vars
is malformed.variables
propertytf.keras.optimizers.Optimizer.variables
Returns variables of this optimizer.