Adagrad
classkeras.optimizers.Adagrad(
learning_rate=0.001,
initial_accumulator_value=0.1,
epsilon=1e-07,
weight_decay=None,
clipnorm=None,
clipvalue=None,
global_clipnorm=None,
use_ema=False,
ema_momentum=0.99,
ema_overwrite_frequency=None,
name="adagrad",
**kwargs
)
Optimizer that implements the Adagrad algorithm.
Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.
Arguments
keras.optimizers.schedules.LearningRateSchedule
instance, or
a callable that takes no arguments and returns the actual value to
use. The learning rate. Defaults to 0.001
. Note that Adagrad
tends to benefit from higher initial learning rate values compared
to other optimizers. To match the exact form in the original paper,
use 1.0
.use_ema=True
.
This is the momentum to use when computing
the EMA of the model's weights:
new_average = ema_momentum * old_average + (1 - ema_momentum) *
current_variable_value
.use_ema=True
. Every ema_overwrite_frequency
steps of iterations,
we overwrite the model variable by its moving average.
If None, the optimizer
does not overwrite model variables in the middle of training, and you
need to explicitly overwrite the variables at the end of training
by calling optimizer.finalize_variable_values()
(which updates the model
variables in-place). When using the built-in fit()
training loop,
this happens automatically after the last epoch,
and you don't need to do anything.None
. If a float, the scale factor will
be multiplied the loss before computing gradients, and the inverse of
the scale factor will be multiplied by the gradients before updating
variables. Useful for preventing underflow during mixed precision
training. Alternately, keras.optimizers.LossScaleOptimizer
will
automatically set a loss scale factor.Reference