LSTM
classtf_keras.layers.LSTM(
units,
activation="tanh",
recurrent_activation="sigmoid",
use_bias=True,
kernel_initializer="glorot_uniform",
recurrent_initializer="orthogonal",
bias_initializer="zeros",
unit_forget_bias=True,
kernel_regularizer=None,
recurrent_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
recurrent_constraint=None,
bias_constraint=None,
dropout=0.0,
recurrent_dropout=0.0,
return_sequences=False,
return_state=False,
go_backwards=False,
stateful=False,
time_major=False,
unroll=False,
**kwargs
)
Long Short-Term Memory layer - Hochreiter 1997.
See the TF-Keras RNN API guide for details about the usage of RNN API.
Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or pure-TensorFlow) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation.
The requirements to use the cuDNN implementation are:
activation
== tanh
recurrent_activation
== sigmoid
recurrent_dropout
== 0unroll
is False
use_bias
is True
For example:
>>> inputs = tf.random.normal([32, 10, 8])
>>> lstm = tf.keras.layers.LSTM(4)
>>> output = lstm(inputs)
>>> print(output.shape)
(32, 4)
>>> lstm = tf.keras.layers.LSTM(4, return_sequences=True, return_state=True)
>>> whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
>>> print(whole_seq_output.shape)
(32, 10, 4)
>>> print(final_memory_state.shape)
(32, 4)
>>> print(final_carry_state.shape)
(32, 4)
Arguments
tanh
). If you pass None
, no activation
is applied (ie. "linear" activation: a(x) = x
).sigmoid
). If you pass None
, no activation is
applied (ie. "linear" activation: a(x) = x
).True
), whether the layer uses a bias vector.kernel
weights matrix, used for
the linear transformation of the inputs. Default: glorot_uniform
.recurrent_kernel
weights
matrix, used for the linear transformation of the recurrent state.
Default: orthogonal
.zeros
.True
). If True, add 1 to the bias of
the forget gate at initialization. Setting it to true will also force
bias_initializer="zeros"
. This is recommended in Jozefowicz et
al..kernel
weights
matrix. Default: None
.recurrent_kernel
weights matrix. Default: None
.None
.None
.kernel
weights
matrix. Default: None
.recurrent_kernel
weights matrix. Default: None
.None
.False
.False
.False
). If True, process the input
sequence backwards and return the reversed sequence.False
). If True, the last state for each
sample at index i in a batch will be used as initial state for the sample
of index i in the following batch.inputs
and outputs
tensors.
If True, the inputs and outputs will be in shape
[timesteps, batch, feature]
, whereas in the False case, it will be
[batch, timesteps, feature]
. Using time_major = True
is a bit more
efficient because it avoids transposes at the beginning and end of the
RNN calculation. However, most TensorFlow data is batch-major, so by
default this function accepts input and emits output in batch-major
form.False
). If True, the network will be unrolled,
else a symbolic loop will be used. Unrolling can speed-up a RNN,
although it tends to be more memory-intensive. Unrolling is only
suitable for short sequences.Call arguments
[batch, timesteps, feature]
.[batch, timesteps]
indicating whether
a given timestep should be masked (optional).
An individual True
entry indicates that the corresponding timestep
should be utilized, while a False
entry indicates that the
corresponding timestep should be ignored. Defaults to None
.dropout
or
recurrent_dropout
is used (optional). Defaults to None
.None
causes creation
of zero-filled initial state tensors). Defaults to None
.