LSTM
classkeras.layers.LSTM(
units,
activation="tanh",
recurrent_activation="sigmoid",
use_bias=True,
kernel_initializer="glorot_uniform",
recurrent_initializer="orthogonal",
bias_initializer="zeros",
unit_forget_bias=True,
kernel_regularizer=None,
recurrent_regularizer=None,
bias_regularizer=None,
activity_regularizer=None,
kernel_constraint=None,
recurrent_constraint=None,
bias_constraint=None,
dropout=0.0,
recurrent_dropout=0.0,
seed=None,
return_sequences=False,
return_state=False,
go_backwards=False,
stateful=False,
unroll=False,
**kwargs
)
Long Short-Term Memory layer - Hochreiter 1997.
Based on available runtime hardware and constraints, this layer will choose different implementations (cuDNN-based or backend-native) to maximize the performance. If a GPU is available and all the arguments to the layer meet the requirement of the cuDNN kernel (see below for details), the layer will use a fast cuDNN implementation when using the TensorFlow backend. The requirements to use the cuDNN implementation are:
activation
== tanh
recurrent_activation
== sigmoid
dropout
== 0 and recurrent_dropout
== 0unroll
is False
use_bias
is True
For example:
>>> inputs = np.random.random((32, 10, 8))
>>> lstm = keras.layers.LSTM(4)
>>> output = lstm(inputs)
>>> output.shape
(32, 4)
>>> lstm = keras.layers.LSTM(
... 4, return_sequences=True, return_state=True)
>>> whole_seq_output, final_memory_state, final_carry_state = lstm(inputs)
>>> whole_seq_output.shape
(32, 10, 4)
>>> final_memory_state.shape
(32, 4)
>>> final_carry_state.shape
(32, 4)
Arguments
tanh
).
If you pass None
, no activation is applied
(ie. "linear" activation: a(x) = x
).sigmoid
).
If you pass None
, no activation is applied
(ie. "linear" activation: a(x) = x
).True
), whether the layer
should use a bias vector.kernel
weights matrix,
used for the linear transformation of the inputs. Default:
"glorot_uniform"
.recurrent_kernel
weights matrix, used for the linear transformation of the recurrent
state. Default: "orthogonal"
."zeros"
.True
). If True
,
add 1 to the bias of the forget gate at initialization.
Setting it to True
will also force bias_initializer="zeros"
.
This is recommended in Jozefowicz et al.kernel
weights
matrix. Default: None
.recurrent_kernel
weights matrix. Default: None
.None
.None
.kernel
weights
matrix. Default: None
.recurrent_kernel
weights matrix. Default: None
.None
.False
.False
.False
).
If True
, process the input sequence backwards and return the
reversed sequence.False
). If True
, the last state
for each sample at index i in a batch will be used as initial
state for the sample of index i in the following batch.True
, the network will be unrolled,
else a symbolic loop will be used.
Unrolling can speed-up a RNN,
although it tends to be more memory-intensive.
Unrolling is only suitable for short sequences.Call arguments
(batch, timesteps, feature)
.(samples, timesteps)
indicating whether
a given timestep should be masked (optional).
An individual True
entry indicates that the corresponding timestep
should be utilized, while a False
entry indicates that the
corresponding timestep should be ignored. Defaults to None
.dropout
or
recurrent_dropout
is used (optional). Defaults to None
.None
causes creation
of zero-filled initial state tensors). Defaults to None
.