This wrapper applies a layer to every temporal slice of an input.

The input should be at least 3D, and the dimension of index one will be considered to be the temporal dimension.

Consider a batch of 32 samples, where each sample is a sequence of 10 vectors of 16 dimensions. The batch input shape of the layer is then (32, 10, 16), and the input_shape, not including the samples dimension, is (10, 16).

You can then use TimeDistributed to apply a Dense layer to each of the 10 timesteps, independently:

# as the first layer in a model
model = Sequential()
model.add(TimeDistributed(Dense(8), input_shape=(10, 16)))
# now model.output_shape == (None, 10, 8)

The output will then have shape (32, 10, 8).

In subsequent layers, there is no need for the input_shape:

# now model.output_shape == (None, 10, 32)

The output will then have shape (32, 10, 32).

TimeDistributed can be used with arbitrary layers, not just Dense, for instance with a Conv2D layer:

model = Sequential()
model.add(TimeDistributed(Conv2D(64, (3, 3)),
                          input_shape=(10, 299, 299, 3)))


  • layer: a layer instance.



keras.layers.Bidirectional(layer, merge_mode='concat', weights=None)

Bidirectional wrapper for RNNs.


  • layer: Recurrent instance.
  • merge_mode: Mode by which outputs of the forward and backward RNNs will be combined. One of {'sum', 'mul', 'concat', 'ave', None}. If None, the outputs will not be combined, they will be returned as a list.
  • weights: Initial weights to load in the Bidirectional model


  • ValueError: In case of invalid merge_mode argument.


model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True),
                        input_shape=(5, 10)))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')