CategoryEncoding
classtf_keras.layers.CategoryEncoding(
num_tokens=None, output_mode="multi_hot", sparse=False, **kwargs
)
A preprocessing layer which encodes integer features.
This layer provides options for condensing data into a categorical encoding
when the total number of tokens are known in advance. It accepts integer
values as inputs, and it outputs a dense or sparse representation of those
inputs. For integer inputs where the total number of tokens is not known,
use tf.keras.layers.IntegerLookup
instead.
For an overview and full list of preprocessing layers, see the preprocessing guide.
Examples
One-hot encoding data
>>> layer = tf.keras.layers.CategoryEncoding(
... num_tokens=4, output_mode="one_hot")
>>> layer([3, 2, 0, 1])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[0., 0., 0., 1.],
[0., 0., 1., 0.],
[1., 0., 0., 0.],
[0., 1., 0., 0.]], dtype=float32)>
Multi-hot encoding data
>>> layer = tf.keras.layers.CategoryEncoding(
... num_tokens=4, output_mode="multi_hot")
>>> layer([[0, 1], [0, 0], [1, 2], [3, 1]])
<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[1., 1., 0., 0.],
[1., 0., 0., 0.],
[0., 1., 1., 0.],
[0., 1., 0., 1.]], dtype=float32)>
Using weighted inputs in "count"
mode
>>> layer = tf.keras.layers.CategoryEncoding(
... num_tokens=4, output_mode="count")
>>> count_weights = np.array([[.1, .2], [.1, .1], [.2, .3], [.4, .2]])
>>> layer([[0, 1], [0, 0], [1, 2], [3, 1]], count_weights=count_weights)
<tf.Tensor: shape=(4, 4), dtype=float64, numpy=
array([[0.1, 0.2, 0. , 0. ],
[0.2, 0. , 0. , 0. ],
[0. , 0.2, 0.3, 0. ],
[0. , 0.2, 0. , 0.4]], dtype=float32)>
Arguments
0 <= value <
num_tokens
, or an error will be thrown."one_hot"
, "multi_hot"
or
"count"
, configuring the layer as follows:"one_hot"
: Encodes each individual element in the input into an
array of num_tokens
size, containing a 1 at the element index. If
the last dimension is size 1, will encode on that dimension. If the
last dimension is not size 1, will append a new dimension for the
encoded output."multi_hot"
: Encodes each sample in the input into a single array
of num_tokens
size, containing a 1 for each vocabulary term
present in the sample. Treats the last dimension as the sample
dimension, if input shape is (..., sample_length)
, output shape
will be (..., num_tokens)
."count"
: Like "multi_hot"
, but the int array contains a count of
the number of times the token at that index appeared in the sample.
For all output modes, currently only output up to rank 2 is supported.
Defaults to "multi_hot"
.SparseTensor
instead of a dense
Tensor
. Defaults to False
.Call arguments
inputs
indicating the
weight for each sample value when summing up in count
mode. Not used
in "multi_hot"
or "one_hot"
modes.