Keras 3 API documentation / Losses / Probabilistic losses

Probabilistic losses

[source]

BinaryCrossentropy class

keras.losses.BinaryCrossentropy(
    from_logits=False,
    label_smoothing=0.0,
    axis=-1,
    reduction="sum_over_batch_size",
    name="binary_crossentropy",
)

Computes the cross-entropy loss between true labels and predicted labels.

Use this cross-entropy loss for binary (0 or 1) classification applications. The loss function requires the following inputs:

  • y_true (true label): This is either 0 or 1.
  • y_pred (predicted value): This is the model's prediction, i.e, a single floating-point value which either represents a logit, (i.e, value in [-inf, inf] when from_logits=True) or a probability (i.e, value in [0., 1.] when from_logits=False).

Arguments

  • from_logits: Whether to interpret y_pred as a tensor of logit values. By default, we assume that y_pred is probabilities (i.e., values in [0, 1]).
  • label_smoothing: Float in range [0, 1]. When 0, no smoothing occurs. When > 0, we compute the loss between the predicted labels and a smoothed version of the true labels, where the smoothing squeezes the labels towards 0.5. Larger values of label_smoothing correspond to heavier smoothing.
  • axis: The axis along which to compute crossentropy (the features axis). Defaults to -1.
  • reduction: Type of reduction to apply to the loss. In almost all cases this should be "sum_over_batch_size". Supported options are "sum", "sum_over_batch_size" or None.
  • name: Optional name for the loss instance.

Examples

Recommended Usage: (set from_logits=True)

With compile() API:

model.compile(
    loss=keras.losses.BinaryCrossentropy(from_logits=True),
    ...
)

As a standalone function:

>>> # Example 1: (batch_size = 1, number of samples = 4)
>>> y_true = [0, 1, 0, 0]
>>> y_pred = [-18.6, 0.51, 2.94, -12.8]
>>> bce = keras.losses.BinaryCrossentropy(from_logits=True)
>>> bce(y_true, y_pred)
0.865
>>> # Example 2: (batch_size = 2, number of samples = 4)
>>> y_true = [[0, 1], [0, 0]]
>>> y_pred = [[-18.6, 0.51], [2.94, -12.8]]
>>> # Using default 'auto'/'sum_over_batch_size' reduction type.
>>> bce = keras.losses.BinaryCrossentropy(from_logits=True)
>>> bce(y_true, y_pred)
0.865
>>> # Using 'sample_weight' attribute
>>> bce(y_true, y_pred, sample_weight=[0.8, 0.2])
0.243
>>> # Using 'sum' reduction` type.
>>> bce = keras.losses.BinaryCrossentropy(from_logits=True,
...     reduction="sum")
>>> bce(y_true, y_pred)
1.730
>>> # Using 'none' reduction type.
>>> bce = keras.losses.BinaryCrossentropy(from_logits=True,
...     reduction=None)
>>> bce(y_true, y_pred)
array([0.235, 1.496], dtype=float32)

Default Usage: (set from_logits=False)

>>> # Make the following updates to the above "Recommended Usage" section
>>> # 1. Set `from_logits=False`
>>> keras.losses.BinaryCrossentropy() # OR ...('from_logits=False')
>>> # 2. Update `y_pred` to use probabilities instead of logits
>>> y_pred = [0.6, 0.3, 0.2, 0.8] # OR [[0.6, 0.3], [0.2, 0.8]]

[source]

CategoricalCrossentropy class

keras.losses.CategoricalCrossentropy(
    from_logits=False,
    label_smoothing=0.0,
    axis=-1,
    reduction="sum_over_batch_size",
    name="categorical_crossentropy",
)

Computes the crossentropy loss between the labels and predictions.

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation. If you want to provide labels as integers, please use SparseCategoricalCrossentropy loss. There should be num_classes floating point values per feature, i.e., the shape of both y_pred and y_true are [batch_size, num_classes].

Arguments

  • from_logits: Whether y_pred is expected to be a logits tensor. By default, we assume that y_pred encodes a probability distribution.
  • label_smoothing: Float in [0, 1]. When > 0, label values are smoothed, meaning the confidence on label values are relaxed. For example, if 0.1, use 0.1 / num_classes for non-target labels and 0.9 + 0.1 / num_classes for target labels.
  • axis: The axis along which to compute crossentropy (the features axis). Defaults to -1.
  • reduction: Type of reduction to apply to the loss. In almost all cases this should be "sum_over_batch_size". Supported options are "sum", "sum_over_batch_size" or None.
  • name: Optional name for the loss instance.

Examples

Standalone usage:

>>> y_true = [[0, 1, 0], [0, 0, 1]]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> # Using 'auto'/'sum_over_batch_size' reduction type.
>>> cce = keras.losses.CategoricalCrossentropy()
>>> cce(y_true, y_pred)
1.177
>>> # Calling with 'sample_weight'.
>>> cce(y_true, y_pred, sample_weight=np.array([0.3, 0.7]))
0.814
>>> # Using 'sum' reduction type.
>>> cce = keras.losses.CategoricalCrossentropy(
...     reduction="sum")
>>> cce(y_true, y_pred)
2.354
>>> # Using 'none' reduction type.
>>> cce = keras.losses.CategoricalCrossentropy(
...     reduction=None)
>>> cce(y_true, y_pred)
array([0.0513, 2.303], dtype=float32)

Usage with the compile() API:

model.compile(optimizer='sgd',
              loss=keras.losses.CategoricalCrossentropy())

[source]

SparseCategoricalCrossentropy class

keras.losses.SparseCategoricalCrossentropy(
    from_logits=False,
    ignore_class=None,
    reduction="sum_over_batch_size",
    name="sparse_categorical_crossentropy",
)

Computes the crossentropy loss between the labels and predictions.

Use this crossentropy loss function when there are two or more label classes. We expect labels to be provided as integers. If you want to provide labels using one-hot representation, please use CategoricalCrossentropy loss. There should be # classes floating point values per feature for y_pred and a single floating point value per feature for y_true.

In the snippet below, there is a single floating point value per example for y_true and num_classes floating pointing values per example for y_pred. The shape of y_true is [batch_size] and the shape of y_pred is [batch_size, num_classes].

Arguments

  • from_logits: Whether y_pred is expected to be a logits tensor. By default, we assume that y_pred encodes a probability distribution.
  • reduction: Type of reduction to apply to the loss. In almost all cases this should be "sum_over_batch_size". Supported options are "sum", "sum_over_batch_size" or None.
  • name: Optional name for the loss instance.

Examples

>>> y_true = [1, 2]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> # Using 'auto'/'sum_over_batch_size' reduction type.
>>> scce = keras.losses.SparseCategoricalCrossentropy()
>>> scce(y_true, y_pred)
1.177
>>> # Calling with 'sample_weight'.
>>> scce(y_true, y_pred, sample_weight=np.array([0.3, 0.7]))
0.814
>>> # Using 'sum' reduction type.
>>> scce = keras.losses.SparseCategoricalCrossentropy(
...     reduction="sum")
>>> scce(y_true, y_pred)
2.354
>>> # Using 'none' reduction type.
>>> scce = keras.losses.SparseCategoricalCrossentropy(
...     reduction=None)
>>> scce(y_true, y_pred)
array([0.0513, 2.303], dtype=float32)

Usage with the compile() API:

model.compile(optimizer='sgd',
              loss=keras.losses.SparseCategoricalCrossentropy())

[source]

Poisson class

keras.losses.Poisson(reduction="sum_over_batch_size", name="poisson")

Computes the Poisson loss between y_true & y_pred.

Formula:

loss = y_pred - y_true * log(y_pred)

Arguments

  • reduction: Type of reduction to apply to the loss. In almost all cases this should be "sum_over_batch_size". Supported options are "sum", "sum_over_batch_size" or None.
  • name: Optional name for the loss instance.

[source]

binary_crossentropy function

keras.losses.binary_crossentropy(
    y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=-1
)

Computes the binary crossentropy loss.

Arguments

  • y_true: Ground truth values. shape = [batch_size, d0, .. dN].
  • y_pred: The predicted values. shape = [batch_size, d0, .. dN].
  • from_logits: Whether y_pred is expected to be a logits tensor. By default, we assume that y_pred encodes a probability distribution.
  • label_smoothing: Float in [0, 1]. If > 0 then smooth the labels by squeezing them towards 0.5, that is, using 1. - 0.5 * label_smoothing for the target class and 0.5 * label_smoothing for the non-target class.
  • axis: The axis along which the mean is computed. Defaults to -1.

Returns

Binary crossentropy loss value. shape = [batch_size, d0, .. dN-1].

Example

>>> y_true = [[0, 1], [0, 0]]
>>> y_pred = [[0.6, 0.4], [0.4, 0.6]]
>>> loss = keras.losses.binary_crossentropy(y_true, y_pred)
>>> assert loss.shape == (2,)
>>> loss
array([0.916 , 0.714], dtype=float32)

[source]

categorical_crossentropy function

keras.losses.categorical_crossentropy(
    y_true, y_pred, from_logits=False, label_smoothing=0.0, axis=-1
)

Computes the categorical crossentropy loss.

Arguments

  • y_true: Tensor of one-hot true targets.
  • y_pred: Tensor of predicted targets.
  • from_logits: Whether y_pred is expected to be a logits tensor. By default, we assume that y_pred encodes a probability distribution.
  • label_smoothing: Float in [0, 1]. If > 0 then smooth the labels. For example, if 0.1, use 0.1 / num_classes for non-target labels and 0.9 + 0.1 / num_classes for target labels.
  • axis: Defaults to -1. The dimension along which the entropy is computed.

Returns

Categorical crossentropy loss value.

Example

>>> y_true = [[0, 1, 0], [0, 0, 1]]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> loss = keras.losses.categorical_crossentropy(y_true, y_pred)
>>> assert loss.shape == (2,)
>>> loss
array([0.0513, 2.303], dtype=float32)

[source]

sparse_categorical_crossentropy function

keras.losses.sparse_categorical_crossentropy(
    y_true, y_pred, from_logits=False, ignore_class=None, axis=-1
)

Computes the sparse categorical crossentropy loss.

Arguments

  • y_true: Ground truth values.
  • y_pred: The predicted values.
  • from_logits: Whether y_pred is expected to be a logits tensor. By default, we assume that y_pred encodes a probability distribution.
  • ignore_class: Optional integer. The ID of a class to be ignored during loss computation. This is useful, for example, in segmentation problems featuring a "void" class (commonly -1 or 255) in segmentation maps. By default (ignore_class=None), all classes are considered.
  • axis: Defaults to -1. The dimension along which the entropy is computed.

Returns

Sparse categorical crossentropy loss value.

Examples

>>> y_true = [1, 2]
>>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
>>> loss = keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
>>> assert loss.shape == (2,)
>>> loss
array([0.0513, 2.303], dtype=float32)

[source]

poisson function

keras.losses.poisson(y_true, y_pred)

Computes the Poisson loss between y_true and y_pred.

Formula:

loss = y_pred - y_true * log(y_pred)

Arguments

  • y_true: Ground truth values. shape = [batch_size, d0, .. dN].
  • y_pred: The predicted values. shape = [batch_size, d0, .. dN].

Returns

Poisson loss values with shape = [batch_size, d0, .. dN-1].

Example

>>> y_true = np.random.randint(0, 2, size=(2, 3))
>>> y_pred = np.random.random(size=(2, 3))
>>> loss = keras.losses.poisson(y_true, y_pred)
>>> assert loss.shape == (2,)
>>> y_pred = y_pred + 1e-7
>>> assert np.allclose(
...     loss, np.mean(y_pred - y_true * np.log(y_pred), axis=-1),
...     atol=1e-5)

[source]

KLDivergence class

keras.losses.KLDivergence(reduction="sum_over_batch_size", name="kl_divergence")

Computes Kullback-Leibler divergence loss between y_true & y_pred.

Formula:

loss = y_true * log(y_true / y_pred)

Arguments

  • reduction: Type of reduction to apply to the loss. In almost all cases this should be "sum_over_batch_size". Supported options are "sum", "sum_over_batch_size" or None.
  • name: Optional name for the loss instance.

[source]

kl_divergence function

keras.losses.kl_divergence(y_true, y_pred)

Computes Kullback-Leibler divergence loss between y_true & y_pred.

Formula:

loss = y_true * log(y_true / y_pred)

Arguments

  • y_true: Tensor of true targets.
  • y_pred: Tensor of predicted targets.

Returns

KL Divergence loss values with shape = [batch_size, d0, .. dN-1].

Example

>>> y_true = np.random.randint(0, 2, size=(2, 3)).astype(np.float32)
>>> y_pred = np.random.random(size=(2, 3))
>>> loss = keras.losses.kl_divergence(y_true, y_pred)
>>> assert loss.shape == (2,)
>>> y_true = ops.clip(y_true, 1e-7, 1)
>>> y_pred = ops.clip(y_pred, 1e-7, 1)
>>> assert np.array_equal(
...     loss, np.sum(y_true * np.log(y_true / y_pred), axis=-1))