Bleu
classkeras_nlp.metrics.Bleu(
tokenizer=None, max_order=4, smooth=False, dtype="float32", name="bleu", **kwargs
)
BLEU metric.
This class implements the BLEU metric. BLEU is generally used to evaluate machine translation systems. By default, this implementation replicates SacreBLEU, but user-defined tokenizers can be passed to deal with other languages.
For BLEU score, we count the number of matching n-grams in the candidate translation and the reference text. We find the "clipped count" of matching n-grams so as to not give a high score to a (reference, prediction) pair with redundant, repeated tokens. Secondly, BLEU score tends to reward shorter predictions more, which is why a brevity penalty is applied to penalise short predictions. For more details, see the following article: https://cloud.google.com/translate/automl/docs/evaluate#bleu.
Note on input shapes:
For unbatched inputs, y_pred
should be a tensor of shape ()
, and
y_true
should be a tensor of shape (num_references,)
. For batched
inputs, y_pred
should be a tensor of shape (batch_size,)
,
and y_true
should be a tensor of shape (batch_size, num_references)
. In
case of batched inputs, y_true
can also be a ragged tensor of shape
(batch_size, None)
if different samples have different number of
references.
Arguments
tf.RaggedTensor
(of any shape), and tokenizes the strings in the tensor. If the
tokenizer is not specified, the default tokenizer is used. The
default tokenizer replicates the behaviour of SacreBLEU's
"tokenizer_13a"
tokenizer
(https://github.com/mjpost/sacrebleu/blob/v2.1.0/sacrebleu/tokenizers/tokenizer_13a.py).max_order
is set to 3, unigrams, bigrams, and trigrams will be
considered. Defaults to 4
.False
."float32"
.References