EditDistance
classkeras_nlp.metrics.EditDistance(
normalize=True, dtype="float32", name="edit_distance", **kwargs
)
Edit Distance metric.
This class implements the edit distance metric, sometimes called
Levenshtein Distance, as a keras.metrics.Metric
. Essentially, edit
distance is the least number of operations required to convert one string to
another, where an operation can be one of substitution, deletion or
insertion. By default, this metric will compute the normalized score, where
the unnormalized edit distance score is divided by the number of tokens in
the reference text.
This class can be used to compute character error rate (CER) and word error
rate (WER). You simply have to pass the appropriate tokenized text, and set
normalize
to True.
Note on input shapes:
y_true
and y_pred
can either be tensors of rank 1 or ragged tensors of
rank 2. These tensors contain tokenized text.
Arguments
"float32"
.References
Examples
Various Input Types.
Single-level Python list.
>>> edit_distance = keras_nlp.metrics.EditDistance()
>>> y_true = "the tiny little cat was found under the big funny bed".split()
>>> y_pred = "the cat was found under the bed".split()
>>> edit_distance(y_true, y_pred)
<tf.Tensor: shape=(), dtype=float32, numpy=0.36363637>
Nested Python list.
>>> edit_distance = keras_nlp.metrics.EditDistance()
>>> y_true = [
... "the tiny little cat was found under the big funny bed".split(),
... "it is sunny today".split(),
... ]
>>> y_pred = [
... "the cat was found under the bed".split(),
... "it is sunny but with a hint of cloud cover".split(),
... ]
>>> edit_distance(y_true, y_pred)
<tf.Tensor: shape=(), dtype=float32, numpy=0.73333335>