Skip to main content
Automatic Machine Translation Evaluation - COMET Explained

Motivation

While the advance in deep learning has dramatically improved the machine translation quality, there is little development in the evaluation of machine translation models. The most widely-used metrics like BLEU [Papineni et al., 2002] and METEOR [Lavie and Denkowski, 2009] simply match the n-gram between the hypothesis text and reference text, which is too rigid without considering the variance in ground-truth translations and fail to differentiate the current highest performance machine translation models. They also cannot be accurately correlated with human judgment for a piece of text.


Xinyi WangAbout 3 minMTDL4MTMT EvaluationPre-training
Learned Metrics for Machine Translation

How to automatically evaluate the quality of a machine translation system? Human evaluation is accurate, but expensive. It is not suitable for MT model development.

Reading Time: About 15 minutes.


Huake HeAbout 12 minMTDL4MTMT EvaluationBERTScoreCOMETBERT