While the advance in deep learning has dramatically improved the machine translation quality, there is little development in the evaluation of machine translation models. The most widely-used metrics like BLEU [Papineni et al., 2002] and METEOR [Lavie and Denkowski, 2009] simply match the n-gram between the hypothesis text and reference text, which is too rigid without considering the variance in ground-truth translations and fail to differentiate the current highest performance machine translation models. They also cannot be accurately correlated with human judgment for a piece of text.
In the past couple years, we have seen the rise of Transformer architectures in Natural Language Processing. Transformers revolutionized the speed and accuracy of machine translation systems, and alleviated the need for Recurrent Neural Networks and LSTMs to derive context and meaning for sequence to sequence modeling. Since the Attention Is All You Need paper was published in 2017, there have been many experimental application and fine-tuning improvements made upon the original model. The latest such improvement is the Generative Pre-Trained Transformer 3, or GPT-3.
预训练技术,比如 BERT等,在自然语言处理领域,尤其是自然语言理解任务取得了巨大的成功。然而目前预训练技术在文本生成领域,比如机器翻译领域,能够取得什么样的效果,还是一个开放问题。CTNMT 这篇论文,从三个方面介绍这个问题:
- 预训练技术,比如 BERT或者 GPT 在机器翻译中的应用存在什么挑战?
- 针对这些调整,需要怎么最大程度利用预训练知识?
- 预训练和机器翻译的融合还有什么潜力?
In 1920, the great philosopher Bertrand Russell visited China, accompanied by Yuen Ren Chao, a Chinese-American linguist. Mr. Chao was a naturally gifted polyglot. At that time, he could already speak Baoding dialect, Wu dialect, Fuzhou dialect, Nanjing dialect, and English. He accompanied Russell from Shanghai to Changsha by ship. During the trip, he was learning Changsha dialect from Yang Ruiliu, an economist on the same ship. When the ship docked in Changsha, Yuen Ren Chao was already able to translate Russell's speeches and slang into Changsha dialect. Can our neural network become a model like "Yuen Ren Chao" on machine translation? That is, to create a unified model with multilingual abilities, and when encountering new languages, the model could quickly adapt to translating new ones after training with a small amount of data.
This blog presents an easy fix to the sentence embeddings learned by pre-trained language models. It is based on the paper: On the Sentence Embeddings from Pre-trained Language Models by Li et al EMNLP 2020.