Constructing a vocabulary is a fisrt step for any NLP tasks. How can we efficiently learn an optimal vocabulary for machine translation? In this blog, I will explain the VOLT algorithm from the paper Vocabulary Leaning via Optimal Transport for Neural Machine Translation, which was awarded the Best Paper at ACL 2021.
Hello fellow readers! In this post, I would like to share a recent advance in the field of Machine Translation. Specifically, I will be presenting the paper Neural Machine Translation with Monolingual Translation Memory by Cai et al, which received one of the six distinguished paper awards from ACL 2021.
Self-training is a very prevalent semi-supervised method. Its key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. the pseudo-parallel data). Self-training has been widely used in classification tasks. However, will it work on sequence generation tasks (e.g. machine translation)? If so, how does it work? This blog introduces a work [1] which investigates these questions and gives the answers.
While the advance in deep learning has dramatically improved the machine translation quality, there is little development in the evaluation of machine translation models. The most widely-used metrics like BLEU [Papineni et al., 2002] and METEOR [Lavie and Denkowski, 2009] simply match the n-gram between the hypothesis text and reference text, which is too rigid without considering the variance in ground-truth translations and fail to differentiate the current highest performance machine translation models. They also cannot be accurately correlated with human judgment for a piece of text.
Upon its emergence, the Transformer Neural Networks [1] dominates the sequence-to-sequence tasks. It even outperforms the Google Neural Machine Translation model in specific tasks. Specifically, the multi-head attention mechanism that depends on element-wise dot-product is deemed as one of the critical building blocks to get things to work. But is it really that important?
Introduction
A typical neural machine translation (NMT) system needs to support the translation among various languages, that is, a multilingual(many-to-many) NMT system rather than only support the translation between two languages. However, to support the multilingual translation is still a challenge. One direct idea is to use a separate model to translation one language to another language, which is very easy to implement but brings high costs: To support the translation among N languages, we need train N(N-1)/2 separate models. Such a method does not allow the sharing of information across languages, which can result in overparameterization and sub-optimal in performance. We denote this method as per-language NMT
Can one build a neural machine translation model without parallel data?
In general, neural machine translation (NMT) requires a large amount of parallel data (e.g., EN->CN). However, it is not easy to collect enough high-quality parallelly-paired sentences for training the translation model. On the other hand, we can capture enormous plain text from Wikipedia or news articles for each specific language. In this paper, MGNMT tries to make good use of non-parallel data and boost the performance of NMT.
Machine translation has helped people daily life, and is also an important research topic especially in computer science community. It expands from one language translate to another language, speech translate to text, etc. Today, I'm going to talk about a paper "Generative Imagination Elevates Machine Translation". I'll cover the background, challenge and motivation behind this paper. Then I'll go through some technical details of this paper as well as some in-depth analysis of their experimental settings and results. Finally, we will discuss about the potential extension of this work. Hopefully, this would give you a better understanding of this area, and point out to a promising research direction.
How to develop a single unified model to translate from any language to any language? This work proposes a many-to-many translation system with emphasis on both English-centric and non-English directions. Many recent works have focused on proposing a single unified model for multiligual translation. These models are favorable because they are efficient and easy for deployment. However, most of these works focus on improving English-centric directions, which means that translation between two arbitrary languages may not be well supported. Therefore, in this paper, they propose a training method called mRASP2, including contrastive learning and alignment augmentation (AA) to train a unified multilingual translation system. They also contribute a monolingual dataset called MC24. By making use of monolingual and bilingual language copora, the system is able to learn language-agnostic representation to support non-English directions better than before. Their system achieves great performances and outperforms a strong Transformer baseline by a large margin.