Constructing a vocabulary is a fisrt step for any NLP tasks. How can we efficiently learn an optimal vocabulary for machine translation? In this blog, I will explain the VOLT algorithm from the paper Vocabulary Leaning via Optimal Transport for Neural Machine Translation, which was awarded the best paper at ACL 2021.

Reading time: About 8 minutes

5/17/2022 Multilingual MTVocabulary LearningOptimal Transport

How to develop a model to verify a natural language statement while explaining its rationale?

Reading Time: About 10 minutes.

4/8/2022 Fact veraficationReasoningLogic-regularized neural networkInterpretable NLP

A high performance open-source library for NLP Transformer model training and inferencing.

12/10/2021 TransformerGPU AccelerationCUDAHigh performance computing
12/8/2021 Translation Memory

Self-training is a very prevalent semi-supervised method. Its key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. the pseudo-parallel data). Self-training has been widely used in classification tasks. However, will it work on sequence generation tasks (e.g. machine translation)? If so, how does it work? This blog introduces a work [1] which investigates these questions and gives the answers.

12/5/2021 Self-training
12/1/2021 MT EvaluationPre-trained Language Model

Speech translation (ST) has increasing demand in our daily life and work. Applications like travel assistant, simultaneous conference translation and movie subtitling can highly reduce translation costs. Building a ST system that can understand and directly translate acoustic speech signals into text in a target language is challenging. For example, people do not always premeditate what they are going to say. Not like text translation, ST lacks completed organization sometimes. Another part is that the parallel corpus for ST is not enough, compared to the MT task. Especially, most ST methods are limited by the amount of parallel corpus.

11/30/2021 Speech TranslationBERT

​Upon its emergence, the Transformer Neural Networks [1] dominates the sequence-to-sequence tasks. It even outperforms the Google Neural Machine Translation model in specific tasks. Specifically, the multi-head attention mechanism that depends on element-wise dot-product is deemed as one of the critical building blocks to get things to work. But is it really that important?

11/29/2021 TransformerRecurrent Attention

Can one build a neural machine translation model without parallel data?

11/28/2021 Unsupervised Machine Translation
123 Next Jump To Go

Lei Li







AllSpeech TranslationShared Semantic MemoryChimeraMT EvaluationPre-trained Language ModelMultilingual MTGPTImaginationVisual Machine TranslationImagiTModel CapacityLanguage-specific Sub-networkTransformerGPU AccelerationCUDAHigh performance computingBERTVariational InferenceLatent Variable ModelSemi-supervised Machine TranslationContrastive LearningZero-shot TranslationmRASPBERTScoreCOMETTranslation MemoryRecurrent AttentionSelf-trainingUnsupervised Machine TranslationMTPre-trainingVocabulary LearningOptimal TransportRelation extractionEmbeddingFact veraficationReasoningLogic-regularized neural networkInterpretable NLP

Friend Links