Skip to main content
Break the Limitation of Training Data — A Better Encoder Enhanced by BERT for Speech Translation

Speech translation (ST) has increasing demand in our daily life and work. Applications like travel assistant, simultaneous conference translation and movie subtitling can highly reduce translation costs. Building a ST system that can understand and directly translate acoustic speech signals into text in a target language is challenging. For example, people do not always premeditate what they are going to say. Not like text translation, ST lacks completed organization sometimes. Another part is that the parallel corpus for ST is not enough, compared to the MT task. Especially, most ST methods are limited by the amount of parallel corpus.


Zichen ChenAbout 5 minSTDL4MTSpeech TranslationBERT
Learned Metrics for Machine Translation

How to automatically evaluate the quality of a machine translation system? Human evaluation is accurate, but expensive. It is not suitable for MT model development.

Reading Time: About 15 minutes.


Huake HeAbout 12 minMTDL4MTMT EvaluationBERTScoreCOMETBERT
机器翻译中的 BERT 应用

​ 预训练技术,比如 BERT等,在自然语言处理领域,尤其是自然语言理解任务取得了巨大的成功。然而目前预训练技术在文本生成领域,比如机器翻译领域,能够取得什么样的效果,还是一个开放问题。CTNMT 这篇论文,从三个方面介绍这个问题:

  1. 预训练技术,比如 BERT或者 GPT 在机器翻译中的应用存在什么挑战?
  2. 针对这些调整,需要怎么最大程度利用预训练知识?
  3. 预训练和机器翻译的融合还有什么潜力?

王明轩About 6 minMTBERTPre-trainingCatastrophic Forgetting
What is the problem with BERT embeddings and how to fix them?

This blog presents an easy fix to the sentence embeddings learned by pre-trained language models. It is based on the paper: On the Sentence Embeddings from Pre-trained Language Models by Li et al EMNLP 2020.


Bohan LiAbout 3 minNLPPre-trainingBERTEmbedding