Lei Li

I am a research scientist working at ByteDance AI Lab. I am developing scalable algorithms to learn and mine knowledge from data, with applications in NLP, machine translation, time series analysis, AI drug discovery, and robot learning.

You are welcome to visit our lab located in the center of Mountain View, California, as well as offices in Beijing and Shanghai.
We have multiple open positions of researchers, software engineers, and interns on machine learning, NLP, AI Drug Discovery, and robotics. (email me at the address below)


  • One paper accepted to InterSpeech 2021, about multi-task progressive pretraining for speech translation, achieving new SOTA results on MuST-C benchmarks.
  • 11 papers to appear at ACL 2021 (6 long, 4 findings, and 1 system demo). Strong results in machine translation and speech translation. Other topics include parallel generation, reasoning, summarization and information extraction.
  • 1 paper is accepted to ICML 2021 about long horizon skill learning.
  • 4 papers (1 main and 3 industry) are presenting at NAACL 2021. Check out the long paper about how visual imagination will influence machine translation capability.
  • Four papers on object detection and segmentation are accepted to CVPR 2021, including Sparse R-CNN, DenseCL, Locate-Segment, Auto-Augment. DenCL is accepted as Oral.
  • The paper on finding proper molecules for drug is accepted to ICLR 2021 with the spotlight presentation!
  • Six papers are accepted to AAAI 2021, about end-to-end speech translation, knowledge graph completion, optimization, text generation.
  • One paper about new method to generate query-relevant bidwords for search advertising is accepted to WSDM 2021.
  • SOLOv2 is out! One paper about faster object instance segmentation in images is accepted to NeurIPS 2020.
  • Winner of 5 tasks in WMT20 Machine Translation Contest on Chinese-English, German-English, French-German, English-Khmer, English-Pashto languages. Winner of the WMT20 parallel data filtering task on Khmer and Pashto languages.
  • 5 papers accepted to EMNLP 2020! 3 in Long track and 2 in Findings.
  • SOLO paper accepted to ECCV 2020, achieving the SOTA in visual object instance segmentation.
  • 1 paper accepted to ICML 2020, about solving a family of deep latent models (exponential family mixture VAEs).
  • 1 paper and 1 demo accepted to ACL 2020, about tailoring pretrained language model and the robot reporter Xiaomingbot.
  • I am giving a talk at ICLR 2020 about Learning Deep Latent Models for Text Sequences. You may watch here.
  • 1 paper accepted to AIStats 2020, about density ratio estimation for text generation.
  • 2 papers accepted at ICLR 2020, about mirror generative model to unite language modelling and machine translation, and learning data-to-text generation templates via a variational method even without parallel corpus.
  •  4 papers accepted at AAAI 2020, about pretraining method for neural machine translation, text editing, and approximate second order optimization.
  • 1 paper accepted at NeurIPS 2019, about contextualized embedding for text generation and how we use kernels to model distribution and variance of word embeddings. see you in Vancouver.
  • EMNLP 2019 Tutorial on Discreteness in NLP
  • 1 paper accepted at INLG 2019. It is about the style transfer for text generation .
  • 1 paper accepted at EMNLP 2019, about linear time neural machine translation.
  • 2 papers accepted at ICCV 2019. One is to be presented as an Oral talk.
  • Dr. Hao Zhou and I are going to give a tutorial on deep generative models for text generation at NLPCC-ADL 2019 at Dunhuang, China.

Media Coverage


Email: <the first part of this website> +  gmail server address.