Software and Toolbox

  • LightSeq: A High Performance Inference Library for Sequence Modelling, Generation, and Machine Translation. With the custom CUDA implementation, it achieves 10x speed-up over the original tensorflow seq2seq package, and faster than other implementations.
  • NeurST : A toolbok with readily available models for neural machine translation and speech-to-text translation.
  • BLOG: a probabilistic programming language for machine learning
  • Swift: a compiler for the probabilistic programming language BLOG.
  • DynaMMo: learning toolbox for multi-dimensional co-evolving time series. github page
  • CLDS: complex-valued linear dynamical system
  • PLiF: time-shift-invariant feature extraction for time series
  • BoLeRO: human motion capture occlution recovering
  • paralearn: a parallel algorithm for learning Markov models and linear dynamical systems (i.e. Kalman filter)
  • MLDS: learning dynamical model for tensor time series


  • TTNews: a dataset for Chinese document summarization. 50,000 news articles with summary for training, and 4,000 news articles for testing. [Task description] [Training data] [Testing data and evaluation script] [Reports from NLPCC2017 and NLPCC2018]
  • CNewSum: an extended version of TTNews for Chinese document summarization. It includes 304,307 documents and human-written summaries. It includes additional adequacy-level and deducibility-level labels. [Project URL]
  • MLGSum: a multilingual text summarization corpus with 1.2 million articles in 12 languages. Average length per article is 570 words. [Project URL] [Data]

Please send me email if you find bugs or have comments!