Software and Toolbox

  • LightSeq: A High Performance Training and Inference Library for Transformer models. It is widely used for machine translation, text generation, visual recognition, and more. With the custom CUDA implementation, it achieves 10x speed-up over the original tensorflow seq2seq package, and faster than other implementations.
  • NeurST : A toolbok with readily available models for neural machine translation and speech-to-text translation.
  • BLOG: a probabilistic programming language for machine learning
  • Swift: a compiler for the probabilistic programming language BLOG.
  • DynaMMo: learning toolbox for multi-dimensional co-evolving time series. github page
  • CLDS: complex-valued linear dynamical system
  • PLiF: time-shift-invariant feature extraction for time series
  • BoLeRO: human motion capture occlution recovering
  • paralearn: a parallel algorithm for learning Markov models and linear dynamical systems (i.e. Kalman filter)
  • MLDS: learning dynamical model for tensor time series

Dataset

  • TTNews: a dataset for Chinese document summarization. 50,000 news articles with summary for training, and 4,000 news articles for testing. [Task description] [Training data] [Testing data and evaluation script] [Reports from NLPCC2017 and NLPCC2018]
  • CNewSum: an extended version of TTNews for Chinese document summarization. It includes 304,307 documents and human-written summaries. It includes additional adequacy-level and deducibility-level labels. [Project URL]
  • MLGSum: a multilingual text summarization corpus with 1.2 million articles in 12 languages. Average length per article is 570 words. [Project URL] [Data]

Please send me email if you find bugs or have comments!