清华NLP | 28篇NLP预训练模型必读论文,可下载源码+论文

清华NLP | 28篇NLP预训练模型必读论文,可下载源码+论文
本文介绍清华大学NLP在Github项目thunlp/PLMpapers中给出的预训练语言模型必读论文清单,包含论文的PDF链接、源码和模型等。

项目地址(点击阅读原文直接访问):
  • https://github.com/thunlp/PLMpapers

清单如下(完整版请点击阅读原文):

  1. Deep contextualized word representationsMatthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer. NAACL 2018.

  • 论文:https://arxiv.org/pdf/1802.05365.pdf

  • 工程:https://allennlp.org/elmo (ELMo)

Universal Language Model Fine-tuning for Text ClassificationJeremy Howard and Sebastian Ruder. ACL 2018.

  • 论文:https://www.aclweb.org/anthology/P18-1031

  • 工程:http://nlp.fast.ai/category/classification.html (ULMFiT)

Improving Language Understanding by Generative Pre-TrainingAlec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. Preprint.

  • 论文:https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf

  • 工程:https://openai.com/blog/language-unsupervised/ (GPT)

BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingJacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. NAACL 2019.

  • 论文:https://arxiv.org/pdf/1810.04805.pdf

  • 代码+模型:https://github.com/google-research/bert

Language Models are Unsupervised Multitask LearnersAlec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. Preprint.

  • 论文:https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

  • 代码:https://github.com/openai/gpt-2 (GPT-2)

ERNIE: Enhanced Language Representation with Informative EntitiesZhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun and Qun Liu. ACL2019.

  • 论文:https://www.aclweb.org/anthology/P19-1139

  • 代码+模型:https://github.com/thunlp/ERNIE (ERNIE (Tsinghua) )

ERNIE: Enhanced Representation through Knowledge IntegrationYu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian and Hua Wu. Preprint.

  • 论文:https://arxiv.org/pdf/1904.09223.pdf

  • 代码:https://github.com/PaddlePaddle/ERNIE/tree/develop/ERNIE (ERNIE (Baidu) )

Defending Against Neural Fake NewsRowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi. NeurIPS.

  • 论文:https://arxiv.org/pdf/1905.12616.pdf

  • 工程:https://rowanzellers.com/grover/ (Grover)

Cross-lingual Language Model PretrainingGuillaume Lample, Alexis Conneau. NeurIPS2019.

  • 论文:https://arxiv.org/pdf/1901.07291.pdf

  • 代码+模型:https://github.com/facebookresearch/XLM (XLM)

Multi-Task Deep Neural Networks for Natural Language UnderstandingXiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. ACL2019.

  • 论文:https://www.aclweb.org/anthology/P19-1441

  • 代码+模型:https://github.com/namisan/mt-dnn (MT-DNN)

MASS: Masked Sequence to Sequence Pre-training for Language GenerationKaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. ICML2019.

  • 论文:https://arxiv.org/pdf/1905.02450.pdf

  • 代码+模型:https://github.com/microsoft/MASS

Unified Language Model Pre-training for Natural Language Understanding and GenerationLi Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon. Preprint.

  • 论文:https://arxiv.org/pdf/1905.03197.pdf (UniLM)

XLNet: Generalized Autoregressive Pretraining for Language UnderstandingZhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. NeurIPS2019.

  • 论文:https://arxiv.org/pdf/1906.08237.pdf

  • 代码+模型:https://github.com/zihangdai/xlnet

RoBERTa: A Robustly Optimized BERT Pretraining ApproachYinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. Preprint.

  • 论文:https://arxiv.org/pdf/1907.11692.pdf

  • 代码+模型:https://github.com/pytorch/fairseq

SpanBERT: Improving Pre-training by Representing and Predicting SpansMandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy. Preprint.

  • 论文:https://arxiv.org/pdf/1907.10529.pdf

  • 代码+模型:https://github.com/facebookresearch/SpanBERT

Knowledge Enhanced Contextual Word RepresentationsMatthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith. EMNLP2019.

  • 论文:https://arxiv.org/pdf/1909.04164.pdf (KnowBert)

VisualBERT: A Simple and Performant Baseline for Vision and LanguageLiunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. Preprint.

  • 论文:https://arxiv.org/pdf/1908.03557.pdf

  • 代码+模型:https://github.com/uclanlp/visualbert

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language TasksJiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee. NeurIPS.

  • 论文:https://arxiv.org/pdf/1908.02265.pdf

  • 代码+模型:https://github.com/jiasenlu/vilbert_beta

VideoBERT: A Joint Model for Video and Language Representation LearningChen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid. ICCV2019.

  • 论文:https://arxiv.org/pdf/1904.01766.pdf

LXMERT: Learning Cross-Modality Encoder Representations from TransformersHao Tan, Mohit Bansal. EMNLP2019.

  • 论文:https://arxiv.org/pdf/1908.07490.pdf

  • 代码+模型:https://github.com/airsplay/lxmert

VL-BERT: Pre-training of Generic Visual-Linguistic RepresentationsWeijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. Preprint.

  • 论文:https://arxiv.org/pdf/1908.08530.pdf

Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-trainingGen Li, Nan Duan, Yuejian Fang, Ming Gong, Daxin Jiang, Ming Zhou. Preprint.

  • 论文:https://arxiv.org/pdf/1908.06066.pdf

K-BERT: Enabling Language Representation with Knowledge GraphWeijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang. Preprint.

  • 论文:https://arxiv.org/pdf/1909.07606.pdf

Fusion of Detected Objects in Text for Visual Question AnsweringChris Alberti, Jeffrey Ling, Michael Collins, David Reitter. EMNLP2019.

  • 论文:https://arxiv.org/pdf/1908.05054.pdf (B2T2)

Contrastive Bidirectional Transformer for Temporal Representation LearningChen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid. Preprint.

  • 论文:https://arxiv.org/pdf/1906.05743.pdf (CBT)

ERNIE 2.0: A Continual Pre-training Framework for Language UnderstandingYu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang. Preprint.

  • 论文:https://arxiv.org/pdf/1907.12412v1.pdf

  • 代码:https://github.com/PaddlePaddle/ERNIE/blob/develop/README.md

75 Languages, 1 Model: Parsing Universal Dependencies UniversallyDan Kondratyuk, Milan Straka. EMNLP2019.

  • 论文:https://arxiv.org/pdf/1904.02099.pdf

  • 代码+模型:https://github.com/hyperparticle/udify (UDify)

Pre-Training with Whole Word Masking for Chinese BERTYiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu. Preprint.

  • 论文:https://arxiv.org/pdf/1906.08101.pdf

  • 代码+模型:https://github.com/ymcui/Chinese-BERT-wwm/blob/master/README_EN.md (Chinese-BERT-wwm)

推荐阅读