ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation

04/16/2021
by   Weizhen Qi, et al.
0

Now, the pre-training technique is ubiquitous in natural language processing field. ProphetNet is a pre-training based natural language generation method which shows powerful performance on English text summarization and question generation tasks. In this paper, we extend ProphetNet into other domains and languages, and present the ProphetNet family pre-training models, named ProphetNet-X, where X can be English, Chinese, Multi-lingual, and so on. We pre-train a cross-lingual generation model ProphetNet-Multi, a Chinese generation model ProphetNet-Zh, two open-domain dialog generation models ProphetNet-Dialog-En and ProphetNet-Dialog-Zh. And also, we provide a PLG (Programming Language Generation) model ProphetNet-Code to show the generation performance besides NLG (Natural Language Generation) tasks. In our experiments, ProphetNet-X models achieve new state-of-the-art performance on 10 benchmarks. All the models of ProphetNet-X share the same model structure, which allows users to easily switch between different models. We make the code and models publicly available, and we will keep updating more pre-training models and finetuning scripts. A video to introduce ProphetNet-X usage is also released.

READ FULL TEXT
research
10/18/2020

Mixed-Lingual Pre-training for Cross-lingual Summarization

Cross-lingual Summarization (CLS) aims at producing a summary in the tar...
research
04/03/2020

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

In this paper, we introduce XGLUE, a new benchmark dataset to train larg...
research
12/13/2022

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Software engineers working with the same programming language (PL) may s...
research
09/23/2020

Hierarchical Pre-training for Sequence Labelling in Spoken Dialog

Sequence labelling tasks like Dialog Act and Emotion/Sentiment identific...
research
09/14/2023

Automatic Data Visualization Generation from Chinese Natural Language Questions

Data visualization has emerged as an effective tool for getting insights...
research
09/05/2022

Multi-Figurative Language Generation

Figurative language generation is the task of reformulating a given text...
research
07/31/2023

Structural Transfer Learning in NL-to-Bash Semantic Parsers

Large-scale pre-training has made progress in many fields of natural lan...

Please sign up or login with your details

Forgot password? Click here to reset