AnchiBERT: A Pre-Trained Model for Ancient ChineseLanguage Understanding and Generation

09/24/2020
by   Huishuang Tian, et al.
0

Ancient Chinese is the essence of Chinese culture. There are several natural language processing tasks of ancient Chinese domain, such as ancient-modern Chinese translation, poem generation, and couplet generation. Previous studies usually use the supervised models which deeply rely on parallel data. However, it is difficult to obtain large-scale parallel data of ancient Chinese. In order to make full use of the more easily available monolingual ancient Chinese corpora, we release AnchiBERT, a pre-trained language model based on the architecture of BERT, which is trained on large-scale ancient Chinese corpora. We evaluate AnchiBERT on both language understanding and generation tasks, including poem classification, ancient-modern Chinese translation, poem generation, and couplet generation. The experimental results show that AnchiBERT outperforms BERT as well as the non-pretrained models and achieves state-of-the-art results in all cases.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

08/31/2019

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

The pre-trained language models have achieved great successes in various...
03/03/2020

CLUECorpus2020: A Large-scale Chinese Corpus for Pre-trainingLanguage Model

In this paper, we introduce the Chinese corpus from CLUE organization, C...
03/03/2020

CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model

In this paper, we introduce the Chinese corpus from CLUE organization, C...
10/12/2020

OCNLI: Original Chinese Natural Language Inference

Despite the tremendous recent progress on natural language inference (NL...
02/01/2021

Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning

The majority of Chinese characters are monophonic, i.e.their pronunciati...
11/16/2021

An Empirical Study of Finding Similar Exercises

Education artificial intelligence aims to profit tasks in the education ...
08/27/2020

AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization

Pre-trained language models such as BERT have exhibited remarkable perfo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.