WYWEB: A NLP Evaluation Benchmark For Classical Chinese

05/23/2023
by   Bo Zhou, et al.
0

To fully evaluate the overall performance of different NLP models in a given domain, many evaluation benchmarks are proposed, such as GLUE, SuperGLUE and CLUE. The fi eld of natural language understanding has traditionally focused on benchmarks for various tasks in languages such as Chinese, English, and multilingua, however, there has been a lack of attention given to the area of classical Chinese, also known as "wen yan wen", which has a rich history spanning thousands of years and holds signifi cant cultural and academic value. For the prosperity of the NLP community, in this paper, we introduce the WYWEB evaluation benchmark, which consists of nine NLP tasks in classical Chinese, implementing sentence classifi cation, sequence labeling, reading comprehension, and machine translation. We evaluate the existing pre-trained language models, which are all struggling with this benchmark. We also introduce a number of supplementary datasets and additional tools to help facilitate further progress on classical Chinese NLU. The github repository is https://github.com/baudzhou/WYWEB.

READ FULL TEXT

page 6

page 9

page 14

page 17

page 18

page 19

page 22

research
05/22/2023

Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods by Language Models

Recent studies in natural language processing (NLP) have focused on mode...
research
11/23/2022

This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish

The availability of compute and data to train larger and larger language...
research
11/29/2020

Intrinsic Knowledge Evaluation on Chinese Language Models

Recent NLP tasks have benefited a lot from pre-trained language models (...
research
05/23/2022

A Fine-grained Interpretability Evaluation Benchmark for Neural NLP

While there is increasing concern about the interpretability of neural m...
research
02/15/2022

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

In the last year, new neural architectures and multilingual pre-trained ...
research
10/14/2021

Understanding Model Robustness to User-generated Noisy Texts

Sensitivity of deep-neural models to input noise is known to be a challe...
research
02/26/2022

QuoteR: A Benchmark of Quote Recommendation for Writing

It is very common to use quotations (quotes) to make our writings more e...

Please sign up or login with your details

Forgot password? Click here to reset