KLUE: Korean Language Understanding Evaluation

05/20/2021
by   Sungjoon Park, et al.
20

We introduce Korean Language Understanding Evaluation (KLUE) benchmark. KLUE is a collection of 8 Korean natural language understanding (NLU) tasks, including Topic Classification, SemanticTextual Similarity, Natural Language Inference, Named Entity Recognition, Relation Extraction, Dependency Parsing, Machine Reading Comprehension, and Dialogue State Tracking. We build all of the tasks from scratch from diverse source corpora while respecting copyrights, to ensure accessibility for anyone without any restrictions. With ethical considerations in mind, we carefully design annotation protocols. Along with the benchmark tasks and data, we provide suitable evaluation metrics and fine-tuning recipes for pretrained language models for each task. We furthermore release the pretrained language models (PLM), KLUE-BERT and KLUE-RoBERTa, to help reproducing baseline models on KLUE and thereby facilitate future research. We make a few interesting observations from the preliminary experiments using the proposed KLUE benchmark suite, already demonstrating the usefulness of this new benchmark suite. First, we find KLUE-RoBERTa-large outperforms other baselines, including multilingual PLMs and existing open-source Korean PLMs. Second, we see minimal degradation in performance even when we replace personally identifiable information from the pretraining corpus, suggesting that privacy and NLU capability are not at odds with each other. Lastly, we find that using BPE tokenization in combination with morpheme-level pre-tokenization is effective in tasks involving morpheme-level tagging, detection and generation. In addition to accelerating Korean NLP research, our comprehensive documentation on creating KLUE will facilitate creating similar resources for other languages in the future. KLUE is available at <a class="link-external link-https" href="https://klue-benchmark.com/">this URL</a>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2020

PhoBERT: Pre-trained language models for Vietnamese

We present PhoBERT with two versions of "base" and "large"–the first pub...
research
05/11/2023

GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark

With a fast developing pace of geographic applications, automatable and ...
research
04/09/2023

Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding

Large language models (LLMs) have made significant progress in various d...
research
05/01/2020

KLEJ: Comprehensive Benchmark for Polish Language Understanding

In recent years, a series of Transformer-based models unlocked major imp...
research
12/21/2022

ORCA: A Challenging Benchmark for Arabic Language Understanding

Due to their crucial role in all NLP, several benchmarks have been propo...
research
01/17/2022

RuMedBench: A Russian Medical Language Understanding Benchmark

The paper describes the open Russian medical language understanding benc...
research
06/26/2023

Enriching the NArabizi Treebank: A Multifaceted Approach to Supporting an Under-Resourced Language

In this paper we address the scarcity of annotated data for NArabizi, a ...

Please sign up or login with your details

Forgot password? Click here to reset