DeepAI AI Chat
Log In Sign Up

FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark

by   Liang Xu, et al.

Pretrained Language Models (PLMs) have achieved tremendous success in natural language understanding tasks. While different learning schemes – fine-tuning, zero-shot and few-shot learning – have been widely explored and compared for languages such as English, there is comparatively little work in Chinese to fairly and comprehensively evaluate and compare these methods. This work first introduces Chinese Few-shot Learning Evaluation Benchmark (FewCLUE), the first comprehensive small sample evaluation benchmark in Chinese. It includes nine tasks, ranging from single-sentence and sentence-pair classification tasks to machine reading comprehension tasks. Given the high variance of the few-shot learning performance, we provide multiple training/validation sets to facilitate a more accurate and stable evaluation of few-shot modeling. An unlabeled training set with up to 20,000 additional samples per task is provided, allowing researchers to explore better ways of using unlabeled samples. Next, we implement a set of state-of-the-art (SOTA) few-shot learning methods (including PET, ADAPET, LM-BFF, P-tuning and EFL), and compare their performance with fine-tuning and zero-shot learning schemes on the newly constructed FewCLUE benchmark.Our results show that: 1) all five few-shot learning methods exhibit better performance than fine-tuning or zero-shot learning; 2) among the five methods, PET is the best performing few-shot method; 3) few-shot learning performance is highly dependent on the specific task. Our benchmark and code are available at


page 1

page 2

page 3

page 4


Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

Pretrained language models (PLMs) have demonstrated remarkable performan...

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

Most recent progress in natural language understanding (NLU) has been dr...

AdaptKeyBERT: An Attention-Based approach towards Few-Shot Zero-Shot Domain Adaptation of KeyBERT

Keyword extraction has been an important topic for modern natural langua...

TEMPERA: Test-Time Prompting via Reinforcement Learning

Careful prompt design is critical to the use of large language models in...

Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics

Reinforcement learning algorithms such as Q-learning have shown great pr...

GPT Understands, Too

While GPTs with traditional fine-tuning fail to achieve strong results o...

FHIST: A Benchmark for Few-shot Classification of Histological Images

Few-shot learning has recently attracted wide interest in image classifi...