ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

05/03/2022
by   Junyi Li, et al.
1

Nowadays, pretrained language models (PLMs) have dominated the majority of NLP tasks. While, little research has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM). In our study, we design four evaluation dimensions, i.e. memory, comprehension, reasoning, and composition, to measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) PLMs with varying training objectives and strategies are good at different ability tests; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Moreover, the prediction results of PLMs in our experiments are released as an open resource for more deep and detailed analysis on the language abilities of PLMs. This paper can guide the future work to select, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://github.com/RUCAIBox/ElitePLM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2023

MatSci-NLP: Evaluating Scientific Language Models on Materials Science Language Tasks Using Text-to-Schema Modeling

We present MatSci-NLP, a natural language benchmark for evaluating the p...
research
08/18/2023

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Large language models (LLMs), such as GPT-4, have shown remarkable perfo...
research
10/22/2022

PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models

A wide range of NLP tasks benefit from the fine-tuning of pretrained lan...
research
03/23/2022

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

Prompt-based probing has been widely used in evaluating the abilities of...
research
09/01/2023

Taken out of context: On measuring situational awareness in LLMs

We aim to better understand the emergence of `situational awareness' in ...
research
08/07/2023

AgentBench: Evaluating LLMs as Agents

Large Language Models (LLMs) are becoming increasingly smart and autonom...
research
10/08/2020

Large Product Key Memory for Pretrained Language Models

Product key memory (PKM) proposed by Lample et al. (2019) enables to imp...

Please sign up or login with your details

Forgot password? Click here to reset