LMentry: A Language Model Benchmark of Elementary Language Tasks

11/03/2022
by   Avia Efrat, et al.
0

As the performance of large language models rapidly improves, benchmarks are getting larger and more complex as well. We present LMentry, a benchmark that avoids this "arms race" by focusing on a compact set of tasks that are trivial to humans, e.g. writing a sentence containing a specific word, identifying which words in a list belong to a specific category, or choosing which of two words is longer. LMentry is specifically designed to provide quick and interpretable insights into the capabilities and robustness of large language models. Our experiments reveal a wide variety of failure cases that, while immediately obvious to humans, pose a considerable challenge for large language models, including OpenAI's latest 175B-parameter instruction-tuned model, TextDavinci002. LMentry complements contemporary evaluation approaches of large language models, providing a quick, automatic, and easy-to-run "unit test", without resorting to large benchmark suites of complex tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2022

Collateral facilitation in humans and language models

Are the predictions of humans and language models affected by similar th...
research
10/14/2021

Sparks: Inspiration for Science Writing using Language Models

Large-scale language models are rapidly improving, performing well on a ...
research
05/21/2023

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

Large language models have demonstrated remarkable performance across va...
research
07/17/2023

COLLIE: Systematic Construction of Constrained Text Generation Tasks

Text generation under constraints have seen increasing interests in natu...
research
05/24/2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

The explosive growth of language models and their applications have led ...
research
05/30/2023

Scalable Performance Analysis for Vision-Language Models

Joint vision-language models have shown great performance over a diverse...
research
05/24/2023

ClusterLLM: Large Language Models as a Guide for Text Clustering

We introduce ClusterLLM, a novel text clustering framework that leverage...

Please sign up or login with your details

Forgot password? Click here to reset