ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding

05/23/2023
by   Uri Shaham, et al.
0

We introduce ZeroSCROLLS, a zero-shot benchmark for natural language understanding over long texts, which contains only test sets, without training or development data. We adapt six tasks from the SCROLLS benchmark, and add four new datasets, including two novel information fusing tasks, such as aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a comprehensive evaluation of both open-source and closed large language models, finding that Claude outperforms ChatGPT, and that GPT-4 achieves the highest average score. However, there is still room for improvement on multiple open challenges in ZeroSCROLLS, such as aggregation tasks, where models struggle to pass the naive baseline. As the state of the art is a moving target, we invite researchers to evaluate their ideas on the live ZeroSCROLLS leaderboard

READ FULL TEXT
research
04/09/2023

A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

Zero-shot dialogue understanding aims to enable dialogue to track the us...
research
05/26/2020

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

Intermediate-task training has been shown to substantially improve pretr...
research
12/10/2019

Zero-shot Text Classification With Generative Language Models

This work investigates the use of natural language to enable zero-shot m...
research
07/15/2023

Zero-shot NLG evaluation through Pairware Comparisons with LLMs

Evaluating Natural Language Generation (NLG) outputs is crucial but labo...
research
12/10/2022

OpenD: A Benchmark for Language-Driven Door and Drawer Opening

We introduce OPEND, a benchmark for learning how to use a hand to open c...
research
04/25/2023

Measuring Massive Multitask Chinese Understanding

The development of large-scale Chinese language models is flourishing, y...
research
03/18/2023

A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models

GPT series models, such as GPT-3, CodeX, InstructGPT, ChatGPT, and so on...

Please sign up or login with your details

Forgot password? Click here to reset