EditEval: An Instruction-Based Benchmark for Text Improvements

09/27/2022
by   Jane Dwivedi-Yu, et al.
15

Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text. Writing, however, is naturally an iterative and incremental process that requires expertise in different modular skills such as fixing outdated information or making the style more consistent. Even so, comprehensive evaluation of a model's capacity to perform these skills and the ability to edit remains sparse. This work presents EditEval: An instruction-based, benchmark and evaluation suite that leverages high-quality existing and new datasets for automatic evaluation of editing capabilities such as making text more cohesive and paraphrasing. We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA, particularly when neutralizing and updating information. Our analysis also shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models. Through the release of this benchmark and a publicly available leaderboard challenge, we hope to unlock future research in developing models capable of iterative and more controllable editing.

READ FULL TEXT
research
09/20/2023

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Text editing is a crucial task that involves modifying text to better al...
research
11/03/2020

Data-to-Text Generation with Iterative Text Editing

We present a novel approach to data-to-text generation based on iterativ...
research
09/01/2023

Iterative Multi-granular Image Editing using Diffusion Models

Recent advances in text-guided image synthesis has dramatically changed ...
research
12/16/2021

FRUIT: Faithfully Reflecting Updated Information in Text

Textual knowledge bases such as Wikipedia require considerable effort to...
research
02/08/2023

GPTScore: Evaluate as You Desire

Generative Artificial Intelligence (AI) has enabled the development of s...
research
09/18/2023

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Text language models have shown remarkable zero-shot capability in gener...
research
08/19/2023

Eva-KELLM: A New Benchmark for Evaluating Knowledge Editing of LLMs

Large language models (LLMs) possess a wealth of knowledge encoded in th...

Please sign up or login with your details

Forgot password? Click here to reset