XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

09/20/2023
by   Haopeng Zhang, et al.
0

Text editing is a crucial task that involves modifying text to better align with user intents. However, existing text editing benchmark datasets have limitations in providing only coarse-grained instructions. Consequently, although the edited output may seem reasonable, it often deviates from the intended changes outlined in the gold reference, resulting in low evaluation scores. To comprehensively investigate the text editing capabilities of large language models, this paper introduces XATU, the first benchmark specifically designed for fine-grained instruction-based explainable text editing. XATU covers a wide range of topics and text types, incorporating lexical, syntactic, semantic, and knowledge-intensive edits. To enhance interpretability, we leverage high-quality data sources and human annotation, resulting in a benchmark that includes fine-grained instructions and gold-standard edit explanations. By evaluating existing open and closed large language models against our benchmark, we demonstrate the effectiveness of instruction tuning and the impact of underlying architecture across various editing tasks. Furthermore, extensive experimentation reveals the significant role of explanations in fine-tuning language models for text editing tasks. The benchmark will be open-sourced to support reproduction and facilitate future research.

READ FULL TEXT
research
05/29/2023

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Recent works have explored text-guided image editing using diffusion mod...
research
09/27/2022

EditEval: An Instruction-Based Benchmark for Text Improvements

Evaluation of text generation to date has primarily focused on content c...
research
04/23/2023

Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness

The capability of Large Language Models (LLMs) like ChatGPT to comprehen...
research
03/16/2023

HIVE: Harnessing Human Feedback for Instructional Visual Editing

Incorporating human feedback has been shown to be crucial to align text ...
research
05/23/2023

Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA

Large language models (e.g., GPT-3.5) are uniquely capable of producing ...
research
01/01/2023

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

We present Second Thought, a new learning paradigm that enables language...
research
05/27/2023

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

Recent model editing techniques promise to mitigate the problem of memor...

Please sign up or login with your details

Forgot password? Click here to reset