Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

Recent model editing techniques promise to mitigate the problem of memorizing false or outdated associations during LLM training. However, we show that these techniques can introduce large unwanted side effects which are not detected by existing specificity benchmarks. We extend the existing CounterFact benchmark to include a dynamic component and dub our benchmark CounterFact+. Additionally, we extend the metrics used for measuring specificity by a principled KL divergence-based metric. We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity. Our findings highlight the need for improved specificity benchmarks that identify and prevent unwanted side effects.

READ FULL TEXT

page 8

page 9

page 10

research
07/24/2023

Evaluating the Ripple Effects of Knowledge Editing in Language Models

Modern language models capture a large body of factual knowledge. Howeve...
research
12/20/2022

Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

Generated texts from large pretrained language models have been shown to...
research
05/24/2023

PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions

The remarkable capabilities of large language models have been accompani...
research
05/22/2023

Editing Large Language Models: Problems, Methods, and Opportunities

Recent advancements in deep learning have precipitated the emergence of ...
research
09/20/2023

XATU: A Fine-grained Instruction-based Benchmark for Explainable Text Updates

Text editing is a crucial task that involves modifying text to better al...
research
10/13/2022

Mass-Editing Memory in a Transformer

Recent work has shown exciting promise in updating large language models...
research
05/23/2023

LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

With the recent appearance of LLMs in practical settings, having methods...

Please sign up or login with your details

Forgot password? Click here to reset