This is not correct! Negation-aware Evaluation of Language Generation Systems

07/26/2023
by   Miriam Anschütz, et al.
0

Large language models underestimate the impact of negations on how much they change the meaning of a sentence. Therefore, learned evaluation metrics based on these models are insensitive to negations. In this paper, we propose NegBLEURT, a negation-aware version of the BLEURT evaluation metric. For that, we designed a rule-based sentence negation tool and used it to create the CANNOT negation evaluation dataset. Based on this dataset, we fine-tuned a sentence transformer and an evaluation metric to improve their negation sensitivity. Evaluating these models on existing benchmarks shows that our fine-tuned models outperform existing metrics on the negated sentences by far while preserving their base models' performances on other perturbations.

READ FULL TEXT
research
11/20/2022

Artificial Interrogation for Attributing Language Models

This paper presents solutions to the Machine Learning Model Attribution ...
research
09/12/2023

Learning to Predict Concept Ordering for Common Sense Generation

Prior work has shown that the ordering in which concepts are shown to a ...
research
08/23/2023

Evaluation of Faithfulness Using the Longest Supported Subsequence

As increasingly sophisticated language models emerge, their trustworthin...
research
05/18/2023

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

With growing capabilities of large language models, prompting them has b...
research
09/18/2023

Automatic Personalized Impression Generation for PET Reports Using Large Language Models

Purpose: To determine if fine-tuned large language models (LLMs) can gen...
research
07/19/2023

Generating Mathematical Derivations with Large Language Models

The derivation of mathematical results in specialised fields using Large...
research
06/15/2023

ChatGPT for Suicide Risk Assessment on Social Media: Quantitative Evaluation of Model Performance, Potentials and Limitations

This paper presents a novel framework for quantitatively evaluating the ...

Please sign up or login with your details

Forgot password? Click here to reset