Towards objectively evaluating the quality of generated medical summaries

04/09/2021
by   Francesco Moramarco, et al.
0

We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.

READ FULL TEXT
research
06/22/2020

Shared Task on Evaluating Accuracy in Natural Language Generation

We propose a shared task on methodologies and algorithms for evaluating ...
research
11/08/2020

A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

Most Natural Language Generation systems need to produce accurate texts....
research
10/12/2022

Perplexity from PLM Is Unreliable for Evaluating Text Quality

Recently, amounts of works utilize perplexity (PPL) to evaluate the qual...
research
03/15/2023

FactReranker: Fact-guided Reranker for Faithful Radiology Report Summarization

Automatic radiology report summarization is a crucial clinical task, who...
research
05/23/2023

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Evaluating the factuality of long-form text generated by large language ...
research
05/30/2019

Assessing The Factual Accuracy of Generated Text

We propose a model-based metric to estimate the factual accuracy of gene...

Please sign up or login with your details

Forgot password? Click here to reset