BERTScore: Evaluating Text Generation with BERT

04/21/2019
by   Tianyi Zhang, et al.
0

We propose BERTScore, an automatic evaluation metric for text generation. Analogous to common metrics, computes a similarity score for each token in the candidate sentence with each token in the reference. However, instead of looking for exact matches, we compute similarity using contextualized BERT embeddings. We evaluate on several machine translation and image captioning benchmarks, and show that BERTScore correlates better with human judgments than existing metrics, often significantly outperforming even task-specific supervised metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2020

NUBIA: NeUral Based Interchangeability Assessor for Text Generation

We present NUBIA, a methodology to build automatic evaluation metrics fo...
research
06/10/2021

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

Automatic evaluations for natural language generation (NLG) conventional...
research
11/01/2019

Kernelized Bayesian Softmax for Text Generation

Neural models for text generation require a softmax layer with proper to...
research
02/10/2023

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Since the rise of neural models of code that can generate long expressio...
research
08/06/2021

Sentence Semantic Regression for Text Generation

Recall the classical text generation works, the generation framework can...
research
10/13/2020

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

Recent advances in automatic evaluation metrics for text have shown that...
research
08/27/2021

Automatic Text Evaluation through the Lens of Wasserstein Barycenters

A new metric to evaluate text generation based on deep contextualized e...

Please sign up or login with your details

Forgot password? Click here to reset