BERTScore: Evaluating Text Generation with BERT

04/21/2019
by   Tianyi Zhang, et al.
ASAPP INC
cornell university
0

We propose BERTScore, an automatic evaluation metric for text generation. Analogous to common metrics, computes a similarity score for each token in the candidate sentence with each token in the reference. However, instead of looking for exact matches, we compute similarity using contextualized BERT embeddings. We evaluate on several machine translation and image captioning benchmarks, and show that BERTScore correlates better with human judgments than existing metrics, often significantly outperforming even task-specific supervised metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/30/2020

NUBIA: NeUral Based Interchangeability Assessor for Text Generation

We present NUBIA, a methodology to build automatic evaluation metrics fo...
06/10/2021

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

Automatic evaluations for natural language generation (NLG) conventional...
11/01/2019

Kernelized Bayesian Softmax for Text Generation

Neural models for text generation require a softmax layer with proper to...
02/10/2023

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Since the rise of neural models of code that can generate long expressio...
08/06/2021

Sentence Semantic Regression for Text Generation

Recall the classical text generation works, the generation framework can...
10/13/2020

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

Recent advances in automatic evaluation metrics for text have shown that...
08/27/2021

Automatic Text Evaluation through the Lens of Wasserstein Barycenters

A new metric to evaluate text generation based on deep contextualized e...

Code Repositories

bert_score

BERT score for language generation


view repo

Please sign up or login with your details

Forgot password? Click here to reset