BERTScore: Evaluating Text Generation with BERT

04/21/2019

∙

We propose BERTScore, an automatic evaluation metric for text generation. Analogous to common metrics, computes a similarity score for each token in the candidate sentence with each token in the reference. However, instead of looking for exact matches, we compute similarity using contextualized BERT embeddings. We evaluate on several machine translation and image captioning benchmarks, and show that BERTScore correlates better with human judgments than existing metrics, often significantly outperforming even task-specific supervised metrics.

READ FULL TEXT

BERTScore: Evaluating Text Generation with BERT

Sign in with Google

Consider DeepAI Pro