GPTScore: Evaluate as You Desire

02/08/2023
by   Jinlan Fu, et al.
0

Generative Artificial Intelligence (AI) has enabled the development of sophisticated models that are capable of producing high-caliber text, images, and other outputs through the utilization of large pre-trained models. Nevertheless, assessing the quality of the generation is an even more arduous task than the generation itself, and this issue has not been given adequate consideration recently. This paper proposes a novel evaluation framework, GPTScore, which utilizes the emergent abilities (e.g., zero-shot instruction) of generative pre-trained models to score generated texts. There are 19 pre-trained models explored in this paper, ranging in size from 80M (e.g., FLAN-T5-small) to 175B (e.g., GPT3). Experimental results on four text generation tasks, 22 evaluation aspects, and corresponding 37 datasets demonstrate that this approach can effectively allow us to achieve what one desires to evaluate for texts simply by natural language instructions. This nature helps us overcome several long-standing challenges in text evaluation–how to achieve customized, multi-faceted evaluation without the need for annotated samples. We make our code publicly available at https://github.com/jinlanfu/GPTScore.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2020

iNLTK: Natural Language Toolkit for Indic Languages

We present iNLTK, an open-source NLP library consisting of pre-trained l...
research
07/25/2023

FacTool: Factuality Detection in Generative AI – A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

The emergence of generative pre-trained models has facilitated the synth...
research
05/08/2023

Code Execution with Pre-trained Language Models

Code execution is a fundamental aspect of programming language semantics...
research
07/15/2021

Robust Learning for Text Classification with Multi-source Noise Simulation and Hard Example Mining

Many real-world applications involve the use of Optical Character Recogn...
research
03/05/2020

RecipeGPT: Generative Pre-training Based Cooking Recipe Generation and Evaluation System

Interests in the automatic generation of cooking recipes have been growi...
research
09/27/2022

EditEval: An Instruction-Based Benchmark for Text Improvements

Evaluation of text generation to date has primarily focused on content c...
research
05/08/2023

Learning to Evaluate the Artness of AI-generated Images

Assessing the artness of AI-generated images continues to be a challenge...

Please sign up or login with your details

Forgot password? Click here to reset