Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications

05/13/2022
by   Kaitlyn Zhou, et al.
0

There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners' goals, assumptions, and constraints – which inform decisions about what, when, and how to evaluate – are often partially or implicitly stated, or not stated at all. Combining a formative semi-structured interview study of NLG practitioners (N=18) with a survey study of a broader sample of practitioners (N=61), we surface goals, community practices, assumptions, and constraints that shape NLG evaluations, examining their implications and how they embody ethical considerations.

READ FULL TEXT

page 19

page 20

page 21

page 22

page 23

page 25

page 26

page 27

research
12/30/2021

An Empirical Study of Security Practices for Microservices Systems

Despite the numerous benefits of microservices systems, security has bee...
research
10/31/2018

A Process-driven View on Summative Evaluation of Visual Analytics Solutions

Many evaluation methods have been applied to assess the usefulness of vi...
research
08/10/2019

A Mulching Proposal

he ethical implications of algorithmic systems have been much discussed ...
research
08/23/2021

Legitimization of Data Quality Practices in Health Management Information Systems Using DHIS2. Case of Malawi

Medical doctors consider data quality management a secondary priority wh...
research
04/06/2022

"Merging Results Is No Easy Task": An International Survey Study of Collaborative Data Analysis Practices Among UX Practitioners

Analysis is a key part of usability testing where UX practitioners seek ...
research
09/05/2021

How Do Practitioners Interpret Conditionals in Requirements?

Context: Conditional statements like "If A and B then C" are core elemen...
research
03/03/2022

Why Do Machine Learning Practitioners Still Use Manual Tuning? A Qualitative Study

Current advanced hyperparameter optimization (HPO) methods, such as Baye...

Please sign up or login with your details

Forgot password? Click here to reset