The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP

03/17/2021
by   Anastasia Shimorina, et al.
13

This paper introduces the Human Evaluation Datasheet, a template for recording the details of individual human evaluation experiments in Natural Language Processing (NLP). Originally taking inspiration from seminal papers by Bender and Friedman (2018), Mitchell et al. (2019), and Gebru et al. (2020), the Human Evaluation Datasheet is intended to facilitate the recording of properties of human evaluations in sufficient detail, and with sufficient standardisation, to support comparability, meta-evaluation, and reproducibility tests.

READ FULL TEXT

page 2

page 6

page 7

page 11

page 13

research
10/16/2019

Efficiency through Auto-Sizing: Notre Dame NLP's Submission to the WNGT 2019 Efficiency Task

This paper describes the Notre Dame Natural Language Processing Group's ...
research
07/28/2023

Lessons in Reproducibility: Insights from NLP Studies in Materials Science

Natural Language Processing (NLP), a cornerstone field within artificial...
research
04/17/2023

Use of social media and Natural Language Processing (NLP) in natural hazard research

Twitter is a microblogging service for sending short, public text messag...
research
11/06/2020

GANterpretations

Since the introduction of Generative Adversarial Networks (GANs) [Goodfe...
research
04/06/2021

Efficient transfer learning for NLP with ELECTRA

Clark et al. [2020] claims that the ELECTRA approach is highly efficient...
research
09/20/2023

Safurai 001: New Qualitative Approach for Code LLM Evaluation

This paper presents Safurai-001, a new Large Language Model (LLM) with s...
research
04/20/2022

MEDFORD: A human and machine readable metadata markup language

Reproducibility of research is essential for science. However, in the wa...

Please sign up or login with your details

Forgot password? Click here to reset