Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

11/17/2021
by   Yaman Kumar Singla, et al.
18

Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by both human and machine thereby increasing costs. We target the spectrum of possible solutions in between, making use of both humans and machines to provide a higher quality test while keeping costs reasonable to democratize access to AS. In this work, we propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We propose reward sampling and observe significant gains in accuracy (19.80 increase on average) and quadratic weighted kappa (QWK) (25.60 with a relatively small human budget (30 The accuracy increase observed using standard random and importance sampling baselines are 8.6 system's model agnostic nature by measuring its performance on a variety of models currently deployed in an AS setting as well as pseudo models. Finally, we propose an algorithm to estimate the accuracy/QWK with statistical guarantees (Our code is available at https://git.io/J1IOy).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2020

Calling Out Bluff: Attacking the Robustness of Automatic Scoring Systems with Simple Adversarial Testing

A significant progress has been made in deep-learning based Automatic Es...
research
03/01/2022

Improving Performance of Automated Essay Scoring by using back-translation essays and adjusted scores

Automated essay scoring plays an important role in judging students' lan...
research
12/21/2020

Get It Scored Using AutoSAS – An Automated System for Scoring Short Answers

In the era of MOOCs, online exams are taken by millions of candidates, w...
research
10/13/2021

Automated Essay Scoring Using Transformer Models

Automated essay scoring (AES) is gaining increasing attention in the edu...
research
08/05/2020

An Interpretable Deep Learning System for Automatically Scoring Request for Proposals

The Managed Care system within Medicaid (US Healthcare) uses Request For...
research
06/16/2022

Balancing Cost and Quality: An Exploration of Human-in-the-loop Frameworks for Automated Short Answer Scoring

Short answer scoring (SAS) is the task of grading short text written by ...
research
03/27/2013

An Empirical Analysis of Likelihood-Weighting Simulation on a Large, Multiply-Connected Belief Network

We analyzed the convergence properties of likelihood- weighting algorith...

Please sign up or login with your details

Forgot password? Click here to reset