Calling Out Bluff: Attacking the Robustness of Automatic Scoring Systems with Simple Adversarial Testing

by   Yaman Kumar, et al.
The University of Texas at Austin
IIIT Delhi

A significant progress has been made in deep-learning based Automatic Essay Scoring (AES) systems in the past two decades. The performance commonly measured by the standard performance metrics like Quadratic Weighted Kappa (QWK), and accuracy points to the same. However, testing on common-sense adversarial examples of these AES systems reveal their lack of natural language understanding capability. Inspired by common student behaviour during examinations, we propose a task agnostic adversarial evaluation scheme for AES systems to test their natural language understanding capabilities and overall robustness.


page 1

page 2

page 3

page 4


My Teacher Thinks The World Is Flat! Interpreting Automatic Essay Scoring Mechanism

Significant progress has been made in deep-learning based Automatic Essa...

Robustness Testing of Language Understanding in Dialog Systems

Most language understanding models in dialog systems are trained on a sm...

Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

Automated Scoring (AS), the natural language processing task of scoring ...

Towards More Robust Natural Language Understanding

Natural Language Understanding (NLU) is a branch of Natural Language Pro...

Robust Natural Language Inference Models with Example Forgetting

We investigate whether example forgetting, a recently introduced measure...

What Will it Take to Fix Benchmarking in Natural Language Understanding?

Evaluation for many natural language understanding (NLU) tasks is broken...

On Adversarial Robustness of Synthetic Code Generation

Automatic code synthesis from natural language descriptions is a challen...

Please sign up or login with your details

Forgot password? Click here to reset