Performance of the Pre-Trained Large Language Model GPT-4 on Automated Short Answer Grading

09/17/2023
by   Gerd Kortemeyer, et al.
0

Automated Short Answer Grading (ASAG) has been an active area of machine-learning research for over a decade. It promises to let educators grade and give feedback on free-form responses in large-enrollment courses in spite of limited availability of human graders. Over the years, carefully trained models have achieved increasingly higher levels of performance. More recently, pre-trained Large Language Models (LLMs) emerged as a commodity, and an intriguing question is how a general-purpose tool without additional training compares to specialized models. We studied the performance of GPT-4 on the standard benchmark 2-way and 3-way datasets SciEntsBank and Beetle, where in addition to the standard task of grading the alignment of the student answer with a reference answer, we also investigated withholding the reference answer. We found that overall, the performance of the pre-trained general-purpose GPT-4 LLM is comparable to hand-engineered models, but worse than pre-trained LLMs that had specialized training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2019

Exploring Neural Net Augmentation to BERT for Question Answering on SQUAD 2.0

Enhancing machine capabilities to answer questions has been a topic of c...
research
04/09/2020

Injecting Numerical Reasoning Skills into Language Models

Large pre-trained language models (LMs) are known to encode substantial ...
research
05/18/2019

BERTSel: Answer Selection with Pre-trained Models

Recently, pre-trained models have been the dominant paradigm in natural ...
research
05/17/2023

CooK: Empowering General-Purpose Language Models with Modular and Collaborative Knowledge

Large language models (LLMs) are increasingly adopted for knowledge-inte...
research
06/20/2023

Lingua Manga: A Generic Large Language Model Centric System for Data Curation

Data curation is a wide-ranging area which contains many critical but ti...
research
08/16/2023

FootGPT : A Large Language Model Development Experiment on a Minimal Setting

With recent empirical observations, it has been argued that the most sig...

Please sign up or login with your details

Forgot password? Click here to reset