Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers

04/21/2023
by   Felipe Urrutia, et al.
0

Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection of incoherent answers. One option is to automate the review with Large Language Models (LLM). In this paper, we analyze the responses of fourth graders in mathematics using three LLMs: GPT-3, BLOOM, and YOU. We used them with zero, one, two, three and four shots. We compared their performance with the results of various classifiers trained with Machine Learning (ML). We found that LLMs perform worse than MLs in detecting incoherent answers. The difficulty seems to reside in recursive questions that contain both questions and answers, and in responses from students with typical fourth-grader misspellings. Upon closer examination, we have found that the ChatGPT model faces the same challenges.

READ FULL TEXT

page 1

page 6

page 13

page 15

page 16

page 27

research
09/14/2023

ExpertQA: Expert-Curated Questions and Attributed Answers

As language models are adapted by a more sophisticated and diverse set o...
research
07/25/2023

A large language model-assisted education tool to provide feedback on open-ended responses

Open-ended questions are a favored tool among instructors for assessing ...
research
09/20/2023

Chain-of-Verification Reduces Hallucination in Large Language Models

Generation of plausible yet incorrect factual information, termed halluc...
research
04/25/2022

Machine learning of the well known things

Machine learning (ML) in its current form implies that an answer to any ...
research
06/01/2021

Automated Grading of Anatomical Objective Structured Practical Exams Using Decision Trees

An Objective Structured Practical Examination (OSPE) is an effective and...
research
05/23/2023

Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models

This paper investigates the capabilities of Large Language Models (LLMs)...
research
02/13/2023

"Correct answers" from the psychology of artificial intelligence

Large Language Models have vastly grown in capabilities. One proposed ap...

Please sign up or login with your details

Forgot password? Click here to reset