An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)

02/23/2023
by   Paulo Shakarian, et al.
0

We study the performance of a commercially available large language model (LLM) known as ChatGPT on math word problems (MWPs) from the dataset DRAW-1K. To our knowledge, this is the first independent evaluation of ChatGPT. We found that ChatGPT's performance changes dramatically based on the requirement to show its work, failing 20 when it does not. Further several factors about MWPs relating to the number of unknowns and number of operations that lead to a higher probability of failure when compared with the prior, specifically noting (across all experiments) that the probability of failure increases linearly with the number of addition and subtraction operations. We also have released the dataset of ChatGPT's responses to the MWPs to support further work on the characterization of LLM performance and present baseline machine learning models to predict if ChatGPT can correctly answer an MWP. We have released a dataset comprised of ChatGPT's responses to support further research in this area.

READ FULL TEXT
research
08/22/2023

Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

Error prediction in large language models often relies on domain-specifi...
research
09/15/2023

Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level

Our work demonstrates that large language model (LLM) pre-trained on tex...
research
09/13/2022

Improving Language Model Prompting in Support of Semi-autonomous Task Learning

Language models (LLMs) offer potential as a source of knowledge for agen...
research
02/20/2002

Nonmonotonic inference operations

A. Tarski proposed the study of infinitary consequence operations as the...
research
10/01/2022

Failure-informed adaptive sampling for PINNs

Physics-informed neural networks (PINNs) have emerged as an effective te...
research
03/08/2023

The Bystander Affect Detection (BAD) Dataset for Failure Detection in HRI

For a robot to repair its own error, it must first know it has made a mi...

Please sign up or login with your details

Forgot password? Click here to reset