Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

08/10/2023
by   Ernest Davis, et al.
0

This report describes a test of the large language model GPT-4 with the Wolfram Alpha and the Code Interpreter plug-ins on 105 original problems in science and math, at the high school and college levels, carried out in June-August 2023. Our tests suggest that the plug-ins significantly enhance GPT's ability to solve these problems. Having said that, there are still often "interface" failures; that is, GPT often has trouble formulating problems in a way that elicits useful answers from the plug-ins. Fixing these interface failures seems like a central challenge in making GPT a reliable tool for college-level calculation problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2019

Does Unit-Tested Code Crash? A Case Study of Eclipse

Context: Software development projects increasingly adopt unit testing a...
research
01/30/2022

Training and Evaluating a Jupyter Notebook Data Science Assistant

We study the feasibility of a Data Science assistant powered by a sequen...
research
09/15/2023

Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level

Our work demonstrates that large language model (LLM) pre-trained on tex...
research
10/17/2020

Printmaking, Puzzles, and Studio Closets: Using Artistic Metaphors to Reimagine the User Interface for Designing Immersive Visualizations

We, as a society, need artists to help us interpret and explain science,...
research
06/21/2023

Mass-Producing Failures of Multimodal Systems with Language Models

Deployed multimodal systems can fail in ways that evaluators did not ant...
research
10/23/2018

Nonequispaced Fast Fourier Transform (NFFT) Interface for Julia

This report describes the newly added Julia interface to the NFFT3 libra...
research
05/31/2022

K-Detector: Identifying Duplicate Crash Failures in Large-Scale Software Delivery

After a developer submits code, corresponding test cases arise to ensure...

Please sign up or login with your details

Forgot password? Click here to reset