A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams

06/11/2022
by   Sarah Zhang, et al.
0

Can a machine learn machine learning? We propose to answer this question using the same criteria we use to answer a similar question: can a human learn machine learning? We automatically answer MIT final exams in Introduction to Machine Learning at a human level. The course is a large undergraduate class with around five hundred students each semester. Recently, program synthesis and few-shot learning solved university-level problem set questions in mathematics and STEM courses at a human level. In this work, we solve questions from final exams that differ from problem sets in several ways: the questions are longer, have multiple parts, are more complicated, and span a broader set of topics. We provide a new dataset and benchmark of questions from eight MIT Introduction to Machine Learning final exams between Fall 2017 and Spring 2022 and provide code for automatically answering these questions and generating new questions. We perform ablation studies comparing zero-shot learning with few-shot learning, chain-of-thought prompting, GPT-3 pre-trained on text and Codex fine-tuned on code on a range of machine learning topics and find that few-shot learning methods perform best. We make our data and code publicly available for the machine learning community.

READ FULL TEXT
research
11/16/2021

Solving Linear Algebra by Program Synthesis

We solve MIT's Linear Algebra 18.06 course and Columbia University's Com...
research
12/30/2021

Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI

The second edition of Deep Learning Interviews is home to hundreds of fu...
research
08/14/2022

Limits of an AI program for solving college math problems

Drori et al. (2022) report that "A neural network solves, explains, and ...
research
10/22/2020

Zero-Shot Learning from scratch (ZFS): leveraging local compositional representations

Zero-shot classification is a generalization task where no instance from...
research
12/19/2022

Visconde: Multi-document QA with GPT-3 and Neural Reranking

This paper proposes a question-answering system that can answer question...
research
09/04/2021

FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models

The task of learning from only a few examples (called a few-shot setting...
research
04/16/2022

What If: Generating Code to Answer Simulation Questions

Many texts, especially in chemistry and biology, describe complex proces...

Please sign up or login with your details

Forgot password? Click here to reset