DeepAI AI Chat
Log In Sign Up

From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

by   Peter Clark, et al.

AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3 an 8th Grade science exam challenge. This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90 exam's non-diagram, multiple choice (NDMC) questions. In addition, our Aristo system, building upon the success of recent language models, exceeded 83 the corresponding Grade 12 Science Exam NDMC questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern NLP methods can result in mastery on this task. While not a full solution to general question-answering (the questions are multiple choice, and the domain is restricted to 8th Grade science), it represents a significant milestone for the field.


page 1

page 2

page 3

page 4


Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

We present a new question set, text corpus, and baselines assembled to e...

A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology

In recent years, there have been significant breakthroughs in the field ...

Towards an AI to Win Ghana's National Science and Maths Quiz

Can an AI win Ghana's National Science and Maths Quiz (NSMQ)? That is th...

Crowdsourcing Multiple Choice Science Questions

We present a novel method for obtaining high-quality, domain-targeted mu...

Question Answering via Integer Programming over Semi-Structured Knowledge

Answering science questions posed in natural language is an important AI...

Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for Education

There is a lack of enough qualified teachers across Africa which hampers...

Effect of Tuned Parameters on a LSA MCQ Answering Model

This paper presents the current state of a work in progress, whose objec...

Code Repositories



view repo