A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers

06/30/2021
by   Shen-Yun Miao, et al.
0

We present ASDiv (Academia Sinica Diverse MWP Dataset), a diverse (in terms of both language patterns and problem types) English math word problem (MWP) corpus for evaluating the capability of various MWP solvers. Existing MWP corpora for studying AI progress remain limited either in language usage patterns or in problem types. We thus present a new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem types taught in elementary school. Each MWP is annotated with its problem type and grade level (for indicating the level of difficulty). Furthermore, we propose a metric to measure the lexicon usage diversity of a given MWP corpus, and demonstrate that ASDiv is more diverse than existing corpora. Experiments show that our proposed corpus reflects the true capability of MWP solvers more faithfully.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 6

page 7

page 8

page 9

research
06/10/2022

Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing

We present a new corpus of Twitter data annotated for codeswitching and ...
research
09/25/2019

Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment

Reference corpus for word alignment is an important resource for develop...
research
11/06/2017

Evaluation of Croatian Word Embeddings

Croatian is poorly resourced and highly inflected language from Slavic l...
research
07/13/2020

GGPONC: A Corpus of German Medical Text with Rich Metadata Based on Clinical Practice Guidelines

The lack of publicly available text corpora is a major obstacle for prog...
research
06/07/2020

An Algorithm for Fuzzification of WordNets, Supported by a Mathematical Proof

WordNet-like Lexical Databases (WLDs) group English words into sets of s...
research
04/13/2022

Study of Indian English Pronunciation Variabilities relative to Received Pronunciation

In contrast to British or American English, labeled pronunciation data o...
research
10/01/2019

Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings

Natural language processing techniques are being applied to increasingly...

Please sign up or login with your details

Forgot password? Click here to reset