FERMAT: An Alternative to Accuracy for Numerical Reasoning

05/27/2023
by   Jasivan Alex Sivakumar, et al.
0

While pre-trained language models achieve impressive performance on various NLP benchmarks, they still struggle with tasks that require numerical reasoning. Recent advances in improving numerical reasoning are mostly achieved using very large language models that contain billions of parameters and are not accessible to everyone. In addition, numerical reasoning is measured using a single score on existing datasets. As a result, we do not have a clear understanding of the strengths and shortcomings of existing models on different numerical reasoning aspects and therefore, potential ways to improve them apart from scaling them up. Inspired by CheckList (Ribeiro et al., 2020), we introduce a multi-view evaluation set for numerical reasoning in English, called FERMAT. Instead of reporting a single score on a whole dataset, FERMAT evaluates models on various key numerical reasoning aspects such as number understanding, mathematical operations, and training dependency. Apart from providing a comprehensive evaluation of models on different numerical reasoning aspects, FERMAT enables a systematic and automated generation of an arbitrarily large training or evaluation set for each aspect.The datasets and codes are publicly available to generate further multi-view data for ulterior tasks and languages.

READ FULL TEXT

page 7

page 15

research
12/16/2022

ALERT: Adapting Language Models to Reasoning Tasks

Current large language models can perform reasonably well on complex tas...
research
06/14/2021

Probing Pre-Trained Language Models for Disease Knowledge

Pre-trained language models such as ClinicalBERT have achieved impressiv...
research
04/09/2020

Injecting Numerical Reasoning Skills into Language Models

Large pre-trained language models (LMs) are known to encode substantial ...
research
05/13/2022

Improving the Numerical Reasoning Skills of Pretrained Language Models

State-of-the-art pretrained language models tend to perform below their ...
research
03/20/2023

3D Concept Learning and Reasoning from Multi-View Images

Humans are able to accurately reason in 3D by gathering multi-view obser...
research
07/16/2023

MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning

Reasoning in mathematical domains remains a significant challenge for re...
research
07/24/2023

Interpretable Stereotype Identification through Reasoning

Given that language models are trained on vast datasets that may contain...

Please sign up or login with your details

Forgot password? Click here to reset