A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models

10/21/2022
by   Alessandro Stolfo, et al.
0

We have recently witnessed a number of impressive results on hard mathematical reasoning problems with language models. At the same time, the robustness of these models has also been called into question; recent works have shown that models can rely on shallow patterns in the problem description when predicting a solution. Building on the idea of behavioral testing, we propose a novel framework, which pins down the causal effect of various factors in the input, e.g., the surface form of the problem text, the operands and math operators on the output solution. By grounding the behavioral analysis in a causal graph describing an intuitive reasoning process, we study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space. We apply our framework on a test bed of bivariate math word problems. Our analysis shows that robustness does not appear to continuously improve as a function of scale, but that the recent LLM, GPT-3-Instruct (175B), achieves a dramatic improvement in both robustness and sensitivity, compared to all other GPT variants.

READ FULL TEXT
research
05/15/2023

Estimating the Causal Effects of Natural Logic Features in Neural NLI Models

Rigorous evaluation of the causal effects of semantic features on langua...
research
05/30/2023

The Magic of IF: Investigating Causal Reasoning Abilities in Large Language Models of Code

Causal reasoning, the ability to identify cause-and-effect relationship,...
research
09/12/2023

Re-Reading Improves Reasoning in Language Models

Reasoning presents a significant and challenging issue for Large Languag...
research
05/24/2023

Understanding Arithmetic Reasoning in Language Models using Causal Mediation Analysis

Mathematical reasoning in large language models (LLMs) has garnered atte...
research
06/01/2023

Examining the Causal Effect of First Names on Language Models: The Case of Social Commonsense Reasoning

As language models continue to be integrated into applications of person...
research
03/07/2023

Can large language models build causal graphs?

Building causal graphs can be a laborious process. To ensure all relevan...
research
03/01/2023

Competence-Based Analysis of Language Models

Despite the recent success of large pretrained language models (LMs) on ...

Please sign up or login with your details

Forgot password? Click here to reset