DeepAI AI Chat
Log In Sign Up

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

by   Peter Hase, et al.
University of North Carolina at Chapel Hill

Many methods now exist for conditioning model outputs on task instructions, retrieved documents, and user-provided explanations and feedback. Rather than relying solely on examples of task inputs and outputs, these approaches use valuable additional data for improving model correctness and aligning learned models with human priors. Meanwhile, a growing body of evidence suggests that some language models can (1) store a large amount of knowledge in their parameters, and (2) perform inference over tasks in textual inputs at test time. These results raise the possibility that, for some tasks, humans cannot explain to a model any more about the task than it already knows or could infer on its own. In this paper, we study the circumstances under which explanations of individual data points can (or cannot) improve modeling performance. In order to carefully control important properties of the data and explanations, we introduce a synthetic dataset for experiments, and we also make use of three existing datasets with explanations: e-SNLI, TACRED, and SemEval. We first give a formal framework for the available modeling approaches, in which explanation data can be used as model inputs, as targets, or as a prior. After arguing that the most promising role for explanation data is as model inputs, we propose to use a retrieval-based method and show that it solves our synthetic task with accuracies upwards of 95 below 65 retrieval-based modeling fails. With the three existing datasets, we find no improvements from explanation retrieval. Drawing on findings from our synthetic task, we suggest that at least one of six preconditions for successful modeling fails to hold with these datasets. Our code is publicly available at


CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

Chain-of-thought (CoT) prompting enables large language models (LLMs) to...

Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals

Feature importance (FI) estimates are a popular form of explanation, and...

ExaRanker: Explanation-Augmented Neural Ranker

Recent work has shown that inducing a large language model (LLM) to gene...

Shapley Explanation Networks

Shapley values have become one of the most popular feature attribution e...

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

Data collection for natural language (NL) understanding tasks has increa...

Rule induction for global explanation of trained models

Understanding the behavior of a trained network and finding explanations...

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Algorithmic approaches to interpreting machine learning models have prol...

Code Repositories


Code for paper "When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data"

view repo