Refining Neural Networks with Compositional Explanations

03/18/2021
by   Huihan Yao, et al.
0

Neural networks are prone to learning spurious correlations from biased datasets, and are thus vulnerable when making inferences in a new target domain. Prior work reveals spurious patterns via post-hoc model explanations which compute the importance of input features, and further eliminates the unintended model behaviors by regularizing importance scores with human knowledge. However, such regularization technique lacks flexibility and coverage, since only importance scores towards a pre-defined list of features are adjusted, while more complex human knowledge such as feature interaction and pattern generalization can hardly be incorporated. In this work, we propose to refine a learned model by collecting human-provided compositional explanations on the models' failure cases. By describing generalizable rules about spurious patterns in the explanation, more training examples can be matched and regularized, tackling the challenge of regularization coverage. We additionally introduce a regularization term for feature interaction to support more complex human rationale in refining the model. We demonstrate the effectiveness of the proposed approach on two text classification tasks by showing improved performance in target domain after refinement.

READ FULL TEXT
research
03/04/2022

Evaluating Local Model-Agnostic Explanations of Learning to Rank Models with Decision Paths

Local explanations of learning-to-rank (LTR) models are thought to extra...
research
04/04/2020

Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection

Generating explanations for neural networks has become crucial for their...
research
11/08/2019

Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

The impressive performance of neural networks on natural language proces...
research
04/06/2021

Shapley Explanation Networks

Shapley values have become one of the most popular feature attribution e...
research
10/23/2022

Unsupervised Non-transferable Text Classification

Training a good deep learning model requires substantial data and comput...
research
03/11/2023

Robust Learning from Explanations

Machine learning from explanations (MLX) is an approach to learning that...

Please sign up or login with your details

Forgot password? Click here to reset