On the Impact of Temporal Concept Drift on Model Explanations

10/17/2022
by   Zhixue Zhao, et al.
0

Explanation faithfulness of model predictions in natural language processing is typically evaluated on held-out data from the same temporal distribution as the training data (i.e. synchronous settings). While model performance often deteriorates due to temporal variation (i.e. temporal concept drift), it is currently unknown how explanation faithfulness is impacted when the time span of the target data is different from the data used to train the model (i.e. asynchronous settings). For this purpose, we examine the impact of temporal variation on model explanations extracted by eight feature attribution methods and three select-then-predict models across six text classification tasks. Our experiments show that (i)faithfulness is not consistent under temporal variations across feature attribution methods (e.g. it decreases or increases depending on the method), with an attention-based method demonstrating the most robust faithfulness scores across datasets; and (ii) select-then-predict models are mostly robust in asynchronous settings with only small degradation in predictive performance. Finally, feature attribution methods show conflicting behavior when used in FRESH (i.e. a select-and-predict model) and for measuring sufficiency/comprehensiveness (i.e. as post-hoc methods), suggesting that we need more robust metrics to evaluate post-hoc explanation faithfulness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

An Empirical Study on Explanations in Out-of-Domain Settings

Recent work in Natural Language Processing has focused on developing app...
research
03/23/2023

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

As neural networks increasingly make critical decisions in high-stakes s...
research
09/06/2022

Change Detection for Local Explainability in Evolving Data Streams

As complex machine learning models are increasingly used in sensitive ap...
research
12/09/2022

Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation

We investigate whether three types of post hoc model explanations–featur...
research
08/31/2021

Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience

Pretrained transformer-based models such as BERT have demonstrated state...
research
05/29/2017

Contextual Explanation Networks

We introduce contextual explanation networks (CENs)---a class of models ...
research
04/09/2021

Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals

Token-level attributions have been extensively studied to explain model ...

Please sign up or login with your details

Forgot password? Click here to reset