Log In Sign Up

ReX: A Framework for Generating Local Explanations to Recurrent Neural Networks

We propose a general framework to adapt various local explanation techniques to recurrent neural networks. In particular, our explanations add temporal information, which expand explanations generated from existing techniques to cover data points that have different lengths compared to the original input data point. Our approach is general as it only modifies the perturbation model and feature representation of existing techniques without touching their core algorithms. We have instantiated our approach on LIME and Anchors. Our empirical evaluation shows that it effectively improves the usefulness of explanations generated by these two techniques on a sentiment analysis network and an anomaly detection network.


page 1

page 2

page 3

page 4


Distilling neural networks into skipgram-level decision lists

Several previous studies on explanation for recurrent neural networks fo...

Faithful Explanations for Deep Graph Models

This paper studies faithful explanations for Graph Neural Networks (GNNs...

Empirical Analysis of Limits for Memory Distance in Recurrent Neural Networks

Common to all different kinds of recurrent neural networks (RNNs) is the...

Anomaly detection in average fuel consumption with XAI techniques for dynamic generation of explanations

In this paper we show a complete process for unsupervised anomaly detect...

Modeling Compositionality with Multiplicative Recurrent Neural Networks

We present the multiplicative recurrent neural network as a general mode...

On Guaranteed Optimal Robust Explanations for NLP Models

We build on abduction-based explanations for ma-chine learning and devel...

Explaining Recurrent Neural Network Predictions in Sentiment Analysis

Recently, a technique called Layer-wise Relevance Propagation (LRP) was ...


As more critical applications employ machine learning systems, how to explain rationales behind results produced by these systems have emerged as an important problem. Such explanations allow end users to 1) judge whether the results are trustworthy 

Ribeiro et al. (2016); Doshi-Velez et al. (2017) and 2) understand knowledge embedded in the systems so they can use the knowledge to manipulate future events Poyiadzi et al. (2020); Prosperi et al. (2020); Zhang et al. (2018)

. This paper focuses on the problem of explaining Recurrent Neural Networks (RNNs), an important instance of deep learning systems that is widely applied to processing sequential data of different lengths.

One dimension to classify explanation techniques is based on whether they are

global or local Molnar (2020). The former explains how the model behaves on all inputs while the latter explains how the model behaves on a particular set of inputs (typically ones that are similar to a given input). A vast majority of explanation techniques for RNNs nowadays are global. In particular, they employ deterministic finite automaton (DFAs) as global surrogates of target RNNsOmlin and Giles (1996); Jacobsson (2005); Wang et al. (2018); Weiss et al. (2018); Dong et al. (2020). However, due to complex natures of any practical problem domain, such techniques can produce very large DFAs. These explanations are hard for a human to digest. Moreover, they take a long time to generate, making these techniques hard to scale. As a result, these techniques are often limited to toy networks such as ones that learn regular expressions. Moreover, even for these simple domains, the explanations can still be complex as RNNs often fail to internalize the perfect regular expressions and contain noise.

Due to the complexity of the problem domain and the networks, we turn our attentions to local methodsRibeiro et al. (2016, 2018); Zhang et al. (2018); Arras et al. (2017); Wachter et al. (2017); Lundberg and Lee (2017). These methods produce more tractable explanations at the cost of covering much less inputs. Such methods provide explanations to individual inputs and have a wide range of applications. While there is a rich body of techniques in this category, to our knowledge, there are few that are specialized to RNNs. In particular, they all ignore temporal information in the inputs, making it hard for users to fully understand the explanation.

Input sentence I: He never fails in any exam.
Network output: Positive
(a) {never, fails} Positive
(b) {never, fails} Positive
Input sentence II: He never attends any lecture,
so he fails in any exam.
Network output: Negative
(a) {never, fails} Negative
(b) {never, fails} Negative
Figure 1: Example explanations generated in Anchors (a) without ReX and (b) with ReX.
Input sentence: The weather is not good.
Network output: Negative
(a) (b)
Figure 2: An example explanation generated by LIME (a) without ReX and (b) with ReX.

Consider Anchors Ribeiro et al. (2018), a popular local technique that generates sufficient conditions to producing specific outputs. As Figure 1(a) shows, to a sentiment analysis network, it generates the following explanation to Sentence I: Sentence I is negative because it contains both “never” and “fails”. However, it does not give any temporal information about these two words. In particular, would the sentence still be negative if “never” comes after “fails”? Consider Sentence II, the generated anchors are exactly the same as those to Sentence I, but now the sentence is judged as negative. The key difference is in Sentence II, “never” and “fails” do not form a phrase. However, this is not captured in Anchors, which is confusing to a user.

We see similar issues with other existing local explanation techniques. For example, a large group of techniques such as LIME Ribeiro et al. (2016) use linear models as local surrogates. Similarly, consider a sentiment analysis network. As Figure 2 (a) shows, LIME would assign a high negative score to “Not”, and a high positive score to “Good”. Intuitively, it means that “Good” makes the sentence positive and the sentence is overall negative because “Not” has a stronger effect making the sentence negative. However, this is not the case, as “Not Good” together is a negative phrase.

To resolve this issue, we propose ReX, a general framework for extending various local model-agnostic explanation techniques with temporal information. In particular, ReX adds such information in the form of , meaning the distance between the positions of feature and feature is not smaller than . How such information is presented depends on the specific explanation technique. Consider the sentences in Figure 1 again. After ReX augments Anchors, the explanation to Sentence I becomes the sentence is positive because it contains both “never” and “fails”, and “never” is before “fails”. On the other hand, the explanation to Sentence II becomes the sentence is negative because it contains “never” and “fails”, and “never” is not right before “fails”. Now, consider applying LIME with ReX to the sentence in Figure 2. In the new explanation, the fact that “Not” is before “Good” gets the highest negative score. This new information 1) associates the two words, and 2) captures that “Not” comes before “Good”. As we can see, for both Anchors and LIME, ReX makes the explanations much easier for an end user to understand.

In order to add such information to various local model-agnostic explanation techniques, we notice that these techniques treat target model as black boxes and generate a surrogate by learning from input-output pairs. In particular, we make two key observations: 1) these techniques use a perturbation model that generates inputs that are similar to the target input so it captures the local behavior of the network through these inputs; 2) these techniques generate explanations which are described in features. Based on 1), in order to capture temporal information which the network internalizes, we can modify the perturbation models so it generates inputs whose lengths vary and in which the order of features can change. Based on 2), we can treat temporal information such as as a feature of the network. Although this feature will not affect the result of the original network, they reflect temporal information that is internalized by the network. More importantly, now the explanation techniques can generate explanations that utilize them without changing their core algorithms.

To evaluate the effectiveness of ReX, We have applied it to both LIME and Anchors and then use their augmented versions to explain a sentiment analysis network and an anomaly detection network. Our results show that ReX is able to make the generated explanations cover 235.2% more inputs on average while the changes in their precision are minimal. Moreover, a user study shows that the augmented explanations make it easier for human users to predict the behaviors of the networks.

In summary, we have made the following contributions:

  • We have proposed to incorporate temporal information in the form of in local explanations to RNNs, which make these explanations easier to understand.

  • We have proposed a general framework ReX to automatically incorporate the above information in popular local explanation techniques.

  • We have demonstrated the effectiveness of ReX by instantiating it on LIME and Anchors and evaluating the augmented explanation techniques on two representative RNNs.


In this section, we describe the necessary background to introduce our approach. Without loosing generalizability, we assume the RNN is a black-box function from a sequence of real numbers to a real value, . We limit our discussion to networks that do classification or regression.

Given an input , its corresponding output , a local explanation is . A local explanation reflects how the network produces the output given the specific input. Typically, a local explanation is an expression over features. More specifically, is an expression which is formed with feature predicates. A feature predicate is , where is a variable representing the th feature, is a binary operator such as , , and , and is a constant. Based on the specific form of the expression, a feature predicate can be interpreted as a Boolean variable or a {0, 1} integer. We call the set of feature predicates the vocabulary of the local explanation. For example, in Anchors, is a conjunction which must evaluate to true on ; In LIME, is a linear expression in the form of ; In a counterfactual explanation Zhang et al. (2018); Wachter et al. (2017); Dandl et al. (2020), is a conjunction where , all do not hold on , and all has the form of . Moreover if an input satisfies a counterfactual explanation , then for a given 111In the case of binary classification, d is 1..

A perturbation model describes a set of inputs that are similar to a given input. A local explanation technique is parameterized by a perturbation model, and generates a local explanation to a network given an input: . Intuitively, the local explanation only reflects the behaviors of the network over the input space given by the perturbation model. The perturbation models in all existing local explanation techniques are implemented as changing values of given features, i.e., .

As we can see, both the vocabularies and the perturbation models of existing local explanations are limited to describing inputs that have the same lengths as the original input222When applying these techniques to language applications, some perturbation models allow replacing a word with an empty string. This allows generalizing the explanations to inputs of shorter lengths to some extent.. This limits their effectiveness on RNNs.

Our Framework

We now describe ReX. Our goal is to provide a general approach to incorporate temporal information in explanations without heavily modifying the explanation generation technique. We first describe our extended explanations for RNNs, and then describe how to augment existing techniques to generate such explanations.

Local Explanations with Temporal Information

Our key observation is that while the form of explanation expression varies, the expressions are all built from the corresponding vocabulary. If we can add predicates that reflect temporal information to the vocabulary, then naturally the explanations provide temporal information.

Our temporal predicates describe the temporal relationship between a set of features that satisfy basic predicates. We limit the number of features in a temporal predicates upto two because 1) in most cases, the temporal relationship between two features suffices to cover a large range of inputs of different lengths, therefore provide useful information, 2) humans are bad at understanding high-dimensional information, and 3) covering too many inputs makes the generation process slow, especially for the generation techniques based on sampling (e.g., LIME and Anchors). We give their definitions below:

[1-D Temporal Predicate] A 1-D temporal predicate takes the form of

In the above definition, can be also replaced with .

[2-D Temporal Predicate] A 2-D temporal predicate takes the form of

In a 2-D temporal predicate, when the distance is negative, feature can come either before or after feature . This implies the order between these two features does not matter. In the experiments, we constraint as is enough to indicate that the order does not matter.

As we can see, compared to standard predicates, temporal predicates no longer describe properties of features at fixed positions in an input. Instead, it requires positions of features being described satisfy given temporal constraints.

Based on the definitions of temporal predicates, we introduce the definition of local explanation with temporal information:

[Explanation with Temporal Information] A local explanation with temporal information is a local explanation whose vocabulary consists of regular predicates, 1-D temporal predicates, or 2-D temporal predicates.


Consider Input sentence I in Figure 1, a 1-D temporal predicate is

A 2-D temporal predicate is

The explanation generated by Anchors augmented with ReX in Figure 1 is a conjunction which only consists of the above 2-D predicate.

Augmenting Generation Techniques

We aim to provide a general approach to extend existing model-agnostic techniques to generate local explanations with temporal information without heavily modifying their algorithms. Our key insight to address this challenge is that it can be achieved by simply extending the vocabulary and the perturbation model of a explanation technique. The reason is that existing local model-agnostic explanation techniques essentially treat the target model as a black-box: they generate surrogate models as explanations which are described using features in the vocabulary and trained from input-output pairs that are obtained from the perturbation model. As long as we can change these two components, we can generate explanations augmented with temporal information without changing the core explanation algorithm.

Extending Vocabularies.

In order to add a predicate to a vocabulary, the predicate must be able to serve as a feature of an input. As a result, the work required to add a predicate to a vocabulary is quite light: as long as a way is provided to evaluate the predicate on a given input, it is fine. Obviously, both our 1-D and 2-D temporal predicates can be easily evaluated on any input of a target RNN. By adding temporal predicates to the vocabulary, now an explanation technique has the language to describe behaviours of the target RNN over inputs of various lengths.

Extending Perturbation Models.

In order to generate useful local explanations with temporal information that cover inputs of different lengths, we next modify the perturbation models of existing techniques. More concretely, we add a preprocessor over a given perturbation model. For a given input , the preprocessor does two modifications in sequence to generate more inputs: 1) it can delete certain features from the input, and 2) it can switch the positions of two features. To control the number of generated inputs, we add a parameter to limit the switching operation to two features that are at most apart. In our experiment, we set . This corresponds to the aforementioned setting of 2-D temporal predicates where the distance . More formally, given a perturbation model , a constant , the new perturbation model our approach generates is

In the above definition, returns a new input that removes a random feature from a given input and it can be applied for arbitrary times; switches the positions of feature and feature of a given input and returns a new input.

Careful readers may have noticed that our augmented perturbation model can be made more complex. For example, instead of only allowing switching the positions of two features, one can completely shuffle the features in an input; besides deleting or switch features, one can add new features. While these methods can generate more features, the drawbacks are that 1) the generation techniques can run much slower – many of these techniques require sampling from the perturbation model, which now alone takes much longer time, and 2) a larger input space is harder for a simple explanation to fit. In practice, we find our definition above a good balance among generality, efficiency, and accuracy.


We next describe how to instantiate our framework on several popular local explanation methods Molnar (2020). In particular, we discuss how the augmented explanation looks, how the temporal predicates are plugged, and how the perturbation model interacts with the core generation algorithm. In our experiment, we implement the instances that correspond to Anchors and LIME.

Lime Ribeiro et al. (2016)

As the example in Figure 2 shows, the augmented explanation is now a linear expression whose 0-1 variables can either be a regular predicate or a temporal predicate. Since LIME trains such a linear expression by fitting it on a given set of input-output pairs, it only needs to be able to evaluate the linear expression and calculate the loss value after adding temporal predicates. This is obviously doable. As for the perturbation model, it is used to draw the aforementioned inputs and is viewed as a black-box by the core algorithm in LIME. As a result, replacing the perturbation model is straightforward.

AnchorsRibeiro et al. (2018)

As the example in Figure 1 shows, the augmented explanation is now a conjunction of regular predicates and temporal predicates. Similar to LIME, Anchors are generated by a sampling-based approach. Therefore, adding the temporal predicates and substituting the perturbation model are straightforward.

Shapley Values Strumbelj and Kononenko (2014); Lundberg and Lee (2017)

Similar to LIME, these methods attribute importance to individual features but apply more computationally expensive yet precise algorithms. As a result, the way to extend their explanations is similar to that to extend LIME’s. In addition, the underlying algorithms are also based on learning from input-output pairs. So the way to extend the vocabulary and the perturbation model is also similar to the case of extending LIME’s.

Counterfactual Explanations Wachter et al. (2017); Dandl et al. (2020); Zhang et al. (2018)

The techniques proposed by Watcher et al. and by Dandl return individual inputs as explanations so they do not fit into our framework. Here, we discuss Polaris Zhang et al. (2018), which returns a linear expression, such as “If your January income was greater than $1,000 and your February’s income was greater than $2,000, you would have got the loan”. We augment such explanations with temporal predicates so that a subset of linear expressions come with existential quantifiers. For example, the aforementioned explanations would become “If your income for Month A was greater than $1,000, your income for Month B was greater than $2,000, and Month B comes after Month A, you would have got the loan”. However, augmenting the generation technique needs more work, because Polaris is not model-agnostic and encodes explaining a given model as linear constraint problems. One direction is to extend the linear problems with our predicates and perturbation models. Another direction is to make the generation technique model-agnostic by changing it to a sampling-based approach.

Empirical Evaluation

We have instantiated ReX on Anchors and LIME. We refer to the extended version as Anchors* and LIME*, respectively. To evaluate ReX, we use Anchor* and LIME* to explain a sentiment analysis RNN and an anomaly detection RNN, and compare their effectiveness against original methods’ effectiveness. More concretely, we conducted an experiment with simulated users on a relatively large set of inputs and a study with real users on a smaller set of inputs. In particular, we are interested in the answers to two questions:

  1. With an explanation, for an input that is similar to the original input, will the user be able to predict the result produced by the target model?

  2. Is the prediction correct?

For a given explanation and a set of inputs, we refer to the percentage of answering “yes” to the first question as “coverage” of the explanation; we refer to the percentage of answering “yes” to the second question as “precision” of the explanation.

The RNNs and the Datasets

Sentiment Analysis

We trained an LSTM with paraphrastic sentence embedding Wieting et al. (2015) as a sentiment predictor on the Stanford Sentiment Treebank Socher et al. (2013) dataset. We followed the train/validate/test split in the original dataset. We use words as the basic building blocks of explanations. As described previously, the explanations produced by the original Anchors and LIME only use the presence of words, while ReX adds the 1-D and 2-D temporal information into the explanations. In the perturbation model, besides deleting and switching words, we apply BERT Devlin et al. (2018) to replace words with other words that can also appear in the context.

Anomaly Detection

We trained an Anomaly Detection RNN following Park (2018) on an ECG dataset Dau et al. (2018). Similarly, we followed the train/validate/test split in the original dataset. We use whether a data point in a series is fixed to a given value as the basic building block of explanations. While the explanations generated by the original Anchors and LIME only contain such predicates, ReX

adds temporal predicates. To make the explanation generation process tractable, we limit the explanations to only consist data points that are at most 20 steps before the detected anomalous point in a time series. In the perturbation model, besides deleting and switching data points, we allow changing each data point by sampling from a Gaussian distribution with its original value as the mean and one as the standard deviation.

Simulated Users

Method Sentiment Analysis Anomaly Detection
coverage precision coverage precsion
Anchors 5.0% 99.0% 3.8% 98.7%
Anchors* 21.3% 97.1% 7.2% 97.8%
LIME 10.6% 74.8% 8.1% 87.7%
LIME* 36.5% 74.7% 31.1% 88.4%
Table 1: Average coverage and precision in the simulated user experiment. Anchors* and LIME* indicate Anchors and LIME augmented with ReX.

In this experiment, we evaluated the improvement incurred by our approach on coverage and precision by assuming the user will precisely follow the explanations. We took the test sets of the two datasets, and applied Anchors, Anchors*, LIME, and LIME* to generate explanations for each input in the sets. Since the normal points are much more than anomalous points in the anomaly detection dataset, we only looked at anomalous inputs. There are 2210 inputs in the test set of the sentiment analysis dataset, while there are 9 (anomalous) inputs in that of the anomaly detection dataset. Then we applied our perturbation models that vary input lengths to generate 10,000 similar inputs for each input. For each generated input, we tried to predict the RNN output following each explanation, which was in turn used to calculate the coverage and precision of each approach. In the case of Anchors and Anchors*, as long as the given input satisfies the explanation (a sufficient condition), we gave a “yes” to the coverage question; if the actual classification also matches that of the original input, we gave a “yes” to the precision question. In the case of LIME and LIME*, the explanations are linear expressions and evaluate to real numbers on each input. If the absolute value of an evaluation result is greater than a threshold (0.1 for sentiment analysis, and 0.05 for anomaly detection), we gave a “yes” to the coverage question; further, if the sign of the value matches the actual RNN output, we gave a “yes” to the precision question.

Table 1 summarizes the average coverage and precision of the four approaches across all the generated inputs. Overall, ReX improves the coverage significantly while maintaining roughly the same level of high precision as the original approaches. This is in line with our assumption that ReX is able to augment existing techniques to generate explanations that cover more inputs. On sentiment analysis, ReX has improved the average coverage of Anchors and LIME by 322.09% and 245.05% respectively. On some inputs, the improvements are as high as 473.0% and 386.1%. On anomaly detection, ReX has improved the average coverage of Anchors and LIME by 89.5% and 284.2% respectively. On some inputs, the improvements are as high as 1,310.1% and 774.9%.

Input sentence: we never feel anything for
these characters.
Network output: Negative
Anchors: {characters} Negative
Anchors*: {never, feel}

Figure 3: Representative explanations produced by Anchors and Anchors*.
Input sentence: It never fails to engage us.
Network output: Positive

Figure 4: Representative explanations produced by LIME and LIME*.

Apart from the explanations similar to ones in the introduction, there are are other explanations generated in the experiment that drew our attention. For example, consider the explanations in Figure 3, Anchors judges that the sentence is negative because it contains “characters”. This explanation does not make too much sense - either the network or the explanation technique has issues. It turns out that Anchors overfits to sentences that are similar to the original sentence and have the same length. On the other hand, the explanation generated by Anchors* is more robust as it covers more sentences. Concretely, it says the sentence is negative because it contains both “never” and “feel” and the order between them does not matter. Consider another example in Figure 4. The sentence is classified as positive because it is a double negative. However, LIME has trouble capturing this correlation between two words. As a result, its explanation does not make too much sense - it assigns a high positive score to “never”. On the other hand, LIME* is able to assign a high score to the fact that both “never” and “fails” are present and “never” comes before “fails”. These two examples show that (1) by covering more inputs, ReX can make the underlying technique generate more robust explanations, (2) by adding 2-D temporal predicates, ReX can help the underlying technique capture correlations between two features.

User Study

Original sentence: pretentious editing ruins a potentially terrific flick.
RNN output: negative.
Explanation 1: the word “ruins” appears in the sentence at the specific position.
Explanation 2: both “ruins” and “terrific” appear in the sentence, “terrific” is behind “ruins”, and there are at least
“0” words between them.

Please predict the RNN output on each sentence below according to each explanation. You can answer 0. negative, 1. positive, or 2. I don’t know.
Sentence Prediction 1 Prediction 2 ruins beneath terrific design. pretentious editing ruins a potentially terrific methodology. cult ruins a potentially lucrative planet.

Figure 5: A question in the user study.
Method Precision Coverage
Q1 Q2 Q3 Q4 Q5 Q1 Q2 Q3 Q4 Q5
Anchors 70.6% 47.4% 18.1% 47.4% 57.8% 58.0% 44.0% 37.0% 37.0% 43.5%
Anchors* 81.2% 99.4% 73.9% 84.1% 97.7% 61.9% 69.5% 80.5% 71.5% 60.5%
Table 2: Results of the user study.

We asked 19 computer science undergraduate to participate in a study to evaluate how ReX improves Anchors on the sentiment analysis network. Each participant has taken courses in machine learning but had no experience with explanation techniques before the study. The questionnaire contains five questions. Each question first presents a sentence, the network’s output on the sentence, and explanations produced by Anchor and Anchor*. The sentences are randomly chosen from the test set. Then the user is asked to predict the RNN’s output on 10 new sentences. The new sentences are produced using our perturbation model (with BERT Devlin et al. (2018)). They can answer “positive”, “negative”, or ‘I don’t know”. Figure 5 shows one such question in the questionnaire. If a user did not answer “I don’t know”, we gave a “yes” to the coverage question; further, if their prediction matches the actual network output, we gave a “yes” to the precision question.

Table 2 shows the average coverage and precision across the 19 users and 10 sentences for each question. Anchors* is better than Anchors on all questions in terms of both coverage and precision. Across these questions, Anchors* yields an average precision of 87.3% and an average coverage of 68.8%, while these numbers are only 48.3% and 43.9% for Anchors. The improvements are 80.9% and 56.7% respectively. Compared to the experiment results with simulated users, the improvement in precision is much higher while the improvement in coverage is less. We looked at the answers closely and found out sometimes users misuse explanations produced by the original Anchors. Strictly speaking, an explanation produced by Anchors only covers inputs with the same lengths as the original input. Moreover, when it says a word is present, the word must appear at the same position as it does in the original input. However, some users would ignore such constraints and applies the explanation when they are supposed to answer “I don’t know”. This situation happens for 24.8% of all the answers. As a result, they often give incorrect predictions to these sentences while the coverage is increased. On the other hand, such misuses rarely happen with Anchors* because the corresponding explanations highlight temporal information and cover much more inputs.

Similar to the experiment with simulated users, our user study shows that ReX can considerably increase the coverage of existing explanation techniques without sacrificing precision. It also shows the shortcomings of existing techniques when they are applied on RNNs. In particular, their precision or coverage drops when a user tries to apply explanations to inputs whose lengths are different from the original input.

Related Work

Our work is closely related to explanation techniques specific to RNNs and general local model-agnostic explanation techniques.

There is a long line of works on extracting DFAs and their variants from RNNs. For earlier works, we refer to the surveys by Wang et al. (2018) and Jacobsson (2005). The focus of this field has been developing more scalable algorithms. For example, Weiss et al. (2018) propose an approach by adapting Angluin’s algorithm. There are also works that augment the explanations with more powerful variants of DFAs, including weighted automata Ayache et al. (2018)

, Discrete-Time Markov Chains 

Du et al. (2019), probabilistic automata Dong et al. (2020), and others. The drawback of these approaches is that they fail to scale to practical RNNs due to their global nature. Their applications are mostly limited to explaining RNNs that learn regular expressions.

There also exist local explanation techniques to RNNs that attribute importance to features Arras et al. (2019). However, they fail to capture temporal information and their explanations are in the same form as those generated by model-agnostic approaches such as LIME. The difference is that by treating RNNs as white boxes they can generate explanations more efficiently and accurately. The techniques include gradient-based sensitivity analysis Li et al. (2016); Denil et al. (2014), layer-wise relevance propagation Ding et al. (2017); Arras et al. (2017); Arjona-Medina et al. (2019), contextual decomposition Murdoch et al. (2018), and others. Extending our approach to these techniques may require modifying the underlying algorithms. This would be interesting to explore in the future.

Local model-agnostic explanation techniques apply to a wide range of machine learning models as they treat models as black boxes. As mentioned earlier, popular forms of explanations include attributing importance to features Ribeiro et al. (2016); Strumbelj and Kononenko (2014); Lundberg and Lee (2017), using a Boolean expression as a sufficient condition for deriving current model output Ribeiro et al. (2018), and counterfactuals Wachter et al. (2017); Dandl et al. (2020); Zhang et al. (2018). Besides these approaches, Individual Conditional Expectation plots visualize how the model output changes as one feature changes Goldstein et al. (2015). For more details, we refer to Chapter 9 of the popular book by  Molnar (2020). These approaches fail to capture temporal information that is internalized in RNNs.


We have proposed ReX, a general framework that adds temporal information to existing local model-agnostic explanation methods. This allows these methods to generate more useful explanations for models that handle inputs of varying lengths, with RNNs as a representative use case. ReX achieves this by extending vocabularies of explanations with temporal predicates, and modifying perturbation models that generate inputs so they can generate inputs of different lengths. We have instantiated ReX on Anchors and LIME, and demonstrated the effectiveness by applying these two techniques to two representative RNNs.


  • J. A. Arjona-Medina, M. Gillhofer, M. Widrich, T. Unterthiner, J. Brandstetter, and S. Hochreiter (2019) RUDDER: return decomposition for delayed rewards. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett (Eds.), pp. 13544–13555. External Links: Link Cited by: Related Work.
  • L. Arras, G. Montavon, K. Müller, and W. Samek (2017) Explaining recurrent neural network predictions in sentiment analysis. CoRR abs/1706.07206. External Links: Link, 1706.07206 Cited by: Introduction, Related Work.
  • L. Arras, A. Osman, K. Müller, and W. Samek (2019) Evaluating recurrent neural network explanations. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, T. Linzen, G. Chrupala, Y. Belinkov, and D. Hupkes (Eds.), pp. 113–126. External Links: Link, Document Cited by: Related Work.
  • S. Ayache, R. Eyraud, and N. Goudian (2018) Explaining black boxes on sequential data using weighted automata. In Proceedings of the 14th International Conference on Grammatical Inference, ICGI 2018, Wrocław, Poland, September 5-7, 2018, O. Unold, W. Dyrka, and W. Wieczorek (Eds.), Proceedings of Machine Learning Research, Vol. 93, pp. 81–103. External Links: Link Cited by: Related Work.
  • S. Dandl, C. Molnar, M. Binder, and B. Bischl (2020) Multi-objective counterfactual explanations. In Parallel Problem Solving from Nature - PPSN XVI - 16th International Conference, PPSN 2020, Leiden, The Netherlands, September 5-9, 2020, Proceedings, Part I, T. Bäck, M. Preuss, A. H. Deutz, H. Wang, C. Doerr, M. T. M. Emmerich, and H. Trautmann (Eds.), Lecture Notes in Computer Science, Vol. 12269, pp. 448–469. External Links: Link, Document Cited by: Preliminaries, Counterfactual Explanations Wachter et al. (2017); Dandl et al. (2020); Zhang et al. (2018), Related Work.
  • H. A. Dau, E. Keogh, K. Kamgar, C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana, Yanping, B. Hu, N. Begum, A. Bagnall, A. Mueen, G. Batista, and Hexagon-ML (2018) The ucr time series classification archive. Note: Cited by: Anomaly Detection.
  • M. Denil, A. Demiraj, and N. de Freitas (2014) Extraction of salient sentences from labelled documents. CoRR abs/1412.6815. External Links: Link, 1412.6815 Cited by: Related Work.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: Sentiment Analysis, User Study.
  • Y. Ding, Y. Liu, H. Luan, and M. Sun (2017)

    Visualizing and understanding neural machine translation

    In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, R. Barzilay and M. Kan (Eds.), pp. 1150–1159. External Links: Link, Document Cited by: Related Work.
  • G. Dong, J. Wang, J. Sun, Y. Zhang, X. Wang, T. Dai, J. S. Dong, and X. Wang (2020) Towards interpreting recurrent neural networks through probabilistic abstraction. In 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020, pp. 499–510. External Links: Link, Document Cited by: Introduction, Related Work.
  • F. Doshi-Velez, M. Kortz, R. Budish, C. Bavitz, S. Gershman, D. O’Brien, S. Schieber, J. Waldo, D. Weinberger, and A. Wood (2017) Accountability of AI under the law: the role of explanation. CoRR abs/1711.01134. External Links: Link, 1711.01134 Cited by: Introduction.
  • X. Du, X. Xie, Y. Li, L. Ma, Y. Liu, and J. Zhao (2019) DeepStellar: model-based quantitative analysis of stateful deep learning systems. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, M. Dumas, D. Pfahl, S. Apel, and A. Russo (Eds.), pp. 477–487. External Links: Link, Document Cited by: Related Work.
  • A. Goldstein, A. Kapelner, J. Bleich, and E. Pitkin (2015) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics 24 (1), pp. 44–65. External Links: Document, Link, Cited by: Related Work.
  • H. Jacobsson (2005) Rule extraction from recurrent neural networks: A taxonomy and review. Neural Comput. 17 (6), pp. 1223–1263. External Links: Link, Document Cited by: Introduction, Related Work.
  • J. Li, X. Chen, E. H. Hovy, and D. Jurafsky (2016) Visualizing and understanding neural models in NLP. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, K. Knight, A. Nenkova, and O. Rambow (Eds.), pp. 681–691. External Links: Link, Document Cited by: Related Work.
  • S. M. Lundberg and S. Lee (2017) A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett (Eds.), pp. 4765–4774. External Links: Link Cited by: Introduction, Shapley Values Strumbelj and Kononenko (2014); Lundberg and Lee (2017), Related Work.
  • C. Molnar (2020) Interpretable machine learning. Lulu. com. Cited by: Introduction, Instances, Related Work.
  • W. J. Murdoch, P. J. Liu, and B. Yu (2018) Beyond word importance: contextual decomposition to extract interactions from lstms. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, External Links: Link Cited by: Related Work.
  • C. W. Omlin and C. L. Giles (1996) Extraction of rules from discrete-time recurrent neural networks. Neural Networks 9 (1), pp. 41–52. External Links: Link, Document Cited by: Introduction.
  • J. Park (2018)

    RNN based Time-series Anomaly Detector Model Implemented in Pytorch

    Note: 2022-06-01 Cited by: Anomaly Detection.
  • R. Poyiadzi, K. Sokol, R. Santos-Rodríguez, T. D. Bie, and P. A. Flach (2020) FACE: feasible and actionable counterfactual explanations. In AIES ’20: AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, February 7-8, 2020, A. N. Markham, J. Powles, T. Walsh, and A. L. Washington (Eds.), pp. 344–350. External Links: Link, Document Cited by: Introduction.
  • M. C. F. Prosperi, Y. Guo, M. Sperrin, J. S. Koopman, J. S. Min, X. He, S. N. Rich, M. Wang, I. E. Buchan, and J. Bian (2020) Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nat. Mach. Intell. 2 (7), pp. 369–375. External Links: Link, Document Cited by: Introduction.
  • M. T. Ribeiro, S. Singh, and C. Guestrin (2016) ”Why should I trust you?”: explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, B. Krishnapuram, M. Shah, A. J. Smola, C. C. Aggarwal, D. Shen, and R. Rastogi (Eds.), pp. 1135–1144. External Links: Link, Document Cited by: Introduction, Introduction, Introduction, LIME Ribeiro et al. (2016), Related Work.
  • M. T. Ribeiro, S. Singh, and C. Guestrin (2018) Anchors: high-precision model-agnostic explanations. In

    Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018

    , S. A. McIlraith and K. Q. Weinberger (Eds.),
    pp. 1527–1535. External Links: Link Cited by: Introduction, Introduction, AnchorsRibeiro et al. (2018), Related Work.
  • R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In

    Proceedings of the 2013 conference on empirical methods in natural language processing

    pp. 1631–1642. Cited by: Sentiment Analysis.
  • E. Strumbelj and I. Kononenko (2014) Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41 (3), pp. 647–665. External Links: Link, Document Cited by: Shapley Values Strumbelj and Kononenko (2014); Lundberg and Lee (2017), Related Work.
  • S. Wachter, B. D. Mittelstadt, and C. Russell (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. CoRR abs/1711.00399. External Links: Link, 1711.00399 Cited by: Introduction, Preliminaries, Counterfactual Explanations Wachter et al. (2017); Dandl et al. (2020); Zhang et al. (2018), Related Work.
  • Q. Wang, K. Zhang, A. G. O. II, X. Xing, X. Liu, and C. L. Giles (2018) An empirical evaluation of rule extraction from recurrent neural networks. Neural Comput. 30 (9). External Links: Link, Document Cited by: Introduction, Related Work.
  • G. Weiss, Y. Goldberg, and E. Yahav (2018) Extracting automata from recurrent neural networks using queries and counterexamples. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, J. G. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, pp. 5244–5253. External Links: Link Cited by: Introduction, Related Work.
  • J. Wieting, M. Bansal, K. Gimpel, and K. Livescu (2015) Towards universal paraphrastic sentence embeddings. arXiv preprint arXiv:1511.08198. Cited by: Sentiment Analysis.
  • X. Zhang, A. Solar-Lezama, and R. Singh (2018) Interpreting neural network judgments via minimal, stable, and symbolic corrections. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 4879–4890. External Links: Link Cited by: Introduction, Introduction, Preliminaries, Counterfactual Explanations Wachter et al. (2017); Dandl et al. (2020); Zhang et al. (2018), Counterfactual Explanations Wachter et al. (2017); Dandl et al. (2020); Zhang et al. (2018), Related Work.