Explainable Predictive Process Monitoring

08/04/2020 ∙ by Riccardo Galanti, et al. ∙ Università di Padova Universitat Politècnica de Catalunya 0

Predictive Business Process Monitoring is becoming an essential aid for organizations, providing online operational support of their processes. This paper tackles the fundamental problem of equipping predictive business process monitoring with explanation capabilities, so that not only the what but also the why is reported when predicting generic KPIs like remaining time, or activity execution. We use the game theory of Shapley Values to obtain robust explanations of the predictions. The approach has been implemented and tested on real-life benchmarks, showing for the first time how explanations can be given in the field of predictive business process monitoring.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Within the field of Process Mining, predictive monitoring aims to forecast the running process instances with the purpose of timely signalling those that require special attention (those that may take too long, cost too much, not be satisfactory, etc.). Several approaches have been proposed in literature to deal with predictive monitoring (cf. Section III-A and the survey by Márquez et al. [Marquez-Chamorro18]

), which has received significant attention in the last years. However, the majority of these approaches rely on black-box models (e.g. based on LSTM, i.e. Long Short-Term Memory neural models), which are proven to be more accurate, at the cost of being unable to provide a feedback to the user. On the other hand, approaches based on explicit rules (e.g. based on classification/regression trees) tend to be significantly less accurate. While the priority remains on giving accurate predictions, users need to be provided with an explanation of the reason why a given process execution is predicted to behave in a certain way. Otherwise, users would not trust the model, and hence they would not adopt the predictive-monitoring technology 

[10.1007/s11257-017-9195-0, doshivelez2017rigorous].

This paper tackles the problem of equipping process monitoring with explanations of the predictions. It leverages on current state of the art of Explainable AI (cf. Section III-B

), defining a framework for explainable process monitoring of generic KPIs. The proposed framework is independent of the machine- or deep-learning technique that is employed to make the predictions. However, we aim to instantiate the framework to prove its effectiveness. With this aim in mind, we built a process-monitoring framework based on LSTM models that is also able to explain any generic KPI, numerical or nominal. The LSTM choice was motivated by the fact that the literature has shown it to be among the most effective AI techniques for predictive monitoring (see Section 


Experiments were conducted on different benchmarks, including the real-life process of an Italian financial institute, with the aim of predicting different KPIs, namely remaining time, costs, and the eventual occurrence of certain undesired activities. Explanations can be generated at LSTM-model level, to be provided to process stakeholders to understand the general trend of the model, but also at run-time, to explain the predictions of each single running case. The explanations obtained for the aforementioned financial institute are in line with those of the analysts of the process. The remarkable difference is that our results were obtained within a few days of automatic computations, instead of long analyses.

The rest of the paper is organized as follows. Section II states the problem addressed in this paper. Section III summarizes the most relevant work related to process predictive monitoring and Explainable AI. Section IV sketches the state of the art on using LSTM models for predictive monitoring, on which we build to provide explanations. Section V reports on our framework for explainable predictive process monitoring. Section VI reports on our framework’s operationalization, and on the case studies conducted with an Italian financial institute, whereas Section VII concludes the paper.

Ii Problem Statement

The starting point for a prediction system is an event log. An event log is a multiset of traces. Each trace describes the life-cycle of a particular process instance (i.e., a case) in terms of the activities executed and the process attributes that are manipulated.

Definition II.1 (Events)

Let be the set of process attributes. Let be a function that assigns a domain to each process attribute . Let be . An event is a partial function assigning values to process attributes, with .

Note that the same event can potentially occur in different traces, namely attributes are given the same assignment in different traces. This means that potentially the entire same trace can appear multiple times. This motivates why an event log is to be defined as a multiset of traces.111Given a set , indicates the set of all multisets with the elements in , and indicates the universe of all sequences over elements in (Kleene’s Star).

Definition II.2 (Traces & Event Logs)

Let be the universe of events. A trace is a sequence of events, i.e. . An event-log is a multiset of traces, i.e. .

Predictive monitoring aims to estimate the future KPI values of the running cases. Here, we aim to be generic, meaning that KPIs can be of any nature:

Definition II.3 (Kpi)

Let be the universe of events defined over a set of attributes. Let be the domain of the KPI values. A KPI is a function such that, given a trace and an integer index , returns the KPI value of after the occurrence of the first events.222Given a sequence , indicates the length of .

Note that our KPI definition assumes it to be computed a posteriori, when the execution is completed and leaves a complete trail as a certain trace . In many cases, the KPI value is updated after each activity execution, which is recorded as next event in trace; however, other times, this is only known after the completion. We aim to be generic and account for all relevant cases. Given a trace that records a complete process execution, the following are three potential KPI definitions:

is equal to the difference between the timestamp of and that of .

It measures whether a certain activity is going to eventually occur in the future, such as an activity Open Loan in a loan-application process. The corresponding KPI definition for the occurrence of an activity is , which is equal to true if activity occurs in and ; otherwise false.

This is a typical KPI for several service providers. Let us assume, without losing generality, to have a trace where the satisfaction is known at the end, e.g. through a questionnaire. Assuming the satisfaction level is recorded with the last event - say . Then, . The following definition states the prediction problem:

Definition II.4 (The Prediction Problem)

Let be an event log that records the execution of a given process, for which a KPI is defined. Let be the trace of a running case, which eventually will complete as . The prediction problem can be formulated as forecasting the value of for all .

As indicated in Section I, we aim to provide an explanation for the predictions. In particular, for each running case, we aim to return the set of attributes that influence its prediction the most, with the corresponding magnitude and with the indication whether the attributes increase or decrease the predicted KPI’s value.

In the light of the above, for each trace , the problem can be stated as finding a function such that, for all , , and for all s.t. , is different from zero if and only if the assignment of value to attribute by has influenced the prediction of KPI . The absolute value of indicates how much this influence is, where a zero value indicates no influence. If , its positive or negative sign indicates whether the influence is towards increasing or decreasing the KPI value:

Definition II.5 (The Prediction-Explanation Problem)

Let be an event log over a set of attributes, with domains . Let be a running case with a KPI definition . Let be . Explaining the prediction is the problem of computing a function , where .

Iii Related Works

Iii-a Prediction of Process-Related KPIs

The predictive-monitoring survey of Márquez et al. [Marquez-Chamorro18] reports on the large repertoire of techniques and tools that were developed to address this problem. However, the authors claim that “little attention has been given to [] explaining the prediction values to the users so that they can determine the best way to act upon”, and that “it is necessary to develop tools that help users to query these models in order to get information that is relevant for them”. These are in fact the problems tackled in this paper, so as to ensure that the predictive-monitoring system is trusted, and thus used.

Predictive monitoring has been built on different machine and deep-learning techniques, and also on their ensemble [Marquez-Chamorro18]. Different research works have recently illustrated that the so-called Long Short-Term Memory networks (LSTMs) generally outperform other methods (see, e.g.,  [Park19, TaxVRD17, LSTM_time]). Therefore, while our explanation framework is independent of the machine- or deep-learning technique that is employed, we operationalize it with LSTMs. Section IV provides further details on LSTMs, and details how they are employed for business-process predictive monitoring.

It was explained above that little research work has been conducted on explaining the outcome of process predictive monitoring. The most relevant work is by Rehse et al. [Rehse2019], which also aims at providing a dashboard to process participants with predictions and their explanation. However, the paper does not provide sufficient details on the actual usage of the explainable-AI literature, and the very preliminary evaluation is based on one single artificial process that consists of a sequence of five activities. Breuker et al. also try to tackle the problem [10.1007/978-3-319-15895-2_46], but their attempt is not independent of the actual technique employed for predictions. Furthermore, their explanations are only based on the activity names, while the explanations can generally involve resources, time, and more (cf. the case studies reported in Section VI).

Iii-B Explanation of Machine-Learning Models

Few approaches exist in the literature to explain machine learning models, arisen from the need to understand complex black-box algorithms like ensembles of Decision Trees and Deep Learning 

[lime, shrikumar2017learning, gradients, shap].

The adoption of explanatory methods in industry is at an early stage; In [Shu:2019:DEF:3292500.3330935] an approach of fake news detection grounded in explainability is introduced. A significant amount of work in literature is focused on healthcare applications. We highlight [lundberg2018explainable], an implementation of the Shapley Values in healthcare, where the explanatory method is used to prevent hypoxaemia during surgery, and [inbook], where explainability is used for analysis of patience re-admittance.

The SHAP implementation of the Shapley values for Deep Learning has the strong theoretical foundation of the original game theory approach, with the advantage of providing offline explanations that are consistent with the online explanations. Moreover, SHAP avoids the problems in consistency seen in other explanatory approaches (e.g. the lack of robustness seen in the online surrogate models, as analysed in [alvarez2018robustness]). The framework proposed in this paper specializes the use of Shapley values to the problem of providing explanations for predictive analytics.

We also considered attention mechanisms [DBLP:conf/iclr/2015]

as an alternative. However, two limitations made us opt for Shapley values. First, attention mechanisms necessarily have to be integrated in a Neural Network architecture, while Shapley values can be applied to any Machine or Deep Learning algorithm. The second limitation is linked to the lack of consensus that attention weights are always correlated to feature importance. Jain et al. 

[DBLP:conf/naacl/JainW19] find it “at best, questionable – especially when a complex encoder is used, which may entangle inputs in the hidden space”, Serrano et al. [serrano-smith-2019-attention] state that “attention weights often fail to identify the sets of representations most important to the model’s final decision”.

Iv The Use of LSTMs for Predictive Monitoring

As indicated in Section I, we implemented our framework by leveraging on LSTM models [Hochreiter1997]

, a special type of Recurrent Neural Networks. LSTM models natively support the predictions where the independent variables are sequences of elements, and the literature has shown that they are among the most suitable methods for predictive business monitoring (cf. Section 


The construction of LSTM models fall into the problem of supervised learning, which aims to learn the model from a training set, for which the value of the dependent variable is known. This set is composed by pairs

where represents the independent variables with their values (also known as features), and is the value observed for the dependent variable (i.e. the value we aim to predict).

In the domain of LSTM learning,

consists of sequences of vectors with a certain number

of dimensions, i.e. .333In literature, LSTMs are often trained on the basis of matrices. However, a sequence of vectors in is in fact a matrix in . We use here the dataset representation as vectors to simplify the formalization. When LSTM is used for predictive business monitoring using KPI values in a domain (cf. Definition II.3), is .

With these preliminaries at hand, we built a framework for process monitoring, which is composed by an off-line and an on-line phase.

The off-line phase requires an event log and a KPI definition as input. This enables creating the dataset for training and testing the LSTM model, which consists of pairs . The input is, hence, a sequence of vectors; conversely, a trace is a sequence of events. Therefore, each event needs to be encoded as a vector, which is a problem largely studied: we use the same encoding as in [LSTM_time]; this can be abstracted as an event-to-vector encoding function . In a nutshell, each numeric attribute of event becomes a different dimension of , which takes on value . Each boolean attribute is also a different dimension, with either or depending whether is false or true. Each literal attributes is represented through the so-called one-hot encoding: one different dimension exists for each value , and the dimension referring to value takes on value , with the other dimensions be assigned value . Function can also be overloaded to traces: .

The dataset is created starting from each prefix of each trace : will generate one item in the data set consisting of a pair where and . The dataset is later divided in one larger part for training the LSTM model, and a smaller part for testing. The test part is used to evaluate the quality of the LSTM model, in terms of different metrics. Details of the proportions and the quality metrics employed are discussed in Section VI. The LSTM-based process predictor trained from a dataset can be abstracted as a function .

The on-line phase aim is to predict the KPI of interest for a set of running cases of the process, identified by a set of partial traces (i.e., a log). It relies on the LSTM-based process predictor : for each , the predicted KPI value is .

V Explanation of Generic KPI Predictions

This section reports on the main contribution of this paper, namely using Shapley Values to explain the predictions of any predictive model.

Section V-A introduces the theory behind Shapley values, while Section V-B illustrates its application and adaptation for predictive process monitoring. Then, in Section V-C we provide the general picture and the two main types of explanations reported.

V-a The Theory of Shapley Values

The Shapley Values [shapley1953value] is a game theory approach to fairly distribute the payout among the players that have collaborated in a cooperative game. This theory can be adapted as an approach to explain a predictive model. The assumption is that the features from an instance correspond to the players, and the payout is the difference between the prediction made by the predictive model and the average prediction (later referred to as the base value). Intuitively, given a predicted instance, the Shapley Value of a feature expresses how much the feature value contributes to the model prediction [molnar2019]:

Definition V.1 (Shapley Value)

Let be a set of features. The Shapley value for feature is defined as:

where is the so-called payout for only using the set of feature values in in making the prediction.

Intuitively, the formula in Definition V.1 evaluates the effect of incorporating the feature value into any possible subset of the feature values considered for prediction. In the equation, variable runs over all possible subsets of feature values, the term corresponds to the marginal value of adding in the prediction using only the set of feature values in , and the term corresponds to the permutations that can be done with subset size , to weight different sets differently in the formula. This way, all possible subsets of attributes are considered, and the corresponding effect is used to compute the Shapley Value of .

V-B Explainable Predictions through Shapley Values

The starting point is a event-to-vector encoding function that maps each event to a feature vector (cf. Section IV). Given an event , where each feature is associated with an event attribute and, possibly, with a value . We mentioned that, if an attribute is categorical, we need to introduce as many features as its possible values (one-hot encoding). Namely, is both associated with an attribute , and with a value . If the feature associated with attribute and value takes on value , then ; otherwise, the value is . If an attribute is conversely numerical, only one feature exists with value . When applied for explainable predictive monitoring, the Shapley values of a trace are computed over the features of the vector = where for .

When applying Definition V.1 to all features of , the result is a vector of Shapley values associated to feature vector , and attributes . Any Shapley value can be either positive or negative. A positive or negative value indicates that the feature contributes to increasing or decreasing the value, respectively.

This allows us to construct the explanations. The first step is to determine which features are relevant and at which timestep. For this, we consider the average of the values in

along with their standard deviation

. This allows to define an interval of Shapley values that are not considered to contribute significantly, where is a parameter set by the user. This reduces the number of features that are considered in the explanation, with the effect of limiting its verbosity.

Let us consider each Shapley value , associated with feature and an event’s attribute .

If is a numerical attribute, attribute is the explanation itself, i.e.  .

If is a categorical attribute, is a one-hot encoded feature, and it is also associated with a value . If , the explanation obtained is that contributes to the KPI value: . Otherwise, , and the explanation is , namely .

Any other combination that does not fall into the situations above is such that .

Fig. 1: Two examples of explanations using Shapley Values. When the Remaining Time predicted is high (i.e. higher than the Base Value), the Shapley Values indicate which features increase the prediction. Similarly, when the prediction is smaller than the Base Value, most of the Shapley Values are negative.

While an exact computation of the Shapley values requires to consider all combinations of features (hence, the algorithm is exponential on the number of features), efficient estimations can be obtained through polynomial algorithms that use greedy approaches  [molnar2019].

To conclude, let us illustrate how Shapley values help explain a typical KPI in predictive process monitoring: estimated remaining time. Figure 1

shows the estimated remaining time of the same case in two different moments: T1, when the case started (the upper figure, with an estimated remaining time of 1000 seconds), and T2, when it is close to its end (the lower figure, with an estimated remaining time of 260 seconds). Considering that the Base Value is 400 seconds, the explanatory method would indicate, at T1, which features have been useful for the predictive model to predict a high value, i.e. the features with a positive Shapley Value. On the other hand, for T2, most of the Shapley Values would be negative, since the model has predicted a value smaller than the base value.

V-C Overall Approach for Explaining Generic KPI Predictions

Explanations can be used offline to explain the features/factors that the trained model uses to make predictions, moreover they can be employed online on each running case to put forward the factors that affected the predictions. In particular, offline explanations are calculated on the test dataset, which is a part of the dataset not used for training the model (information about the division between train and test sets will be provided in Section VI).

V-C1 Offline Explanations

Our offline explanation strategy is to provide an heatmap that overviews the importance of each factor in explaining the instances of the test dataset.

CASE ID REMAINING TIME Explanations for increasing remaining_time Explanations for decreasing remaining_time
201810011258 5d 6h 7m ACTIVITY=Evaluating Request (NO registered letter) CLOSURE_TYPE!=Inheritance
201810000206 5d 2h 12m ROLE=DIRECTOR CLOSURE_TYPE=Bank Recess
201811010829 2d 2h 31m ROLE!=BACK-OFFICE (-1) AND ACTIVITY!=Service closure Request with BO responsibility (-1) -
TABLE I: Online explanations for Remaining Time for three running cases. When the explanation is followed by , it means that it refers to the value assigned to the attribute by the event that precedes the last of respective case.
Fig. 2: The offline explanation of the remaining time

In particular, given an event log , we consider each prefix of each trace in . Then, we compute the explanations as defined in Section V-B. Figure 2 shows an example of a heatmap reporting the frequency in which an explanation is relevant at different time-steps. The axis lists different explanations of types or while the axis refers to the time-step difference compared to the last event of the considered prefix. A cell with explanation (y axis) and time-step (x axis) takes on a value if there are prefixes of traces in s.t.  and prefixes of traces in s.t. . For instance, let us consider the explanation ROLE=BACK-OFFICE at time-step 0, which is associated with value . This means that is the difference between the number of prefixes in which ROLE=BACK-OFFICE in the last event of contributes to increasing the KPI value and the number of prefixes in which ROLE=BACK-OFFICE of last event contributes to decreasing. Similarly, is the difference when considering the second last event of the prefixes (i.e., the event before the last occurred), in place of the last. A similar reasoning can be repeated for explanations of type . The heatmap uses different shades of blue and red to highlight the magnitude of negative and positive values, respectively.

V-C2 Online Explanations

When we focus on running cases, we generate a table with one row per running case (see, e.g., Table I. Each row shows the case id, unique for each running case, the prediction for the current KPI, and the explanations that influence the prediction. Section VI discusses the case study in detail, including the results in Table I.

Vi Implementation and Experiments

The framework for explainable predictive monitoring has been implemented in Python, using Pandas to elaborate the data, and the shap library444https://shap.readthedocs.io/en/latest to explain the prediction.555https://github.com/PyRicky/LSTM_Generic_explainable

We relied on Keras framework for the LSTM implementation. The architecture was composed by 8 layers with 100 neurons each, and the learning algorithm used is the Adam optimizer with Nesterov momentum (Nadam) 


The remainder of the paper will report on the experiments with different KPIs for the process carried on in an Italian bank. However, we also conducted several additional experiments with publicly-available event logs, previously used as predictive process-monitoring benchmarks. Space limitation prevent us from reporting on them, which are however discussed in the appendix that complements this paper [Additional_Experiments].

Vi-a Domain description

Our assessment is based on the so-called Bank Account Closure, a process executed at an Italian Banking Institution. The process deals with the closure of customer’s accounts, which may be requested either by the customer or by the bank, for several reasons.

From the bank’s information system, we extracted an event log with 32.429 completed traces and 212.721 events. It contains 15 different activities, 654 possible resources (recorded in an attribute labeled Ce_Uo), divided in 3 roles (attribute role). Each trace is associated with an attribute Closure_Type, which encodes the type of procedure that is carried out for the specific account holder, and the Closure_Reason, namely the reason triggering the closure’s request. The latter is only known for 79.43% of cases.

For the bank, it is of interest to obtain an estimate of the remaining time until the end for running cases. This allows the bank to decide which cases require special attention, in order to not postpone them too much further. Also, the bank wants to be informed whether there are high chances that one or more of the following activities will occur: Authorization Requested, Pending Request for Acquittance of heirs, and Back-Office Adjustment Requested. They are linked to contingency actions, which should be avoided because they would cause inefficiencies in terms of time, costs, and resource utilization. Finally, the bank is also interested in obtaining an estimate of the total cost of a running case, in order to detect in advance which cases require particular attention.

We used two/thirds of the traces as training, and one third as test set. For improving the quality of the trained model, we used hyperparameter optimization, with 20% of the training data employed for this (validation set).

Sections VI-B, VI-C and VI-D report on the outcome for remaining-time prediction, for the prediction of the occurrence of one of those three contingency actions and for total cost prediction, respectively.

Vi-B Results on Remaining Time

Section V showed that the explanation for a learnt prediction model is given as a heatmap during the offline phase. Figure 2 refers to the application for the remaining time prediction. The fact that the closure type is not Inheritance (Closure_Type!=Inheritance) is the largest value in the heatmap (as absolute value), so it is the largest factor that influences the prediction. The information that the value is negative (i.e. -71598) indicates that the influence is towards reducing the value, namely towards having lower remaining time. From a domain viewpoint, when the type of procedure is Inheritance, the bank-account holder is passed away. A further analysis of the data confirms this finding: if the type is Inheritance, the process duration is 29 days, versus 14 days when the type is different. The evidence in the explanation illustrates that LSTM allowed learning a prediction model that leverages on the closure type to estimate the remaining time. Other important attributes are related to the role associated to the resource and the resource performing each activity. Let us consider attributes Role=Back-office and CE_UO=BOF that are related to back-office activities, which are generally performed in the final part of cases; it can be seen in the heatmap that even in this case the model is able to predict that the process instance is about to complete (a negative value again indicates smaller remaining time).

The discussion was so far focused on the attribute of the last event. However, the values of attributes of previous events also influence the prediction of remaining time as shown in the heatmaps (see columns related to timesteps -1, -2, -3 and -4). Consider, e.g., the row ROLE=Back-office and column -1: the value -4223 indicates that if the previous event refers to an activity performed by a resource with role Back-office, this influences to lower the prediction: the case is getting even closer to the end. When activities are performed by a resource director the behaviour is considered as exceptional, while activities performed by resources playing the role of applicant are in general performed in the initial part of cases; consequently, the cases usually take longer to complete. This is indicated by the positive value 4863 of the last event in the row ROLE!=Back-office, which indicates that the influence is towards increasing the remaining time. Notice that the column related to timestep -1 has a bigger value (7011), indicating that if the previous event refers to an exceptional activity, the influence on the prediction will be even stronger. Finally when the activity performed is other than Network Adjustment Requested then the predicted remaining time is smaller; this is in fact an exceptional activity, that only occurs when an error is made in the early stages of the process, and even in this case our framework was able to learn to predict a smaller remaining time when no adjustments need to be done.

Section V indicated that explanations are also given for running cases to explain predictions to process stakeholders. Our implementation returns a CSV file with the predictions for the running cases; a subset of predictions is provided in Table I, which shows the factors that increase or decrease the prediction for the remaining time prediction. Let us consider as an example the last row: the remaining time is predicted as being ca. 2 days and 2 hours, with two explanations increasing the prediction, one related to the fact that the previous activity performed was not Service Closure Request with BO Responsibility, and the other related to the resource performing the previous activity with a role not being Back-Office.

To conclude, since this KPI is numerical and the values are reasonably well balanced, we adopted the Mean Absolute Error (MAE), which is the average difference between the actual and the predicted value, computed over all test-set samples. Here, we achieved a MAE of 4.37 days, which is around the 28% of the average case duration (i.e. 15.5 days).

Vi-C Results on Prediction of Activity Occurrence

We mentioned that the financial institute aims to avoid activities related to inefficiencies (e.g. rework): Pending Request for Acquittance of Heirs, Back-Office Adjust Requested and Autorization Required. Space limitation prevents us from showing here all of three: here we focus on activity Back-Office Adjust Requested, while the other two are in the appendix complementing the paper [Additional_Experiments]. The learnt LSTM model was characterized by an F1 score of 0.65, an Area Under the Receiver Operating Charateristics (AUROC) of 0.86, and an Area under Precision/Recall curve (APR) of 0.69. We computed AUROC and APR, because these metrics are, in fact, more suitable when some classes are unbalanced. This is actually the case for our case study: the three activities are contingency actions, which occur infrequently.

Fig. 3: Offline explanations for Back-Office Adjustment Requested

The heatmap related to Back-Office Adjustment Requested prediction (Figure 3) shows that the attributes related to the type and the reason of bank account closure are influencing the most. When all bank accounts of a customer are closed (labeled by Closure_Reason=1 - Client lost) or when the customer decides to close one of its bank accounts among different ones he owns (labeled as Closure_Reason=2 - Keep bank account. Same dip), then a Back-Office Adjustment Requested is unlikely to happen. This is clearly shown in the heatmap, respectively represented by the values -40374 and -18374, which influence is towards not predicting the occurrence of this activity. Values -15934 when the Closure Type is Client Recess (it is the client that decides to close the bank account) and -11577 when it is Inheritance (the bank-account holder is passed away) indicate as well that a Back-Office Adjustment Requested is unlikely to happen. Conversely, when the Closure Type is Bank Recess (the bank account is closed by the bank) or it is Porting, then the rework activity Back-Office Adjustment Requested is more likely to occur.

Explanations are also used on-line to explain the predictions of running cases. Table II shows the factors that make the model predict whether or not activity Back-Office Adjustment Requested is expected to happen for three running cases. Values 1 and 0 indicate that the activity is expected or not to happen, respectively. Let us consider for instance the first case in the table: the rework activity is not expected to happen because two events ago Service Closure Request with Network Responsibility has been performed and because the previous event has been performed by the resource 195. Conversely, it is predicted to eventually happen for the second case in the table, and the explanation is related to the closure type being Porting and the closure reason not being Client lost.

Fig. 4: Offline explanations for Case cost

Vi-D Results on Case Cost prediction

Since this KPI is numerical, we adopted the Mean Absolute Error, and we achieved a score of 0.95. Figure 4 shows the application for the case cost prediction for the off-line phase. The main factor that contributes to increase the cost of a case is represented by Closure_Reason=1 - Client Lost, which is indicated when all bank accounts are going to be closed. The information that the value is positive (i.e. 49767) indicates that the influence is towards increasing the cost. This is mainly caused by the fact that most of the times here the director needs to carefully evaluate the request before proceeding, and the hourly director’s wage is certainly higher than that of other bank employees. Nevertheless, this evaluation is not needed when the closure of the bank account is requested by the bank (labeled as Bank Recess), therefore the predicted case cost will be smaller (indicated in the heatmap by the negative value -34243). The director is similarly not involved when customers only close one of their bank accounts (Closure_Reason=2 - Keep bank account. Same dip), which is a factor that yields lower costs. Another reason is that when only one between different bank accounts is closed, then of course the process is simpler and less Back-office adjustment activities need to be performed compared to when all bank accounts need to be closed, leading to minor costs. Another indirect evidence that the director’s involvement is a factor that increases costs is evident when one looks at the explanations based on Activity=Evaluating Request (NO registered letter). This activity needs a lot of time and is performed by the director, leading to high costs (even higher compared to the case in which a request has only to be authorized). If this activity occurs, the cost will remain permanently high. This is evident in the heatmaps: the fact that this activity was performed at previous timesteps is still influencing towards increasing the costs (see columns related to timesteps -1, -2 and -3, which values are respectively 4987, 4984 and 4948).

CASE ID Back-Office Adjustment Requested Explanations for Back-Office Adjustment Requested happening Explanations for Back-Office Adjustment Requested not happening
201810000206 0 - ACTIVITY=Service closure Request with network responsibility (-2) AND CE_UO=195 (-1)
201811008237 1 CLOSURE_TYPE=Porting -
201812005701 1 CLOSURE_REASON!=1 - Client lost -
TABLE II: Online explanations for Back-Office Adjustment Requested. Values 1 and 0 indicate if the activity is predicted to occur or not. Explanation followed by : attribute value assigned by the event that precedes the last of respective case.

Vii Conclusion

A lot of research has been devoted towards increasingly accurate frameworks for predictive process monitoring. Nonetheless, little attention has been paid to ensure that that the resulting predictive-monitoring system is workable in practice. With practical workability, here we intend that the process analysts and stakeholders need to trust the system and its predictions. Previous studies have shown that a necessary condition to build trust is to explain the reason of the provided predictions [10.1007/s11257-017-9195-0, doshivelez2017rigorous]. Proposals that do not put explanation as a core feature are not going to be adopted in practice.

This paper has put forward a framework to equip predictive-process-monitoring systems with explanations that are intelligible by actors of the process. The framework builds on the most recent state of the art on Explainable AI, and is independent of the actual AI predictive-analytics technique.

However, the operationalization of the framework requires one to select an actual AI technique, and here we opted for predictive models based on LSTM, which the present literature has shown to be the most suitable for the problem in question. The implementation is based on Python, and it has been used for several case studies. Here we reported different KPI predictions for a process run in a financial institute in Italy. The case studies shows that our framework is able to, on the one hand, provide explanations of the most salient features that influence the prediction models and, on the other hand, to provide online explanations on the running cases.

Future work accounts different directions. First, we aim to verify through interviews whether process stakeholders would fully comprehend the heatmaps and the form given to explanations. Second, we aim to explore the possibilities of Natural Language Generation techniques to report more user-friendly explanations, instead of the output shown in Tables I and II.