Process Knowledge Driven Change Point Detection for Automated Calibration of Discrete Event Simulation Models Using Machine Learning

by   Suleyman Yildirim, et al.
Wayne State University

Initial development and subsequent calibration of discrete event simulation models for complex systems require accurate identification of dynamically changing process characteristics. Existing data driven change point methods (DD-CPD) assume changes are extraneous to the system, thus cannot utilize available process knowledge. This work proposes a unified framework for process-driven multi-variate change point detection (PD-CPD) by combining change point detection models with machine learning and process-driven simulation modeling. The PD-CPD, after initializing with DD-CPD's change point(s), uses simulation models to generate system level outputs as time-series data streams which are then used to train neural network models to predict system characteristics and change points. The accuracy of the predictive models measures the likelihood that the actual process data conforms to the simulated change points in system characteristics. PD-CPD iteratively optimizes change points by repeating simulation and predictive model building steps until the set of change point(s) with the maximum likelihood is identified. Using an emergency department case study, we show that PD-CPD significantly improves change point detection accuracy over DD-CPD estimates and is able to detect actual change points.



There are no comments yet.


page 10


Learning Sinkhorn divergences for supervised change point detection

Many modern applications require detecting change points in complex sequ...

Shape-CD: Change-Point Detection in Time-Series Data with Shapes and Neurons

Change-point detection in a time series aims to discover the time points...

Change-point Detection Methods for Body-Worn Video

Body-worn video (BWV) cameras are increasingly utilized by police depart...

The development of an information criterion for Change-Point Analysis

Change-point analysis is a flexible and computationally tractable tool f...

A One-Class Support Vector Machine Calibration Method for Time Series Change Point Detection

It is important to identify the change point of a system's health status...

Population based change-point detection for the identification of homozygosity islands

In this paper, we propose a new method for offline change-point detectio...

Online detection of failures generated by storage simulator

Modern large-scale data-farms consist of hundreds of thousands of storag...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Related Work

DES models systematize the operations of real-world system as discrete event sequences and commonly used to analyze dynamic and complex systems [13]. DES models offer a flexible and dynamic system representation that are widely used for effective analysis of process-based systems in manufacturing, automotive, transportation, or healthcare industries [13, 14, 15, 16]. This paper aims to automate the detection and estimation of the changes in system characteristics to maintain the validity of DES models. Hence, it contributes to two independent literature streams: (1) automated discovery and calibration of DES models, and (2) change point analysis.
Automated discovery and calibration of models has been studied by both the DES and process mining communities. In DES literature, many studies attempt to create an automated model discovery using formal model specifications for simulation models [17, 18, 19, 20]. More recently, several studies also propose data-driven model integration using external database and analysis techniques to create simulation models [21, 22]. In addition, some DES studies focus on model calibration and parameter update using data-driven approaches [23, 24, 25]. Simulation model transformation and parameter calibration of DES formalism is discussed in different publications by Zeigler [26, 27]. Complexity of simulation model parameter calibration is also studied by Hofmann [28], and extensive development and evaluation of this procedure is analyzed by Park et al [29]. Spear [30] studied multivariate statistical analysis for calibration, uniqueness, and goodness of fit for high dimensional space and large simulation models of environmental systems. Model update and parameter calibration is also studied in process mining literature as concept drift [8, 11]. Concept drift literature primarily focuses on detection of drifts on time [31] and keeping predictive models up to date [32]. Process mining studies present different methods to discover and understand business process model changes in a period [8, 33].

The proposed PD-CPD method is an offline supervised multivariate non-parametric change point detection approach using high-dimensional dependent time series data. Change point detection and estimation is a data-driven method to identify and diagnose abrupt changes in time series data. It is a signal processing tool not only commonly used in mathematics and statistics, but also in various fields such as machine learning, finance, economics, healthcare, engineering, etc. [34, 35, 36, 37]

. Change point detection methods are traditionally classified as online (real-time setting)

[38, 39] and offline (retrospective setting) detection [40, 41]

. Most discrete event simulation models are executed offline, therefore our focus in this paper is on the multivariate offline change point detection approaches. Since there is no label to use in analysis, the change point detection problem is a typical unsupervised learning method. In the CPD literature, most methods consider one dimensional (univariate) data sets, and others use multi-dimensional (multivariate) data sets to detect changes. Change point detection methods for univariate time series has been studied for different domains in the literature

[36, 42, 43]. Since changes in a complex system may manifest across multiple system performance characteristics, it is important for the CPD method to handle multi-dimensional data. In this paper, we also consider a nonparametric (distribution free) model with multivariate dataset, because we have no known distribution or a stable distribution in most simulation models. Matteson and James [44] developed a nonparametric approach for multiple change point analysis of multivariate observations and compared with alternative methods. Other studies with nonparametric CPD approach in multivariate time series has been also analyzed in literature with different studies [45, 46, 47].
This paper mainly contributes to the automated DES model calibration literature by proposing a data-driven approach to detect and estimate the change points for resource parameter estimation. It also contributes to the nonparametric multivariate change point detection literature by leveraging the underlying process knowledge through simulation.

Ii Methodology

This section develops the proposed PD-CPD method - a novel offline multi-dimensional change point detection approach driven by process knowledge and machine learning for automated change detection and calibration of DES models. Without loss of generality, we herein consider a class of periodic type changes, i.e. resource task assignment or shift change, in the system.

Figure 1 presents the flowchart of the PD-CPD methodology. Proposed PD-CPD method executes in four stages. First stage is the initialization stage and determines an initial estimate of the system’s change point(s) by applying a multivariate nonparametric change point detection technique to the time-series dataset (for each realization). In the second stage, i.e. simulation stage, the method generates system outputs by simulating the DES model calibrated with the incumbent change points. The DES model is simulated for each realization of the system using the corresponding system inputs. Third stage is the predictive model training where we train a time series neural network model, i.e., Nonlinear Autoregressive with Exogenous Input (NARX) Model, for each realization. These neural networks are then used to predict the change points given the time-series datasets of the system’s performance outputs. Last stage is the change point evaluation and perturbation stage. In this final stage, we evaluate the change point predictions obtained using the NARX models and update the change points using a perturbation based approach. We repeat this process with the perturbed change point(s) until change point(s) with the maximum likelihood is identified.

Inputs to the proposed PD-CPD method are multiple realizations of multivariate time-series data obtained from the system representing its performance outputs at different times. These multiple realizations correspond to unique output samples of the system in which the changes occur at the same change points, e.g., multiple day outputs of a system where changes are repeated daily. In addition, we process the raw data to extract parameters of a simulation model of the system. Outputs of the simulation model are also processed using a snapshot strategy to collect multi-variate time-series performance output data (features) i.e. number in system, number in queue, utilization etc. matching the time points of the actual system’s time-series data.

In what follows, we introduce the methods required for the proposed framework, and integrate them into a unified solution algorithm for process-driven change point prediction.

Fig. 1: Process Knowledge Driven Change Point Detection (PD-CPD) Model Framework.

Ii-a Data Driven Change Point Estimation

Multivariate non-parametric CPD is the first step of our methodology to detect data-based abrupt changes using real system data. This approach provides initial CPs using raw data, without using any process knowledge. We use standard offline change point detection methods, see references [41]. There are multiple methods used for offline non-parametric change point detection such as non-parametric maximum likelihood, rank-based detection, kernel-based detection, and probabilistic methods. Change point detection method for a single change point can be expressed in general form as follows:


where and

define the section empirical estimate, and the deviation measurement, respectively. The formulation can use different types of statistics such as standard deviation, root mean squared, linear etc.

Since, we have number of change points in the case, then CPD function minimizes


where and are the first and the last sample of time series data, and constant number represents a fixed penalty added for each change point [48, 49].

Unique to our application, the multi-variate raw dataset may represent multiple realizations of the system’s performance data (i.e., periodic changes in the underlying system characteristics). For instance, the raw time-series dataset consists of the daily performance outputs of a system over the course of multiple days, i.e. realizations. Hence, applying a DD-CPD method to each realization could lead to a distinct set of change point(s). These distinct CPs implied by multiple realizations can be conciliated by selecting a representative CP set based on median or mean across realizations. The proposed PD-CPD method is agnostic to the DD-CPD method used to initialize and the initialization method of multiple CPs.

Ii-B Process-Driven Change Point Prediction

Most process-driven systems have resource level changes in the short term (e.g. shift changes during a day or ad-hoc task assignment due to congestion) and long term (e.g. every 2-3 months or seasonal changes). These changes can be classified as one-of-a-kind (e.g., ad-hoc task assignment in service processes or machine degradation in manufacturing) or periodic (shift changes). In the case of periodic change points, the real system generates multiple realizations of time series data. The historical data from the system (e.g., event logs) is processed to generate features of the system performance in the form of time series data such as average number of entities in system, average number in queue, utilization of a process etc. using a sliding time window approach. Figure 2 illustrates the multi-dimensional time-series feature data for a single change point for a single realization of the system.

Fig. 2: Multivariate time-series features of a process with a single change point.

Determination of which features to used is also a decision to be made as part of the proposed approach. For instance, if the goal is to estimate resource shift changes of a process step, then clearly utilization of that specific process as well as queues prior to the execution of that step should be included in the feature set. While some features are more important than others, the importance determination of features is left to the predictive model building step where less important features can be eliminated or new features can be extracted in a supervised manner [50]

. Such feature selection approaches are beyond the scope of our contribution. In addition, time windowing used to create time-series features of system performance is important. Clearly, these feature sets can be extracted at different levels of resolution by choosing the windowing function and temporal width. Choice of short width promotes greater sensitivity with respect to changes, but by increasing the size of time-series dataset it decreases the performance of the predictive modeling due to the system’s intrinsic variability during periods with no changes. In comparison longer width windowing gives a better predictive performance in stationary periods, but reduces sensitivity to changes. Next, we discuss simulation model, NARX neural network, and change point perturbation steps of the PD-CPD.

Ii-B1 DES Simulation Modeling - Mapping System Characteristics to Performance Outputs

DES models allow us to change the system characteristics and observe how those changes are manifested in the system outputs, therefore, it provides a powerful tool to map system characteristics changes with outputs. In implementing the PD-CPD’s second stage (simulation stage), we assume that a DES model of the system is available or can be constructed. Further, we assume that the change points are characterized such that which process step they affect and their number are known except their timing. For instance, in our case study of emergency room modeling, we consider dynamic shift changes on resource (staffing) levels. We further assume that all system changes are same type (periodic or one-of-a-kind) and share the same periodicity (if periodic). In the case of periodic changes, we repeat the DES simulation runs for each realization of the historical data using the same flow unit attributes and arrival distributions but allow for variability through the processing times. Process duration distributions are estimated by distribution fitting using all realizations’ data. DES models are configured to record high fidelity system performance data (e.g., queue lengths and wait times, total time in system). Each DES model corresponding to a realization is replicated multiple times with different random seeds and their outputs are processed through the same sliding time windowing approach used for the actual system’s dataset to obtain multi-dimensional feature sets. All DES model runs are executed with the same incumbent change points (timing and levels).

Ii-B2 Change Point Prediction and Evaluation - Nonlinear Autoregressive Exogenous Input (NARX) Neural Network Model

Given the simulated system’s performance feature sets, we train a unique neural-network model (corresponding to each realization) which predicts temporal resource levels given the time-series features. Next we test the prediction accuracy of resource levels using actual system’s features as inputs and simulated changes in resource levels as labels. For predictive modelling, we use Nonlinear Autoregressive Exogenous input (NARX) model, which is a robust class of dynamic recurrent neural network model suitable for nonlinear systems and time series and is non-parametric. NARX predicts future values of a time series

from past values of that time series and past values of a second time series . The NARX neural network output can be mathematically expressed as follows:


where is a nonlinear function that describes the system behaviour and is the approximation error. The NARX neural network framework is illustrated in Figure 3.

Fig. 3: NARX Neural Network Framework with Delayed Inputs and Outputs.

In our approach, NARX neural network model predicts the temporal resource levels using preceding resource levels and time-series performance features where . In training the NARX neural network, we use the the input

dimensional feature data from the multiple DES replications (of a realization) and the simulated resource levels (as per the incumbent change). Note that the number of DES model replications (for a given realization) determines the size of data available for training and validation of the NARX model. For validation of the NARX model, we use k-folds in training and validation approach. To determine the optimal NARX model, we apply the common trial-and-error method to specify the number of hidden neurons and training function presented by Maier and Dandy

[51]. However, the number of neurons, training functions, and other parameters can change depending on different case study. We explained all the details of our model parameters and input data in the case study section. The prediction performance of trained models were compared using mean squared error (MSE), accuracy measures. The lowest MSE or highest accuracy score provides the best prediction performance for that specific model. Once trained, each NARX model is then input with the features obtained from the actual system’s realization to make resource level predictions. These resource level predictions are then discretized by a simple rounding procedure and compared with the simulated resource levels at a selected temporal region. The reason for not comparing the predicted resource levels with those simulated in the full temporal spectrum is to increase the sensitivity of comparison accuracy. To illustrate, let’s consider discretization of a single day of 24 hours with 96 intervals of 15 minutes and a single change point at 8 AM (i.e., 32nd interval). A 30 minute timing discrepancy between predicted and simulated resource level corresponds to an accuracy of 98% which hinders the comparison of accuracies with high variability. Instead of using all 96 intervals, by using 20 intervals (2.5 hours) centered at 8 AM, the accuracy would be 90%. Without apriori imperfect knowledge of the change points, the comparison temporal spectrum cannot be determined. The proposed PD-CPD begins with a wide range of the temporal spectrum (centered at initial DD-CPD estimates of change points), i.e. low sensitivity, and then gradually constricts the temporal spectrum to increase sensitivity. Accuracy of the incumbent resource level solution (change point) is then found by averaging the accuracies across multiple realizations.

Ii-B3 Identifying Optimal Process-Driven Change Point

In the last step, we perturb and update change points that are determined by PD-CPD method in order to improve prediction results. After training the NARX neural network and testing with the actual system’s data, we obtain an average prediction accuracy indicating degree of agreement between the simulated resource levels and system’s actual performance data. Next, the incumbent change points are updated by perturbation (i.e., change times), and simulation, NARX training and testing procedure are repeated to evaluate different change point combinations.

In small instances, one can conduct an extensive neighborhood search, i.e. run simulations and NARX for assessing fitness for a small set of change point combinations to ascertain the optimal change point. However, for medium to large instances, e.g., multiple change points, such an approach may prove impractical and require an optimization approach. The perturbation-based optimization procedure of PD-CPD approach, belongs to the domain of simulation-based optimization, where at each iteration, given a direction of improvement, the incumbent solution is iterated in the improvement direction by a predetermined amount/step size. Proposed PD-CPD approach starts this perturbation with the initial change points identified through DD-CPD. At the end of first iteration, a perturbation step is executed where change times of resource levels are perturbed to obtain improvement direction. A common challenge in using solely the direction of improvement for iterations is the nonlinearity of the objective space which poses risk of convergence to a local optima. Hence, most solution approaches also utilize an exploration step where iterates are allowed to be perturbed to an inferior solution (i.e., exploration step) [52].

While many of the simulation-based optimization approaches are applicable, we use a modified form of the simulated annealing algorithm in our experiments. Simulated annealing is a probabilistic technique that brings together exploitation (choosing most accuracy improving direction) with exploration that chooses suboptimal moves with a gradually decreasing probability to allow for the algorithm to escape local minima. The tradeoff between exploration and exploitation is cast through a temperature parameter, which yields to pure exploitation as the temperature goes to zero, or a random-walk when temperature is infinite. A common approach is to start with a higher temperature to allow exploration in earlier stages, and to lower it after every iteration to ensure stability towards the end of the algorithm. We refer the interested reader to

[53, 54] for variants and uses of simulated annealing. Our rational for choosing simulated annealing was to provide a simple search method to showcase the flexibility of our framework in terms of integrating different simulation-based optimization methods. For a general overview of search algorithms within simulation literature that can be integrated into our framework, we refer the interested reader to [55].

Ii-C Solution Algorithm

In this section, we formally introduce the Process-Driven Change Point Detection algorithm that brings together the set of methods outlined in Sections II-A & B. Main objective of the algorithm is to leverage detailed DES models to capture process-driven insights, which are then used to augment the initial change point estimations that are acquired through purely data-driven methods.

Data: Time Series Data Observations
Result: Process-Driven Change Point Predictions
Stage A: Data-Driven Change Point Estimation:
  • Convert the observations from raw data into time series of the arrivals and features, , and , respectively. and indicate the set of times and features.

  • Define as the ordered change points such that for any . These change points split the sets and to segments, whereby the segment contains and . is denoted as the corresponding resource level.

  • Given , identify the optimal data driven change points , where and denote the deviation measurement, and the penalty for additional change points, respectively.

Stage B: Process-Driven Change Point Prediction:
  • Let , , , where is a sufficiently large number. Define as the set of change point scenarios. while  do


    Randomly select , an -neighbor of ,

from the list that includes all change point combinations .

    Execute the Procedure to obtain

  • the error term associated with , namely .

    If , with probability , let and .

  • Otherwise, and .

    Add the change points and the associated error

  •        to the set . Update and .
  • Optimal change point set is , where yields minimum average error, i.e. .

  • Algorithm 1 Process-Driven Change Point Detection


    The proposed algorithm (Algorithm 1) starts with data-driven estimation in Stage A that include acquisition and processing of observation data, and extraction of multi-dimensional time-series features for each observation. These feature sets are used to produce initial, data-driven change point estimates that initialize the search space for Stage B. Stage B incorporates process-driven insights to change point prediction. It starts with selecting change point combinations in the neighborhood of the incumbent solution. For each selected change point combination, multi-replication runs of a DES simulation model is executed as outlined in Section II-B1. The resulting DES simulations are used to train and predict resource levels via NARX models. NARX prediction accuracy measures for observed time-series data streams are evaluated to quantify the success rate of the change point combination. A detailed explanation of this procedure is provided in II-B2, and outlined within the Process-Driven Assessment procedure. Given the accuracy estimates for each candidate change point solution, the algorithm decides which candidate solution replaces the incumbent change point. After sufficient number of iterations, the algorithm terminates, and the change point estimate with minimal error is identified as the optimal prediction. The perturbation based optimization procedure is a modified version of simulated annealing.

    Iii Case Study

    To demonstrate the steps of our approach, we conducted a case study using an emergency department (ED) and consider two resource level changes reflecting shift changes within a single day. Resource levels are considered as in our model. This level of resources (along with other parameters and arrival data) corresponds to a realistic ED setting where there are occasional congestion induced waiting. In the pre-processing phase, the input raw data is generated by simulating the ED system for 30 realizations (e.g., 30 day history), which are then converted to time-series data. Since we use snapshot property to generate time series data with minute intervals, each time series input has data points for hours horizon. In this experiment, we use features (such as the number of entities waiting in the system, waiting in queue, time in idle/busy states) in order to predict resource level change times with multivariate data-driven change point detection. Figure 4 depicts input features and initial change point detection results.

    Fig. 4: Model Input Features and CPD Results.

    In this case study, we use non-parametric offline change point detection method considering ”mean” and ”standard deviation” changes as the change point detection statistics. For ”mean” statistic, change point detection function minimizes the total residual error from the best horizontal level for each section. Given a time series data as , CPD function finds change point such that:


    attains its minimum value. When we use standard deviation, we fix the mean, and use the following function;


    Data-driven change point detection algorithm identifies two change points of the system as and during the 24-hr day. Next, we parametrize the discrete event simulation model by setting up the changes to occur at these initial estimates. Next we simulate the ED for each realization with multiple replications and process the outputs of each realization to obtain the multivariate time-series features.

    Next, using simulation outputs, we train and validate a NARX neural network model for each realization. In the training phase, we use multiple inputs (i.e., 6 features representing the performance outputs of the simulated system) and single output (resource level at each time series point) with training samples corresponding to the simulation replications. We use scaled conjugate gradient backpropagation network function for training and validation. We note that the simulation data is only used for the training and validation, not for testing. In the testing phase, actual historical time-series data is used as input features and simulation labels (resource levels) are used for output, thereby we observe simulation model’s fitness with respect to the actual data. In this experiment, input delays, feedback delays and hidden layer size are considered as 1:2, 1:2, and 5, respectively. Training and validation sets are divided as

    and , respectively, and testing data is taken from the actual data. Figure 5 illustrates the training and validation results for a single NARX neural network. We have two change points and three different resource levels in a day. Since resource shift at is larger, prediction performance in is better than . We note that accuracy results (both in validation and testing) vary across individual NARX models corresponding to each realization (see Figure 7).

    Fig. 5: NARX Neural Network Time Series Response for Training and Validation Sets.

    For each realization, we test the corresponding NARX neural network model and find the average error across all realizations. Next, we perturb the incumbent change points as described in Algorithm 1. We use a modified version of simulated annealing to perturb the change points and obtain the change point combination that yields the minimum error. The perturbation algorithm converges the CP iterates to the actual CPs of and hours.

    Figure 6 characterizes the absolute time deviation (averaged across realizations) response as a function of different change point combinations used in simulations. We note that our proposed PD-CPD is an unsupervised approach seeking to minimize the deviation between the change points of resource levels that are simulated versus those predicted by the NARX neural network models. Figure 6’s depiction thus illustrates how different CPs used in simulations compare with the actual CPs. Absolute time deviation is the absolute distance between the actual CPs ( and hours) and those predicted by the NARX models in the testing stage, respectively. Response surface depicted in Figure 6 reveals the non-convexity of the average absolute time deviation in terms of simulated CPs. It further shows that the local minimum is attained at the actual CPs, i.e. and . The reason for non-zero average time deviation at the minimum is that NARX models for some of the realizations predict change

    Fig. 6: PD-CPD Average Absolute Time Deviation Results.

    points different than actual CPs given the time-series features of different realizations. However, the method converges to the minimizing solution of and hours which has zero deviation from the actual CPs. This is an improvement of the DD-CPD results which has a total absolute deviation of 70 minutes.

    Figure 7 compares the results of the DD-CPD and PD-CPD approaches on a daily (realization) basis for 30 days using the optimal solution. Of the 30 days, NARX models’ predictions for days 3, 6, 16, and 23 correspond to the actual CPs, i.e. no deviation. The PD-CPD outperforms the DD-CPD in 26 days out of 30 days. Note that while on days 4,26, and 28, PD-CPD’s deviations are higher than DD-CPD, these results are for individual days. Indeed, the PD-CPD’s optimizing solution corresponds to actual CPs which are better than the DD-CPD results of and which has a total absolute deviation of 70 minutes.

    Fig. 7: DD-CPD vs PD-CPD - Daily Based Absolute Time Deviation Results.

    Iv Limitation and Future Directions

    In the current study, each day is simulated with multiple replications and a NARX model is trained based on the average features across replications. Instead of averaging, one approach could be to train and validate NARX model with the non-averaged replication results. However, NARX model is not able to process more than one time series sequence as it is not able to handle the time delay between observations across a day’s replications. An alternative way (to averaging across replications) is to replicate each day’s simulation and build a NARX model for each of these simulation replications. These NARX models would then form an ensemble model for each day. We can then make aggregate predictions using these ensemble models, i.e. by choosing the majority resource level across simulation replications.

    One future research opportunity is to extend the present approach to change point detection at multiple timescales. Most discrete event systems have periodic change points that are short term(i.e. daily, weekly) as well as long term (i.e. monthly, quarterly, or yearly). This study focused on detecting change points that repeat on a single time scale where we considered daily shift changes. Future extension can investigate the presence of short-term and long-term changes (i.e., quarterly surgery block time allocation and daily staff shift changes) and their joint detection with process knowledge driven CPD method.

    V Conclusion

    In this paper, we propose a novel change point detection algorithm that combines data driven methods with process knowledge, to identify when resource levels change in discrete event systems. Unique to our approach is the use of discrete event simulation models to capture complex process dynamics that typically occur at times of resource level transition. Our experimental results indicate that the value of process knowledge is significant in improving change point detection accuracy. The proposed model can complement a large variety of data driven change point detection models, and provides an extensive basis for automated discovery of process knowledge in discrete event systems.


    • [1] M. Schluse and J. Rossmann, “From simulation to experimentable digital twins: Simulation-based development and operation of complex technical systems,” in 2016 IEEE International Symposium on Systems Engineering (ISSE).   IEEE, 2016, pp. 1–6.
    • [2] M. Schluse, M. Priggemeyer, L. Atorf, and J. Rossmann, “Experimentable digital twins—streamlining simulation-based systems engineering for industry 4.0,” IEEE Transactions on Industrial Informatics, vol. 14, no. 4, pp. 1722–1731, 2018.
    • [3] M. Basseville, “Detecting changes in signals and systems—a survey,” Automatica, vol. 24, no. 3, pp. 309–326, 1988.
    • [4] S. Robinson, Simulation: The Practice of Model Development and Use.   John Wiley and Sons, Ltd, 2004.
    • [5] W. Van Der Aalst, Process mining: discovery, conformance and enhancement of business processes.   Springer, 2011, vol. 2.
    • [6] A. Augusto, R. Conforti, M. Dumas, M. La Rosa, F. M. Maggi, A. Marrella, M. Mecella, and A. Soo, “Automated discovery of process models from event logs: Review and benchmark,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 4, pp. 686–705, 2018.
    • [7] W. Van Der Aalst, A. Adriansyah, A. K. A. De Medeiros, F. Arcieri, T. Baier, T. Blickle, J. C. Bose, P. Van Den Brand, R. Brandtjen, J. Buijs et al., “Process mining manifesto,” in International Conference on Business Process Management.   Springer, 2011, pp. 169–194.
    • [8] R. J. C. Bose, W. M. van der Aalst, I. Žliobaitė, and M. Pechenizkiy, “Handling concept drift in process mining,” in International Conference on Advanced Information Systems Engineering.   Springer, 2011, pp. 391–405.
    • [9] J. Carmona and R. Gavalda, “Online techniques for dealing with concept drift in process mining,” in International Symposium on Intelligent Data Analysis.   Springer, 2012, pp. 90–102.
    • [10] J. Martjushev, R. J. C. Bose, and W. M. van der Aalst, “Change point detection and dealing with gradual and multi-order dynamics in process mining,” in International Conference on Business Informatics Research.   Springer, 2015, pp. 161–178.
    • [11] R. J. C. Bose, W. M. Van Der Aalst, I. Žliobaitė, and M. Pechenizkiy, “Dealing with concept drifts in process mining,” IEEE transactions on neural networks and learning systems, vol. 25, no. 1, pp. 154–171, 2013.
    • [12] D. S. G. Pollock, R. C. Green, and T. Nguyen, Handbook of time series analysis, signal processing, and dynamics.   Elsevier, 1999.
    • [13] J. Banks, Handbook of simulation: principles, methodology, advances, applications, and practice.   John Wiley & Sons, 1998.
    • [14] S. Bangsow, “Use cases of discrete event simulation,” Appliance and research, Berlin, New York, 2012.
    • [15] S. Lucidi, M. Maurici, L. Paulon, F. Rinaldi, and M. Roma, “A simulation-based multiobjective optimization approach for health care service management,” IEEE Transactions on Automation Science and Engineering, vol. 13, no. 4, pp. 1480–1491, 2016.
    • [16] K. Labadi, T. Benarbia, J.-P. Barbot, S. Hamaci, and A. Omari, “Stochastic petri net modeling, simulation and analysis of public bicycle sharing systems,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 4, pp. 1380–1395, 2014.
    • [17] O. Balci, R. E. Nance, E. J. Derrick, E. H. Page, and J. L. Bishop, “Model generation issues in a simulation support environment,” in Proceedings of the 22nd conference on Winter simulation.   IEEE Press, Conference Proceedings, pp. 257–263.
    • [18] Y. J. Son, A. T. Jones, and R. A. Wysk, “Automatic generation of simulation models from neutral libraries: an example,” in Proceedings of the 32nd conference on Winter simulation.   Society for Computer Simulation International, Conference Proceedings, pp. 1558–1567.
    • [19] M. Foeken and M. Voskuijl, “Knowledge-based simulation model generation for control law design applied to a quadrotor uav,” Mathematical and Computer Modelling of Dynamical Systems, vol. 16, no. 3, pp. 241–256, 2010.
    • [20] Y. Huang, M. D. Seck, and A. Verbraeck, “From data to simulation models: component-based model generation with a data-driven approach,” in Proceedings of the Winter Simulation Conference.   Winter Simulation Conference, Conference Proceedings, pp. 3724–3734.
    • [21] G. Lucko, P. C. Benjamin, K. Swaminathan, and M. G. Madden, “Comparison of manual and automated simulation generation approaches and their use for construction applications,” in Proceedings of the 2010 winter simulation conference.   IEEE, Conference Proceedings, pp. 3132–3144.
    • [22] K.-Y. Jeong and D. Allan, “Integrated system design, analysis and database-driven simulation model generation,” in Proceedings of the 37th annual symposium on Simulation.   IEEE Computer Society, Conference Proceedings, p. 80.
    • [23] P. K. Davis, J. H. Bigelow, and J. McEver, “Model abstraction techniques and applications: informing and calibrating a multiresolution exploratory analysis model with high resolution simulation: the interdiction problem as a case history,” in Proceedings of the 32nd conference on Winter simulation.   Society for Computer Simulation International, Conference Proceedings, pp. 316–325.
    • [24] P. K. Davis and R. Hillestad, “Families of models that cross levels of resolution: Issues for design, calibration and management,” in Proceedings of 1993 Winter Simulation Conference-(WSC’93).   IEEE, Conference Proceedings, pp. 1003–1012.
    • [25] J. H. Bigelow and P. K. Davis, “Implications for model validation of multiresolution, multiperspective modeling (mrmpm) and exploratory analysis,” Rand Corp Santa Monica CA, Report, 2003.
    • [26] B. P. Zeigler, “Theory of modeling and simulation. john wiley & sons,” Inc., New York, NY, 1976.
    • [27] B. P. Zeigler, T. G. Kim, and H. Praehofer, Theory of modeling and simulation.   Academic press, 2000.
    • [28] M. Hofmann, “On the complexity of parameter calibration in simulation models,” The Journal of Defense Modeling and Simulation, vol. 2, no. 4, pp. 217–226, 2005.
    • [29] B. Park and H. Qi, “Development and evaluation of a procedure for the calibration of simulation models,” Transportation Research Record, vol. 1934, no. 1, pp. 208–217, 2005.
    • [30] R. C. Spear, “Large simulation models: calibration, uniqueness and goodness of fit,” Environmental Modelling and Software, vol. 12, no. 2-3, pp. 219–228, 1997.
    • [31] J. Gama, P. Medas, G. Castillo, and P. Rodrigues, “Learning with drift detection,” in

      Brazilian symposium on artificial intelligence

      .   Springer, 2004, pp. 286–295.
    • [32] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” Journal of Machine Learning Research, vol. 8, no. Dec, pp. 2755–2790, 2007.
    • [33] M. Prodel, V. Augusto, B. Jouaneton, L. Lamarsalle, and X. Xie, “Optimal process mining for large and complex event logs,” IEEE Transactions on Automation Science and Engineering, vol. 15, no. 3, pp. 1309–1325, 2018.
    • [34] S. Liu, M. Yamada, N. Collier, and M. Sugiyama, “Change-point detection in time-series data by relative density-ratio estimation,” Neural Networks, vol. 43, pp. 72–83, 2013.
    • [35] J. Du, X. Zhang, and J. Shi, “A physics-specific change point detection method using torque signals in pipe tightening processes,” IEEE Transactions on Automation Science and Engineering, vol. 16, no. 3, pp. 1289–1300, 2018.
    • [36]

      J. Chen and A. K. Gupta, “Testing and locating variance changepoints with application to stock prices,”

      Journal of the American Statistical association, vol. 92, no. 438, pp. 739–747, 1997.
    • [37] A. Y. Kaplan and S. L. Shishkin, Application of the change-point analysis to the investigation of the brain’s electrical activity.   Springer, 2000, pp. 333–388.
    • [38] R. Adams and D. MacKay, “Bayesian online changepoint detection (technical report),” University of Cambridge, Cambridge, 2007.
    • [39] J. Wu, Y. Chen, S. Zhou, and X. Li, “Online steady-state detection for process control using multiple change-point models and particle filters,” IEEE Transactions on Automation Science and Engineering, vol. 13, no. 2, pp. 688–700, 2015.
    • [40] M. Basseville and I. V. Nikiforov, Detection of abrupt changes: theory and application.   Prentice Hall Englewood Cliffs, 1993, vol. 104.
    • [41] C. Truong, L. Oudre, and N. Vayatis, “Selective review of offline change point detection methods,” Signal Processing, p. 107299, 2019.
    • [42] C. Inclan and G. C. Tiao, “Use of cumulative sums of squares for retrospective detection of changes of variance,” Journal of the American Statistical Association, vol. 89, no. 427, pp. 913–923, 1994.
    • [43] M. Lavielle and E. Moulines, “Least‐squares estimation of an unknown number of shifts in a time series,” Journal of time series analysis, vol. 21, no. 1, pp. 33–59, 2000.
    • [44] D. S. Matteson and N. A. James, “A nonparametric approach for multiple change point analysis of multivariate data,” Journal of the American Statistical Association, vol. 109, no. 505, pp. 334–345, 2014.
    • [45] C. Zou, G. Yin, L. Feng, Z. Wang et al., “Nonparametric maximum likelihood approach to multiple change-point problems,” The Annals of Statistics, vol. 42, no. 3, pp. 970–1002, 2014.
    • [46] M. D. Holland and D. M. Hawkins, “A control chart based on a nonparametric multivariate change-point model,” Journal of Quality Technology, vol. 46, no. 1, pp. 63–77, 2014.
    • [47] M. Zhou, X. Zi, W. Geng, and Z. Li, “A distribution-free multivariate change-point model for statistical process control,” Communications in Statistics-Simulation and Computation, vol. 44, no. 8, pp. 1975–1987, 2015.
    • [48] M. Lavielle, “Using penalized contrasts for the change-point problem,” Signal processing, vol. 85, no. 8, pp. 1501–1510, 2005.
    • [49] R. Killick, P. Fearnhead, and I. A. Eckley, “Optimal detection of changepoints with a linear computational cost,” Journal of the American Statistical Association, vol. 107, no. 500, pp. 1590–1598, 2012.
    • [50] J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, “Feature selection: A data perspective,” ACM Computing Surveys (CSUR), vol. 50, no. 6, p. 94, 2018.
    • [51] H. R. Maier and G. C. Dandy, “Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications,” Environmental modelling & software, vol. 15, no. 1, pp. 101–124, 2000.
    • [52] S. Amaran, N. V. Sahinidis, B. Sharda, and S. J. Bury, “Simulation optimization: a review of algorithms and applications,” Annals of Operations Research, vol. 240, no. 1, pp. 351–380, 2016.
    • [53] J. M. Thompson and K. A. Dowsland, “Variants of simulated annealing for the examination timetabling problem,” Annals of Operations research, vol. 63, no. 1, pp. 105–128, 1996.
    • [54] D. Bertsimas, J. Tsitsiklis et al., “Simulated annealing,” Statistical science, vol. 8, no. 1, pp. 10–15, 1993.
    • [55] T. Lacksonen, “Empirical comparison of search algorithms for discrete event simulation,” Computers & Industrial Engineering, vol. 40, no. 1-2, pp. 133–148, 2001.