1 Motivation
The increasing amount of data recorded by today’s automatized and sensorequipped production plants is an essential impetus for current developments in the industrial area. In order to face the challenge of actually making use of the recordings, machine learning algorithms can be employed to create models and hence, fully utilize the available data. In reference to a realworld production system, the relationships between multiple variables representing inputs (cf. configured process and recipe parameters), internal states (cf. measured time series from condition monitoring) and dependent outputs (cf. measured product quality indicators) have to be covered by such models to identify a system comprehensively. One modeling approach for the analysis of complex systems are variable interaction networks [8] – directed graphs representing system variables as nodes and their impact on others as weighted edges. Primarily, variable interaction networks have been employed to gain a better understanding of the interdependencies within a modeled system [4]. In this work however, we utilize them to detect changing system behavior – socalled concept drifts [3] – online. Therefore, identified system relationships are tracked over time and analyzed for changes, which might give an indication for beginning malfunctions when applied at production plants.
The objective of this approach is closely related to the currently intensively investigated topic Predictive Maintenance [7], which is concerned with forecasting the remaining useful lifetime of a production system based on its current condition and scheduling specific preventing actions proactively. However, data which enables such predictions is quite difficult to gather, starting by consolidating data from various sources, up to carrying out a large number of runtofailure experiments [9] under continuous assessment of the actual system condition. Tracking of changing system relationships however, is applicable also for less stricktly controlled environments and allows a closer look into a system’s dynamics.
In Section 2, we describe an algorithm to model variable interaction networks and present a sliding window based evaluation method, performing concept drift detection. Further on, a test problem is introduced in Section 3, which we use to generate synthetic data streams and validate and discuss our approach in Section 4. Finally, we give a brief summary and an outlook for possible future extensions in Section 5.
2 Variable Interaction Networks for Drift Detection
The developed twophase concept drift detection approach may be categorized as supervised learning based detector
[5]. While the aim of the first phase is to develop a comprehensive model, which describes an initially stable system, during the second phase this network is used to detect structural interaction changes on a continuous stream of new data from the respective system. All parts of the approach have been implemented and tested using the open source framework HeuristicLab
^{1}^{1}1https://dev.heuristiclab.com/trac.fcgi/ticket/2288.2.1 Network Modeling
As a first step, we define a set of variables within the system of interest. For each of them a regression model is trained using the other variables as inputs. For this task various machine learning algorithms may be employed, including multivariate linear regression, random forests or symbolic regression. Subsequently, we determine the relevance of each input variable for the respective target within a model: The impact of a variable is calculated based on the increasing regression error of the developed model when reevaluating it on a data set, for which the values of the variable have been randomly shuffled
[2]. By this means, the information value of the respective variable is removed from the data set, without changing its distribution. Eventually, the calculated value is normalized to the range . Subsequently, the directed, weighted graph for the variable interaction network can be constructed by creating a node for each participating variable and creating weighted edges from input to target nodes, by using the calculated impacts.Several postprocessing measures are advisable in order to prune less important nodes and edges and hence, determine more robust network structures: Regression models (and the derived variable impacts) with an estimation accuracy below a problem dependent threshold should not be taken into account right from the start, as they might not identify the system correctly. Further on, variable impacts below a userdefined threshold may be pruned to sparsify the networks, without loosing much information.
Moreover, we developed a routine, which uses the previously described impact computation as base, but assembles acyclic graphs in order to support identifying the correct variable interaction direction, as summarized in LABEL:lst:acyclic. Instead of creating edges for any computed impact, the routine adds edges stepwise, alternating with removal of the weakest links to break up cycles.
2.2 Network Evaluation
Within the second phase, as depicted in Figure 1, we consider a stream of new, unseen data, which is partitionwise analyzed (A). The described calculation of variable impacts based on the previously built models, as well as the successive creation of networks is constantly repeated (B.1), while a window slides over this data stream. For identifying drifts in the underlying system, we compute the similarity of the initially built network – representing a stable state – and the updated networks, as part of the sliding window evaluation (B.2). Presumably, changing system behavior affects internal variable dependencies to some extent, which hence, should be reflected by the freshly created networks. We apply the Spearman’s rank correlation coefficient and the normalized discounted cumulative gain (NDCG), as proposed in [6], to compare the network structures. The Spearman’s rank correlation considers only deviations in ranks, such that topranked variables are treated equally to lower ranked variables. In contrast, the NDCG puts more weight on topranked variables by using an exponential weighting scheme.
A system may be declared drifting if the similarity score during the evaluation drops below a threshold. If the actual drift state is known, as e. g. for synthetic data sets, the correlation between the drift value and the computed similarity might give a good indication of how well the drift detection performed (C).
3 Test Problem: Clogging Communicating Vessels
In order to test the proposed concept drift detection algorithm, we designed a synthetic problem based on the system of communicating vessels as illustrated in Figure 2. The vessels – and represent their current fill state – are continuously filled with fluid from two inlets. The flow rates of these inlets – and
– are independent, dynamic and defined by stationary, autoregressive models with normally distributed terms. The outlet flow rate for each vessel depends on the current fill state and hence, helps to preserve their stationarity. The communication channel between the vessels transports fluid into the vessel with the currently lower fill state and is described by the flow rate
.The dynamics of the system are defined by a system of differential equations (1),(2) and (3). For this particular example we designed a channel that may gradually clog over time, thus, eventually resulting in a malfunctioning of the vessel communication, controlled by the parameter (3). This clogging channel represents the maintenance problem, which is aimed to be found by the proposed detection algorithm.
(1)  
(2)  
(3) 
Based on the system definition we compiled a set of 10 training instances representing stable system states (i. e. remains constant) and 10 evaluation instances with drifting behavior (i. e. slowly decreases), each consisting of data points. The variables allowed for model training and evaluation, as both inputs and targets, are , , and . Further on, the first numerical derivative (as defined by the equations (1) and (2)) and the second numerical derivative for each vessel fill state are provided as additional input variables. The current flow between the vessels, represented by and the clogging factor however, remain unknown to the regression models. This limitation is inspired by realworld problems, in which availability and quality of data are not always fully ensured, either for technical, monetary or security related reasons. It is the essential motivation and goal of the proposed drift detection algorithm to estimate and monitor the changing variable interactions, when this cannot be observed directly.
4 Experiments and Results
The training of regression models, representing the variable interaction networks’ foundation, was performed with multivariate linear regression (LR), random forest (RF) and symbolic regression (SR). To tune the random forest and the symbolic regression algorithm, we performed a parameter grid search for reasonable configurations:

Random Forest: R: 0.5, M: 0.2, 100 trees

Symbolic Regression: Offspring Selection Genetic Algorithm (OSGA)
[1], population size: 100, generations: 1000, selection pressure: 100, proportional and random selection, mutation rate: 25%, crossover rate: 100%, unary functions (), binary functions , max. tree length: 25 nodes
The modeling results, aggregated for all training instances, are summarized in Table 1. Linear and symbolic regression both achieved almost perfect fits on the training as well as the test partition. The random forest models however, tend to overfit, no matter the tested algorithm parameters.
Linear Regression (LR)  Random Forest (RF)  Symbolic Regression (SR)  

NMSE  NMSE  NMSE  
0.99 / 0.99  0.00 / 0.00  0.96 / 0.76  0.06 / 0.30  0.97 / 0.97  0.02 / 0.03  
0.99 / 0.99  0.00 / 0.00  0.95 / 0.70  0.06 / 0.34  0.97 / 0.97  0.02 / 0.03  
0.99 / 0.99  0.00 / 0.00  0.95 / 0.67  0.06 / 0.41  0.93 / 0.92  0.06 / 0.09  
0.99 / 0.99  0.00 / 0.00  0.95 / 0.71  0.06 / 0.36  0.94 / 0.92  0.05 / 0.09 
Based on the regression models, the initial variable interaction networks, representing stable system behavior, have been computed. Further on, we defined a threshold for the minimum NMSE of a model has to achieve to be considered for the network creation. For the random forest model, the threshold has been set higher, to an NMSE of , because in the first modeling step we observed that the predictive quality of RF is lower compared to SR and LR. Furthermore, we set a minimum variable impact threshold of to prune less important edges from the final networks. After the modeling phase (cf. Section 2.1), one cyclic and one acyclic network version for each algorithm and each of the 10 training sets has been created.
The second phase (cf. Section 2.2) has been performed using the same configurations for the creation of networks during the sliding window evaluation. The results of the drift detection method for each regression modeling algorithm, with varying sliding window size and aggregated for all 10 drift data sets are depicted in Figure 3. The bar chart illustrates the computed correlation between the network similarity and the synthetic drift, as described in Figure 1.
According to the computed correlation scores, the linear and the symbolic regression models detect the synthetically introduced drifts quite well with a correlation of roughly . Although the performance of the random forest models clearly lags behind, one can observe that the drifts are still detected to some extent. In conclusion, the detection algorithm is agnostic to the used regression models, however, accurate models with the ability to generalize (i. e. not to overfit) are necessary.
One key factor of the detection algorithm is the sliding window size, which has to be tuned for any problem. With a large window size more stable network structures can be identified, which are valid for a longer period while moving over the data stream. This results in a smoother curve shape of the similarity score, however decreases the reaction speed to underlying trends and hence, should be limitted to a reasonable level. In this example, sizes between 100 and 200 showed similar good results. Furthermore, the acyclic networks achieved smoother similarity score curve shapes (cf. Figure 1), than networks with cycles. Although, there is no advantage according to the computed detection quality by using these networks, it is easier to define a threshold as a minimum similarity score, when the values do not vary too much within a certain period, which is a clear benefit of the acyclic networks.
5 Conclusion and Outlook
In this work we presented a machine learning based approach for identifying changing relationships of dynamical systems, such as industrial production plants. We show how variable interaction networks are developed and utilized to evaluate a continuous data stream and identify deviations from the original behavior, which eventually might enable triggering maintenance actions proactively. We implemented the algorighm using the open source framework HeuristicLab and tested the approach on a synthetic problem successfully.
As a promising next step to enhance the described approach, we consider to investigate how a closer integration of modeling and evaluation phase might lead to a more accurate calculation of variable impacts and hence, more robust networks. A repeated or openended training of regression models – which represent the foundation of the variable interaction networks – on the continuously updated data stream might provide valuable information concerning the current impact of variables. Proceeding from drift detection, investigating the dependency changes closely, might eventually enable tracking a system change back to its beginnings. Especially considering the potential value for domain experts, such a rootcause analysis would be a powerful component for future production systems.
Acknowledgments
The work described in this paper was done within the project “Smart Factory Lab” which is funded by the European Fund for Regional Development (EFRE) and the country of Upper Austria as part of the program “Investing in Growth and Jobs 20142020”.
Gabriel Kronberger gratefully acknowledges the financial support by the Austrian Federal Ministry for Digital and Economic Affairs and the National Foundation for Research, Technology and Development within the Josef Ressel Centre for Symbolic Regression.
References

[1]
(2009)
Genetic algorithms and genetic programming: modern concepts and practical applications
. CRC. Cited by: 2nd item.  [2] (2001) Random forests. Machine learning 45 (1), pp. 5–32. Cited by: §2.1.
 [3] (2014) A survey on concept drift adaptation. ACM computing surveys (CSUR) 46 (4), pp. 44. Cited by: §1.

[4]
(2011)
Data mining using unguided symbolic regression on a blast furnace dataset.
In
European Conference on the Applications of Evolutionary Computation
, pp. 274–283. Cited by: §1.  [5] (2017) Ensemble learning for data stream analysis: a survey. Information Fusion 37, pp. 132–156. Cited by: §2.
 [6] (2017) Measures for the evaluation and comparison of graphical model structures. In International Conference on Computer Aided Systems Theory, pp. 283–290. Cited by: §2.2.
 [7] (2014) Service innovation and smart analytics for industry 4.0 and big data environment. Procedia Cirp 16, pp. 3–8. Cited by: §1.
 [8] (2007) Variable interaction network based variable selection for multivariate calibration. Analytica chimica acta 599 (1), pp. 24–35. Cited by: §1.
 [9] (2008) Damage propagation modeling for aircraft engine runtofailure simulation. In International Conference on Prognostics and Health Management 2008, pp. 1–9. Cited by: §1.
Comments
There are no comments yet.