Advanced Symbolic Time Series Analysis in Cyber Physical Systems

02/02/2018 ∙ by Roland Ritt, et al. ∙ Montan Universität Leoben 0

This paper presents advanced symbolic time series analysis (ASTSA) for large data sets emanating from cyber physical systems (CPS). The definition of CPS most pertinent to this paper is: A CPS is a system with a coupling of the cyber aspects of computing and communications with the physical aspects of dynamics and engineering that must abide by the laws of physics. This includes sensor networks, real-time and hybrid systems. To ensure that the computation results conform to the laws of physics a linear differential operator (LDO) is embedded in the processing channel for each sensor. In this manner the dynamics of the system can be incorporated prior to performing symbolic analysis. A non-linear quantization is used for the intervals corresponding to the symbols. The intervals are based on observed modes of the system, which can be determined either during an exploratory phase or online during operation of the system. A complete processing channel is called a single channel lexical analyser; one is made available for each sensor on the machine being observed. The implementation of LDO in the system is particularly important since it enables the establishment of a causal link between the observations of the dynamic system and their cause. Without causality there can be no semantics and without semantics no knowledge acquisition based on the physical background of the system being observed. Correlation alone is not a guarantee for causality. This work was originally motivated from the observation of large bulk mate- rial handling systems. Typically, there are n = 150...250 sensors per machine, and data is collected in a multi rate manner; whereby general sensors are sampled with f_s = 1Hz and vibration data being sampled in the kilo-hertz range.



There are no comments yet.


page 2

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Local Linear Differential Operators (LDO)

Although processing the entire ’large’ time series is a common practice in exploratory data analysis, reliable local computations (implemented as streaming algorithms) are preferred in on–line data processing. Since in this work we deal with time series emanating from cyber physical systems new techniques for local computations including the physics of the system (described by differential equations) have to be developed.

An ordinary differential equation (ODE) of the form


can be described using a linear differential operator (LDO) [2] such that where is a function of , is the -th derivative with respect to and is the exciting function, in our case the noisy sensor data. This yields to the notation [3]


Factoring leads to the compact formulation of the model




In the discrete case (3) can be formulated as matrix equation. Solving this equation for is an inverse problem which can be solved numerically in a discrete sense by


where is the solution to the inverse problem, is the pseudo-inverse of , is an orthonormal basis function set of the null space of ,

is a coefficient vector for the null space (computed by initial- and/or the boundary-values) and

is the noisy time series data vector. Algebraic implementations for the solution of such problems can be found in [4, 5, 6, 7].

The LDO, and their inverses, can be implemented as local operators and efficiently computed using a convolutional approach. This is basically a streaming-algorithm and thus suitable for big-data processing.

Furthermore, the covariance of the solution (5) is simply propagated as



as an estimate for the covariance in conjunction with the student-

and/or F-distribution permits the estimation of a confidence interval over the complete solution and allows the computation of a prediction interval for future values.

That is, the approach presented here to implementing linear differential operators not only permits the solution of embedded system dynamics but also yields a confidence interval for the predicted values of the dynamics.

2 Symbolic Time Series Analysis

The availability of the sensor signals, their regularized derivative and/or the application of a LDO permits the implementation of an advanced symbolic time series analysis (ASTSA) which includes the modelling of the system dynamics. As a result the time series (TS) can be discretized and compressed using unique symbols for different intervals (the so called alphabet). This step is named lexical analysis. A number of methods for the selection of the symbol intervals based on, e.g. , equal probability, variance or entropy can be found in literature

[8, 9, 10, 11, 12]. Here, in a new approach, we define the intervals to correspond to the modes of the dynamic system in operation, i.e. each symbol corresponds to a mode or portion of a mode which should be identified. Commonly controllers are designed to operate optimally in a number of specific but distinct modes of the dynamic system.

In a next step, connected sequences with the same symbol can be compressed to a single symbol predicated with its length. The combination of applying a LDO, lexical analysis of the derived signal and compression is called single channel lexical analyser (SCLA), see Fig. 2.

Figure 2: A single channel lexical analyser (SCLA)

Combining the output of multiple SCLA is called multi channel lexical analyser (MCLA). Two examples of symbolic time series analysis using MCLA are demonstrated in Fig. 3 and Fig. 4). For signal 1 and signal 2 the alphabet consists of the three symbols [u, s, d] assigned to the direction of the signal (up, stationary, down). The figures show two operation modes from the same machine. It can be clearly seen, that the operation modes of the machine have a different symbolic representation (visualized as different shaded colours in the plots) and allow a fast intuitive inspection and characterization of the signal.

Figure 3: Operation mode 1; the coloured areas illustrate the output of the MCLA; different colours represent different combinations of symbols from the SCLA of each channel (in this case two channels); the alphabet used for signal 1 and 2 consists of the three symbols [u, s, d]. Top: machine working in operation mode 1 with longer interrupts in-between (light blue area - both signals are stationary); Bottom: snippet of the signal showing the typical repeating pattern of operation mode 1.
Figure 4: Operation mode 2; the coloured areas illustrate the output of the MCLA; different colours represent different combinations of symbols from the SCLA of each channel (in this case two channels); the alphabet used for signal 1 and 2 consists of the three symbols [u, s, d]. Top: machine working in operation mode 2 with interrupts in-between (light blue area - both signals are stationary); Bottom: snippet of the signal showing the typical repeating pattern of operation mode 2.

The signal range from the first dashed-blue line to the dashed-red line (marked in both plots) have the same symbolic representation in both modes, whereas the portion of the signal after the dashed-red line shows a different colour-code for each mode.

The generated symbolic representation is used for further analyses. Building up histograms for occurring symbol combinations offers an insight in the overall behaviour of the system, see Fig. 5). This allows inter-machine comparison and comparison of different signal portions/ranges as well as classification of the operation mode. On top of Fig. 5 the histograms of the entire signal ranges shown in Fig 3 (top) and Fig 4 (top) are presented. The histograms for the typical repeating snippets, shown in Fig 3 (bottom) and Fig 4 (bottom), are visualized on the bottom. Since the machine is interrupted several times in both operating modes, the bins for the stationary state (ss) are more visible for the entire signal sequences (top). Excluding these bins, the statistics (histograms) of the shown snippets can act as representatives (motifs) for the operating modes. It can be seen that the histograms differ whether the machine is operating in mode 1 (left) or mode 2 (right). Especially the occurrences of dd and ud reveal the differences. In future investigations the definition of a similarity measure for such histograms is planned to compare them qualitatively and may use this for automatic operation recognition and finding motifs. Note: sorting the histograms in decreasing order of occurrences will yield a classical frequency dictionary.

(a) Operation mode 1
(b) Operation mode 2
(c) Operation mode 1 - snippet
(d) Operation mode 2 - snippet
Figure 5: Histograms of occurring symbol combinations of a machine in two different operation modes. Top: Histograms for the entire time range shown in Fig 3 (top) and Fig 4 (top); Bottom: Histograms for the signal snippets presented in Fig 3 (bottom) and Fig 4 (bottom).

A big advantage of the presented symbolic time series analysis is, that he sequence of symbols - either single or multi channel - can now be addressed with techniques more common to computational linguistics (e.g. regex) [13], which is a growing field of research.

3 Conclusion

Successful data analytics in large physical systems must embed the modelling of the individual component and complete system dynamics. This has been addressed by providing for a linear differential operator or its inverse in each and every signal- or derived-data-channel. A multi-variate symbolic time series analysis has been introduced. It permits a symbolic view of the system and its dynamics. The concept of frequency dictionaries has been applied to automatic operation recognition; this functions for operation types which are characterised by a specific distribution of symbols. A major advantage of the proposed method is its intrinsic multi-scale property. This enables the identification of very short events in very large data sets. Currently, we are performing research on the relationships between the sequences of symbols and the metaphor of language. Initial results indicate that this opens the door to take advantage of new methods emerging in computational linguistics.


  • [1] Baheti, R., Gill, H.: Cyber-physical systems. The Impact of Control Technology (2011) 161–166
  • [2] Lanczos, C.: Linear differential operators. SIAM (1961)
  • [3] O’Leary, P., Harker, M., Gugg, C.: An inverse problem approach to approximating sensor data in cyber physical systems. In: 2015 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) Proceedings. Volume 2015-July., IEEE (may 2015) 1717–1722
  • [4] Gugg, C., Harker, M., O’Leary, P., Rath, G.: An Algebraic Framework for the Real-Time Solution of Inverse Problems on Embedded Systems. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. Volume V., IEEE (aug 2015) 1097–1102
  • [5] Harker, M., O’Leary, P.: Discrete Orthogonal Polynomial Toolbox - Matlab File Exchange
  • [6] Gugg, C.: An Algebraic Framework for the Solution of Inverse Problems in Cyber-Physical Systems. Phd thesis, Montanuniversitaet Leoben (2015)
  • [7] O’Leary, P., Harker, M.:

    An algebraic framework for discrete basis functions in computer vision.

    In: Proceedings - 6th Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 2008, IEEE (dec 2008) 150–157
  • [8] Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery 15(2) (aug 2007) 107–144
  • [9] Veenman, C., Reinders, M., Backer, E.: A maximum variance cluster algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(9) (sep 2002) 1273–1280
  • [10] Chau, T., Wong, A.: Pattern discovery by residual analysis and recursive partitioning. IEEE Transactions on Knowledge and Data Engineering 11(6) (1999) 833–852
  • [11] Keogh, E., Lonardi, S., Chiu, B.Y.c.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’02, New York, New York, USA, ACM Press (2002) 550
  • [12] Daw, C.S., Finney, C.E.A., Tracy, E.R.: A review of symbolic analysis of experimental data. Review of Scientific Instruments 74(2) (feb 2003) 915–930
  • [13] Clark, A., Fox, C., Lappin, S.:

    The Handbook of Computational Linguistics and Natural Language Processing. Volume XXXIII.

    Wiley-Blackwell (2010)