1 Local Linear Differential Operators (LDO)
Although processing the entire ’large’ time series is a common practice in exploratory data analysis, reliable local computations (implemented as streaming algorithms) are preferred in on–line data processing. Since in this work we deal with time series emanating from cyber physical systems new techniques for local computations including the physics of the system (described by differential equations) have to be developed.
An ordinary differential equation (ODE) of the form
(1) 
can be described using a linear differential operator (LDO) [2] such that where is a function of , is the th derivative with respect to and is the exciting function, in our case the noisy sensor data. This yields to the notation [3]
(2) 
Factoring leads to the compact formulation of the model
(3) 
with
(4) 
In the discrete case (3) can be formulated as matrix equation. Solving this equation for is an inverse problem which can be solved numerically in a discrete sense by
(5) 
where is the solution to the inverse problem, is the pseudoinverse of , is an orthonormal basis function set of the null space of ,
is a coefficient vector for the null space (computed by initial and/or the boundaryvalues) and
is the noisy time series data vector. Algebraic implementations for the solution of such problems can be found in [4, 5, 6, 7].The LDO, and their inverses, can be implemented as local operators and efficiently computed using a convolutional approach. This is basically a streamingalgorithm and thus suitable for bigdata processing.
Furthermore, the covariance of the solution (5) is simply propagated as
(6) 
Using
as an estimate for the covariance in conjunction with the student
and/or Fdistribution permits the estimation of a confidence interval over the complete solution and allows the computation of a prediction interval for future values.
That is, the approach presented here to implementing linear differential operators not only permits the solution of embedded system dynamics but also yields a confidence interval for the predicted values of the dynamics.
2 Symbolic Time Series Analysis
The availability of the sensor signals, their regularized derivative and/or the application of a LDO permits the implementation of an advanced symbolic time series analysis (ASTSA) which includes the modelling of the system dynamics. As a result the time series (TS) can be discretized and compressed using unique symbols for different intervals (the so called alphabet). This step is named lexical analysis. A number of methods for the selection of the symbol intervals based on, e.g. , equal probability, variance or entropy can be found in literature
[8, 9, 10, 11, 12]. Here, in a new approach, we define the intervals to correspond to the modes of the dynamic system in operation, i.e. each symbol corresponds to a mode or portion of a mode which should be identified. Commonly controllers are designed to operate optimally in a number of specific but distinct modes of the dynamic system.In a next step, connected sequences with the same symbol can be compressed to a single symbol predicated with its length. The combination of applying a LDO, lexical analysis of the derived signal and compression is called single channel lexical analyser (SCLA), see Fig. 2.
Combining the output of multiple SCLA is called multi channel lexical analyser (MCLA). Two examples of symbolic time series analysis using MCLA are demonstrated in Fig. 3 and Fig. 4). For signal 1 and signal 2 the alphabet consists of the three symbols [u, s, d] assigned to the direction of the signal (up, stationary, down). The figures show two operation modes from the same machine. It can be clearly seen, that the operation modes of the machine have a different symbolic representation (visualized as different shaded colours in the plots) and allow a fast intuitive inspection and characterization of the signal.
The signal range from the first dashedblue line to the dashedred line (marked in both plots) have the same symbolic representation in both modes, whereas the portion of the signal after the dashedred line shows a different colourcode for each mode.
The generated symbolic representation is used for further analyses. Building up histograms for occurring symbol combinations offers an insight in the overall behaviour of the system, see Fig. 5). This allows intermachine comparison and comparison of different signal portions/ranges as well as classification of the operation mode. On top of Fig. 5 the histograms of the entire signal ranges shown in Fig 3 (top) and Fig 4 (top) are presented. The histograms for the typical repeating snippets, shown in Fig 3 (bottom) and Fig 4 (bottom), are visualized on the bottom. Since the machine is interrupted several times in both operating modes, the bins for the stationary state (ss) are more visible for the entire signal sequences (top). Excluding these bins, the statistics (histograms) of the shown snippets can act as representatives (motifs) for the operating modes. It can be seen that the histograms differ whether the machine is operating in mode 1 (left) or mode 2 (right). Especially the occurrences of dd and ud reveal the differences. In future investigations the definition of a similarity measure for such histograms is planned to compare them qualitatively and may use this for automatic operation recognition and finding motifs. Note: sorting the histograms in decreasing order of occurrences will yield a classical frequency dictionary.
A big advantage of the presented symbolic time series analysis is, that he sequence of symbols  either single or multi channel  can now be addressed with techniques more common to computational linguistics (e.g. regex) [13], which is a growing field of research.
3 Conclusion
Successful data analytics in large physical systems must embed the modelling of the individual component and complete system dynamics. This has been addressed by providing for a linear differential operator or its inverse in each and every signal or deriveddatachannel. A multivariate symbolic time series analysis has been introduced. It permits a symbolic view of the system and its dynamics. The concept of frequency dictionaries has been applied to automatic operation recognition; this functions for operation types which are characterised by a specific distribution of symbols. A major advantage of the proposed method is its intrinsic multiscale property. This enables the identification of very short events in very large data sets. Currently, we are performing research on the relationships between the sequences of symbols and the metaphor of language. Initial results indicate that this opens the door to take advantage of new methods emerging in computational linguistics.
References
 [1] Baheti, R., Gill, H.: Cyberphysical systems. The Impact of Control Technology (2011) 161–166
 [2] Lanczos, C.: Linear differential operators. SIAM (1961)
 [3] O’Leary, P., Harker, M., Gugg, C.: An inverse problem approach to approximating sensor data in cyber physical systems. In: 2015 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) Proceedings. Volume 2015July., IEEE (may 2015) 1717–1722
 [4] Gugg, C., Harker, M., O’Leary, P., Rath, G.: An Algebraic Framework for the RealTime Solution of Inverse Problems on Embedded Systems. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems. Volume V., IEEE (aug 2015) 1097–1102
 [5] Harker, M., O’Leary, P.: Discrete Orthogonal Polynomial Toolbox  Matlab File Exchange
 [6] Gugg, C.: An Algebraic Framework for the Solution of Inverse Problems in CyberPhysical Systems. Phd thesis, Montanuniversitaet Leoben (2015)

[7]
O’Leary, P., Harker, M.:
An algebraic framework for discrete basis functions in computer vision.
In: Proceedings  6th Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 2008, IEEE (dec 2008) 150–157  [8] Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery 15(2) (aug 2007) 107–144
 [9] Veenman, C., Reinders, M., Backer, E.: A maximum variance cluster algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(9) (sep 2002) 1273–1280
 [10] Chau, T., Wong, A.: Pattern discovery by residual analysis and recursive partitioning. IEEE Transactions on Knowledge and Data Engineering 11(6) (1999) 833–852
 [11] Keogh, E., Lonardi, S., Chiu, B.Y.c.: Finding surprising patterns in a time series database in linear time and space. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining  KDD ’02, New York, New York, USA, ACM Press (2002) 550
 [12] Daw, C.S., Finney, C.E.A., Tracy, E.R.: A review of symbolic analysis of experimental data. Review of Scientific Instruments 74(2) (feb 2003) 915–930

[13]
Clark, A., Fox, C., Lappin, S.:
The Handbook of Computational Linguistics and Natural Language Processing. Volume XXXIII.
WileyBlackwell (2010)
Comments
There are no comments yet.