Signature features with the visibility transformation

04/08/2020 ∙ by Yu Wu, et al. ∙ Loughborough University University of Oxford UCL 0

The signature in rough path theory provides a graduated summary of a path through an examination of the effects of its increments. Inspired by recent developments of signature features in the context of machine learning, we explore a transformation that is able to embed the effect of the absolute position of the data stream into signature features. This unified feature is particularly effective for its simplifying role in allowing the signature feature set to accommodate nonlinear functions of absolute and relative values.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Feature extraction is the key to effective model construction in the context of machine learning. Real-world complex data always comes with lots of inherent noise and variation, thus a good choice of feature is needed to provide informative and non-redundant resources, and to facilitate the subsequent learning step. The focus of this paper, namely, the signature feature, which has its root in rough path theory, by its nature is able to capture the total ordering of the streamed data and to summarise the data over segments. Signature-based machine learning models haven proved efficient in several fields of application, from automated recognition of Chinese handwriting [5, 22] to diagnosis of mental health problems [17, 23]. The reason for its significant performance is that the controlled systems, which form a universal and effective quantifiable family that models all functions on the streamed data, have been shown to be completely determined by their signatures [14].

The signature, or an infinite sequence of coordinate iterative integrals, makes use only of the increments of the path generated from the streamed data rather than the true values. However, for some scenarios, handwriting recognition and human action recognition for example, the true position may be essential for characterising the temporal dynamics as well. The visibility transformation was therefore introduced initially as a complement in [28] for skeleton-based human action recognition tasks. It is designed to retain information about absolute position within the corresponding signature due to its algebraic nature, and therefore provides an unified framework to capture effects on path segments and path positions simultaneously. This technique, i.e., extracting signature features after the visibility transformation from longitudinal data, was later used for diagnosis of different clinic groups through longitudinal self-reported assessment data [27]. In this paper, we give a detailed introduction to visibility transformation and discuss its properties (see Theorem 7 and Theorem 8), which shed a light on its better performance in some applications compared to the performance attained by using the signature alone in certain circumstances. At the same time the availability of the well-established Python packages for calculating signature features from data streams allows easy implementation of extraction of features using the visibility transformation. Owing to the fundamental nature of the framework, we foresee a multifaceted impact in data-driven applications.

The paper is organised as follows: In Section 2 the relevant foundations concerning the signature are reviewed briefly, including coordinate iterated integrals, Chen’s identity, tree-like equivalence, etc; In Section 3 we formulate the visibility transformation map for bounded variation paths using the concatenation operator, and discuss its ability to capture the effects from the positions as well as increments. Its discrete version for the streamed data is introduced in Section 3.2 and then assessed in different applications in Section 4, where, the visibility transformation can bring in additionally useful information. We conclude our paper in Section 5. All the proofs are postponed to the Appendix.

2 Preliminaries in signatures

We consider -valued time-dependent, piecewise-differentiable paths of finite length. Such a path mapping from to is denoted as . Denote by the initial position of path and the tail position of path . For short we will use for . Each coordinate path of is a real-valued path and denoted as with . Now for a fixed ordered multi-index collection , with and for , define the coordinate iterated integral by

(1)

where the subscript denotes the lower and upper limits of the integral. It is easy to verify the recursive relation

(2)
Definition 1.

The signature of a path , denoted by , is the infinite collection of all iterated integrals of . That is,

(3)

where, the th term is equal to 1 by convention, and the superscripts of the terms after the th term run along the set of all multi-index . The finite collection of all terms with the multi-index of fixed length is termed as the kth level of the signature. The truncated signature up to the th level is denoted by .

It is not hard to deduce that the length of the signature up to level of a -dimensional path is . One important feature that can be derived from the definition is that the signature is invariant under time reparameterizations of , where by reparameterization we mean a surjective, continuous and non-decreasing map . That is, for a new path , we have

(4)

Invariance under time reparameterizations implies that the signature remains the same regardless of sampling rate. This property entails its practical advantage in identifying the trajectories shape of motions regardless of the speed. Relevant applications can be found in [5, 22, 28]. It also reveals that the signature only captures the effect of pattern change and not ones depending on the absolute position. This is further supported by the following calculation from the definition:

(5)

Note these terms of the signature are completely described by the increments of the coordinates in the right hand side.

Another crucial property of the signature in rough path theory (c.f. [18]) shows the relation between the higher level terms of the signature and the lower level terms through shuffle product.

Definition 2.

Let there be given two multi-index sets and with and . Define a new multi-index by

(6)

The shuffle product of and is a finite set

(7)

where a permutation of the set is called a -shuffle if and .

An example of the shuffle product is for , and , .

Theorem 1.

Let there be given a path and two multi-index sets and with and . Then

(8)

Theorem 1 suggests that the nonlinear effect in terms of lower level terms is equivalent to some linear effect of higher level terms. In other words, the shuffle product property of the signature ensures that the linear functionals on the signature are dense in the space of all continuous functions on signatures [13].

Alternatively, the signature of a path can be viewed as a non-commutative polynomial on the path space as follows. It also serves as a basis of functionals on the unparameterised path space, which will be defined in the next subsection.

Definition 3.

Denote the formal indeterminates by . The algebra of formal power series in non-commuting indeterminates, called the tensor algebra of

, is the vector space of all infinite series of the form

(9)

where are called monomials, with the tensor product such that

(10)

A formal polynomial is a formal power series with only finitely many non-vanishing coefficients in (9).

Definition 3 leads to the following representation of

(11)

where are coefficients of . This gives rise to the multiplicative property of the signature called Chen’s identity (c.f. [2, 14]).

Theorem 2 (Chen’s identity: a simple version).

Let , then

(12)

Theorem 2 simply asserts the multiplicative property of the signature of a path. Thus the signature of the entire path can be captured by calculating the signatures of its pieces. Before proceeding to the generalised version of Chen’s identity, which is crucial for proving one of the main results Theorem 7, we need to introduce two important concepts for paths: the concatenation and the reversal operation.

Definition 4.

Given two continuous paths and with . The concatenation product is the continuous path and defined by

(13)

Also the reversal operation is defined by

(14)

We can now present the classical Chen’s identity as follows.

Theorem 3 (Chen’s identity).

Given two continuous paths and such that . Then

(15)

Thus the multiplicative property of the signature of a path is preserved under concatenation. By now the signature map S is revealed as a homomorphism of the monoid of paths (or path segments) with concatenation into the tensor algebra. Reversing the path segment produces the inverse tensor.

Theorem 3 also leads to the fact that a bounded variation path which is completely cancelled out by itself has null effect on its increments.

Corollary 1.

For a -valued continuous path of finite length we have that

(16)

2.1 Signatures as features

The rough path theory shows that the solution of a controlled system driven by path is uniquely determined by its signature and the initial condition. Putting it in another way, for a path of finite length, the corresponding signature is the fundamental representation that captures its effect on any nonlinear system. Therefore, the coordinate iterated integrals, or the signature in total, are a natural feature set for capturing the aspects of the data that predict the effects of the path on a controlled system. Its advantage in capturing the order of events has been proved to be efficient in exploiting distinctive features of sequential data in several fields as mentioned in Section 1. The signature can remove the infinite dimensional redundant information caused by time re-parameterization while retaining the information on the order of events [15]. Moreover, the signature of a path, as a feature set, has the advantages of being able to handle time of variable length, unequal spacing and missing data in a unified way [12].

On the other hand, the signature map is not one-to-one, but its kernel is well understood. Two distinct paths can have exactly the same signature. For example, they are the same under time reparametersation as discussed above. To characterise those paths with the same signature, we determine a geometric relation on paths of finite length. Towards this goal we introduce the notions of tree-like paths and tree-like equivalence.

Definition 5.

[9] A continuous path is tree-like if there exists a nonnegative continuous function , called the height function for , defined on such that and

(17)

where is the Euclidean norm.

Definition 6.

[9] Given two -valued continuous parameterised paths with finite length such that and . We say if is a tree-like path.

(a) A 2-dimensional path .
(b) A 2-dimensional path that is tree-like equivalent with .
Figure 1: An illustration for tree-like equivalence: the left plot is a 2-dimensional curve on time range , where the , and ; the right plot is a 2-dimensional curve on the same time range, with , for , , , for , , for , , for , where , and .

Hambly-Lyons [9] shows that is an equivalence relation, and the equivalence classes form a group under concatenation.

Theorem 4.

[9] Given two -valued parameterised paths with finite length. The relation is an equivalence relation. Concatenation respects and the equivalence classes form a group under this operation.

In order to visualise tree-like equivalence, in Figure 1 we exhibit two 2 dimensional paths which are tree-like equivalent as an example. It can be seen that both of the curves have the same shape expect that right one has some "new part" that that is completely self-cancelling. Now we are able to group paths that are mutually tree-like equivalent.

Definition 7.

The unparameterised path is the tree-like equivalence class in , so that for ,

Indeed the unparameterised path contains all reparameterisations of some path . Hambly-Lyons [9] also shows the tree-like property can be captured by checking the corresponding signature:

Theorem 5.

[9] Given an -valued path with finite length. is tree-like if and only if .

Theorem 5 ensures that the unparameterised path can be uniquely characterised by its signature, i.e., for and which are both -valued parameterised paths with the same initial and tail positions, it holds that if and only if . This implies that the increments of X and Y have the same effects. The work by Boedihardjo, et al. [1] extends this result from paths of finite length to weakly geometric rough paths. As real data is often assumed to be with finite length, we will not discuss on weakly geometric rough paths.

Finally, let us present a result that may help construct a path that can be uniquely determined by and recovered from its signature:

Proposition 1.

Given a -valued parameterised path of finite length and fixed . If at least one coordinate of is monotone, then determines X uniquely.

In practice, for a time-augmented path, as its time coordinate is a monotone function, the corresponding signatures uniquely determines the original path. The signature feature provide canonical low dimensional sets of features for a continuous data streams and therefore is a good candidate for handling streamed data sets.

3 The visibility transformation

This section is devoted to the concepts and properties of the visibility transformation. We start from introducing the visibility transformation for -valued bounded variation paths and discussing its algebraic properties in Section 3.1, and then apply it to -valued paths from tick data in Section 3.2, which gives a brief idea of how to utilise the visibility transformation with the signature in real problem of longitudinal data.

3.1 The continuous path of finite length

To assist the understanding towards the visibility transformation defined later, we first introduce the plane

(18)

in as the visibility plane and the plane

(19)

the invisibility plane. Without loss of generality, the time range is always assumed to be within this subsection.

Definition 8.

Given a continuous path . The initial-position-incorporated visibility transformation (-visibility transformation) maps the path into a valued path starting at the origin, where the path is determined by the continuous function with

(20)

for , where and . Similarly the tail-position-incorporated visibility transformation (-visibility transformation) maps the path into a valued path starting at the origin, where the path is determined by the continuous function with

(21)

In the definition of the -visibility transformation, we construct two segments from path , and join them together using concatenation. The new path starts from the origin (on the invisibility plane), and moves to the initial position of path on the invisibility plane, i.e., ; then the path is made visible by being lifted onto the visibility plane, i.e., for . By contrast, for the -visibility transformation, the path is visible first and then invisible. An example in Figure 2 shows how a 2-dimensional path, which is indeed path in Figure 1, can be extended to a 3-dimensional path with invisibility and visibility information, where the time dimension is discarded.

For simplicity, we choose , and to be on . Indeed, the speed moving along the path does not affect the shape of the image.

(a) The path after the -visibility transformation .
(b) The path after the -visibility transformation .
Figure 2: An illustration for visibility transformations on path from Figure 1: the left plot shows that the -visibility transformation transforms the 2-dimensional path to a 3-dimensional curve by joining the origin to the initial position of the original curve in the invisibility plane (z=0, the light blue plane) and lifting the 2-dimensional curve to the visibility plane (z=1, the light green plane); the right plot shows that the -visibility transformation by first lifting the 2-dimensional curve to the visibility plane and joining the end of the original curve to the origin.

Figure 2 illustrates that the structure of the path remains the same though being lifted to a higher dimensional space. Indeed, the visibility transformation preserves tree-like equivalence.

Theorem 6.

Let , be continuous paths of finite length with . Then for the -visibility transformation we have and . Similarly, for the T-visibility transformation, and .

In the following, we consider an ordered multi-index collection with for , where denotes the number of elements in a set. An application of Chen’s identity leads to the following decomposition of the signature after the visibility transformation, showing that the signature of the path generated by the visibility transformation is expressed in terms of the signature of the original path.

Theorem 7.

Let be a continuous path of finite length. For a multi-index collection

(22)

and

(23)

Here , are multi-index collections, is a new multi-index collection in which is appended to , and , and are the corresponding coefficients of , and respectively.

Corollary 2.

Let be a continuous path of finite length. For , we have

(24)

Corollary 2 shows the -visibility transformation (resp. -visibility transformation) trivially captures the linear effect on the tail position (resp. the initial position) of the path. Based on the decomposition of the signature after the visibility transformation in Theorem 7, Theorem 8 shows the signature of the lifted path after the -visibility transformation captures the effects of initial position and the increments of the path simultaneously. Similarly, the signature of the lifted path after the -visibility transformation captures the effects of the tail position and the increments of the path simultaneously.

Theorem 8.

Let there be given an -valued continuous path of finite length and a multi-index collection with . Define , where is prefixed to on the left. Then

(25)

where is the corresponding coefficient of . Similarly, define , where is postfixed to on the right. Then

(26)

Similarly, for the -visibility transformation, we have

(27)

3.2 The path from streamed data

In the machine learning context, we often work on streamed data , where contains observations, and the th observation , , is assumed to be a -dimensional column vector at the th time point. For convenience, we assume the time for the th observation is simply . Later on, to extract the signature feature, the first step is to embed the time series data into a path over a continuous time interval. To do this, we usually construct a -valued continous path from through

piece-wise linear interpolation

along each coordinate dimension, denoted by . That is,

(28)

where denotes the integer part of a real number. On the other hand, signature features can be obtained directly using discrete data through the well-established Python packages iisignature [19] or esig, where the piecewise linear interpolation is implemented automatically by the packages.

However, the easiest way to apply the -visibility transformation (resp. the -visibility transformation) on the generated path is not directly following Definition 8. Instead we may first expand the streamed data from observations of dimensional vectors to observations of dimensional vectors, through which is called the discrete -visibility transformation (resp. the discrete -visibility transformation) as follows:

Definition 9.

Let there be given a discrete data sequence where are -dimensional column vectors. The discrete -visibility transformation maps to a new sequence , where are -dimensional column vectors for and given by

(29)

Here is the dimensional zero vector and denotes the transpose of matrix .

Similarly, the discrete -visibility transformation maps to a new sequence , where are -dimensional column vectors and given by

(30)

Then we generate a new path (resp. ) through piece-wise linear interpolation on (resp. ). (resp. ) is exactly the path after the -visibility transformation (resp. the -visibility transformation) based on . This construction coincides with Definition 8. In this sense, the discrete visibility transformation can be treated as an intermediate transformation. The availability of the aforementioned Python packages allows for signature feature extraction directly from data after the discrete visibility transformation.

Meanwhile, streamed data can be manipulated through other transformations together with the discrete visibility transformation. For example, the discrete -visibility transformation with lead and lag transforms [8], which accounts for the quadratic variability in data, is used in [28] for applications where the tail position of the streamed data brings in more information than the initial position.

Remark 1.

From Thoerem 8 we can see though we cannot directly compute the signature of when extracting signature features of or , each of its terms is concealed in and . More specifically, Theorem 8 illustrates that the th level signature of is captured in the th level signature of . This implies that we may simply truncate the signature of to the th level if the signature of up to the th level is needed. In this case, however, the number of signature terms computed increases from to , which leads to a growth in computational cost for extracting signature features. For example, for and , the number of terms to be computed increases from to . On the other hand, based on the assumption that position information provides extra features, embedding position information into the signature surely increases precision to some extent. Thus there is a trade-off between computational burden and accuracy.

Remark 2.

Another important object related to the signature is the log-signature, which is the logarithm of the signature [15]. The log-signature is a parsimonious description of the signature, while the (truncated) log-signature and signature are bijective. In contrast to the signature, the log-signature offers the benefit for dimension reduction, but it should be combined with non-linear models for approximating any functional on the unparameterised path space. [12].

4 Applications

4.1 Wiimote gesture classification

The original gesture data was collected from Nintendo Wiimote remote controller with built-in 3-axis accelerometer [7], namely . It includes 10 subjects, with each performing 10 gestures 10 times, where the resulting time series data are of different lengths. In particular, those 10 gestures include picking-up, shaking, one moving to the right, one move to the left, one move upwards, one move downwards, one left circle, one right circle, one move toward the screen, and one move away from the screen.

The task is now to build a signature-based model to classify the 10 gestures. The benefit from applying the signature feature is that truncating the signature at some given level will transform input data of different lengths, 3-dimensional sequential data in our case, into one-dimensional feature vectors of the same length. Based on experience in

[28], the tail position information is more important for gesture recognition. Therefore the original gesture data set was first randomly split into a training set (70%) and a testing set (30%) and then transformed to signature features with and without the -visibility transformation (SF and TVT+SF for short). Meanwhile, an additional feature is included for comparison through appending the tail position vector directly to the signature feature (TP+SF

). A random forest model was used for classification afterwards. The performances of the signature-based random forest models on signature features with or without the

-visibility transformation or with explicit tail position information for classifying 10 different gestures were tested in term of accuracy and summarised in Table 1. As mentioned before, if the signature feature up to level is extracted directly from the original sequential data, the signature feature extracted after conducting the visibility transformation should be truncated at the th level for comparison. In the experiment, the truncated levels are set to be for signature features alone in the first two rows in Table 1, and in the last two rows to be for the corresponding features with the visibility transformation.

For each column in Table 1, the accuracy from the random forest model of signature features alone up to level is much smaller than the ones from both of the models with tail position information. Furthermore, between two models that depend on the tail position information, the visibility transformation performs better. Thus the visibility transformation enhances the performance through providing extra useful information for classification, and it is able to classify with fairly high accuracy up to 87.33%. Thus that both the true positions and the increments are critical for classification in this application. In addition, recall that 1st level signature of a path is just its increment. In the light of the low accuracy from classification using 1st level signature appended by the tail position (n=1 cell in row TP+SF), the tail position and the increments of the path may not be very instructive as expected.

Meanwhile, the feature importances of the forest trained on the signature feature after the -visibility transformation can be evaluated together with getting the classification result via Sci-kit learn library. To compare across the feature importances of effects on increments and the tail position, we extract the signature feature at level after the -visibility transformation, and list all the feature importances of second level terms in desending order in Figure 3. Recall from Theorem 7 that the top six features contain combined information from both the increment and the tail position. From Theorem 8 we can see the features that are indeed increments, namely, , and , have roughly the same importance as the pure position features, namely, , and . Needless to say, the coordinate iterated integrals of the additional coordinate give no information, which coincides with the bar .

SF (up to level n) n=1 n=2 n=3 n=4 n=5 n=6
Accuracy 32.51% 63.19% 78.43% 70.91% 67.27% 70.31%
TP +SF (up to level n) n=1 n=2 n=3 n=4 n=5 n=6
Accuracy 48.85% 72.21% 84.45% 72.29% 66.90% 73.14%
TVT +SF (up to level n+1) n+1=2 n+1=3 n+1=4 n+1=5 n+1=6 n+1=7
Accuracy 80.91% 87.33% 80.21% 77.36% 75.43% 80.71%
Table 1: The accuracy for gesture classification with different signature features, where ’SF’ is short for signature features, "TP" short for the tail position, and ’TVT’ short for the -visibility transformation.
Figure 3: Feature importances of the second-level signature, that is, the second-order coordinate iterated integrals, in the forest with the signature extracted after the -visibility transformation:

are the 3-dimensional coordinates of the standard Cartesian coordinate system, and

represents the the additional dimension due to the visibility transformation; the blue bars are the feature importances, along with their inter-trees variability.

4.2 Classification of handwriting data

The concept of the visibility transformation originated in the analysis of temporal handwriting data, where the path in the visibility space indicates when one is writing the letter and the pen is visible. Apparently where one starts his/her writing is also crucial for making classification decisions. Here we choose character trajectories data set [4] for assessment, where multiple, labelled samples of pen tip trajectories are recorded whilst writing individual characters [24, 25, 26]. The data consists of 2858 instances for 20 different characters, and was captured using a WACOM tablet at 200Hz. Each character sample is a 3-dimensional pen tip velocity trajectory, namely , where is the trajectory and the coordinate represents the pen tip force. The lengths of the samples for the same character are not necessarily the same. The data has been numerically differentiated, Gaussian smoothed (with a sigma value of 2) and later normalised.

The original handwriting data contains training set (50%) and testing set (50%). To account for quadratic variability of the path, we choose to combine the -visibility transformation and the lead lag transform as described at the end of Section 3.2. All the data are then transformed to three different features respectively: signatures, signatures prefixed by the explicit initial position, and signatures with the -visibility transformation and the lead lag transform. We also extract signature features with the -visibility transformation and the lead lag transform on the trajectory, namely the path only, where we ignore the pen tip force. A random forest model was used for classification afterwards. The performances of the signature-based random forest models using three different features for classifying 20 different characters were tested in term of accuracy and plotted in Figure 4. In the experiment, the truncated levels are set to be for signature features alone, with the corresponding truncated levels for signature features with the visibility transformations to be .

It is not surprising that the accuracy of each of the three models roughly increases with . This is because higher order terms of the signature are not redundant in this case. Similarly as in Example 4.1, the performance of the model with the visibility transformation is the best and the one with signature features alone is the worst across all , which illustrates the power of the visibility transformation method in such applications. It is also worth noticing that the performance of classification using signature features with the visibility transformation on partial path is also superior to the classification using either signature features alone or prefixed initial position features on the full path . This may imply that either the pen tip force dimension may not be too informative or our proposed method is very efficient in seizing pivotal and non-redundant information from limited data.

Figure 4: Accuracy curves for handwriting classification with different signature features, where ’SF’ is short for signature feature, "IP" short for the initial position, ’LLT’ short for the lead lag transform and ’IVT’ short for the -visibility transformation.

In Table 2, the best performance of our method is compared with that of other methods proposed in the literature on this handwritten character database. Row to Row

are results from classification based on different hidden Markov models (HMMs): the cluster HMM on the hierarchical expectation–maximization (

VHEM- H3M) algorithm [3]

, the support vector machine (SVM) on features from HMM embedded entropy feature extractor (

(O,HMM)+SVM, [20]), and the generative classification based on HMM and the Fisher (FK) and TOP Kernels (TK). The accuracy of our proposed method (random forest on LLT+IVT+SF) with the signature up to different level, i.e., the blue line in Figure 4, is beyond all the HMM-related classifications. The best performance of our method can be as high as 97.32%, and the best performance of our method on partial path can achieve 96.02%. Comparing to the scalable shapelet discovery (SDD) algorithm in [6] and the modified clustering discriminant analysis ( MCDS) algorithm [10] based on an iterative optimisation procedure which both provider a bit higher precision, the implementation of our proposed method is much easier with well-designed Python packages as mentioned before.

Method Accuracy
VHEM- H3M [3] 65.10%
FK [11] 89.26%
(O,HMM)+SVM [20] 92.91%
TK [21] 93.67%
LLT+IVT+SF on 96.02%
LLT+IVT+SF 97.32%
SDD [6] 98.00%
MCDS [10] 98.25%
Table 2: Comparison for handwriting classification with different methods.

5 Conclusion

To capture the informative and non-redundant information from streamed data for learning tasks, we present a transformation that encodes the effects on the absolute position of streamed data into signature features. The enhanced feature is unified, theoretical-backed, and simple to implement with. It outperforms the signature feature alone in applications when absolute position of the data is also intrusive. In particular, it is superior to many benchmark methods that require handy data preparation and implementation of complicated algorithms in the numerical experiment.

Acknowledgements

The authors are grateful to the Alan Turing Institute for funding this work under EPSRC grant EP/N510129/1 and EPSRC for funding though the project EP/S026347/1, titled ’Unparameterised multi-modal data, high order signatures, and the mathematics of data science’.

Appendix

Proof of Theorem 6.

For two continuous bounded variation paths and , the paths generated by the -visibility transformation can be denoted as and according to Definition 8. The assumption leads to naturally. Meanwhile, implies and . This further implies that . Then we can conclude that by using the fact that the concatenation respects in Theorem 4.

The assertion for the -visibility transformation follows a similar argument.

Proof of Corollary 2.

It’s trivial. ∎

Proof of Theorem 7.

For a multi-index collection such that , it is very clear that . Note that for any multi-index collection such that , . Then the assertion follows from Chen’s identity (Theorem 3) and Eqn. (10). ∎

Proof of Theorem 8.

Eqn. (22) in Theorem 7 leads to

(31)

where , , and are the corresponding coefficients of and respectively. On the other hand, for the collection with , i.e., , . This can be shown by induction, and is omitted here. Thus the only non-vanishing term is when , where . In total, we have that by setting

(32)

For the second part, with multi-index , using a similar argument as for , we can conclude from Eqn.(22) in Theorem 7 again that

(33)

Now it remains to compute . For non-empty , by induction again we can show that

(34)

The assertion for the -visibility transformation follows a similar argument. ∎

References

  • [1] Boedihardjo, H., Geng, X., Lyons, T. and Yang, D., The signature of a rough path: uniqueness. Advances in Mathematics, 293 (2016): 720-737.
  • [2] Chen, K.T., Integration of paths–A faithful representation of paths by noncommutative formal power series. Transactions of the American Mathematical Society, 89.2 (1958): 395-407.
  • [3] Coviello, E., Chan, A.B. and Lanckriet, G.R., Clustering hidden Markov models with variational HEM. The Journal of Machine Learning Research, 15.1 (2014): 697-747.
  • [4] Dua, D. and Graff, C., UCI Machine Learning Repository. [http://archive.ics.uci.edu/ml], (2019): Irvine, CA: University of California, School of Information and Computer Science.
  • [5] Graham, B., Sparse arrays of signatures for online character recognition. arXiv preprint,arXiv:1308.0371.
  • [6] Grabocka, J., Wistuba, M. and Schmidt-Thieme, L., Fast classification of univariate and multivariate time series through shapelet discovery. Knowledge and information systems, 49.2 (2016): 429-454.
  • [7] Guna, J., Humar, I. and Pogačnik, M., Intuitive gesture based user identification system. In 35th International Conference on Telecommunications and Signal Processing (TSP), IEEE, (2012): 629-633.
  • [8] Gyurkó, L.G., Lyons, T., Kontkowski, M. and Field, J., Extracting information from the signature of a financial data stream. In arXiv preprint, arXiv:1307.7244.
  • [9] Levin, D., Lyons, T. and Ni, H., Uniqueness for the signature of a path of bounded variation and the reduced path group. Annals of Mathematics , 171.1 (2010):109-167.
  • [10] Iosifidis, A., Tefas, A. and Pitas, I., Multidimensional sequence classification based on fuzzy distances and discriminant analysis. IEEE Transactions on Knowledge and Data Engineering, 25.11 (2012): 2564-2575.
  • [11] Jaakkola, T. and Haussler, D., Exploiting generative models in discriminative classifiers. In Advances in neural information processing systems, (1999): 487-493.
  • [12] Liao, S., Lyons, T., Yang, W. and Ni, H., Learning stochastic differential equations using RNN with log signature features. arXiv preprint, arXiv:1908.08286.
  • [13] Levin, D., Lyons, T. and Ni, H., Learning from the past, predicting the statistics for the future, learning an evolving system. arXiv preprint, arXiv:1309.0260.
  • [14] Lyons, T., and Qian, Z., System control and rough paths. Oxford University Press, 2002.
  • [15] Lyons, T., Caruana, M. and Lévy, T., Differential equations driven by rough paths. Springer Berlin Heidelberg, 2007.
  • [16] Milstein, G.N., Numerical integration of stochastic differential equations. Springer Science & Business Media, 313 (1994).
  • [17] Moore, P.J., Lyons, T., Gallacher, J. and Alzheimer’s Disease Neuroimaging Initiative, Using path signatures to predict a diagnosis of Alzheimer’s disease. PloS one, 14.9 (2019).
  • [18] Ree, R., Lie elements and an algebra associated with shuffles Annals of Mathematics, (1958): 210-220.
  • [19] Reizenstein, J., The iisignature library: efficient calculation of iterated-integral signatures and log signatures. arXiv preprint, arXiv:1802.08252.
  • [20] Perina, A., Cristani, M., Castellani, U. and Murino, V., A new generative feature set based on entropy distance for discriminative classification. In International Conference on Image Analysis and Processing, (2009): 199-208, Springer, Berlin, Heidelberg.
  • [21] Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S. and Müller, K.R., A new discriminative kernel from probabilistic models. In Advances in Neural Information Processing Systems, (2002): 977-984.
  • [22] Xie, Z., Sun, Z., Jin, L., Ni, H. and Lyons, T., Learning spatial-semantic context with fully convolutional recurrent network for online handwritten Chinese text recognition. IEEE transactions on pattern analysis and machine intelligence, 40.8 (2017): 1903-1917.
  • [23] Wang, B., Liakata, M., Ni, H., Lyons, T., Nevado-Holgado, A.J. and Saunders, K., A Path Signature Approach for Speech Emotion Recognition. Interspeech 2019, ISCA (2019): 1661-1665.
  • [24] Williams, B.H., Toussaint, M. and Storkey, A.J., Extracting motion primitives from natural handwriting data.

    International Conference on Artificial Neural Networks

    , (2006): 634-643, Springer, Berlin, Heidelberg.
  • [25] Williams, B.H., Toussaint, M. and Storkey, A.J., A Primitive Based Generative Model to Infer Timing Information in Unpartitioned Handwriting Data.

    International Joint Conferences on Artificial Intelligence

    , (2007): 1119-1124.
  • [26] Williams, B., Toussaint, M. and Storkey, A.J., Modelling motion primitives and their timing in biologically executed movements. In Advances in neural information processing systems, (2008): 1609-1616).
  • [27] Wu Y., Saunders, K. and Lyons, T., Deriving information from missing data: implications for mood prediction in bipolar disorder. In preparation.
  • [28] Yang, W., Lyons, T., Ni, H., Schmid, C., Jin, L. and Chang, J., Leveraging the Path Signature for Skeleton-based Human Action Recognition. arXiv preprint, arXiv:1707.03993.
  • [29] Yu, Z., Moirangthem, D.S. and Lee, M, J.

    Continuous timescale long-short term memory neural network for human intent understanding.

    Frontiers in neurorobotics, 11 (2017):42.