Over the years, tools from topological data analysis (TDA) have been used to characterize the invariant structure of data obtained from a noisy sampling of an underlying metric space . Invariance learning is a fundamental problem in computer vision, since common transformations can diminish the performance of algorithms significantly. Past work in invariance learning has fallen into one of two classes. The first approach involves ad-hoc choices of features or metrics between features that offer some invariance to specific factors . However, this approach has suffered due to lack of generalizable solutions. The other approach is to increase the training size by collecting samples that capture all the variations of the data, so that the learning algorithm can implicitly marginalize out the variations. A similar effect can be achieved via simple data augmentation .
In this context, TDA has emerged as a surprisingly powerful tool to analyze underlying invariant properties of data before any contextual modeling assumptions or the need to extract actionable information kicks in. Generally speaking, TDA seeks to characterize the shape
of high dimensional data by quantifying various topological invariants such as connected components, cycles, high-dimensional holes, level-sets and monotonic regions of functions defined on the data. Topological invariants are those properties that do not change under smooth deformations like stretching, bending, and rotation, but without tearing or gluing surfaces. We illustrate the connections between topological invariants and learning invariant representations for vision via three applications:
1) Point cloud shape analysis: Shape analysis of 3-dimensional (3D) point cloud data is a topic of major current interest due to emergence of Light Detection and Ranging (LIDAR) based vision systems in autonomous vehicles. It has been a difficult problem to solve with contemporary methods (e.g.deep learning) due to the non-vectorial nature of the representations. While there is interest in trying to extend deep-net architectures to point-cloud data [53, 44, 72, 46, 32], the invariance one seeks is that of shape articulation, i.e.
stretching, skewing, rotation of shape that does not alter the fundamental object class. These invariances are optimally defined in terms of topological invariants.
2) Video analysis: One of the long-standing problems in video analysis, specific to human action recognition, is to deal with variation in body type, execution style, and view-point changes. Work in this area has shown that temporal self-similarity matrices (SSMs) are a robust feature and provide general invariance to the above factors . Temporal self-similarities can be quantified by scalar field topological constructions defined over video features, leading to representations with encoded invariances not relying on brute-force training data.
3) Non-linear dynamical modeling: Many time-series analysis problems have been studied under the lens of non-linear dynamical modeling: including motion-capture analysis, wearable-based activity analysis etc. Results from dynamical systems theory (Takens’ embedding theorem ) suggest that the placement-invariant property may be related to the topological properties of reconstructed dynamical attractors via delay-embeddings.
One of the prominent TDA tools is persistent homology. It provides a multi-scale summary of different homological features . This multi-scale information is represented using a persistence diagram (PD), a 2-dimensional (2D) Cartesian plane with a multi-set of points. For a point in the PD, a homological feature appears at scale and disappears at scale . Due to the simplicity of PDs, there has been a surge of interest to use persistent homology for summarizing high-dimensional complex data and has resulted in its successful implementation in several research areas [49, 63, 14, 19, 15, 31, 57, 66]. However, application of machine learning (ML) techniques on the space of PDs has always been a challenging task. The gold-standard approach for measuring the distance between PDs is the Bottleneck or the -Wasserstein metric [45, 65]
. However, a simple metric structure is not enough to use vector based ML tools such as support vector machines (SVMs), neural networks, random forests, decision trees, principal component analysis and so on. These metrics are only stable under small perturbations of the data which the PDs summarize, and the complexity of computing distances between PDs grows in the order of, where is the number of points in the PD . Efforts have been made to overcome these problems by attempting to map PDs to spaces that are more suitable for ML tools [5, 12, 52, 48, 51, 3]. A comparison of some recent algorithms for machine learning over topological descriptors can be found in . More recently, topological methods have also shown early promise in improving performance of image-based classification algorithms in conjunction with deep-learning .
Contributions: Using a novel perturbation framework, we propose a topological representation of PDs called Perturbed Topological Signature
(PTS). To do this we first generate a set of perturbed PDs by randomly shifting the points in the original PD by a certain amount. A perturbed PD is analogous to extracting the PD from data that is subjected to topological noise. Next, we utilize a 2D probability density function (PDF) estimated by kernels on each of the perturbed PDs to generate a smooth functional representation. Finally, we simplify and summarize the end representation-space for the set of 2D PDFs to a point on the Grassmann manifold (a non-constantly curved manifold). The framework described above is illustrated in figure1. We develop very efficient ML pipelines over these topological descriptors by leveraging the known metrics and statistical results on the Grassmann manifold. We also develop a stability proof of the Grassmannian representations w.r.t. the normalized geodesic distance over the Grassmannian and the Wasserstein metrics over PDs. Experiments show that our proposed framework recovers the lost performance due to functional methods, while still enjoying orders of magnitude faster processing times over the classical -Wasserstein and Bottleneck approaches.
Outline of the paper: Section 2 provides the necessary background on topological data analysis and the Grassmannian. Section 3 discusses related work, while section 4 describes the proposed framework and end representation of the PD for statistical learning tasks. Section 5 describes the experiments and results. Section 6 concludes the paper.
Persistent Topology: Consider a graph on the high-dimensional point cloud, where is the set of nodes and defines the neighborhood relations between the samples. To estimate the topological properties of the graph’s shape, a simplicial complex is constructed over . We denote , where is a family of non-empty level sets of , with each element is a simplex . These simplices are constructed using the the -neighborhood rule, being the scale parameter . In TDA, Betti numbers provide the rank of the homology group . For instance, denotes the number of connected components, denotes the number of holes or loops, denotes the number of voids or trapped volumes, etc. They provide a good summary of a shape’s topological features. However, two shapes with same Betti numbers can have very different PDs since PDs summarize the birth vs death time information of each topological feature in a homology group. Birth time signifies the scale at which the group is formed and death time is the scale at which it ceases to exist. The difference between the death and the birth times is the lifetime of the homology group . Each PD is a multiset of points in and is hence represented graphically as a set of points in a 2D plane. The diagonal where is assumed to contain an infinite number of points since they correspond to groups of zero persistence.
We use the Vietoris-Rips (VR) construction denoted by VR(, ) to obtain simplicial complexes from for a given scale . An algorithm for computing homological persistence is provided in  and an efficient dual variant that uses co-homology is described in . The VR construction obtains the topology of the distance function on the point cloud data. However, given a graph , and a function defined on the vertices, it is also possible to quantify the topology induced by on . For example, we may want to study the topology of the sub-level or super-level sets of . This is referred to as scalar field topology since . A well-known application of this in vision is in 3D shape data, where the graph corresponds to the shape mesh and is a function, such as heat kernel signature (HKS) , defined on the mesh . The PD of the homology group of the super-level sets now describes the evolving segments of regions in the shape. For instance, if we compute the PD of the super-level sets induced by HKS in an octopus shape, we can expect to see eight highly persistent segments corresponding to the eight legs. This is because the HKS values are high at regions of high curvature in the shape. In scalar field constructions, the PDs can be obtained efficiently using the Union-Find algorithm by first sorting the nodes of as per their function magnitude and keeping a trail of the corresponding connected components .
Distance Metrics between PDs: PDs are invariant to rotations, translations and scaling of a given shape, and under continuous deformation conditions are invariant to slight permutations of the vertices [16, 17]. The two classical metrics to measure distances between PDs X and Y are the Bottleneck distance and the -Wasserstein metric [45, 65]. They are appealing as they reflect any small changes such as perturbations of a measured phenomenon on the shape, which results in small shifts to the points in the persistence diagram. The Bottleneck distance is defined as , with ranging over all bijections and is the -norm. Equivalently, the -Wasserstein distance is defined as . However, the complexity of computing distances between PDs with points is . These metrics also do not allow for easy computation of statistics and are unstable under large deformations .
Grassmann Manifold: Let be two positive integers such that . The set of -dimensional linear subspaces in is called a Grassmann manifold, denoted by . Each point on is represented as a basis, i.e. a linear combination of the set of orthonormal vectors . The geometric properties of the Grassmannian have been used for various computer vision applications, such as object recognition, shape analysis, human activity modeling and classification, face and video-based recognition, etc [9, 29, 64, 28]. We refer our readers to the following papers that provide a good introduction to the geometry, statistical analysis, and techniques for solving optimization problems on the Grassmann manifold [1, 23, 69, 13, 2].
Distance Metrics between Grassmann Representations: The minimal geodesic distance between two points and on the Grassmann manifold is the length of the shortest constant speed curve that connects these points. To do this, the velocity matrix or the inverse exponential map needs to be calculated, with the geodesic path starting at and ending at . can be computed using the numerical approximation method described in . The geodesic distance between and is represented by the following equation: or . Here is the principal angle matrix between and can be computed as , where . To show the stability of the proposed PTS representations in section 4, we use the normalized geodesic distance represented by , where is the maximum possible geodesic distance on [33, 39]. The symmetric directional distance is another popular metric to compute distances between Grassmann representations with different [61, 67]. It is a widely used measure in areas like computer vision [56, 8, 7, 43, 70], communications , and applied mathematics . It is equivalent to the chordal metric  and is defined as, . Here, and are subspace dimensions for the orthonormal matrices and respectively. For all our experiments, we restrict ourselves to distance computations between same-dimension subspaces, i.e. . The following papers propose methods to compute distances between subspaces of different dimensions [61, 67, 71].
3 Prior Art
PDs provide a compact multi-scale summary of the different topological features. The traditional metrics used to measure the distance between PDs are the Bottleneck and -Wasserstein metrics [45, 65]. These measures are stable with respect to small continuous deformations of the topology of the inputs [16, 17]. However, they do poorly under large deformations. Further, a feature vector representation will be useful that is compatible with different ML tools that demand more than just a metric. To address this need, researchers have resorted to transforming PDs to other suitable representations [5, 12, 52, 48, 51, 3]. Bubenik proposed persistence landscapes (PL) which are stable and invertible functional representations of PDs in a Banach space . A PL is a sequence of envelope functions defined on the points in PDs that are ordered on the basis of their importance. Bubenik’s main motivation for defining PLs was to derive a unique mean representation for a set of PDs which is not necessarily obtained using Fréchet means . Their usefulness is however limited, as PLs can provide low importance to moderate size homological features that generally possess high discriminating power.
Rouse et al. create a simple vector representation by overlaying a grid on top of the PD and count the number of points that fall into each bin . This method is unstable, since a small shift in the points can result in a different feature representation. This idea has also appeared in other forms, some of which are described below. Pachauri et al. transform PDs into smooth surfaces by fitting Gaussians centered at each point in the PD . Reininghaus et al. create stable representations by taking a weighted sum of positive Gaussians at each point above the diagonal and mirror the same below the diagonal but with negative Gaussians . Adams et al. design persistence images (PI) by defining a regular grid and obtaining the integral of the Gaussian-surface representation over the bins defined on each grid . Both PIs and the multi-scale kernel defined by Reininghaus et al. show stability with respect to the Wasserstein metrics and do well under small perturbations of the input data. They also weight the points using a weighting function, and this can be chosen based on the problem. Prioritizing points with medium lifetimes was used by Bendich et al. to best identify the age of a human brain by studying its arterial geometry . Cohen-Steiner et al. suggested prioritizing points near the death-axis and away from the diagonal .
In this paper, we propose a unique perturbation framework that overcomes the need for selecting a weighting function. We consider a range of topological noise realizations one could expect to see, by perturbing the points in the PD. We summarize the perturbed PDs by creating smooth surfaces from them and consider a subspace of these surfaces, which naturally becomes a point on the Grassmann manifold. We show the effectiveness of our features in section 5 for different problems using data collected from different sensing devices. Compared to the -Wasserstein and Bottleneck distances, the metrics defined on the Grassmannian are computationally less complex and the representations are independent of the number of points present in the PD. The proposed PTS representation is motivated from  , where the authors create a subspace representation of blurred faces and perform face recognition on the Grassmannian. Our framework also bears some similarities to
, where the authors create a subspace representation of blurred faces and perform face recognition on the Grassmannian. Our framework also bears some similarities to, where the authors use the square root representation of PDFs obtained from PDs.
4 Perturbed Topological Signatures
In this section we go through details of each step in our framework’s pipeline, illustrated in figure 1. In our experiments we transform the axes of the PD from , with .
Create a set of Perturbed PDs: We randomly perturb a given PD to create PDs. Each of the perturbed PDs has its points randomly displaced by a certain amount compared to the original. The set of randomly perturbed PDs retain the same topological information of the input data as the original PD, but together capture all the probable variations of the input data when subjected to topological noise. We constrain the extent of perturbation of the individual points in the PD to ensure that the topological structure of the data being analyzed is not abruptly changed.
Convert Perturbed PDs to 2D PDFs: We transform the initial PD and its set of perturbed PDs to a set of 2D PDFs. We do this via kernel density estimation: by fitting a Gaussian kernel function with zero mean, standard deviation prove the stability of persistence surfaces obtained using general and Gaussian distributions (
We transform the initial PD and its set of perturbed PDs to a set of 2D PDFs. We do this via kernel density estimation: by fitting a Gaussian kernel function with zero mean, standard deviationat each point in the PD, and then normalizing the 2D surface. The obtained PDF surface is discretized over a grid similar to the approach of Rouse et al. . The standard deviation (also known as bandwidth parameter) of the Gaussian is not known a priori and is fine-tuned to get best results. A multi-scale approach can also be employed by generating multiple surfaces using a range of different bandwidth parameters for each of the PDs and still obtain favorable results. Unlike other topological descriptors that use a weighting function over their functional representations of PDs [51, 3], we give equal importance to each point in the PD and do not resort to any weighting function. Adams et al.
prove the stability of persistence surfaces obtained using general and Gaussian distributions (), together with a weighting function (), with respect to the -Wasserstein distance between PDs in [3, Thm. 4, 9]. For Gaussian distributions, both and distances between persistence surfaces are stable with respect to -Wasserstein distance between PDs , .
Projecting 2D PDFs to the Grassmannian: Let be an unperturbed persistence surface, and let be a randomly shifted perturbation. Under assumptions of small perturbations, we have using Taylor’s theorem:
Now, in the following, we interpret as an equality, enabling us to stack together the same equation for all , to get a matrix-vector form
where the overline indicates a discrete vectorization of the 2D functions. Here, is the total number of discretized samples from the plane. Now consider the set of all small perturbations of , i.e. , over all . It is easy to see that this set is just a 2D linear-subspace in which coincides with the column-span of . For a more general affine-perturbation model, we can show that the required subspace corresponds to a 6-dimensional (6D) linear subspace, corresponding to the column-span of the matrix . More details on this can be found in the supplement. In implementation, we perturb a given PD several times using random offsets, compute their persistence surfaces, use singular value decomposition (SVD) on the stacked matrix of perturbations, then select the
. More details on this can be found in the supplement. In implementation, we perturb a given PD several times using random offsets, compute their persistence surfaces, use singular value decomposition (SVD) on the stacked matrix of perturbations, then select thelargest left singular vectors, resulting in a orthonormal matrix. Also, we vary the dimension of the subspace across a range of values. Since the linear span of our matrix can be further identified as a point on the Grassmann manifold, we adopt metrics defined over the Grassmannian to compare our perturbed topological signatures.
Stability of Grassmannian metrics w.r.t. Wasserstein: The next natural question to consider is whether the metrics over the Grassmannian for the perturbed stack are in any way related to the Wasserstein metric over the original PDs. Let the column span of be represented by . Let be two persistence surfaces, then are the subspaces spanned by and respectively. Following a result due to Ji-Guang , the normalized geodesic distance between and is upper bounded as follows: . Here, is the spectral norm of the pseudo-inverse of , is the Frobenius norm, and . In the supplement, a full derivation is given, showing , where is the -Wasserstein metric between the original unperturbed PDs, is the maximum number of points in a given PD (a dataset dependent quantity), refers to the total number of discrete samples from and . This is the critical part of the stability proof. The remaining part requires us to upper bound the spectral norm . The spectral-norm of the pseudo-inverse of , i.e. , is simply the inverse of the smallest singular-value of , which in turn corresponds to the square-root of the smallest eigenvalue of
, which in turn corresponds to the square-root of the smallest eigenvalue of. i.e.
Given that , becomes the 2D structure-tensor of a Gaussian mixture model (GMM). While we are not aware of any results that lower-bound the eigenvalues of a 2D GMMs structure-tensor, in the supplement we show an approach for 1D GMMs that indicates that the smallest eigenvalue can indeed be lower-bounded, if the standard-deviation
becomes the 2D structure-tensor of a Gaussian mixture model (GMM). While we are not aware of any results that lower-bound the eigenvalues of a 2D GMMs structure-tensor, in the supplement we show an approach for 1D GMMs that indicates that the smallest eigenvalue can indeed be lower-bounded, if the standard-deviationis upper-bounded. For example, a non-trivial lower-bound is derived for in the supplement. It is inversely proportional to the number of components in the GMM. We used for all our experiments. The approach in the supplement is shown for 1D GMMs, and we posit that a similar approach applies for the 2D case, but it is cumbersome. In empirical tests, we find that even for 2D GMMs defined over the grid , with , the spectral-norms are always upper-bounded. In general, we find , where is a positive monotonically decreasing function of in the domain , and is the number of components in the GMM (points in a given PD). If we denote and as the maximum allowable number of components in the GMM (max points in any PD in given database) and the maximum standard deviation respectively, an upper bound readily develops. Thus, we have
Please refer to the supplement for detailed derivation and explanation of the various constants in the above bound. We note that even though the above shows that the normalized Grassmannian geodesic distance over the perturbed topological signatures is stable w.r.t the -Wasserstein metric over PDs, it still relies on knowledge of the maximum number of points in any given PD across the entire dataset , and also on the sampling of the 2D grid.
In this section we first show the robustness of the PTS descriptor to different levels of topological noise using a sample of shapes from the SHREC 2010 dataset . We then test the proposed framework on three publicly available datasets: SHREC 2010 shape retrieval dataset , IXMAS multi-view video action dataset  and motion capture dataset . We briefly go over the details of each dataset, and describe the experimental objectives and procedures followed. Finally, we show the computational benefits of comparing different PTS representations using the and metrics, over the classical -Wasserstein and Bottleneck metrics used between PDs.
5.1 Robustness to Topological Noise
We conduct this experiment on randomly chosen shapes from the SHREC 2010 dataset . The dataset consists of 200 near-isometric watertight 3D shapes with articulating parts, equally divided into 10 classes. Each 3D mesh is simplified to 2000 faces. The 10 shapes used in the experiment are denoted as , . The minimum bounding sphere for each of these shapes has a mean radius of 54.4 with standard deviation of 3.7 centered at with coordinate-wise standard deviations of respectively. Next, we generate 100 sets of shapes, infused with topological noise. Topological noise is applied by changing the position of the vertices of the triangular mesh face, which results in changing its normal. We do this by applying a zero-mean Gaussian noise to the vertices of the original shape, with the standard deviation varied from 0.1 to 1 in steps of 0.1. For each shape , its 10 noisy shapes with different levels of topological noise are denoted by .
|Method||Average Accuracy (%)||Average Time Taken ( sec)|
|PSSK - SVM||100.00||100.00||100.00||100.00||100.00||100.00||91.60||90.00||89.80||89.00||96.04||4.55|
|PWGK - SVM||100.00||100.00||100.00||100.00||100.00||99.90||99.40||95.90||87.50||73.30||95.60||0.17|
methods for correctly classifying the topological representations of noisy shapes to their original shape.
A 17-dimensional scale-invariant heat kernel signature (SIHKS) spectral descriptor function is calculated on each shape , and PDs are extracted for each dimension of this function resulting in 17 PDs per shape. The PDs are passed through the proposed framework to get the respective PTS descriptors. The 3D mesh, PD and PTS representation for 4 of the 10 shapes (shown in figure 3) and their respective noisy-variants (Gaussian noise with standard deviation 1.0) is shown in figure 2. In this experiment, we evaluate the robustness of our proposed feature by correctly classifying shapes with different levels of topological noise. Displacement of vertices by adding varying levels of topological noise, interclass similarities and intraclass variations of the shapes make this a challenging task. A simple unbiased one nearest neighbor (1-NN) classifier is used to classify the topological representations of the noisy shapes in each set. The classification results are averaged over the 100 sets and tabulated in table 1. We also compare our method to other TDA-ML methods like PI , PL , PSSK  and PWGK . For PTS, we set the discretization of the grid . For PIs we chose the linear ramp weighting function, set and for the Gaussian kernel function, same as our PTS feature. For PLs we use the first landscape function with 500 elements. A linear SVM classifier is used instead of the 1-NN classifier for the PSSK and PWGK methods. From table 1, the -Wasserstein and Bottleneck distances over PDs perform poorly even at low levels of topological noise. However, PDs with -Wasserstein distance and PTS representations with , metrics show stability and robustness to even high noise levels. Nevertheless, the average time taken to compare two PTS features using either or is at least two orders of magnitude faster than the -Wasserstein distance as seen in table 1. We also observe that comparison of PIs, PLs and PWGK is an order of magnitude faster than comparing PTS features. However, these methods show significantly lower performance compared to the proposed feature, at correctly classifying noisy shapes as the noise level increases.
5.2 3D Shape Retrieval
In this experiment, we consider all 10 classes consisting of 200 shapes from the SHREC 2010 dataset, and extract PDs using 3 different spectral descriptor functions defined on each shape, namely: heat kernel signature (HKS) , wave kernel signature (WKS) , and SIHKS . HKS and WKS are used to capture the microscopic and macroscopic properties of the 3D mesh surface, while SIHKS descriptor is the scale-invariant version of HKS.
Using the PTS descriptor we attempt to encode invariances to shape articulations such as rotation, stretching, skewing. For the task of 3D shape retrieval we use a 1-NN classifier to evaluate the performance of the PTS representation against other methods [12, 51, 3, 40, 38]. A linear SVM classifier is used to report the classification accuracy of the PSSK and PWGK methods. Li et al. report best results after carefully selecting weights to normalize the distance combinations of their BoF+PD and ISPM+PD methods. As in , we also use the three spectral descriptors and combine our PTS representations for each descriptor. PIs, PLs and PTS features are also designed the same way as described before. The results reported in table 2 show that the PTS feature (with subspace dimension ) alone using the metric achieves an accuracy of 99.50 %, outperforming other methods. The average classification result of the PTS feature on varying the subspace dimension is 98.420.4 % and 98.720.25 % using and metrics respectively, thus displaying its stability with respect to the choice of .
|Method||BoF ||SSBoF ||ISPM ||PD (Bottleneck) ||PD (1-Wasserstein)||PD (2-Wasserstein)||BoF+PD ||ISPM+PD ||PI () ||PI () ||PI () ||PL () ||PL () ||PL () ||PSSK (SVM) ||PWGK (SVM) ||PTS||PTS|
|1-NN Accuracy (%)||97.00||97.50||97.50||98.50||98.50||98.50||98.50||99.00||88.50||87.50||89.50||95.00||95.00||95.00||98.50||99.00||99.00||99.50|
5.3 View-invariant Activity Analysis
The IXMAS dataset contains video and silhouette sequences of 11 action classes, performed 3 times by 10 subjects from five different camera views. The 11 classes are as follows - check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick, pick up. Sample frames across 5 views for 2 actions are shown in figure 4. We consider only the silhouette information in the dataset for our PTS representations. For each frame in an action sequence we extract multi-scale shape distributions which are referred to as A3M, D1M, D2M and D3M, over the 2D silhouettes . The multi-scale shape distribution feature captures the local to global changes in different geometric properties of a shape. For additional details about this feature, please see: [58, 59, 47].
For frames in an action sequence and bins in each shape distribution at a certain scale, an matrix representing the action is obtained. Treating the frames as nodes, scalar field topological PDs are calculated across each column, resulting in PDs. PDs capture the structural changes along each bin in the distributions. We select 5 different scales for the multi-scale shape features, giving us PDs per action which are passed through the proposed pipeline resulting in PTS features. PTS features try to encode the possible changes with respect to view-point variation, body-type and execution style. To represent the entire action as a point on the Grassmannian, we select the first two largest singular vectors from each of the PTS descriptors, apply SVD and choose 20 largest components.
|Method||Same Camera Accuracy (%)||Any-To-Any Accuracy (%)|
|SSM-HOG + PTS-HOG||69.01||-||55.13||-|
|SSM-HOG + PTS-A3M||73.15||72.061.14||58.36||56.961.05|
|SSM-HOG + PTS-D1M||74.25||73.261.53||59.26||57.671.19|
|SSM-HOG + PTS-D2M||74.92||74.221.36||59.77||58.191.03|
|SSM-HOG + PTS-D3M||76.18||73.721.13||60.33||58.721.11|
|SSM-OF + PTS-A3M||72.02||70.251.06||58.85||57.480.93|
|SSM-OF + PTS-D1M||73.67||71.621.17||59.56||57.811.05|
|SSM-OF + PTS-D2M||73.45||72.531.12||60.60||59.051.11|
|SSM-OF + PTS-D3M||74.41||72.211.03||61.51||59.331.13|
|SSM-HOG-OF + PTS-A3M||79.30||78.050.71||64.93||63.580.65|
|SSM-HOG-OF + PTS-D1M||79.61||79.030.96||65.39||64.270.65|
|SSM-HOG-OF + PTS-D2M||79.86||79.350.76||65.70||64.620.83|
|SSM-HOG-OF + PTS-D3M||81.12||79.490.99||66.16||64.990.79|
To perform multi-view action recognition, we train non-linear SVMs using the Grassmannian RBF kernel, . Here, , are points on the Grassmannian and is the Frobenius norm. We set in our implementations. Junejo et al. train non-linear SVMs using the kernel over the SSM-based descriptors and follow a one-against-all approach for multi-class classification . We follow the same approach and use a joint weighted kernel between their SSM kernel and our kernel, i.e. , where . The SSM-based descriptors are computed using the histogram of gradients (HOG), optical flow (OF) and fusion of HOG, OF features. The classification results are tabulated in table 3. Apart from reporting results of PTS representations obtained using the multi-scale shape distributions, we also show recognition results of PTS feature computed over the HOG descriptor (PTS-HOG). We see significant improvement in the results by fusing different PTS features with the SSM-based descriptor. We also tabulate the mean and standard deviation values for all classification results obtained after varying from 0.1 to 1.0 and subspace dimension from 1 to 10. These results demonstrate the flexibility and stability associated with the proposed PTS topological descriptor.
5.4 Dynamical Analysis on Motion Capture Data
|Method||Accuracy (%)||Average Time Taken ( sec)|
|PD (-Wasserstein) NN ||93.68||22.00|
|Hilbert Sphere NN ||89.87||590.00|
|Hilbert Sphere PGA+SVM ||91.68||-|
|PTS - NN||85.96||0.19|
|PTS - SVM||91.92||-|
This dataset consists of human body joint motion capture sequences in 3D, where each sequence contains 57 trajectories (19 joint trajectories along 3 axes). There are 5 action classes - dance, jump, run, sit and walk, with each class containing 31, 14, 30, 35 and 48 sequences respectively. homology group PDs are computed over the reconstructed attractor for each trajectory, resulting in 57 PDs per action  and the corresponding PTS feature is also extracted. We report the average classification performance over 100 random splits, with each split having 25 random test samples (5 samples from each class) and remaining 133 training samples. For SVM classification, we train non-linear SVMs using the projection kernel, .
The results are tabulated in table 4. PTS features have a classification accuracy of 85.96 % and 91.92 % using the 1-NN and SVM classifier respectively. While these results are slightly lower than the -Wasserstein metric, the proposed descriptor with the metric is more than 2 orders of magnitude faster. Topological properties of dynamic attractors for analysis of time-series data has been studied and applied to tasks such as wheeze detection , pulse pressure wave analysis  and such applications are surveyed in . We ask our readers to refer to these papers for further exploration.
5.5 Time-complexity of Comparing Topological Representations
|Dataset||Average Number of Points in PD||Average Time Taken ( sec)||Subspace Dimension () of PTS Feature|
|SHREC 2010 ||71||256.00 (Kerber et al. : 219.00)||450.00 (Kerber et al. : 237.00)||36.00 (Kerber et al. : 295.00)||2.30||1.60||10|
|Motion Capture ||27||22.00||22.00||2.72||0.24||0.19||1|
All experiments are carried out on a standard Intel i7 CPU using Matlab 2016b with a working memory of 32 GB. We used the Hungarian algorithm to compute the Bottleneck and -Wasserstein distances between PDs. Kerber et al. take advantage of the geometric structure of the input graph and propose geometric variants of the above metrics, thereby showing significant improvements in runtime performance when comparing PDs having several thousand points . However, extracting PDs for most real datasets of interest in this paper does not result in more than a few hundred points. For example, on average we observe 71, 23, 27 points in each PD for the SHREC 2010, IXMAS and motion capture datasets respectively. The Hungarian algorithm incurs similar computations in this setting as shown in table 5. The and metrics used to compare different PTS representations (grid size = 50) are fast and computationally less complex compared to the Bottleneck and -Wasserstein distance measures. The average time taken to compare two topological signatures (PD or PTS) for each of the datasets is tabulated in table 5. The table also shows the average number of points seen per PD and the subspace dimension used for the PTS representation.
|Average Time Taken ( sec)|
|Grid size (k)||5||10||20||40||60||80||100||200||300||400||500|
Table 6 shows the variation of the average time taken to compare PTS features on varying the grid size () of the 2D PDF. Here too the average time is reported after averaging over 3000 distance calculations between PTS features computed from PDs of the SHREC 2010 dataset. We observe that the time taken to compare two PTS features with a grid size is two orders of magnitude greater than the time obtained for PTS features using . However, these times are still smaller than or on par with the times reported using -Wasserstein and Bottleneck distances between PDs as seen in table 5. For all our experiments we set for our PTS representations and as shown in table 5, the times reported for and are at least an order of magnitude faster than Bottleneck distance and two orders of magnitude faster than the -Wasserstein metrics.
6 Conclusion and Discussion
We believe that a perturbed realization of a PD computed over a high-dimensional shape/graph is robust to topological noise affecting the original shape. Based on the type of data and application, topological noise can imply different types of variations, such as: articulation in 3D shape point cloud data; diversity in body structure, execution style and view-point pertaining to human actions in video analysis, etc. In this paper, we propose a novel topological representations called PTS that is obtained using a perturbation approach, taking first steps towards robust invariant learning with topological features. We obtained perturbed persistence surfaces and summarized them as a point on the Grassmann manifold, in order to utilize the different distance metrics and Mercer kernels defined for the Grassmannian. The and metrics used to compare different Grassmann representations are computationally cheap as they do not depend on the number of points present in the PD, in contrast to Bottleneck and -Wasserstein metrics, which do. The PTS feature offers flexibility in choosing the weighting function, kernel function and perturbation level. This makes it easily adaptable to different types of real-world data. It can also be easily integrated with various ML tools, which is not easily achievable with PDs. Future directions include fusion with contemporary deep-learning architectures to exploit the complementarity of both paradigms. We expect that topological methods will push the state-of-the-art in invariant representations, where the requisite invariance is incorporated using a topological property of an appropriately redefined metric space. Additionally, the proposed methods may help open new feature-pooling options in deep-nets.
-  Absil, P.A., Mahony, R., Sepulchre, R.: Riemannian geometry of grassmann manifolds with a view on algorithmic computation. Acta Applicandae Mathematica 80(2), 199–220 (2004)
-  Absil, P.A., Mahony, R., Sepulchre, R.: Optimization algorithms on matrix manifolds. Princeton University Press (2009)
-  Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., Ziegelmeier, L.: Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research 18(8), 1–35 (2017)
-  Ali, S., Basharat, A., Shah, M.: Chaotic invariants for human action recognition. In: IEEE 11th International Conference on Computer Vision (ICCV). pp. 1–8 (2007)
Anirudh, R., Venkataraman, V., Natesan Ramamurthy, K., Turaga, P.: A riemannian framework for statistical analysis of topological persistence diagrams. In: The IEEE Conference on Computer Vision and Pattern Recognition Workshops. pp. 68–76 (2016)
-  Aubry, M., Schlickewei, U., Cremers, D.: The wave kernel signature: A quantum mechanical approach to shape analysis. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops). pp. 1626–1633 (2011)
-  Bagherinia, H., Manduchi, R.: A theory of color barcodes. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops). pp. 806–813 (2011)
-  Basri, R., Hassner, T., Zelnik-Manor, L.: Approximate nearest subspace search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(2), 266–278 (2011)
-  Begelfor, E., Werman, M.: Affine invariance revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). vol. 2, pp. 2087–2094. IEEE (2006)
-  Bendich, P., Marron, J.S., Miller, E., Pieloch, A., Skwerer, S.: Persistent homology analysis of brain artery trees. The Annals of Applied Statistics 10(1), 198–218 (2016)
-  Bertsekas, D.P.: A new algorithm for the assignment problem. Mathematical Programming 21(1), 152–171 (1981)
-  Bubenik, P.: Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research 16(1), 77–102 (2015)
-  Chikuse, Y.: Statistics on special manifolds, vol. 174. Springer Science & Business Media (2012)
-  Chintakunta, H., Gentimis, T., Gonzalez-Diaz, R., Jimenez, M.J., Krim, H.: An entropy-based persistence barcode. Pattern Recognition 48(2), 391–401 (2015)
-  Chung, M.K., Bubenik, P., Kim, P.T.: Persistence diagrams of cortical surface data. In: International Conference on Information Processing in Medical Imaging. pp. 386–397. Springer (2009)
-  Cohen-Steiner, D., Edelsbrunner, H., Harer, J.: Stability of persistence diagrams. Discrete & Computational Geometry 37(1), 103–120 (2007)
-  Cohen-Steiner, D., Edelsbrunner, H., Harer, J., Mileyko, Y.: Lipschitz functions have Lp-stable persistence. Foundations of Computational Mathematics 10(2), 127–139 (2010)
-  Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edition. The MIT Press (2001)
-  Dabaghian, Y., Mémoli, F., Frank, L., Carlsson, G.: A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Computational Biology 8(8), 1–14 (2012)
-  De Silva, V., Morozov, D., Vejdemo-Johansson, M.: Dualities in persistent (co) homology. Inverse Problems 27(12), 124003 (2011)
-  Dey, T.K., Mandal, S., Varcho, W.: Improved Image Classification using Topological Persistence. In: Hullin, M., Klein, R., Schultz, T., Yao, A. (eds.) Vision, Modeling & Visualization. The Eurographics Association (2017). https://doi.org/10.2312/vmv.20171272
-  Draper, B., Kirby, M., Marks, J., Marrinan, T., Peterson, C.: A flag representation for finite collections of subspaces of mixed dimensions. Linear Algebra and its Applications 451, 15–32 (2014)
-  Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications 20(2), 303–353 (1998)
-  Edelsbrunner, H., Harer, J.: Computational topology: an introduction. American Mathematical Society (2010)
-  Edelsbrunner, H., Letscher, D., Zomorodian, A.: Topological persistence and simplification. Discrete & Computational Geometry 28(4), 511–533 (2002)
-  Emrani, S., Gentimis, T., Krim, H.: Persistent homology of delay embeddings and its application to wheeze detection. IEEE Signal Process. Lett. 21(4), 459–463 (2014). https://doi.org/10.1109/LSP.2014.2305700, https://doi.org/10.1109/LSP.2014.2305700
-  Emrani, S., Saponas, T.S., Morris, D., Krim, H.: A novel framework for pulse pressure wave analysis using persistent homology. IEEE Signal Process. Lett. 22(11), 1879–1883 (2015). https://doi.org/10.1109/LSP.2015.2441068, https://doi.org/10.1109/LSP.2015.2441068
-  Gopalan, R., Taheri, S., Turaga, P., Chellappa, R.: A blur-robust descriptor with applications to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(6), 1220–1226 (2012)
-  Hamm, J., Lee, D.D.: Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the International Conference on Machine Learning (ICML). pp. 376–383. ACM (2008)
-  Harandi, M.T., Salzmann, M., Jayasumana, S., Hartley, R., Li, H.: Expanding the family of grassmannian kernels: An embedding perspective. In: European Conference on Computer Vision (ECCV). pp. 408–423. Springer (2014)
-  Heath, K., Gelfand, N., Ovsjanikov, M., Aanjaneya, M., Guibas, L.J.: Image webs: Computing and exploiting connectivity in image collections. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
-  Hofer, C., Kwitt, R., Niethammer, M., Uhl, A.: Deep learning with topological signatures. arXiv preprint arXiv:1707.04041 (2017)
-  Ji-guang, S.: Perturbation of angles between linear subspaces. Journal of Computational Mathematics pp. 58–61 (1987)
-  Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. IEEE transactions on Pattern Analysis and Machine Intelligence 33(1), 172–185 (2011)
-  Kerber, M., Morozov, D., Nigmetov, A.: Geometry helps to compare persistence diagrams. Journal of Experimental Algorithmics (JEA) 22(1), 1–4 (2017)
-  Kokkinos, I., Bronstein, M., Yuille, A.: Dense scale invariant descriptors for images and surfaces. Ph.D. thesis, INRIA (2012)
-  Krim, H., Gentimis, T., Chintakunta, H.: Discovering the whole by the coarse: A topological paradigm for data analysis. IEEE Signal Process. Mag. 33(2), 95–104 (2016). https://doi.org/10.1109/MSP.2015.2510703, https://doi.org/10.1109/MSP.2015.2510703
-  Kusano, G., Hiraoka, Y., Fukumizu, K.: Persistence weighted gaussian kernel for topological data analysis. In: International Conference on Machine Learning (ICML). pp. 2004–2013 (2016)
-  Li, C., Shi, Z., Liu, Y., Xu, B.: Grassmann manifold based shape matching and retrieval under partial occlusions. In: International Symposium on Optoelectronic Technology and Application: Image Processing and Pattern Recognition (2014)
-  Li, C., Ovsjanikov, M., Chazal, F.: Persistence-based structural recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1995–2002 (2014)
-  Lian, Z., Godil, A., Fabry, T., Furuya, T., Hermans, J., Ohbuchi, R., Shu, C., Smeets, D., Suetens, P., Vandermeulen, D., et al.: Shrec’10 track: Non-rigid 3d shape retrieval. Eurographics Workshop on 3D Object Retrieval (3DOR) 10, 101–108 (2010)
-  Liu, X., Srivastava, A., Gallivan, K.: Optimal linear representations of images for object recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) (2003)
-  Luo, D., Huang, H.: Video motion segmentation using new adaptive manifold denoising model. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Masci, J., Boscaini, D., Bronstein, M., Vandergheynst, P.: Geodesic convolutional neural networks on riemannian manifolds. In: IEEE International Conference on Computer Vision Workshops (ICCVW). pp. 37–45 (2015)
-  Mileyko, Y., Mukherjee, S., Harer, J.: Probability measures on the space of persistence diagrams. Inverse Problems 27(12) (2011)
-  Monti, F., Boscaini, D., Masci, J., Rodolà, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model cnns. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
-  Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Transactions on Graphics (TOG) 21(4), 807–832 (2002)
-  Pachauri, D., Hinrichs, C., Chung, M.K., Johnson, S.C., Singh, V.: Topology-based kernels with application to inference problems in alzheimer’s disease. IEEE transactions on Medical Imaging 30(10), 1760–1770 (2011)
-  Perea, J.A., Harer, J.: Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics 15(3), 799–838 (2015)
-  Rahmani, H., Mian, A., Shah, M.: Learning a deep model for human action recognition from novel viewpoints. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)
-  Reininghaus, J., Huber, S., Bauer, U., Kwitt, R.: A stable multi-scale kernel for topological machine learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
-  Rouse, D., Watkins, A., Porter, D., Harer, J., Bendich, P., Strawn, N., Munch, E., DeSena, J., Clarke, J., Gilbert, J., et al.: Feature-aided multiple hypothesis tracking using topological and statistical behavior classifiers. In: SPIE Defense+Security (2015)
-  Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Transactions on Neural Networks 20(1), 61–80 (2009)
-  Seversky, L.M., Davis, S., Berger, M.: On time-series topological data analysis: New data and opportunities. In: DiffCVML 2016, held in conjunction with IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2016, Las Vegas, NV, USA, June 26 - July 1, 2016. pp. 1014–1022 (2016)
-  Sharafuddin, E., Jiang, N., Jin, Y., Zhang, Z.L.: Know your enemy, know yourself: Block-level network behavior profiling and tracking. In: IEEE Global Telecommunications Conference (GLOBECOM 2010). pp. 1–6 (2010)
-  da Silva, N.P., Costeira, J.P.: The normalized subspace inclusion: Robust clustering of motion subspaces. In: IEEE International Conference on Computer Vision (ICCV). pp. 1444–1450. IEEE (2009)
-  Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G., Ringach, D.L.: Topological analysis of population activity in visual cortex. Journal of Vision (2008)
-  Som, A., Krishnamurthi, N., Venkataraman, V., Ramamurthy, K.N., Turaga, P.: Multiscale evolution of attractor-shape descriptors for assessing parkinson’s disease severity. In: IEEE Global Conference on Signal and Information Processing (GlobalSIP) (2017)
-  Som, A., Krishnamurthi, N., Venkataraman, V., Turaga, P.: Attractor-shape descriptors for balance impairment assessment in parkinson’s disease. In: IEEE Conference on Engineering in Medicine and Biology Society (EMBC). pp. 3096–3100 (2016)
-  Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: Computer Graphics Forum. vol. 28, pp. 1383–1392. Wiley Online Library (2009)
-  Sun, X., Wang, L., Feng, J.: Further results on the subspace distance. Pattern Recognition 40(1), 328–329 (2007)
-  Takens, F.: Detecting strange attractors in turbulence. In: Dynamical Systems and Turbulence, vol. 898, pp. 366–381 (1981)
-  Tralie, C.J., Perea, J.A.: (quasi) periodicity quantification in video data, using topology. SIAM Journal on Imaging Sciences 11(2), 1049–1077 (2018)
-  Turaga, P., Veeraraghavan, A., Srivastava, A., Chellappa, R.: Statistical computations on grassmann and stiefel manifolds for image and video-based recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(11), 2273–2286 (2011)
-  Turner, K., Mileyko, Y., Mukherjee, S., Harer, J.: Fréchet means for distributions of persistence diagrams. Discrete & Computational Geometry 52(1), 44–70 (2014)
-  Venkataraman, V., Ramamurthy, K.N., Turaga, P.: Persistent homology of attractors for action recognition. In: IEEE International Conference on Image Processing (ICIP). pp. 4150–4154. IEEE (2016)
-  Wang, L., Wang, X., Feng, J.: Subspace distance analysis with application to adaptive bayesian algorithm for face recognition. Pattern Recognition 39(3), 456–464 (2006)
-  Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: IEEE International Conference on Computer Vision (ICCV). pp. 1–7. IEEE (2007)
-  Wong, Y.C.: Differential geometry of grassmann manifolds. Proceedings of the National Academy of Sciences 57(3), 589–594 (1967)
-  Yan, J., Pollefeys, M.: A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In: European Conference on Computer Vision (ECCV). pp. 94–106. Springer (2006)
-  Ye, K., Lim, L.H.: Schubert varieties and distances between subspaces of different dimensions. SIAM Journal on Matrix Analysis and Applications 37(3), 1176–1197 (2016)
-  Yi, L., Su, H., Guo, X., Guibas, L.: Syncspeccnn: Synchronized spectral cnn for 3d shape segmentation. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
-  Zomorodian, A.: Fast construction of the vietoris-rips complex. Computers & Graphics 34(3), 263–271 (2010)