Robust and scalable learning of data manifolds with complex topologies via ElPiGraph

by   Luca Albergante, et al.

We present ElPiGraph, a method for approximating data distributions having non-trivial topological features such as the existence of excluded regions or branching structures. Unlike many existing methods, ElPiGraph is not based on the construction of a k-nearest neighbour graph, a procedure that can perform poorly in the case of multidimensional and noisy data. Instead, ElPiGraph constructs elastic principal graphs in a more robust way by minimizing elastic energy, applying graph grammars and explicitly controlling topological complexity. Using trimmed approximation error function makes ElPiGraph extremely robust to the presence of background noise without decreasing computational performance and allows it to deal with complex cases of manifold learning (for example, ElPiGraph can learn disconnected intersecting manifolds). Thanks to the quasi-quadratic nature of the elastic function, ElPiGraph performs almost as fast as a simple k-means clustering and, therefore, is much more scalable than alternative methods, and can work on large datasets containing millions of high dimensional points on a personal computer. The excellent performance of the method opens the possibility to apply resampling and to approximate complex data structures via principal graph ensembles which can be used to construct consensus principal graphs. ElPiGraph is currently implemented in five programming languages and accompanied by a graphical user interface, which makes it a versatile tool to deal with complex data in various fields from molecular biology, where it can be used to infer pseudo-time trajectories from single-cell RNASeq, to astronomy, where it can be used to approximate complex structures in the distribution of galaxies.



page 17

page 19

page 24


Topological Grammars for Data Approximation

A method of topological grammars is proposed for multidimensional data ...

Principal Graphs and Manifolds

In many physical, statistical, biological and other investigations it is...

Principal manifolds and graphs in practice: from molecular biology to dynamical systems

We present several applications of non-linear data modeling, using princ...

Geometrical complexity of data approximators

There are many methods developed to approximate a cloud of vectors embed...

Principal Manifolds of Middles: A Framework and Estimation Procedure Using Mixture Densities

Principal manifolds are used to represent high-dimensional data in a low...

Elastic Registration of Geodesic Vascular Graphs

Vascular graphs can embed a number of high-level features, from morpholo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.