1 Introduction: reconstructions from unorganized clouds
Potential molecules for new colloidal products are tested by simulations that produce unorganized finite clouds of points (one point per molecule in Fig. 1). Molecules tend to form clusters (called micelles) whose shapes (degrees of branching, edgelengths) affect physical properties of colloidal products, e.g. their viscosity.
These 3D micelles can have complicated branched shapes as in Fig. 7 and are visually analyzed by human experts who struggle to make reliable measurements quickly. To substantially speedup the discovery of new molecules, we propose a new Approximate Skeleton to solve the following problem.
The tree reconstruction problem. Given a point cloud and an error , design a fast algorithm to build a straightline tree (see Definition 1) that has a minimum number of vertices and whose offset (neighborhood) covers .
The first (combinatorial) guarantee is for the number of vertices in , which is close to the minimum number in an optimal tree for a given approximation error by factor 2, see Theorem 8. The second (geometric) guarantee about a near linear time for building is the number of points in , see Corollary 9.
To automatically characterize branching shapes of micelles (clusters of molecules in colloids), an Approximate Skeleton allows us to compute
the topological type of any unorganized cloud , e.g. count all nontrivial vertices of whose degree is 1 (endpoints) or more than 2 (branching);
the geometric characteristics of , e.g. edgelengths of ;
the error of approximating a cloud by its skeleton , see Table 10.
Here is the pipeline of the Approximate Skeleton .
Stage 1 in section 3: for a cloud , we build an initial tree , which has a small number of branching vertices within a Minimum Spanning Tree of .
Stage 2 in section 4: replace polygonal paths of by approximate paths with much fewer vertices to get in a near linear time within a given error.
The key novelty and contributions to the data skeletonization are the following.
Theorem 8 guarantees a small number of vertices in the Approximate Skeleton close to the minimum by factor 2 in an optimal tree within a given error.
Corollary 9 guarantees a near linear time to compute within an error.
2 Basic definitions and a review of the related past work
Definition 1 (a straightline graph, approximation)
A straightline graph (briefly, a graph) consists of vertices at points and undirected straightline edges connecting pairs , , in such a way that any edges meet only at their common vertex. Let be the Euclidean distance. For , a cloud is approximated by a graph if is within the offset that is the union of balls at all points of , i.e. .
Past algorithms without guarantees. Singh et al. Singh et al. (2000) approximated a cloud by a subgraph of a Delaunay triangulation, which requires time for points of and the three thresholds: a minimum number of edges in a cycle and for inserting/merging 2nd order Voronoi regions. Similar parameters are need for principal curves Kégl and Krzyzak (2002), which were later extended to iteratively computed elastic maps Gorban and Zinovyev (2009)
. Since it is often hard to estimate a rate of convergence for iterative algorithms, we discuss below noniterative methods with theoretical guarantees.
The metric graph reconstruction (MGR) takes as an input a large metric graph , which is an abstract graph with weighted edges and outputs a smaller abstract metric graph . The distance between any points of a metric graph is defined as the length of a shortest path these points. If is a good approximation to an unknown graph , then M. Aanjaneya et al. (Aanjaneya et al., 2012, Theorem 5) proved the existence of a homeomorphism that distorts the metrics on and with a multiplicative factor for , where is the length of a shortest edge of .
The authors of the Reeb graph skeletonization (Ge et al., 2011, page 3) have checked that for the MGR algorithm from Aanjaneya et al. (2012) “it is often hard to find suitable parameters in practice, and such local decisions tend to be less reliable when the input data are not as nice (such as a ‘fat’ junction region)”, see this junction in the 2nd picture of Fig. 1.
Definition 2 (a Reeb graph)
Given a topological space (or ) with a function , the Reeb graph is obtained from by collapsing each connected components of every level set of to a single point, so the Reeb graph is the quotient of by the equivalence relation if and only if and the points are in the same connected component of .
Skeletonization via Reebtype graphs. The VietorisRips complex on a cloud consists of all simplices spanned by points whose pairwise distances are at most . Starting from a noisy sample of an unknown graph with a scale parameter, X. Ge et al. (Ge et al., 2011, Theorem 3.1) proved that the Reeb graph of has a correct homotopy type if there is a triangulated space with a continuous deformation that approximates the metrics of . The homotopy type of a graph is the equivalence class of graphs under deformations when any edge (with distinct endpoints) can be collapsed to a point.
The graph reconstruction by discrete Morse theory (DMT). T. Dey et al. Dey et al. (2018) substantially improved the discrete Morsebased framework DelgadoFriedrichs et al. (2015) and proved new homotopy guarantees when an input is a density function , which ‘concentrates’ around a hidden geometric graph
. The key advantage of this approach is the unbounded noise model that allows outliers far away from the underlying graph
, which has found practical applications to map reconstructions Wang et al. (2015); Dey et al. (2017).Since the molecules of a micelle form an unorganized cloud of points (with large bounded noise) around hidden tree structures, the Tree Reconstruction problem in section 1 essentially differs from the above approaches. An initial unorganized cloud of points is not an abstract metric graph (as in the metric graph reconstruction problem) and not a simplicial complex with scalar values at vertices (as in the discrete Morse theory approach), so extra preprocessing was needed in section 5.
The Reeb graph by F. Chazal et al. Chazal et al. (2015) solves the metric graph reconstruction problem, where the input is not an unorganized cloud, but a large metric graph that should be approximated by a smaller graph . For a base point , the image of the distance function is covered by intervals having a length and 50% overlap. Every connected component of defines a node in the Reeb graph . Two nodes are linked if the corresponding components overlap.
Informally, controls the size of a subset of that maps to a single vertex of . Theorem 4.9 in Chazal et al. (2015) says that if is close to an unknown graph with edges of minimum length , the output is close to in the GromovHausdorff distance between spaces, not within one space, where is the first Betti number of . The algorithm has the fast time for points in . Similarly to Reeb graphs, Reeb graphs are abstract without an intrinsic embedding into the space of the cloud and can have selfintersections even for .
The Mapper Singh et al. (2007) extends any clustering algorithm and outputs a network of interlinked clusters and needs a userdefined function , which helps to link different clusters of a cloud . Another parameter is a covering of the image of by a given number of intervals (often with 50% overlap). Each of subclouds is clustered. Every cluster defines a node in the Mapper graph. Two nodes are linked if the corresponding clusters overlap. M. Carriére et al. Carriere and Oudot (2017) have proved first theoretical guarantees for the Mapper output.
More recent persistencebased algorithms for graph reconstruction Kurlin,V. (2015a, b); Kalisnik et al. (2019) and image segmentation Forsythe et al. (2016); Forsythe and Kurlin (2017); Kurlin and Harvey (2018); Kurlin and Muszynski (2020) essentially find most persistent cycles hidden in a cloud, hence go beyond the tree reconstruction problem in section 1.
Straightening polygonal curves
is a key ingredient in many skeletonization algorithms. DouglasPeucker’s heuristic
Douglas and Peucker (1973) approximates a long zigzag line by a simpler line with fewer vertices, see section 4. The elegant algorithm by P. Agarwal et al. Agarwal et al. (2005) guarantees a near linear time and a small number of vertices in a final polygonal approximation when used with the Frechet distance between curves in . For the Hausdorff distance and higher dimensions, there is no near linear time straightening with guarantees on the size of a skeleton to our best knowledge.Definition 3 ()
For a cloud , a Minimum Spanning Tree is a connected graph that has (1) the vertex set , (2) no cycles, and (3) a minimum total length, where lengths of edges are measured in the Euclidean distance.
If all distances between points of are distinct, then is unique. We write a Minimum Spanning Tree, similarly an Approximate Skeleton, to cover all cases.
Theorem 4
(March et al., 2010, Theorem 5.1) For any cloud of points, a Minimum Spanning Tree can be computed in time , where is the inverse Ackermann function; are defined in Appendix A.
3 A new tree defined for any point cloud
This section introduces an important subtree , which has many fewer nontrivial vertices than a usually ‘hairy’ from Definition 3.
A tree might still have too many zigzags and will be replaced by a better tree with fewer vertices in section 4. A vertex of a degree is called (topologically) nontrivial, because any vertex of degree 2 can be potentially removed by straightening algorithms in section 4. Since contains many nontrivial vertices, the next hard step is to identify those few vertices of that represent ‘true’ vertices of a tree , which we try to reconstruct from .
Definition 5 introduces the depth characterizing how deep a vertex sits within . At a deep vertex of a degree at least 3 sufficiently long paths (without common edges) should meet, see the 3 red long paths in Fig. 3. The previous procedural approach by M. Aanjaneya et al. (Aanjaneya et al., 2012, Fig. 1b) to detect branching points in a shape of used more parameters than a single branching factor below.
Definition 5 (deep vertices)
For a cloud and a vertex of a degree , let be the branches (subtrees) joined at the vertex . Let be the length of a longest path within the branch from to another vertex, . Assuming that , set . Let be the average edgelength of . For a branching factor , the vertices of whose depths are larger than are called deep.
Taking the minimum above guarantees that vertices in any short branches of are not deep, hence deep vertices can not form small cliques. The experiments on real micelles in section 5 justify that Definition 5 separates deep vertices from other shallow vertices for a long range of the factor .
Definition 6 introduces a subtree , which nonessentially depends on the branching factor and better approximates a cloud than , see Fig. 3.
Definition 6 ()
In Definition 5, if we remove all deep vertices , splits into several subtrees. If the closure of such a subtree has two deep vertices , they are joined by a unique path . If has one deep vertex , take a longest path from to another vertex . We ignore if its length is less than , where is the branching factor from Definition 5. All the vertices and the paths between them form the subtree .
If the closure of a subtree above has deep vertices , then contains a vertex with at least 3 paths to . Then , , so is also deep and should be split by removing . Hence .
In Fig. 3 the two black edges at the red deep vertex of degree 5 are too short, hence ignored in Definition 6. The tree consists of only 3 red long paths meeting at . Here are the steps of Stage 1 for the Approximate Skeleton .
Step 1a. If needed, split a cloud in clusters to approximate them below.
Step 1b. If is one cluster, find by the fast algorithm from Theorem 4.
Step 1c. Find the depths of vertices in by Algorithm 1 in Appendix A.
Step 1d. Identify all deep vertices of by their .
Step 1e. The subtree is formed by all the paths and from Definition 6 that have lengths more than , where is a given branching factor.
4 : Approximate Skeleton of a cloud
The tree from Definition 6 has only few nontrivial vertices, but contains noisy zigzags with too many trivial vertices of degree 2. This section discusses how to straighten these zigzags and decrease the total number of vertices.
We have tried DouglasPeucker’s heuristic Douglas and Peucker (1973), which was rather unstable and produced large zigzags on curved micelles in Fig. 1. The worst complexity is in the number of points for . A final approximation can have a size even in . Another problem with Douglas and Peucker (1973) are potential selfintersections even in , which are caused by large zigzags that approximate nonmonotone curves Wu et al. (2004).
The problem of straightening polygonal paths in a tree is harder than the curve simplification, because the input is a cloud of unorganized points. So a final approximation should take into account the points of a cloud outside .
Definition 7
Let be a straight line. An ordered cloud is called monotone with respect to if the order of points is preserved by the orthogonal projection of to .
Since there are many paths of to straighten, we split the cloud into monotone subclouds as formalized in Algorithm 2 in Appendix A. Since monotone subpaths can be quickly found only in Shin and Kim (2004), Theorem 8 below will assume that each subpath between nontrivial vertices of is monotone by Definition 7 with respect to the straight line connecting the endpoints of .
All results in this section are proved in Appendix A. Here are the Stage 2 steps.
Step 2a. Split every polygonal path between nontrivial vertices (of degrees ) in the subtree into monotone subpaths by Algorithm 2.
Step 2b. Each monotone subpath of with endpoints (say) has the subcloud approximated by a polygonal path via points of by Steps 2c–2f.
Step 2c. For each subcloud of points ordered by their orthogonal projections to , start from and find the next index for by repeating Steps 2d–2e, which is possible by Lemma 12 in Appendix A.
Step 2d (exponential). Find the smallest index such that for , For every index , compute the distance orthogonally to the line segment as in Definition 11.
Step 2e (binary). Search for the maximum between and such that by dividing the range in 2 halves.
Step 2f. The found indices specify a polygonal path approximating each monotone subcloud from Step 2b. Combine all these paths into a full skeleton.
Step 2g. Any edges of a length more than from Definition 5 are temporarily removed from the skeleton. Each remaining connected component with only short edges is collapsed to its center of mass. The resulting vertices are connected according to the temporarily removed edges to get the Approximate Skeleton .
For a cloud , mark the endpoints of all monotone subpaths in obtained by Algorithm 2. Consider all skeletons that have fixed vertices at the marked points of such that any polygonal path between fixed vertices (say ) is monotone under the orthogonal projection to the line segment .
The approximation problem for an error is to minimize the total number of vertices in a straightline graph whose each monotone path should approximate the corresponding subcloud of by the distance in Definition 11.
Theorem 8
Let be the minimum number of vertices over all graphs approximating a given cloud . Then lies in and has at most vertices.
Theorem 8 estimates the number of vertices of when the geometric error is . In practice, the tree has an initial approximation error for a given cloud , because many points of may not be vertices of .
We measure the initial error by Definition 11 and take the maximum of over monotone paths of computed in Algorithm 2. Stage 2 approximates by a graph simpler than , but keeps the approximation error small. The error in Corollary 9 is , where is an error factor that takes values in the interval for the experiments in section 5.
Corollary 9
For any points and any error factor , an Approximate Skeleton within the offset of the cloud (as in Definition 1) can be computed in time , where is the inverse Ackermann function, the constants are defined in Appendix A.
5 Comparisons of 5 algorithms on real and synthetic data
This section experimentally compares the Approximate Skeleton with those four skeletonization algorithms from section 2 that have theoretical guarantees and accept any cloud of points: Mapper Singh et al. (2007), Metric Graph Reconstruction MGR Aanjaneya et al. (2012), Reeb graphs Chazal et al. (2015) and most recent discrete Morse theory (DMT) algorithm Dey et al. (2018).
The Mapper Singh et al. (2007) is very flexible in the sense that its parameters might be manually tuned for given data over numerous clustering algorithms. Having tried several possibilities, we have settled on the following choices from the original work Singh et al. (2007).
1) Convert a cloud into a connected neighborhood graph with Euclidean edgelengths by using a distance threshold. The filter function is the distance function in from a root that is the furthest point from a random point in .
2) The image of the filter function is covered by 10 intervals with the 50% overlap so that splits into 10 subclouds when filter values are in one of the 10 intervals.
3) Each subcloud is clustered by the singlelinkage clustering with the threshold the average edgelength of , where values of the factor are given in Table 10. The final Mapper graph has a single node representing each cluster.
The authors of the DMT algorithm by T. Dey et al. Dey et al. (2018) have kindly made their code available at https://github.com/wangjiayuan007/graph_recon_DM. Starting from an unorganized cloud of points, e.g. centers of molecules of a micelle, we generated scalar values at nodes of a regular grid required for the DMT algorithm.
1) We subdivide the axisaligned bounding box of a cloud into small boxes: minimum 20 rectangular boxes (as close to cubic as possible) along each side.
2) The scalar values are found from the Kernel Density Estimate
at every grid node . The computed values are passed to the DMT with a parameter that regulates how small density values are replaced by 0.The MGR algorithm has required much more efforts, because the original code was lost as confirmed by the main author of Aanjaneya et al. (2012). Since the algorithm was wellexplained, we have implemented MGR ourselves and confirmed the earlier claim that “it is often hard to find suitable parameters” (Ge et al., 2011, page 3). Trying many values of the key parameter gave the zero success rate on the homeomorphism type.
Hence we have improved MGR by splitting this parameter into two: the first (values used in all experiments) was used for detecting vertex points, the second (three values in Table 10 experiments) was used for clustering points of different types. Only after using these different values, we have managed to push the success rates of MGR closer to on the homeomorphism type.
The Reeb graph has the essential parameter whose values were tried in all experiments. has little dependence on the branching factor , e.g. all values produced almost identical results in Table 10 and Fig. 9.
Since three output graphs (Mapper, Reeb and MGR) are abstract, to compute any geometric error of approximation, we map them to by sending every node of to the average point (center of mass) of the cluster (for Mapper and MGR) or subgraph (for Reeb) corresponding to . Each link between nodes is mapped as a line segment between the corresponding points in .
Figures 5–8 show clouds and outputs of 5 algorithms. Since real micelles have irregular shapes in , their 2D projections may contain intersections of edges.
Table 10 shows the average results of the three algorithms on the dataset of more than 100 real micelles (clouds of about 300 molecules) whose endpoints and homeomorphism types were manually detected. A homeomorphism is a 11 continuous map with a continuous inverse, so a homeomorphism type is a stronger shape descriptor than a homotopy type, which counts only linearly independent cycles.
The most important error measure for the tree reconstruction problem in section 1 is the success rate for detecting a correct homeomorphism type. Indeed, an incorrect graph can be perfect on other errors, e.g. is extremely fast, has the zero geometric error (for many distances between a cloud and a reconstructed graph) and even has a correct homotopy type (no cycles) for any underlying tree .
Hence the key results are in the middle column of Table 10 and the top right picture of Fig. 9. We included the success rate on the number of endpoints (degree 1 vertices) as a weaker topological error. As shows above, only if an algorithm performs well on a topological reconstruction, it makes sense to evaluate the performance on other measures such as geometric distances and time.
algorithm  parameters  endpoints success  homeomorphism success  time, ms  max distance from 

Mapper  54.21%  54.21%  18  4.59  
Mapper  66.36%  66.36%  18  4.62  
Mapper  68.20%  68.20%  19  4.67  
MGR  48.60%  45.79%  25010  5.95  
MGR  40.19%  40.19%  17410  5.51  
MGR  29.91%  29.91%  25480  3.46  
Reeb  98.13%  98.13%  367  10.49  
Reeb  97.20%  97.20%  375  12.50  
Reeb  98.13%  98.13%  373  14.19  
DMT  48.60%  45.79%  6290  5.95  
DMT  40.19%  40.19%  6192  5.51  
DMT  29.91%  29.91%  6410  3.46  
98.13%  98.13%  42  5.16  
98.13%  98.13%  42  5.16  
97.20%  97.20%  42  5.31 
Table 10 shows that the Mapper, MGR and DMT essentially depend on their parameters, because the success rates, run time and distance error significantly vary when the parameters are only slightly changed. The Reeb and were stable, because Table 10 contains almost identical success rates for different parameters.
Both algorithms achieved best results on the most important measure of the homeomorphism success rate, though the top right picture in Fig. 9 highlights and MGR as the best for homeomorphism. In comparison with Reeb and MGR, the Approximate Skeleton is much faster and achieves similar distance errors, see the relevant results in both Table 10 and Fig. 9.
In addition to the comparison on more than 100 micelles, we have tested the algorithms on the much larger dataset of synthetic clouds generated as follows.
1) An star in has one vertex at and straight edges of length 100 to endpointsin random directions with a minimum angle between edges.
2) For and every of 100 random stars , we found a minimum axisaligned box containing , enlarged this box by the noise bound of 10%.
3) We uniformly chose a random point in the resulting box and checked if is at a distance at most 10 (=10% of edgelengths) from . If successful, such points form a noisy sample of the ground truth star .
Fig. 9 shows 4 plots for the 4 error measures of 5 algorithms, which were averaged over 3 values of essential parameters as in Table 10. The Mapper threshold factor for singleedge clustering was . The Reeb scale was . The branching factor of was .
For the correct number of endpoints, the new skeleton achieves 100% results on the synthetic clouds, because Definition 5 provides a very stable concept of a deep vertex not critically depending on a branching factor . For the homeomorphism type, the minimum success of is 96%, because all short branches of are removed to get homeomorphic to an underlying tree.
For the random point sample of the 8star graph in Fig. 8, the 2nd, 3rd and 5th graphs have several branched vertices instead of one. The 5th graph has several zigzags, which would be straightened in . The 4th graph has a triangular cycle because of incorrectly detected overlaps of clusters corresponding to vertices.
6 Conclusions and a discussion of the Approximate Skeleton
Though the current implementation was tested in , all steps and results work in any . Here is the summary of the key contributions to data skeletonization.
The detection of deep (branched) vertices in Definition 5 uses a global structure of longest paths within , hence is more stable under a change of parameters.
To improve the Metric Graph Reconstruction by M. Aanjaneya et al. Aanjaneya et al. (2012), we have split one parameter (used for detecting vertex points and also for clustering later) into two separate parameters (with default values) , , which led to more successful (2040% rates instead of 0%) reconstructions in Table 10.
Theorem 8 proves the first size guarantees (on a small number of vertices) for the Approximate Skeleton , while all past methods from section 2 considered topological (mostly homotopy type) or metric properties of reconstructed graphs.
Corollary 9 says that the Approximate Skeleton can be quickly computed within a given error as required in the Tree Reconstruction Problem from section 1.
Because of the page limit the last author couldn’t include one more result on with realistic conditions on an underlying tree and its noisy sample to guarantee that and are homeomorphic to . This is the first advance after Giesen’s guarantees for shortest paths through sample points Giesen (1999) in 1999. The C++ code of is at https://github.com/YuryUoL/AsKAlgorithm.
In comparison with the past methods in section 2, starts from the most challenging input (an unorganized cloud of points without any extra structure such a metric graph or a regular grid or a mesh), outputs an embedded graph in and provides two guarantees: combinatorial in Theorem 8 and geometric in Corollary 9 and topological. Appendix A has all missed proofs. Appendix B includes more experiments. We thank all reviewers for their helpful suggestions.
Bibliography
 Metric graph reconstruction from noisy data. Int. J. Comp. Geometry Appl. 22, pp. 305–325. Cited by: §2, §2, §3, §5, §5, §6.
 Nearlinear time approximation algorithms for curve simplification. Algorithmica 42 (34), pp. 203–219. Cited by: §2.
 Cover trees for nearest neighbor. In Proceedings of ICML, pp. 97–104. Cited by: §7, Definition 14.
 Structure and stability of the onedimensional mapper. Foundations of Computational Mathematics, pp. 1–64. Cited by: §2.
 Gromovhausdorff approximation of filament structure using reebtype graph. Discrete Computational Geometry 53, pp. 621–649. Cited by: §2, §2, §5.
 Skeletonization and partitioning of digital images using discrete morse theory. Trans. PAMI 37(3), pp. 654–666. Cited by: §2.
 Improved road network reconstruction using discrete morse theory. In Proceedings of the 25th ACM SIGSPATIAL, pp. 58. Cited by: §2.
 Graph reconstruction by discrete morse theory. arXiv:1803.05093. Cited by: §2, §5, §5.
 Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica 10 (2), pp. 112–122. Cited by: §2, §4.
 Resolutionindependent superpixels based on convex constrained meshes without small angles. In LNCS, Proceedings of ISVC 2016, Vol. 10072, pp. 223–233. Cited by: §2.
 Convex constrained meshes for superpixel segmentations of images. Journal of Electronic Imaging 26(6) (061609). Cited by: §2.
 Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM 34 (3), pp. 596–615. Cited by: §7.
 Data skeletonization via reeb graphs. In Proceedings of Neural Information Processing Systems, pp. 837–845. Cited by: §2, §2, §5.
 Curve reconstruction, the traveling salesman problem and menger’s theorem on length. In Proceedings of SoCG, pp. 207–216. Cited by: §6.

Principal graphs and manifolds.
In
Handbook of Research on Machine Learning Applications and Trends
, pp. 28–59. Cited by: §2.  A higherdimensional homologically persistent skeleton. Advances in Applied Mathematics 102, pp. 113–142. Cited by: §2.
 Piecewise linear skeletonization using principal curves. Trans. Pattern Analysis Machine Intelligence 24, pp. 59–74. Cited by: §2.
 Superpixels optimized by color and shape. 10746, pp. 297–311. Cited by: §2.
 Persistencebased resolutionindependent meshes of superpixels. Pattern Recognition Letters, pp. 300–306. Cited by: §2.
 A homologically persistent skeleton is a fast and robust descriptor of interest points in 2d images. In Proceedings of CAIP, Vol. 9256, pp. 606–617. Cited by: §2.
 A onedimensional homologically persistent skeleton of an unstructured point cloud in any metric space. Computer Graphics Forum 34 (5), pp. 253–262. Cited by: §2.
 Fast euclidean minimum spanning tree: algorithm, analysis, and applications. In Proceedings of SIGKDD, pp. 603–612. Cited by: Theorem 4.
 Optimal direction for monotone chain decomposition. In International Conference on Computational Science and Its Applications, pp. 583–591. Cited by: §4.

Topological methods for the analysis of high dimensional data and 3d object recognition
. In Symp. PointBased Graphics, pp. 91–100. Cited by: §2, §5, §5. 
Selforganizing maps for the skeletonization of sparse shapes.
IEEE Tran. Neural Networks
11, pp. 241–248. Cited by: §2.  Efficient map reconstruction and augmentation via topological methods. In Proceedings of the 23rd SIGSPATIAL, pp. 25. Cited by: §2.
 The DouglasPeucker algorithm: sufficiency conditions for nonselfintersections. J. Brazilian Computer Society 9 (3), pp. 67–84. Cited by: §4.
7 Appendix A: proofs of all the statements from section 4
The proof of Corollary 9 below uses Algorithm 1 for depths of vertices in from Definition 5. The depth is trivial (equal to 0) for any degree 1 vertex. For any other vertex , the depth can be recursively computed from lengths of edges at and depths of neighbors of . Imagine a water flow simultaneously starting from all degree 1 vertices of and moving towards internal vertices inside . At every vertex of degree , the flow waits until is reached from directions (edges at ), then the flow moves further in the remaining th direction.
Definition 11 introduces a new distance between a cloud and a polygonal line. Recall that is the Euclidean distance in . We assume that the points are ordered by their orthogonal projections to the line .
Definition 11
For , let be the hyperspace that is orthogonal to and passes through . The distance between and is measured orthogonally to as , see Fig. 10. Consider the distance . For , the distance between and the polygonal line is defined as .
Lemma 12 below justifies the steps of Stage 2 in section 4, which outputs the Approximate Skeleton starting from obtained in Stage 1 at the end of section 3.
Lemma 12
Let be points ordered according to their orthogonal projections to . For , one can find indices in time so that the estimates below for the distances in Definition 11 hold:
(a) for ,
(b) for any .
The following lemma is needed for Theorem 8 and is conveniently illustrated in Fig. 10 below. Recall that the distances from Definition 11 are computed orthogonally to the straight segment passing through the endpoints of a monotone point cloud .
Lemma 13
Let be points ordered according to their orthogonal projections to . Then for any indices .
Proof. For any , let be the hyperspace that is orthogonal to and passes through . Consider the intersection points and . Let , then . Since the points and are close to the segment , the intermediate point is also close to , i.e. . The triangle inequality implies that . Taking the maximum over , we get . ∎
Proof of Lemma 12. Assuming that indices were found, we search for the next index as follows. Search exponentially by trying indices for while .
Each evaluation of the distance requires time, because we need to compare distances to (orthogonally to ) from every point of between and .
After finding and such that and , we start a binary search for in the range each time choosing one half of the current range until both conditions (a)(b) hold.
Finding the next index requires computations for the distance , where , hence time overall. Taking the sum over all , the total time is . ∎
Proof of Theorem 8. Since endpoints of all monotone polygonal paths of are fixed in minimization problem before Theorem 8, we separately consider every corresponding monotone subcloud of points (say) ordered by their orthogonal projections to the line through the line segment . Let be indices of an optimal approximation (polygonal path) to . In the notations of Lemma 12 for the approximation error we will prove below that by induction on . Then and the size of the list , which is found in Lemma 12, is at most as required. Taking the sum of upper bounds over all monotone paths of , we conclude that the Approximate Skeleton has the total number of vertices not greater than that that number for an optimal skeleton .
The base means that , i.e. both paths start from the point . In the inductive step assume that . If , then and the inductive step is complete. The remaining case is . Since is an approximation to , we have . Lemma 13 implies that for any index such that . Lemma 12(b) for the approximation says that , hence . ∎
Definition 14 (expanion constants)
Let be a cloud and be the closed ball with the center and radius . The expansion constant is the smallest real number such that . Let be the similarly defined constant for the metric space of line segments of , then set . Other constants are similarly defined in Beygelzimer et al. (2006).
Proof of Corollary 9. The distance from Definition 11 measured orthogonally to the straight line through fixed endpoints is not smaller than the Hausdorff distance used for offsets in Definition 1. Hence the algorithm from Lemma 12 produces required approximations in the sense of Definition 1. The current implementation uses the singleedge clustering based on , so Step 1a runs in time. The total time is dominated by Step 1b computing in time , where is the inverse Ackermann function.
Algorithm 1 in Step 1c has the pseudocode above and maintains a binary tree on vertices, which requires time. Selecting deep vertices in Step 1d and finding longest paths in Step 1e within subtrees of needs time by classical algorithms Fredman and Tarjan (1987). Step 2a to split into subclouds is implemented by cover trees for line segments of in time as proved in Beygelzimer et al. (2006). By Lemma 12 Steps 2b2g for approximating any subcloud of points by a polygonal path runs in time. Hence the total time at Stage 2 for computing over the cloud of points is . ∎
8 Appendix B: more qualitative comparisons of 3 algorithms
Fig. 12–15 show example outputs of 3 algorithms on real and randomly generated clouds in . In almost all cases the Mapper and Reeb graphs contain superfluous short edges, which affect the homeomorphism types.
The error factor from Corollary 9 affects the quality of approximation. Fig. 11 shows that higher values of lead to more straightened curves.