Path-Invariant Map Networks

12/31/2018 ∙ by Zaiwei Zhang, et al. ∙ 14

Optimizing a network of maps among a collection of objects/domains (or map synchronization) is a central problem across computer vision and many other relevant fields. Compared to optimizing pairwise maps in isolation, the benefit of map synchronization is that there are natural constraints among a map network that can improve the quality of individual maps. While such self-supervision constraints are well-understood for undirected map networks (e.g., the cycle-consistency constraint), they are under-explored for directed map networks, which naturally arise when maps are given by parametric maps (e.g., a feed-forward neural network). In this paper, we study a natural self-supervision constraint for directed map networks called path-invariance, which enforces that composite maps along different paths between a fixed pair of source and target domains are identical. We introduce path-invariance bases for efficient encoding of the path-invariance constraint and present an algorithm that outputs a path-variance basis with polynomial time and space complexities. We demonstrate the effectiveness of our formulation on optimizing object correspondences, estimating dense image maps via neural networks, and 3D scene segmentation via map networks of diverse 3D representations. In particular, our approach only requires 8 the same performance as training a single 3D segmentation network with 30 100



There are no comments yet.


page 7

page 8

page 15

page 16

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Optimizing a network of maps among a collection of objects/domains (or map synchronization) is a central problem across computer vision and many other relevant fields. Important applications include establishing consistent feature correspondences for multi-view structure-from-motion [1, 11, 51, 5], computing consistent relative camera poses for 3D reconstruction [24, 20], dense image flows [68, 65], image translation [72, 61], and optimizing consistent dense correspondences for co-segmentation [57, 17, 56] and object discovery [46, 8], just to name a few. The benefit of optimizing a map network versus optimizing maps between pairs of objects in isolation comes from the cycle-consistency constraint [35, 21, 15, 55], namely composite maps along cycles should be the identity map. For example, this constraint allows us to replace an incorrect map between a pair of dissimilar objects by composing maps along a path of similar objects [21]. Computationally, state-of-the-art map synchronization techniques [3, 7, 15, 22, 49, 18, 19, 29, 69, 70, 33] employ matrix representations of maps [29, 21, 15, 56, 18]. This allows us to utilize a low-rank formulation of the cycle-consistency constraint (c.f. [15]), leading to efficient and robust solutions [18, 70, 49, 22].

[height=0.6]Figures/Path_Invariant_Teaser.pdf Input ModelOutput Seg.PCIPCIIPCIIIVOLIVOLII








Figure 1: (Left) A network of 3D representations for the task of semantic segmentation of 3D scenes. (Right) Computed path-invariance basis for regularizing individual neural networks.

In this paper, we focus on a map synchronization setting, where matrix-based map encodings become too costly or even infeasible. Such instances include optimizing dense flows across many high-resolution images [34, 28, 47] or optimizing a network of neural networks, each of which maps one domain to another domain (e.g., 3D semantic segmentation [12] that maps the space of 3D scenes to the space of 3D segmentations). In this setting, maps are usually encoded as broadly defined parametric maps (e.g., feed-forward neural networks), and map optimization reduces to optimizing hyper-parameters and/or network parameters. Synchronizing parametric maps introduces many technical challenges. For example, unlike correspondences between objects, which are undirected, a parametric map may not have a meaningful inverse map (e.g., a neural network that takes a shape as input and outputs its semantic label). This raises the challenge of formulating an equivalent regularization constraint of cycle-consistency for directed map networks. In addition, as matrix-based map encodings are infeasible for parametric maps, another key challenge is how to efficiently enforce the regularization constraint for map synchronization.

We introduce a computational framework for optimizing directed map networks that addresses the challenges described above. Specifically, we propose the so-called path-invariance constraint, which ensures that whenever there exists a map from a source domain to a target domain (through map composition along a path), the map is unique. This path-invariance constraint not only warrants that a map network is well-defined, but more importantly it provides a natural regularization constraint for optimizing directed map networks. To effectively enforce this path-invariance constraint, we introduce the notion of a path-invariance basis, which collects independent path pairs that can induce the path-invariance property of the entire map network. We also present an algorithm for computing a path-invariance basis from an arbitrary directed map network. The algorithm possesses polynomial time and space complexities.

We demonstrate the effectiveness of our approach on three settings of map synchronization. The first setting considers undirected map networks that can be optimized using low-rank formulations. [18, 70]. Experimental results show that our new formulation leads to competitive and sometimes better results than state-of-the-art low-rank formulations. The second setting studies consistent dense image maps, where each pairwise map is given by a neural network. Experimental results show that our approach significantly outperforms state-of-the-art approaches for computing dense image correspondences. The third setting considers a map network that consists of different 3D representations (e.g., point cloud and volumetric representations) for the task of semantic 3D semantic segmentation (See Figure 1). By enforcing the path-invariance of neural networks on unlabeled data, our approach only requires 8% labeled data from ScanNet [12] to achieve the same performance as training a single semantic segmentation network with 30% to 100% labeled data.

2 Related Works

Map synchronization. So far most map synchronization techniques [23, 20, 63, 36, 16, 69, 18, 7, 58, 5, 71, 2, 62, 22, 39, 50, 14, 66, 72, 61] have focused on undirected map graphs, where the natural regularization constraint is given by cycle-consistency. Depending on how the cycle-consistency constraint is applied, existing approaches fall into three categories. The first category of methods [23, 20] utilizes the fact that a collection of cycle-consistent maps can be generated from maps associated with a spanning tree. However, these approaches are only suitable for removing incorrect maps from the input maps, and it is hard to apply them for optimizing cycle-consistent neural networks, where the neural networks change during the course of the optimization. The second category of approaches [63, 36, 69] applies constrained optimization to select cycle-consistent maps. These approaches are typically formulated so that the objective functions encode the score of selected maps, and the constraints enforce the consistency of selected maps along cycles. The major advantage of these methods is that the correct maps are determined globally, leading to better performance than the first category of approaches. Our approach is relevant to this category of methods but addresses a different problem of optimizing maps along directed map networks. In particular, we introduce the path-invariance constraint and show how to enforce the path-invariance constraint effectively using path-invariance bases.

The third category of approaches apply modern numerical optimization techniques to optimize cycle-consistent maps. Along this line, people have introduced convex optimization [16, 18, 7, 58], non-convex optimization [5, 71, 2, 62, 22], and spectral techniques [39, 50]. To apply these techniques for parametric maps, we have to hand-craft an additional latent domain, as well as parametric maps between each input domain and this latent domain, which may suffer from the issue of sub-optimal network design. In fact, although people have applied such techniques for multilingual machine translation [25], existing approaches only work if the differences among the input domains are small and there exist meaningful bidirectional maps between them, leading to undirected map networks. In contrast, we focus on directed map networks among diverse domains and explicitly enforce the path-invariance constraint via path-invariance bases.

Joint learning of neural networks. Several recent works have studied the problem of enforcing cycle-consistency among a cycle of neural networks for improving the quality of individual networks along the cycle. Zhou et al. [66] studied how to train dense image correspondences between real image objects through two real-2-synthetic networks and ground-truth correspondences between synthetic images. [72, 61]

enforce the bi-directional consistency of transformation networks between two image domains to improve the image translation results. However, in these works the cycles are explicitly given. In contrast, we study how to extend the cycle-consistency constraint on undirected graphs to the path-invariance constraint on directed graphs. In particular, we focus on how to compute a path-invariance basis for enforcing the path-invariance constraint efficiently. A recent work 

[64] studies how to build a network of representations for boosting individual tasks. However, self-supervision constraints such as cycle-consistency and path-invariance are not employed. Another distinction is that our approach seeks to leverage unlabeled data, while [64] focuses on transferring labeled data under different representations/tasks. Our approach is also related to model/data distillation (See [44] and the references therein), which can be considered as many edges between two domains. In this paper, we focus on defining self-supervision for general graphs.

Cycle-bases of graphs. Path-invariance bases are related to cycle-bases on undirected graphs [26], in which any cycle of a graph is given by a linear combination of the cycles in a cycle-basis. However, besides fundamental cycle-bases [26] that can generalize to define cycle-consistency bases, it is an open problem whether other types of cycle-bases generalize or not. Moreover, there are fundamental differences between undirected and directed map networks. This calls for new tools for defining and computing path-invariance bases.

3 Path-Invariance of Directed Map Networks

In this section, we focus on the theoretical contribution of this paper, which introduces an algorithm for computing a path-invariance basis that enforces the path-invariance constraint of a directed map network. In Section 4, we show how to leverage this path-invariance basis to jointly optimize a directed map network to improve the maps in this network. Note that the proofs of theorems and propositions in this section are deferred to the Appendix.

3.1 Path-Invariance Constraint

We first define the notion of a directed map network:

Definition 1.

We define a directed map network as an attributed directed graph where . Each vertex is associated with a domain . Each edge with is associated with a map . In the following, we always assume contains the self-loop at each vertex. The map associated with each self-loop is the identity map at the corresponding domain.

For simplicity, whenever it can be inferred from the context we simplify the terminology of a directed map network as a map network. The following definition considers induced maps along paths of a map network.

Definition 2.

Consider a path along . We define the composite map along induced from a map network on as


In particular, we define where can refer to any self-loop.

In the remaining text, for two successive paths and , we use to denote their composition.

Now we state the path-invariance constraint for map networks.

Definition 3.

Let collect all paths in that connect to . We define the set of all possible path pairs of as

We say is path-invariant if

Remark 1.

Since collects the self-loop at each , it is easy to check that path-invariance induces cycle-consistency (c.f.[15]). On the other hand, for undirected map networks it is easy to see that cycle-consistency induces path-invariance. However, this property is not true for directed map networks. For example, a map network with three vertices and three directed maps , , and has no-cycle, but one path pair .

3.2 Path-Invariance Basis

A challenge of enforcing the path-invariant constraint is that there are many possible paths between each pair of domains in a graph, leading to an intractable number of path pairs. This raises the question of how to compute a path-invariance basis , which is a set of independent path pairs that are sufficient for enforcing the path-invariance property of any map network . To rigorously define path-invariance basis, we introduce three primitive operations on path pairs merge, stitch and cut(See Figure 2):

Figure 2: Illustrations of Operations
Definition 4.

Consider a directed graph . We say two path pairs and are compatible if one path in is a sub-path of one path in or vice-versa. Without losing generality, suppose is a sub-path of and we write , which stitches three sub-paths ,,and in order. We define the merge operation so that it takes two compatible path pairs and as input and outputs a new path pair .

We proceed to define the stitch operation:

Definition 5.

We define the stitch operation so that it takes as input two path pairs and and outputs .

Finally we define the cut operation on two cycles, which will be useful for strongly connected graphs:

Definition 6.

Operation cut takes as input two path pairs and where and are two distinct cycles that have two common vertices and share a common path from to . Specifically, we assume these two cycles are and where and . We define the output of the cut operation as a new path pair .

The Definition 6 is necessary because implies . As we will see later, this operation is useful for deriving new path-invariance basis.

Now we define path-invariance , which is the critical concept of this paper:

Definition 7.

We say a collection of path pairs is a path-invariance basis on if every path-pair can be induced from a subset of through a series of merge, stitch and/or cut operations.

The following proposition shows the importance of path-invariance basis:

Proposition 1.

Consider a path-invariance basis of a graph . Then for any map network on , if

then is path-invariant.

3.3 Path-Invariance Basis Computation

We first discuss the criteria for path-invariance basis computation. Since we will formulate a loss term for each path pair in a path-invariance basis to enforce the path-invariance constraint of a map network, we place two objectives for computing a path-invariance basis. First, we require the length of the paths in each path pair to be small. Intuitively, enforcing consistency between long paths weakens the regularization on each involved map. Second, we want the size of the resulting path-invariance basis to be small in order to increase the effectiveness of gradient-descent based optimization strategies. Note that unlike cycle bases that have a fixed size (c.f. [26]), the sizes of path-invariance bases vary. In fact, in the worst case, the size of a path-invariance basis may be exponential in .

In the following, we will present an algorithm that is guaranteed to return a path-invariance basis whose size is polynomial in , i.e., in the worst case. Our algorithm builds upon the classical result that a directed graph can be factored into a directed acyclic graph whose vertices are strongly connected components of (c.f. [4]). In light of this, we describe our algorithm in three steps. We first show how to compute a path-invariance basis for a directed acyclic graph. We then discuss the case of strongly connected components. Finally, we show how to extend the result of the first two settings to arbitrary directed graphs.

1:input: Directed graph .
2:output: Path-invariance basis .
3:     Calculate SCCs for and the resulting contracted DAG .
4:     Calculate a path-invriance basis for and transform to whose elements are path pairs on .
5:     Calculate a path-invariance basis for .
6:     Calculate path-invirance pairs whenever can reach in .
7:     return
Algorithm 1 The high level algorithm flow to find a path-invariance basis.

Directed acyclic graph (or DAG). Our algorithm utilizes an important property that every DAG admits a topological order of vertices that are consistent with the edge orientations (c.f. [4]). Specifically, consider a DAG . A topological order is a bijection so that we have whenever . A topological order of a DAG can be calculated by Tarjan’s algorithm (c.f.  [53]).

Our algorithm starts with a current graph to which we add all edges in in some order later. Specifically, the edges in will be visited with respect to a (partial) edge order where , if and only if . Note that two edges , with the same head can be in arbitrary order.

For each newly visited edge , we collect a set of candidate vertices such that every vertex can reach both and in . Next we construct a set by removing from all such that can reach some distinct . In other words, is redundant because of in this case. For each vertex , we collect a new path-pair , where and are shortest paths from to and , respectively. After collecting path pairs, we augment with . With we denote the resulting path-pair set after .

Theorem 3.1.

Every topological order of returns a path-invariance basis whose size is at most .

Strongly connected graph (or SCG). To construct a path-invariance basis of a SCG , we run a slightly-modified depth-first search on from arbitrary vertex. Since is strongly connected, the resulting spanning forest must be a tree, denoted by . The path pair set is the result we obtain. In addition, we use a to collect a acyclic sub-graph of and initially it is set as empty. When traversing edge , if is visited for the first time, then we add to both and . Otherwise, there can be two possible cases:

  • is an ancestor of in . In this case we add cycle consistency , where is the tree path from to , into .

  • Otherwise, add into .

It can be proved that is indeed an acyclic graph (See Appendix A.3). Thus we can obtain a path-invariance basis on by running the algorithm stated in the DAG case. We add this basis into . The following proposition ensures is a path-invraiance basis of

Proposition 2.

The path pair set constructed above is a path-invariance basis of .

General directed graph. Given path-invariance bases constructed on DAGs and SCGs, constructing path-invariance bases on general graphs is straight-forward. Specifically, consider strongly connected components of a graph . With we denote the directed acylic graph among . We first construct path-invariance bases and for and each , respectively. We then construct a path-invariance basis of by collecting three groups of path pairs. The first group simply combines . The second group extends to the original graph. This is done by replacing each edge through a shortest path on that connects the representatives of and where representatives are arbitrarily chosen at first for each component. To calculate the third group, consider all oriented edges between each :

Note that when constructing , all edges in are shrinked to one edge in . This means when constructing , we have to enforce the consistency among on the original graph . This can be done by constructing a tree where , . is a minimum spanning tree on the graph whose vertex set is and the weight associated with edge is given by the sum of lengths of and . This strategy encourages reducing the total length of the resulting path pairs in that will be defined below:

where and denote the shortest paths from to on and from to on , respectively.Algorithm 1 shows the path-invariance basis of a DAG computed using the algorithm described above.

Theorem 3.2.

The path-pairs derived from , , and using the algorithm described above is a path-invariance basis for .

Proposition 3.

The size of is upper bounded by 111We conjecture that computing the path-invariance basis with minimum size is NP-hard..

4 Joint Map Network Optimization

In this section, we present a formulation for jointly optimizing a map network using the path-variance basis computed in the preceding section.

Consider the map network defined in Def. 1. We assume the map associated with each edge is a parametric map , where denotes hyper-parameters or network parameters of . We assume the supervision of map network is given by a superset . As we will see later, such instances happen when there exist paired data between two domains, but we do not have a direct neural network between them. To utilize such supervision, we define the induced map along an edge as the composition map (defined in (1)) along the short path from to . Here collects all the parameters. We define each supervised loss term as . The specific definition of will be deferred to Section 5.

Besides the supervised loss terms, the key component of joint map network optimization utilizes a self-supervision loss induced from the path-invariance basis . Let be a distance measure associated with domain . Consider an empirical distribution of . We define the total loss objective for joint map network optimization as


where denotes the index of the end vertex of . Essentially, (3) combines the supervised loss terms and an unsupervised regularization term that ensures the learned representations are consistent when passing unlabeled instances across the map network. We employ the ADAM optimizer [31] for optimization. In addition, we start with a small value of , e.g., , to solve (3

) for 40 epochs. We then double the value of

every 10 epochs. We stop the training procedure when . The training details are deferred to the Appendix.

5 Experimental Evaluation

This section presents an experimental evaluation of our joint map network optimization framework across three settings, namely, shape matching (Section 5.1), dense image maps (Section 5.2), and a network of hybrid 3D representations for 3D semantic segmentation (Section 5.3).

5.1 Map Network of Shape Maps

We begin with the task of joint shape matching [36, 29, 16, 18, 10], which seeks to jointly optimize a network of shape maps to improve the initial maps computed between pairs of shapes in isolation. We utilize the functional map representation described in [38, 54, 18]. Specifically, each domain is given by a linear space spanned by the leading eigenvectors of a graph Laplacian [18] (we choose in our experiments). The map from to is given by a matrix . Let be a path-invariance basis for the associated graph . Adapting (3), we solve the following optimization problem for joint shape matching:


where and are the element-wise L1-norm and the matrix Frobenius norm, respectively. denotes the initial functional map converted from the corresponding initial shape map associated with edge using [38].

Dataset. We perform experimental evaluation on SHREC07–Watertight [13], which is a challenging dataset for evaluating shape maps. Specifically, SHREC07-Watertight contains 400 shapes across 20 categories. Among them, we choose 11 categories (i.e., Human, Glasses, Airplane, Ant, Teddy, Hand, Plier, Fish, Bird, Armadillo, Fourleg) that are suitable for inter-shape mapping. We also test our approach on two large-scale datasets Aliens (200 shapes) and Vase (300 shapes) from ShapeCOSEG [59]. For initial maps, we employ blended intrinsic maps [30], a state-of-the-art method for shape matching. We test our approach under two graphs . The first graph is a clique graph. The second graph connects each shape with -nearest neighbor with respect to the GMDS descriptor [48] ( in our experiments).

Baseline approaches and evaluation metric.

We compare our approach to five baseline approaches, including three state-of-the-art approaches and two variants of our approach. Three state-of-the-art approaches are 1) functional-map based low-rank matrix recovery [18], 2) point-map based low-rank matrix recovery via alternating minimization [71], and 3) consistent partial matching via sparse modeling [10]. Two variants are 4) using a set of randomly sampled cycles [63] whose size is the same as , and 5) using the path-invariance basis derived from the fundamental cycle-basis of (c.f. [26]) (which may contain long cycles).

Figure 3:

Baseline comparison on benchmark datasets. We show cumulative distribution functions (or CDFs) of each method with respect to annotated feature correspondences.

aero bike boat bottle bus car chair table mbike sofa train tv mean
Congealing 0.13 0.24 0.05 0.21 0.22 0.11 0.09 0.05 0.14 0.09 0.10 0.09 0.13
RASL 0.18 0.20 0.05 0.36 0.33 0.19 0.14 0.06 0.19 0.13 0.14 0.29 0.19
CollectionFlow 0.17 0.18 0.06 0.33 0.31 0.15 0.15 0.04 0.12 0.11 0.10 0.11 0.12
DSP 0.19 0.33 0.07 0.21 0.36 0.37 0.12 0.07 0.19 0.13 0.15 0.21 0.20
FlowWeb 0.31 0.42 0.08 0.37 0.56 0.51 0.12 0.06 0.23 0.18 0.19 0.34 0.28
Ours-Dense 0.29 0.42 0.07 0.39 0.53 0.55 0.11 0.06 0.22 0.18 0.21 0.31 0.28
Ours-Undirected 0.32 0.43 0.07 0.43 0.56 0.55 0.18 0.06 0.26 0.21 0.25 0.37 0.31
Ours 0.35 0.45 0.07 0.45 0.63 0.62 0.19 0.06 0.27 0.22 0.23 0.38 0.33
Figure 4: (Left) Keypoint matching accuracy (PCK) on 12 rigid PASCAL VOC categories (). Higher is better. (Right) Plots of the mean PCK of each method with varying
Source Target Congealing RASL CollectionFlow DSP FlowWeb Ours
Figure 5: Visual comparison between our approach and state-of-the-art approaches. This figure is best viewed in color, zoomed in. More examples are included in the appendix.

We evaluate the quality of each map through annotated key points (Please refer to the appendix). Following [30, 16, 18], we report the cumulative distribution function (or CDF) of geodesic errors of predicted feature correspondences.

Analysis of results. Figure 3 shows CDFs of our approach and baseline approaches. All participating methods exhibit considerable improvements from the initial maps, demonstrating the benefits of joint shape matching. Compared to state-of-the-art approaches, our approach is comparable when is a clique and exhibits certain performance gains when is sparse. One explanation is that low-rank approaches are based on relaxations of the cycle-consistency constraint (c.f. [15]

), and such relaxations become loose on sparse graphs. In contrast, our approach explicitly enforces the cycle-consistency constraint (through the generalized path-invariance constraint). Compared to two variants of our approach, our approach delivers the best results on both clique graphs and knn-graphs. This is because the two alternative strategies generate many long paths and cycles in

, making the total objective function (3) hard to optimize. On knn-graphs, both our approach and the baseline of using the fundamental cycle-basis outperform the baseline of randomly sampling path pairs, showing the importance of computing a path-invariance basis for enforcing the consistency constraint.

5.2 Map Network of Dense Image Maps

In the second setting, we consider the task of optimizing dense image flows across a collection of relevant images. We again model this task using a map network , where each domain is given by an image . Our goal is to compute a dense image map (its difference to the identity map gives a dense image flow) between each pair of input images. To this end, we precompute initial dense maps using DSP [28], which is a state-of-the-art approach for dense image flows. Our goal is to obtain improved dense image maps , which lead to dense image maps between all pairs of images in via map composition (See (1)). Due to scalability issues, state-of-the-art approaches for joint estimation of dense image flows [32, 27, 41, 69] are limited to a small number of relatively low-resolution images. To address this issue, we encode dense image maps using the neural network described in [67]. Given a fixed map network and the initial dense maps , we formulate a similar optimization problem as (4) to learn the network parameters :


where denotes a path-invariance basis associated with ; is the index of the start vertex of ; is the composite network along path .

Dataset. The image sets we use are sampled from 12 rigid categories of the PASCAL-Part dataset [6]. To generate image sets that are meaningful to align, we pick the most popular view for each category (who has the smallest variance among 20-nearest neighbors). We then generate an image set for that category by collecting all images whose poses are within of this view. We construct the map network by connecting each image with k-nearest neighbors with respect to the DSP matching score [28]. Note that the resulting is a directed graph as DSP is directed.

Baseline approaches and evaluation metric. We compare our approach with Congealing [32], Collection Flow [27], RASL [41], and FlowWeb [69]. We use publicly available code for all baselines except Collection Flow, for which we implement our own version in Matlab. Note that both Flowweb and our approach use DSP as input. Moreover, we did not compare to [67], since it uses additional synthetic images as supervision. To run baseline approaches, we follow the protocol of [69] to further break each dataset into smaller ones with maximum size of 100. In addition, we consider two variants of our approach: Ours-Dense and Ours-Undirected. Ours-Dense uses the clique graph for . Ours-Undirected uses an undirected knn-graph, where the weight of each edge averages the bi-directional DSP matching scores (c.f. [28]). We employ the standard PCK measure [60], which reports the percentage of keypoints whose prediction errors fall within ( and are image height and width respectively).

Analysis of results. As shown in Figure 4 and Figure 5, our approach outperforms all existing approaches across most of the categories. Several factors contribute to such improvements. First, our approach can jointly optimize more images than baseline approaches and thus benefits more from the data-driven effect of joint matching [15, 7]

. This explains why all variants of our approach are either comparable or superior to baseline approaches. Second, our approach avoids fitting a neural network directly to dissimilar images and focuses on relatively similar images (other maps are generated by map composition), leading to additional performance gains. In fact, all existing approaches, which operate on sub-groups of similar images, also implicitly benefit from map composition. This explains why FlowWeb exhibits competing performance against Ours-Dense. Finally, Ours-Directed is superior to Ours-Undirected. This is because the outlier-ratio of

in Ours-Undirected is higher than that of Ours-Directed, which selects edges purely based on matching scores.

Ground Truth 8% Label 30% Label 100% Label 8% Label + 92%Unlabel
Figure 6: Qualitative comparisons of 3D semantic segmentation results on ScanNet [12]. Each row represents one testing instance, where ground truth and top sub-row show prediction for 21 classes and bottom sub-row only shows correctly labeled points. (Green indicates correct predictions, while red indicates false predictions.) This figure is best viewed in color, zoomed in.

5.3 Map Network of 3D Representations

In the third setting, we seek to jointly optimize a network of neural networks to improve the performance of individual networks. We are particularly interested in the task of semantic segmentation of 3D scenes. Specifically, we consider a network with seven 3D representations (See Figure 1). The first representation is the input mesh. The last representation is the space of 3D semantic segmentations. The second to fourth 3D representations are point clouds with different number of points: PCI (12K), PCII (8K), and PCIII(4K). The motivation of varying the number of points is that the patterns learned under different number of points show certain variations, which are beneficial to each other. In a similar fashion, the fifth and sixth are volumetric representations under two resolutions: VOLI() and VOLII(). The directed maps between different 3D representations fall into three categories, which are summarized below:

1. Semantic segmentation networks. Each point cloud or volumetric representation is associated with a segmentation network. Specifically, we use PointNet++ [42] and 3D U-Net[9], which are state-of-the-art network architectures for point cloud and volumetric representations, respectively.

2. Pointcloud sub-sampling maps. We have six pointcloud sub-sampling maps among the mesh representation (we uniformly sample 24K points using [37]) and three point cloud representations. For each point sub-sampling map, we force the down-sampled point cloud to align with the feature points of the input point cloud [40]. Note that this down-sampled point cloud is also optimized through a segmentation network to maximize the segmentation accuracy.

3. Generating volumetric representations. Each volumetric representation is given by the signed-distance field (or SDF) described in [52]. These SDFs are precomputed.

100% Label (Isolated) 84.2 83.3 83.4 81.9 81.5 85
8% Label (Isolated) 79.2 78.3 78.4 78.7 77.4 81.4
8% Label + Unlabel (Joint) 82.3 82.5 82.3 81.6 79.0 83.4
30% Label (Isolated) 80.8 81.9 81.2 80.3 79.5 83.2
Table 1: Semantic surface voxel label prediction accuracy on ScanNet test scenes (in percentages), following  [43]. We also show the ensembled prediction accuracy with five representations in the last column.

Experimental setup. We have evaluated our approach on ScanNet semantic segmentation benchmark [12]. Our goal is to evaluate the effectiveness of our approach when using a small labeled dataset and a large unlabeled dataset. To this end, we consider three baseline approaches, which train the segmentation network under each individual representation using 100%, 30%, and 8% of the labeled data. We then test our approach by utilizing 8% of the labeled data, which defines the data term in (3), and 92% of the unlabeled data (only the root has an empirical distribution), which defines the regularization term of (3). We initialize the segmentation network for point clouds using uniformly sampled points trained on labeled data. We then fine-tune the entire network using both labeled and unlabeled data. Code is publicly available at

Analysis of results. Figure 6 and Table 1 present qualitative and quantitative comparisons between our approach and baselines. Across all 3D representations, our approach leads to consistent improvements, demonstrating the robustness of our approach. Specifically, when using 8% labeled data and 92% unlabeled data, our approach achieved competing performance as using 30% to 100% labeled data when trained on each individual representation. Moreover, the accuracy on VOLI is competitive against using 100% of labeled data, indicating that the patterns learned under the point cloud representations are propagated to train the volumetric representations. We also tested the performance of applying popular vote [45] on the predictions of using different 3D representations. The relative performance gains of using different configurations of training data remain similar (See the last column in Table1). Please refer to Appendix C for more experimental evaluations and baseline comparisons.

6 Conclusions

In this paper, we have studied the problem of optimizing a directed map network. We have introduced the path-invariance constraint, which can be effectively encoded using path-invariance bases. We have described an algorithm for computing a path-invariance basis with polynomial time and space complexities. The effectiveness of this approach is demonstrated on three groups of map networks with diverse applications.


  • [1] S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski. Building rome in a day. Commun. ACM, 54(10):105–112, Oct. 2011.
  • [2] F. Arrigoni, A. Fusiello, B. Rossi, and P. Fragneto. Robust rotation synchronization via low-rank and sparse matrix decomposition. CoRR, abs/1505.06079, 2015.
  • [3] C. Bajaj, T. Gao, Z. He, Q. Huang, and Z. Liang. Smac: Simultaneous mapping and clustering via spectral decompositions. In ICML, pages 100–108, 2018.
  • [4] J. Bang-Jensen and G. Z. Gutin. Digraphs - theory, algorithms and applications. Springer, 2002.
  • [5] A. Chatterjee and V. M. Govindu. Efficient and robust large-scale rotation averaging. In ICCV, pages 521–528. IEEE Computer Society, 2013.
  • [6] X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun, and A. L. Yuille. Detect what you can: Detecting and representing objects using holistic models and body parts. CoRR, abs/1406.2031, 2014.
  • [7] Y. Chen, L. J. Guibas, and Q. Huang. Near-optimal joint object matching via convex relaxation. In

    Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014

    , pages 100–108, 2014.
  • [8] M. Cho, S. Kwak, C. Schmid, and J. Ponce. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In

    IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015

    , pages 1201–1210, 2015.
  • [9] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3d u-net: learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 424–432. Springer, 2016.
  • [10] L. Cosmo, E. Rodolà, A. Albarelli, F. Mémoli, and D. Cremers. Consistent partial matching of shape collections via sparse modeling. Comput. Graph. Forum, 36(1):209–221, 2017.
  • [11] D. J. Crandall, A. Owens, N. Snavely, and D. P. Huttenlocher. Sfm with mrfs: Discrete-continuous optimization for large-scale structure from motion. IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2841–2853, 2013.
  • [12] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes, 2017. cite arxiv:1702.04405.
  • [13] D. Giorgi, S. Biasotti, and L. Paraboschi. Shape retrieval contest 2007: Watertight models track, 2007.
  • [14] Q. Huang, Y. Chen, and L. J. Guibas. Scalable semidefinite relaxation for maximum A posterior estimation. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 64–72, 2014.
  • [15] Q. Huang and L. Guibas. Consistent shape maps via semidefinite programming. In Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Processing, pages 177–186, 2013.
  • [16] Q. Huang and L. J. Guibas. Consistent shape maps via semidefinite programming. Comput. Graph. Forum, 32(5):177–186, 2013.
  • [17] Q. Huang, F. Wang, and L. Guibas. Functional map networks for analyzing and exploring large shape collections. ACM Trans. Graph., 33(4):36:1–36:11, July 2014.
  • [18] Q. Huang, F. Wang, and L. J. Guibas. Functional map networks for analyzing and exploring large shape collections. ACM Trans. Graph., 33(4):36:1–36:11, 2014.
  • [19] Q. Huang, G. Zhang, L. Gao, S. Hu, A. Butscher, and L. J. Guibas. An optimization approach for extracting and encoding consistent maps in a shape collection. ACM Trans. Graph., 31(6):167:1–167:11, 2012.
  • [20] Q.-X. Huang, S. Flöry, N. Gelfand, M. Hofer, and H. Pottmann. Reassembling fractured objects by geometric matching. ACM Trans. Graph., 25(3):569–578, July 2006.
  • [21] Q.-X. Huang, G.-X. Zhang, L. Gao, S.-M. Hu, A. Butscher, and L. Guibas. An optimization approach for extracting and encoding consistent maps in a shape collection. ACM Trans. Graph., 31(6):167:1–167:11, Nov. 2012.
  • [22] X. Huang, Z. Liang, C. Bajaj, and Q. Huang. Translation synchronization via truncated least squares. In NIPS, page to appear, 2017.
  • [23] D. F. Huber and M. Hebert. Fully automatic registration of multiple 3d data sets. Image and Vision Computing, 21:637–650, 2001.
  • [24] D. F. Huber and M. Hebert. Fully automatic registration of multiple 3d data sets. Image Vision Comput., 21(7):637–650, 2003.
  • [25] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. B. Viégas, M. Wattenberg, G. Corrado, M. Hughes, and J. Dean. Google’s multilingual neural machine translation system: Enabling zero-shot translation. CoRR, abs/1611.04558, 2016.
  • [26] T. Kavitha, C. Liebchen, K. Mehlhorn, D. Michail, R. Rizzi, T. Ueckerdt, and K. A. Zweig. Survey: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev., 3(4):199–243, Nov. 2009.
  • [27] I. Kemelmacher-Shlizerman and S. M. Seitz. Collection flow. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1792–1799. IEEE, 2012.
  • [28] J. Kim, C. Liu, F. Sha, and K. Grauman. Deformable spatial pyramid matching for fast dense correspondences. In CVPR, pages 2307–2314. IEEE Computer Society, 2013.
  • [29] V. G. Kim, W. Li, N. J. Mitra, S. DiVerdi, and T. Funkhouser. Exploring collections of 3d models using fuzzy correspondences. ACM Trans. Graph., 31(4):54:1–54:11, July 2012.
  • [30] V. G. Kim, Y. Lipman, and T. Funkhouser. Blended intrinsic maps. In ACM SIGGRAPH 2011 Papers, SIGGRAPH ’11, pages 79:1–79:12, New York, NY, USA, 2011. ACM.
  • [31] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
  • [32] E. G. Learned-Miller. Data driven image models through continuous joint alignment. IEEE Trans. Pattern Anal. Mach. Intell., 28(2):236–250, Feb. 2006.
  • [33] S. Leonardos, X. Zhou, and K. Daniilidis. Distributed consistent data association via permutation synchronization. In ICRA, pages 2645–2652. IEEE, 2017.
  • [34] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):978–994, May 2011.
  • [35] A. Nguyen, M. Ben-Chen, K. Welnicka, Y. Ye, and L. Guibas. An optimization approach to improving collections of shape maps. In Computer Graphics Forum, volume 30, pages 1481–1491. Wiley Online Library, 2011.
  • [36] A. Nguyen, M. Ben-Chen, K. Welnicka, Y. Ye, and L. J. Guibas. An optimization approach to improving collections of shape maps. Comput. Graph. Forum, 30(5):1481–1491, 2011.
  • [37] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. Shape distributions. ACM Trans. Graph., 21(4):807–832, Oct. 2002.
  • [38] M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. Guibas. Functional maps: A flexible representation of maps between shapes. ACM Transactions on Graphics, 31(4), 2012.
  • [39] D. Pachauri, R. Kondor, and V. Singh. Solving the multi-way matching problem by permutation synchronization. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 1860–1868. Curran Associates, Inc., 2013.
  • [40] M. Pauly, R. Keiser, and M. Gross.

    Multi-scale Feature Extraction on Point-Sampled Surfaces.

    Computer Graphics Forum, 2003.
  • [41] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma. Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell., 34(11):2233–2246, Nov. 2012.
  • [42] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. CoRR, abs/1706.02413, 2017.
  • [43] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, pages 5099–5108, 2017.
  • [44] I. Radosavovic, P. Dollár, R. Girshick, G. Gkioxari, and K. He. Data distillation: Towards omni-supervised learning. arXiv preprint arXiv:1712.04440, 2017.
  • [45] L. Rokach.

    Ensemble-based classifiers.

    Artif. Intell. Rev., 33(1-2):1–39, Feb. 2010.
  • [46] M. Rubinstein, A. Joulin, J. Kopf, and C. Liu. Unsupervised joint object discovery and segmentation in internet images. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, June 23-28, 2013, pages 1939–1946. IEEE Computer Society, 2013.
  • [47] M. Rubinstein, C. Liu, and W. T. Freeman. Joint inference in weakly-annotated image datasets via dense correspondence. Int. J. Comput. Vision, 119(1):23–45, Aug. 2016.
  • [48] R. M. Rustamov.

    Laplace-beltrami eigenfunctions for deformation invariant shape representation.

    In Proceedings of the Fifth Eurographics Symposium on Geometry Processing, SGP ’07, pages 225–233, Aire-la-Ville, Switzerland, Switzerland, 2007. Eurographics Association.
  • [49] Y. Shen, Q. Huang, N. Srebro, and S. Sanghavi. Normalized spectral map synchronization. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 4925–4933. Curran Associates, Inc., 2016.
  • [50] Y. Shen, Q. Huang, N. Srebro, and S. Sanghavi. Normalized spectral map synchronization. In Neural Information Processing Systems (NIPS), 2016.
  • [51] N. Snavely, S. M. Seitz, and R. Szeliski. Photo tourism: Exploring photo collections in 3d. ACM Trans. Graph., 25(3):835–846, July 2006.
  • [52] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  • [53] R. E. Tarjan. Edge-disjoint spanning trees and depth-first search. Acta Inf., 6(2):171–185, June 1976.
  • [54] F. Wang, Q. Huang, and L. Guibas. Image co-segmentation via consistent functional maps. In In Proceedings of the 14th International Conference on Computer Vision (ICCV), 2013.
  • [55] F. Wang, Q. Huang, and L. J. Guibas. Image co-segmentation via consistent functional maps. In Proceedings of the 2013 IEEE International Conference on Computer Vision, ICCV ’13, pages 849–856, Washington, DC, USA, 2013. IEEE Computer Society.
  • [56] F. Wang, Q. Huang, M. Ovsjanikov, and L. J. Guibas. Unsupervised multi-class joint image segmentation. In CVPR, pages 3142–3149. IEEE Computer Society, 2014.
  • [57] F. Wang, Qixing Huang, and L. J. Guibas. Image co-segmentation via consistent functional maps. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 849–856, 2013.
  • [58] L. Wang and A. Singer. Exact and stable recovery of rotations for robust synchronization. CoRR, abs/1211.2441, 2012.
  • [59] Y. Wang, S. Asafi, O. van Kaick, H. Zhang, D. Cohen-Or, and B. Chen. Active co-analysis of a set of shapes. ACM Trans. Graph., 31(6):165:1–165:10, Nov. 2012.
  • [60] Y. Yang and D. Ramanan. Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2878–2890, Dec. 2013.
  • [61] Z. Yi, H. Zhang, P. Tan, and M. Gong. Dualgan: Unsupervised dual learning for image-to-image translation. CoRR, abs/1704.02510, 2017.
  • [62] Yuxin Chen and E. Candes. The projected power method: An efficient algorithm for joint alignment from pairwise differences., 2016.
  • [63] C. Zach, M. Klopschitz, and M. Pollefeys. Disambiguating visual relations using loop constraints. In CVPR, pages 1426–1433. IEEE Computer Society, 2010.
  • [64] A. R. Zamir, A. Sax, W. B. Shen, L. J. Guibas, J. Malik, and S. Savarese. Taskonomy: Disentangling task transfer learning. CoRR, abs/1804.08328, 2018.
  • [65] T. Zhou, P. Krähenbühl, M. Aubry, Q. Huang, and A. A. Efros. Learning dense correspondence via 3d-guided cycle consistency. CoRR, abs/1604.05383, 2016.
  • [66] T. Zhou, P. Krähenbühl, M. Aubry, Q. Huang, and A. A. Efros. Learning dense correspondence via 3d-guided cycle consistency. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 117–126, 2016.
  • [67] T. Zhou, P. Krähenbühl, M. Aubry, Q. Huang, and A. A. Efros. Learning dense correspondence via 3d-guided cycle consistency. In Computer Vision and Pattern Recognition (CVPR), 2016.
  • [68] T. Zhou, Y. J. Lee, S. X. Yu, and A. A. Efros. Flowweb: Joint image set alignment by weaving consistent, pixel-wise correspondences. In CVPR, pages 1191–1200. IEEE Computer Society, 2015.
  • [69] T. Zhou, Y. J. Lee, S. X. Yu, and A. A. Efros. Flowweb: Joint image set alignment by weaving consistent, pixel-wise correspondences. In CVPR, pages 1191–1200. IEEE Computer Society, 2015.
  • [70] X. Zhou, M. Zhu, and K. Daniilidis. Multi-image matching via fast alternating minimization. CoRR, abs/1505.04845, 2015.
  • [71] X. Zhou, M. Zhu, and K. Daniilidis. Multi-image matching via fast alternating minimization. In ICCV, pages 4032–4040, Santiago, Chile, 2015. IEEE Computer Society.
  • [72] J. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. CoRR, abs/1703.10593, 2017.

Appendix A Proof of Theorems and Propositions

a.1 Proof of Proposition 1

To show that is path-invariant, it suffices to prove that for every path pair . But by Definition 7, is either in or can be induced from a finite number of operations with merge, stitch and/or cut. So if we can show that the output path pair in every round of operation keeps consistency on the map network , given the input path pairs are consistent, then all path pairs on would be path-invariant by employing an induction proof. Next we achieve this goal by considering three operations respectively.

  • merge. The merge operation takes as input two path pairs where , i.e., is formed by stitching three sub-paths and in order. By Definition 2, it is easy to see that

    But we are given that is consistent on the input pairs, or equivalently,


    So is also consistent on path pair .

  • stitch. The stitch operation takes as input two path pairs where and . Since is consistent on and , it follows immediately

    which means is also consistent on .

  • cut. The cut operation takes as input two path pairs and , where and are two common vertices and share a common intermediate path from to . The two cycles can be represented by and where and . Since is consistent on and , we have

    However, it is known that the inverse of some function must be unique, giving the following result

    or in other words, is consistent on path pair .

The consistency of on the output pairs for all three operations given the consistency on their input pairs ensures our proposition.∎

a.2 Proof of Theorem 3.1

The algorithm adds exactly edges in total. And during each edge insertion, at most path pairs would be added to , thus it follows immediately that .

Next we show that indeed is a path-invariance basis for . To this end, we will verify that every path pair in can be induced from a subset of by operations, using a induction proof. In particular, we claim that at all time points, all path pairs in can be induced from by a series of operations. Initially, this inductive assumption holds trivially since is an empty set.

Figure 7: An Illustration for Path-Pair Generation

Suppose now we were processing an edge (so at this time point) and let . By inductive assumption, all path pairs in can be induced from . After inserting into , it suffices to consider path pairs that contain edge since all other path pairs have been guaranteed by inductive assumption. Let be a path pair in containing . Without loss of generality, suppose and

If , then can be induced by stitching and where . We assume , and then would be a path from to in and would be a path from to in .

Recall the definition of and . immediately follows. If , then there exists such that can reach in and and denote such path as . For convenience, we let and when . Every vertex in corresponds to a path-invariance pair to be added to by our algorithm. Here we assume that it is for where , and is within .

By the property of DAG and the order of edge insertion, all paths from to in are also in since . Thus can be induced from by inductive assumption. Similarly, as is within , is also a path-invariance pair, which can be induced from . Next we give the operation steps to build :


For the last step, notice that and is equivalent to . Thus all path pairs in can be induced by path pairs in with a series of operations, which completes our proof by induction. ∎

a.3 Proof of Proposition 2

Before proving Proposition 2, we first introduce some well-known terms for depth-first search. There are two time stamps and for each vertex , where is defined as the time point when it visits for the first time and as the time point when it finishes visiting . Some edge in can be classified into one of four disjoint types as follows:

  • Tree Edge: is visited for the first time as we traverse the edge . In this case will be added into the resulting DFS spanning tree. For tree edge we have

  • Back Edge: is visited and is an ancestor of in the current spanning tree. For back edge we have

  • Forward Edge: is visited and is an ancestor of in the current spanning tree. For forward edge we have

  • Cross Edge: is visited and is neither an ancestor nor descendant of in the current spanning tree. For cross edge we have

Using these definitions, we prove that:

Any cycle in have a vertex in such that all other vertices are located within the sub-tree rooted at , i.e., are the descendants of in .


Without loss of generality, is assumed to be the one with smallest among all . If not all are descendants of , we choose to be the one with smallest , which means is a descendant of but is not. Obviously cannot be a tree edge or forward edge, which causes to be a descendant of and also a descendant of . If is a back edge, then is not a descendant of if and only if since there is in fact unique back path in the spanning tree . But means is a parent of , and thus there exists a smaller than , which results in a contradiction. Also cannot be a cross edge. In fact, since is a descendant of , we have . Together with from cross edge property, we have . But is not a descendant or ancestor of , which means the sub-tree rooted at is disjoint from the sub-tree rooted at , so intervals and must be disjoint by the property of depth-first search. As thus implies , which contradicts the assumption that is smallest among . Hence all are descendants of .

Now come back to the original proposition. Continue using the notation defined above. In addition we define as the sub-path from to , i.e.,

We will show can be induced from by a finite number of operations with merge, stitch and cut. Above all, we have assumed the property of path-invariance on by Theorem 3.1. Given is the common ancestor of all , we inductively prove the following statement:

The path is equivalent to the tree path from to . Here tree path means a path in which all edges are in the spanning tree .

The base case is trivial. Now suppose is equivalent to tree path from to and we continue to check .

  • If is a tree edge, then is still a tree path and a stitch operation on path pair and gives the equivalency that we want.

  • If is a forward edge, then there exists a tree path from to . By path-invariance on , we can stitch two path-invariance pair and to obtain the desired equivalency.

  • If is a back edge, then there exists a tree path from to . In addition by our construction the cycle has been added into our basis set . Denote the tree path from to as , then stitching and gives . On the other hand, by inductive assumption we have path-invariance pair since is just the tree path from to . Thus by merging and we obtain the path pair , or equivalently, .

  • If is a cross edge, then has been included in . Denote by the tree path from to . In this way all would be equivalent to another tree path from to since all edges involved here are within which maintains all possible path-invariance pairs. By merging path pairs and we obtain path pair , or , which is exactly we want to verify.

As thus we finished our inductive proof. In particular, the path (also a cycle) is equivalent to , or more precisely, the path pair can be induced from by a finite number of merge and stitch operations.

To complete our proof, we need to show that all path pairs in instead of just can be induced from . This is relatively easy. Consider two path and both from to . Since is strongly connected, there must exist some path from to . The cut operation on and for the common vertices and immediately gives the path pair . ∎

Source Target Congealing RASL CollectionFlow DSP FlowWeb Ours
Figure 8: Visual comparison between our approach and state-of-the-art approaches. This figure is best viewed in color, zoomed in.

a.4 Proof of Theorem 3.2

To prove this theorem, we first prove the following lemma:

Lemma Suppose and are two strongly connected components in with . Given any vertices and with , and paths ,