1 Introduction
Optimizing a network of maps among a collection of objects/domains (or map synchronization) is a central problem across computer vision and many other relevant fields. Important applications include establishing consistent feature correspondences for multiview structurefrommotion [1, 11, 51, 5], computing consistent relative camera poses for 3D reconstruction [24, 20], dense image flows [68, 65], image translation [72, 61], and optimizing consistent dense correspondences for cosegmentation [57, 17, 56] and object discovery [46, 8], just to name a few. The benefit of optimizing a map network versus optimizing maps between pairs of objects in isolation comes from the cycleconsistency constraint [35, 21, 15, 55], namely composite maps along cycles should be the identity map. For example, this constraint allows us to replace an incorrect map between a pair of dissimilar objects by composing maps along a path of similar objects [21]. Computationally, stateoftheart map synchronization techniques [3, 7, 15, 22, 49, 18, 19, 29, 69, 70, 33] employ matrix representations of maps [29, 21, 15, 56, 18]. This allows us to utilize a lowrank formulation of the cycleconsistency constraint (c.f. [15]), leading to efficient and robust solutions [18, 70, 49, 22].
In this paper, we focus on a map synchronization setting, where matrixbased map encodings become too costly or even infeasible. Such instances include optimizing dense flows across many highresolution images [34, 28, 47] or optimizing a network of neural networks, each of which maps one domain to another domain (e.g., 3D semantic segmentation [12] that maps the space of 3D scenes to the space of 3D segmentations). In this setting, maps are usually encoded as broadly defined parametric maps (e.g., feedforward neural networks), and map optimization reduces to optimizing hyperparameters and/or network parameters. Synchronizing parametric maps introduces many technical challenges. For example, unlike correspondences between objects, which are undirected, a parametric map may not have a meaningful inverse map (e.g., a neural network that takes a shape as input and outputs its semantic label). This raises the challenge of formulating an equivalent regularization constraint of cycleconsistency for directed map networks. In addition, as matrixbased map encodings are infeasible for parametric maps, another key challenge is how to efficiently enforce the regularization constraint for map synchronization.
We introduce a computational framework for optimizing directed map networks that addresses the challenges described above. Specifically, we propose the socalled pathinvariance constraint, which ensures that whenever there exists a map from a source domain to a target domain (through map composition along a path), the map is unique. This pathinvariance constraint not only warrants that a map network is welldefined, but more importantly it provides a natural regularization constraint for optimizing directed map networks. To effectively enforce this pathinvariance constraint, we introduce the notion of a pathinvariance basis, which collects independent path pairs that can induce the pathinvariance property of the entire map network. We also present an algorithm for computing a pathinvariance basis from an arbitrary directed map network. The algorithm possesses polynomial time and space complexities.
We demonstrate the effectiveness of our approach on three settings of map synchronization. The first setting considers undirected map networks that can be optimized using lowrank formulations. [18, 70]. Experimental results show that our new formulation leads to competitive and sometimes better results than stateoftheart lowrank formulations. The second setting studies consistent dense image maps, where each pairwise map is given by a neural network. Experimental results show that our approach significantly outperforms stateoftheart approaches for computing dense image correspondences. The third setting considers a map network that consists of different 3D representations (e.g., point cloud and volumetric representations) for the task of semantic 3D semantic segmentation (See Figure 1). By enforcing the pathinvariance of neural networks on unlabeled data, our approach only requires 8% labeled data from ScanNet [12] to achieve the same performance as training a single semantic segmentation network with 30% to 100% labeled data.
2 Related Works
Map synchronization. So far most map synchronization techniques [23, 20, 63, 36, 16, 69, 18, 7, 58, 5, 71, 2, 62, 22, 39, 50, 14, 66, 72, 61] have focused on undirected map graphs, where the natural regularization constraint is given by cycleconsistency. Depending on how the cycleconsistency constraint is applied, existing approaches fall into three categories. The first category of methods [23, 20] utilizes the fact that a collection of cycleconsistent maps can be generated from maps associated with a spanning tree. However, these approaches are only suitable for removing incorrect maps from the input maps, and it is hard to apply them for optimizing cycleconsistent neural networks, where the neural networks change during the course of the optimization. The second category of approaches [63, 36, 69] applies constrained optimization to select cycleconsistent maps. These approaches are typically formulated so that the objective functions encode the score of selected maps, and the constraints enforce the consistency of selected maps along cycles. The major advantage of these methods is that the correct maps are determined globally, leading to better performance than the first category of approaches. Our approach is relevant to this category of methods but addresses a different problem of optimizing maps along directed map networks. In particular, we introduce the pathinvariance constraint and show how to enforce the pathinvariance constraint effectively using pathinvariance bases.
The third category of approaches apply modern numerical optimization techniques to optimize cycleconsistent maps. Along this line, people have introduced convex optimization [16, 18, 7, 58], nonconvex optimization [5, 71, 2, 62, 22], and spectral techniques [39, 50]. To apply these techniques for parametric maps, we have to handcraft an additional latent domain, as well as parametric maps between each input domain and this latent domain, which may suffer from the issue of suboptimal network design. In fact, although people have applied such techniques for multilingual machine translation [25], existing approaches only work if the differences among the input domains are small and there exist meaningful bidirectional maps between them, leading to undirected map networks. In contrast, we focus on directed map networks among diverse domains and explicitly enforce the pathinvariance constraint via pathinvariance bases.
Joint learning of neural networks. Several recent works have studied the problem of enforcing cycleconsistency among a cycle of neural networks for improving the quality of individual networks along the cycle. Zhou et al. [66] studied how to train dense image correspondences between real image objects through two real2synthetic networks and groundtruth correspondences between synthetic images. [72, 61]
enforce the bidirectional consistency of transformation networks between two image domains to improve the image translation results. However, in these works the cycles are explicitly given. In contrast, we study how to extend the cycleconsistency constraint on undirected graphs to the pathinvariance constraint on directed graphs. In particular, we focus on how to compute a pathinvariance basis for enforcing the pathinvariance constraint efficiently. A recent work
[64] studies how to build a network of representations for boosting individual tasks. However, selfsupervision constraints such as cycleconsistency and pathinvariance are not employed. Another distinction is that our approach seeks to leverage unlabeled data, while [64] focuses on transferring labeled data under different representations/tasks. Our approach is also related to model/data distillation (See [44] and the references therein), which can be considered as many edges between two domains. In this paper, we focus on defining selfsupervision for general graphs.Cyclebases of graphs. Pathinvariance bases are related to cyclebases on undirected graphs [26], in which any cycle of a graph is given by a linear combination of the cycles in a cyclebasis. However, besides fundamental cyclebases [26] that can generalize to define cycleconsistency bases, it is an open problem whether other types of cyclebases generalize or not. Moreover, there are fundamental differences between undirected and directed map networks. This calls for new tools for defining and computing pathinvariance bases.
3 PathInvariance of Directed Map Networks
In this section, we focus on the theoretical contribution of this paper, which introduces an algorithm for computing a pathinvariance basis that enforces the pathinvariance constraint of a directed map network. In Section 4, we show how to leverage this pathinvariance basis to jointly optimize a directed map network to improve the maps in this network. Note that the proofs of theorems and propositions in this section are deferred to the Appendix.
3.1 PathInvariance Constraint
We first define the notion of a directed map network:
Definition 1.
We define a directed map network as an attributed directed graph where . Each vertex is associated with a domain . Each edge with is associated with a map . In the following, we always assume contains the selfloop at each vertex. The map associated with each selfloop is the identity map at the corresponding domain.
For simplicity, whenever it can be inferred from the context we simplify the terminology of a directed map network as a map network. The following definition considers induced maps along paths of a map network.
Definition 2.
Consider a path along . We define the composite map along induced from a map network on as
(1) 
In particular, we define where can refer to any selfloop.
In the remaining text, for two successive paths and , we use to denote their composition.
Now we state the pathinvariance constraint for map networks.
Definition 3.
Let collect all paths in that connect to . We define the set of all possible path pairs of as
We say is pathinvariant if
(2) 
Remark 1.
Since collects the selfloop at each , it is easy to check that pathinvariance induces cycleconsistency (c.f.[15]). On the other hand, for undirected map networks it is easy to see that cycleconsistency induces pathinvariance. However, this property is not true for directed map networks. For example, a map network with three vertices and three directed maps , , and has nocycle, but one path pair .
3.2 PathInvariance Basis
A challenge of enforcing the pathinvariant constraint is that there are many possible paths between each pair of domains in a graph, leading to an intractable number of path pairs. This raises the question of how to compute a pathinvariance basis , which is a set of independent path pairs that are sufficient for enforcing the pathinvariance property of any map network . To rigorously define pathinvariance basis, we introduce three primitive operations on path pairs merge, stitch and cut(See Figure 2):
Definition 4.
Consider a directed graph . We say two path pairs and are compatible if one path in is a subpath of one path in or viceversa. Without losing generality, suppose is a subpath of and we write , which stitches three subpaths ,,and in order. We define the merge operation so that it takes two compatible path pairs and as input and outputs a new path pair .
We proceed to define the stitch operation:
Definition 5.
We define the stitch operation so that it takes as input two path pairs and and outputs .
Finally we define the cut operation on two cycles, which will be useful for strongly connected graphs:
Definition 6.
Operation cut takes as input two path pairs and where and are two distinct cycles that have two common vertices and share a common path from to . Specifically, we assume these two cycles are and where and . We define the output of the cut operation as a new path pair .
The Definition 6 is necessary because implies . As we will see later, this operation is useful for deriving new pathinvariance basis.
Now we define pathinvariance , which is the critical concept of this paper:
Definition 7.
We say a collection of path pairs is a pathinvariance basis on if every pathpair can be induced from a subset of through a series of merge, stitch and/or cut operations.
The following proposition shows the importance of pathinvariance basis:
Proposition 1.
Consider a pathinvariance basis of a graph . Then for any map network on , if
then is pathinvariant.
3.3 PathInvariance Basis Computation
We first discuss the criteria for pathinvariance basis computation. Since we will formulate a loss term for each path pair in a pathinvariance basis to enforce the pathinvariance constraint of a map network, we place two objectives for computing a pathinvariance basis. First, we require the length of the paths in each path pair to be small. Intuitively, enforcing consistency between long paths weakens the regularization on each involved map. Second, we want the size of the resulting pathinvariance basis to be small in order to increase the effectiveness of gradientdescent based optimization strategies. Note that unlike cycle bases that have a fixed size (c.f. [26]), the sizes of pathinvariance bases vary. In fact, in the worst case, the size of a pathinvariance basis may be exponential in .
In the following, we will present an algorithm that is guaranteed to return a pathinvariance basis whose size is polynomial in , i.e., in the worst case. Our algorithm builds upon the classical result that a directed graph can be factored into a directed acyclic graph whose vertices are strongly connected components of (c.f. [4]). In light of this, we describe our algorithm in three steps. We first show how to compute a pathinvariance basis for a directed acyclic graph. We then discuss the case of strongly connected components. Finally, we show how to extend the result of the first two settings to arbitrary directed graphs.
Directed acyclic graph (or DAG). Our algorithm utilizes an important property that every DAG admits a topological order of vertices that are consistent with the edge orientations (c.f. [4]). Specifically, consider a DAG . A topological order is a bijection so that we have whenever . A topological order of a DAG can be calculated by Tarjan’s algorithm (c.f. [53]).
Our algorithm starts with a current graph to which we add all edges in in some order later. Specifically, the edges in will be visited with respect to a (partial) edge order where , if and only if . Note that two edges , with the same head can be in arbitrary order.
For each newly visited edge , we collect a set of candidate vertices such that every vertex can reach both and in . Next we construct a set by removing from all such that can reach some distinct . In other words, is redundant because of in this case. For each vertex , we collect a new pathpair , where and are shortest paths from to and , respectively. After collecting path pairs, we augment with . With we denote the resulting pathpair set after .
Theorem 3.1.
Every topological order of returns a pathinvariance basis whose size is at most .
Strongly connected graph (or SCG). To construct a pathinvariance basis of a SCG , we run a slightlymodified depthfirst search on from arbitrary vertex. Since is strongly connected, the resulting spanning forest must be a tree, denoted by . The path pair set is the result we obtain. In addition, we use a to collect a acyclic subgraph of and initially it is set as empty. When traversing edge , if is visited for the first time, then we add to both and . Otherwise, there can be two possible cases:

is an ancestor of in . In this case we add cycle consistency , where is the tree path from to , into .

Otherwise, add into .
It can be proved that is indeed an acyclic graph (See Appendix A.3). Thus we can obtain a pathinvariance basis on by running the algorithm stated in the DAG case. We add this basis into . The following proposition ensures is a pathinvraiance basis of
Proposition 2.
The path pair set constructed above is a pathinvariance basis of .
General directed graph. Given pathinvariance bases constructed on DAGs and SCGs, constructing pathinvariance bases on general graphs is straightforward. Specifically, consider strongly connected components of a graph . With we denote the directed acylic graph among . We first construct pathinvariance bases and for and each , respectively. We then construct a pathinvariance basis of by collecting three groups of path pairs. The first group simply combines . The second group extends to the original graph. This is done by replacing each edge through a shortest path on that connects the representatives of and where representatives are arbitrarily chosen at first for each component. To calculate the third group, consider all oriented edges between each :
Note that when constructing , all edges in are shrinked to one edge in . This means when constructing , we have to enforce the consistency among on the original graph . This can be done by constructing a tree where , . is a minimum spanning tree on the graph whose vertex set is and the weight associated with edge is given by the sum of lengths of and . This strategy encourages reducing the total length of the resulting path pairs in that will be defined below:
where and denote the shortest paths from to on and from to on , respectively.Algorithm 1 shows the pathinvariance basis of a DAG computed using the algorithm described above.
Theorem 3.2.
The pathpairs derived from , , and using the algorithm described above is a pathinvariance basis for .
Proposition 3.
The size of is upper bounded by ^{1}^{1}1We conjecture that computing the pathinvariance basis with minimum size is NPhard..
4 Joint Map Network Optimization
In this section, we present a formulation for jointly optimizing a map network using the pathvariance basis computed in the preceding section.
Consider the map network defined in Def. 1. We assume the map associated with each edge is a parametric map , where denotes hyperparameters or network parameters of . We assume the supervision of map network is given by a superset . As we will see later, such instances happen when there exist paired data between two domains, but we do not have a direct neural network between them. To utilize such supervision, we define the induced map along an edge as the composition map (defined in (1)) along the short path from to . Here collects all the parameters. We define each supervised loss term as . The specific definition of will be deferred to Section 5.
Besides the supervised loss terms, the key component of joint map network optimization utilizes a selfsupervision loss induced from the pathinvariance basis . Let be a distance measure associated with domain . Consider an empirical distribution of . We define the total loss objective for joint map network optimization as
(3) 
where denotes the index of the end vertex of . Essentially, (3) combines the supervised loss terms and an unsupervised regularization term that ensures the learned representations are consistent when passing unlabeled instances across the map network. We employ the ADAM optimizer [31] for optimization. In addition, we start with a small value of , e.g., , to solve (3
) for 40 epochs. We then double the value of
every 10 epochs. We stop the training procedure when . The training details are deferred to the Appendix.5 Experimental Evaluation
This section presents an experimental evaluation of our joint map network optimization framework across three settings, namely, shape matching (Section 5.1), dense image maps (Section 5.2), and a network of hybrid 3D representations for 3D semantic segmentation (Section 5.3).
5.1 Map Network of Shape Maps
We begin with the task of joint shape matching [36, 29, 16, 18, 10], which seeks to jointly optimize a network of shape maps to improve the initial maps computed between pairs of shapes in isolation. We utilize the functional map representation described in [38, 54, 18]. Specifically, each domain is given by a linear space spanned by the leading eigenvectors of a graph Laplacian [18] (we choose in our experiments). The map from to is given by a matrix . Let be a pathinvariance basis for the associated graph . Adapting (3), we solve the following optimization problem for joint shape matching:
(4) 
where and are the elementwise L1norm and the matrix Frobenius norm, respectively. denotes the initial functional map converted from the corresponding initial shape map associated with edge using [38].
Dataset. We perform experimental evaluation on SHREC07–Watertight [13], which is a challenging dataset for evaluating shape maps. Specifically, SHREC07Watertight contains 400 shapes across 20 categories. Among them, we choose 11 categories (i.e., Human, Glasses, Airplane, Ant, Teddy, Hand, Plier, Fish, Bird, Armadillo, Fourleg) that are suitable for intershape mapping. We also test our approach on two largescale datasets Aliens (200 shapes) and Vase (300 shapes) from ShapeCOSEG [59]. For initial maps, we employ blended intrinsic maps [30], a stateoftheart method for shape matching. We test our approach under two graphs . The first graph is a clique graph. The second graph connects each shape with nearest neighbor with respect to the GMDS descriptor [48] ( in our experiments).
Baseline approaches and evaluation metric.
We compare our approach to five baseline approaches, including three stateoftheart approaches and two variants of our approach. Three stateoftheart approaches are 1) functionalmap based lowrank matrix recovery [18], 2) pointmap based lowrank matrix recovery via alternating minimization [71], and 3) consistent partial matching via sparse modeling [10]. Two variants are 4) using a set of randomly sampled cycles [63] whose size is the same as , and 5) using the pathinvariance basis derived from the fundamental cyclebasis of (c.f. [26]) (which may contain long cycles).Baseline comparison on benchmark datasets. We show cumulative distribution functions (or CDFs) of each method with respect to annotated feature correspondences.
aero  bike  boat  bottle  bus  car  chair  table  mbike  sofa  train  tv  mean  
Congealing  0.13  0.24  0.05  0.21  0.22  0.11  0.09  0.05  0.14  0.09  0.10  0.09  0.13  
RASL  0.18  0.20  0.05  0.36  0.33  0.19  0.14  0.06  0.19  0.13  0.14  0.29  0.19  
CollectionFlow  0.17  0.18  0.06  0.33  0.31  0.15  0.15  0.04  0.12  0.11  0.10  0.11  0.12  
DSP  0.19  0.33  0.07  0.21  0.36  0.37  0.12  0.07  0.19  0.13  0.15  0.21  0.20  
FlowWeb  0.31  0.42  0.08  0.37  0.56  0.51  0.12  0.06  0.23  0.18  0.19  0.34  0.28  
OursDense  0.29  0.42  0.07  0.39  0.53  0.55  0.11  0.06  0.22  0.18  0.21  0.31  0.28  
OursUndirected  0.32  0.43  0.07  0.43  0.56  0.55  0.18  0.06  0.26  0.21  0.25  0.37  0.31  
Ours  0.35  0.45  0.07  0.45  0.63  0.62  0.19  0.06  0.27  0.22  0.23  0.38  0.33 
Source  Target  Congealing  RASL  CollectionFlow  DSP  FlowWeb  Ours 

We evaluate the quality of each map through annotated key points (Please refer to the appendix). Following [30, 16, 18], we report the cumulative distribution function (or CDF) of geodesic errors of predicted feature correspondences.
Analysis of results. Figure 3 shows CDFs of our approach and baseline approaches. All participating methods exhibit considerable improvements from the initial maps, demonstrating the benefits of joint shape matching. Compared to stateoftheart approaches, our approach is comparable when is a clique and exhibits certain performance gains when is sparse. One explanation is that lowrank approaches are based on relaxations of the cycleconsistency constraint (c.f. [15]
), and such relaxations become loose on sparse graphs. In contrast, our approach explicitly enforces the cycleconsistency constraint (through the generalized pathinvariance constraint). Compared to two variants of our approach, our approach delivers the best results on both clique graphs and knngraphs. This is because the two alternative strategies generate many long paths and cycles in
, making the total objective function (3) hard to optimize. On knngraphs, both our approach and the baseline of using the fundamental cyclebasis outperform the baseline of randomly sampling path pairs, showing the importance of computing a pathinvariance basis for enforcing the consistency constraint.5.2 Map Network of Dense Image Maps
In the second setting, we consider the task of optimizing dense image flows across a collection of relevant images. We again model this task using a map network , where each domain is given by an image . Our goal is to compute a dense image map (its difference to the identity map gives a dense image flow) between each pair of input images. To this end, we precompute initial dense maps using DSP [28], which is a stateoftheart approach for dense image flows. Our goal is to obtain improved dense image maps , which lead to dense image maps between all pairs of images in via map composition (See (1)). Due to scalability issues, stateoftheart approaches for joint estimation of dense image flows [32, 27, 41, 69] are limited to a small number of relatively lowresolution images. To address this issue, we encode dense image maps using the neural network described in [67]. Given a fixed map network and the initial dense maps , we formulate a similar optimization problem as (4) to learn the network parameters :
(5) 
where denotes a pathinvariance basis associated with ; is the index of the start vertex of ; is the composite network along path .
Dataset. The image sets we use are sampled from 12 rigid categories of the PASCALPart dataset [6]. To generate image sets that are meaningful to align, we pick the most popular view for each category (who has the smallest variance among 20nearest neighbors). We then generate an image set for that category by collecting all images whose poses are within of this view. We construct the map network by connecting each image with knearest neighbors with respect to the DSP matching score [28]. Note that the resulting is a directed graph as DSP is directed.
Baseline approaches and evaluation metric. We compare our approach with Congealing [32], Collection Flow [27], RASL [41], and FlowWeb [69]. We use publicly available code for all baselines except Collection Flow, for which we implement our own version in Matlab. Note that both Flowweb and our approach use DSP as input. Moreover, we did not compare to [67], since it uses additional synthetic images as supervision. To run baseline approaches, we follow the protocol of [69] to further break each dataset into smaller ones with maximum size of 100. In addition, we consider two variants of our approach: OursDense and OursUndirected. OursDense uses the clique graph for . OursUndirected uses an undirected knngraph, where the weight of each edge averages the bidirectional DSP matching scores (c.f. [28]). We employ the standard PCK measure [60], which reports the percentage of keypoints whose prediction errors fall within ( and are image height and width respectively).
Analysis of results. As shown in Figure 4 and Figure 5, our approach outperforms all existing approaches across most of the categories. Several factors contribute to such improvements. First, our approach can jointly optimize more images than baseline approaches and thus benefits more from the datadriven effect of joint matching [15, 7]
. This explains why all variants of our approach are either comparable or superior to baseline approaches. Second, our approach avoids fitting a neural network directly to dissimilar images and focuses on relatively similar images (other maps are generated by map composition), leading to additional performance gains. In fact, all existing approaches, which operate on subgroups of similar images, also implicitly benefit from map composition. This explains why FlowWeb exhibits competing performance against OursDense. Finally, OursDirected is superior to OursUndirected. This is because the outlierratio of
in OursUndirected is higher than that of OursDirected, which selects edges purely based on matching scores.Ground Truth  8% Label  30% Label  100% Label  8% Label + 92%Unlabel 

5.3 Map Network of 3D Representations
In the third setting, we seek to jointly optimize a network of neural networks to improve the performance of individual networks. We are particularly interested in the task of semantic segmentation of 3D scenes. Specifically, we consider a network with seven 3D representations (See Figure 1). The first representation is the input mesh. The last representation is the space of 3D semantic segmentations. The second to fourth 3D representations are point clouds with different number of points: PCI (12K), PCII (8K), and PCIII(4K). The motivation of varying the number of points is that the patterns learned under different number of points show certain variations, which are beneficial to each other. In a similar fashion, the fifth and sixth are volumetric representations under two resolutions: VOLI() and VOLII(). The directed maps between different 3D representations fall into three categories, which are summarized below:
1. Semantic segmentation networks. Each point cloud or volumetric representation is associated with a segmentation network. Specifically, we use PointNet++ [42] and 3D UNet[9], which are stateoftheart network architectures for point cloud and volumetric representations, respectively.
2. Pointcloud subsampling maps. We have six pointcloud subsampling maps among the mesh representation (we uniformly sample 24K points using [37]) and three point cloud representations. For each point subsampling map, we force the downsampled point cloud to align with the feature points of the input point cloud [40]. Note that this downsampled point cloud is also optimized through a segmentation network to maximize the segmentation accuracy.
3. Generating volumetric representations. Each volumetric representation is given by the signeddistance field (or SDF) described in [52]. These SDFs are precomputed.
PCI  PCII  PCIII  VOLI  VOLII  Ensm  

100% Label (Isolated)  84.2  83.3  83.4  81.9  81.5  85 
8% Label (Isolated)  79.2  78.3  78.4  78.7  77.4  81.4 
8% Label + Unlabel (Joint)  82.3  82.5  82.3  81.6  79.0  83.4 
30% Label (Isolated)  80.8  81.9  81.2  80.3  79.5  83.2 
Experimental setup. We have evaluated our approach on ScanNet semantic segmentation benchmark [12]. Our goal is to evaluate the effectiveness of our approach when using a small labeled dataset and a large unlabeled dataset. To this end, we consider three baseline approaches, which train the segmentation network under each individual representation using 100%, 30%, and 8% of the labeled data. We then test our approach by utilizing 8% of the labeled data, which defines the data term in (3), and 92% of the unlabeled data (only the root has an empirical distribution), which defines the regularization term of (3). We initialize the segmentation network for point clouds using uniformly sampled points trained on labeled data. We then finetune the entire network using both labeled and unlabeled data. Code is publicly available at https://github.com/zaiweizhang/path_invariance_map_network.
Analysis of results. Figure 6 and Table 1 present qualitative and quantitative comparisons between our approach and baselines. Across all 3D representations, our approach leads to consistent improvements, demonstrating the robustness of our approach. Specifically, when using 8% labeled data and 92% unlabeled data, our approach achieved competing performance as using 30% to 100% labeled data when trained on each individual representation. Moreover, the accuracy on VOLI is competitive against using 100% of labeled data, indicating that the patterns learned under the point cloud representations are propagated to train the volumetric representations. We also tested the performance of applying popular vote [45] on the predictions of using different 3D representations. The relative performance gains of using different configurations of training data remain similar (See the last column in Table1). Please refer to Appendix C for more experimental evaluations and baseline comparisons.
6 Conclusions
In this paper, we have studied the problem of optimizing a directed map network. We have introduced the pathinvariance constraint, which can be effectively encoded using pathinvariance bases. We have described an algorithm for computing a pathinvariance basis with polynomial time and space complexities. The effectiveness of this approach is demonstrated on three groups of map networks with diverse applications.
References
 [1] S. Agarwal, Y. Furukawa, N. Snavely, I. Simon, B. Curless, S. M. Seitz, and R. Szeliski. Building rome in a day. Commun. ACM, 54(10):105–112, Oct. 2011.
 [2] F. Arrigoni, A. Fusiello, B. Rossi, and P. Fragneto. Robust rotation synchronization via lowrank and sparse matrix decomposition. CoRR, abs/1505.06079, 2015.
 [3] C. Bajaj, T. Gao, Z. He, Q. Huang, and Z. Liang. Smac: Simultaneous mapping and clustering via spectral decompositions. In ICML, pages 100–108, 2018.
 [4] J. BangJensen and G. Z. Gutin. Digraphs  theory, algorithms and applications. Springer, 2002.
 [5] A. Chatterjee and V. M. Govindu. Efficient and robust largescale rotation averaging. In ICCV, pages 521–528. IEEE Computer Society, 2013.
 [6] X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun, and A. L. Yuille. Detect what you can: Detecting and representing objects using holistic models and body parts. CoRR, abs/1406.2031, 2014.

[7]
Y. Chen, L. J. Guibas, and Q. Huang.
Nearoptimal joint object matching via convex relaxation.
In
Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 2126 June 2014
, pages 100–108, 2014. 
[8]
M. Cho, S. Kwak, C. Schmid, and J. Ponce.
Unsupervised object discovery and localization in the wild:
Partbased matching with bottomup region proposals.
In
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 712, 2015
, pages 1201–1210, 2015.  [9] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3d unet: learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 424–432. Springer, 2016.
 [10] L. Cosmo, E. Rodolà, A. Albarelli, F. Mémoli, and D. Cremers. Consistent partial matching of shape collections via sparse modeling. Comput. Graph. Forum, 36(1):209–221, 2017.
 [11] D. J. Crandall, A. Owens, N. Snavely, and D. P. Huttenlocher. Sfm with mrfs: Discretecontinuous optimization for largescale structure from motion. IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2841–2853, 2013.
 [12] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner. Scannet: Richlyannotated 3d reconstructions of indoor scenes, 2017. cite arxiv:1702.04405.
 [13] D. Giorgi, S. Biasotti, and L. Paraboschi. Shape retrieval contest 2007: Watertight models track, 2007.
 [14] Q. Huang, Y. Chen, and L. J. Guibas. Scalable semidefinite relaxation for maximum A posterior estimation. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 2126 June 2014, pages 64–72, 2014.
 [15] Q. Huang and L. Guibas. Consistent shape maps via semidefinite programming. In Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Processing, pages 177–186, 2013.
 [16] Q. Huang and L. J. Guibas. Consistent shape maps via semidefinite programming. Comput. Graph. Forum, 32(5):177–186, 2013.
 [17] Q. Huang, F. Wang, and L. Guibas. Functional map networks for analyzing and exploring large shape collections. ACM Trans. Graph., 33(4):36:1–36:11, July 2014.
 [18] Q. Huang, F. Wang, and L. J. Guibas. Functional map networks for analyzing and exploring large shape collections. ACM Trans. Graph., 33(4):36:1–36:11, 2014.
 [19] Q. Huang, G. Zhang, L. Gao, S. Hu, A. Butscher, and L. J. Guibas. An optimization approach for extracting and encoding consistent maps in a shape collection. ACM Trans. Graph., 31(6):167:1–167:11, 2012.
 [20] Q.X. Huang, S. Flöry, N. Gelfand, M. Hofer, and H. Pottmann. Reassembling fractured objects by geometric matching. ACM Trans. Graph., 25(3):569–578, July 2006.
 [21] Q.X. Huang, G.X. Zhang, L. Gao, S.M. Hu, A. Butscher, and L. Guibas. An optimization approach for extracting and encoding consistent maps in a shape collection. ACM Trans. Graph., 31(6):167:1–167:11, Nov. 2012.
 [22] X. Huang, Z. Liang, C. Bajaj, and Q. Huang. Translation synchronization via truncated least squares. In NIPS, page to appear, 2017.
 [23] D. F. Huber and M. Hebert. Fully automatic registration of multiple 3d data sets. Image and Vision Computing, 21:637–650, 2001.
 [24] D. F. Huber and M. Hebert. Fully automatic registration of multiple 3d data sets. Image Vision Comput., 21(7):637–650, 2003.
 [25] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. B. Viégas, M. Wattenberg, G. Corrado, M. Hughes, and J. Dean. Google’s multilingual neural machine translation system: Enabling zeroshot translation. CoRR, abs/1611.04558, 2016.
 [26] T. Kavitha, C. Liebchen, K. Mehlhorn, D. Michail, R. Rizzi, T. Ueckerdt, and K. A. Zweig. Survey: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev., 3(4):199–243, Nov. 2009.
 [27] I. KemelmacherShlizerman and S. M. Seitz. Collection flow. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1792–1799. IEEE, 2012.
 [28] J. Kim, C. Liu, F. Sha, and K. Grauman. Deformable spatial pyramid matching for fast dense correspondences. In CVPR, pages 2307–2314. IEEE Computer Society, 2013.
 [29] V. G. Kim, W. Li, N. J. Mitra, S. DiVerdi, and T. Funkhouser. Exploring collections of 3d models using fuzzy correspondences. ACM Trans. Graph., 31(4):54:1–54:11, July 2012.
 [30] V. G. Kim, Y. Lipman, and T. Funkhouser. Blended intrinsic maps. In ACM SIGGRAPH 2011 Papers, SIGGRAPH ’11, pages 79:1–79:12, New York, NY, USA, 2011. ACM.
 [31] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
 [32] E. G. LearnedMiller. Data driven image models through continuous joint alignment. IEEE Trans. Pattern Anal. Mach. Intell., 28(2):236–250, Feb. 2006.
 [33] S. Leonardos, X. Zhou, and K. Daniilidis. Distributed consistent data association via permutation synchronization. In ICRA, pages 2645–2652. IEEE, 2017.
 [34] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell., 33(5):978–994, May 2011.
 [35] A. Nguyen, M. BenChen, K. Welnicka, Y. Ye, and L. Guibas. An optimization approach to improving collections of shape maps. In Computer Graphics Forum, volume 30, pages 1481–1491. Wiley Online Library, 2011.
 [36] A. Nguyen, M. BenChen, K. Welnicka, Y. Ye, and L. J. Guibas. An optimization approach to improving collections of shape maps. Comput. Graph. Forum, 30(5):1481–1491, 2011.
 [37] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. Shape distributions. ACM Trans. Graph., 21(4):807–832, Oct. 2002.
 [38] M. Ovsjanikov, M. BenChen, J. Solomon, A. Butscher, and L. Guibas. Functional maps: A flexible representation of maps between shapes. ACM Transactions on Graphics, 31(4), 2012.
 [39] D. Pachauri, R. Kondor, and V. Singh. Solving the multiway matching problem by permutation synchronization. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 1860–1868. Curran Associates, Inc., 2013.

[40]
M. Pauly, R. Keiser, and M. Gross.
Multiscale Feature Extraction on PointSampled Surfaces.
Computer Graphics Forum, 2003.  [41] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma. Rasl: Robust alignment by sparse and lowrank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell., 34(11):2233–2246, Nov. 2012.
 [42] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. CoRR, abs/1706.02413, 2017.
 [43] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in Neural Information Processing Systems, pages 5099–5108, 2017.
 [44] I. Radosavovic, P. Dollár, R. Girshick, G. Gkioxari, and K. He. Data distillation: Towards omnisupervised learning. arXiv preprint arXiv:1712.04440, 2017.

[45]
L. Rokach.
Ensemblebased classifiers.
Artif. Intell. Rev., 33(12):1–39, Feb. 2010.  [46] M. Rubinstein, A. Joulin, J. Kopf, and C. Liu. Unsupervised joint object discovery and segmentation in internet images. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, June 2328, 2013, pages 1939–1946. IEEE Computer Society, 2013.
 [47] M. Rubinstein, C. Liu, and W. T. Freeman. Joint inference in weaklyannotated image datasets via dense correspondence. Int. J. Comput. Vision, 119(1):23–45, Aug. 2016.

[48]
R. M. Rustamov.
Laplacebeltrami eigenfunctions for deformation invariant shape representation.
In Proceedings of the Fifth Eurographics Symposium on Geometry Processing, SGP ’07, pages 225–233, AirelaVille, Switzerland, Switzerland, 2007. Eurographics Association.  [49] Y. Shen, Q. Huang, N. Srebro, and S. Sanghavi. Normalized spectral map synchronization. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 4925–4933. Curran Associates, Inc., 2016.
 [50] Y. Shen, Q. Huang, N. Srebro, and S. Sanghavi. Normalized spectral map synchronization. In Neural Information Processing Systems (NIPS), 2016.
 [51] N. Snavely, S. M. Seitz, and R. Szeliski. Photo tourism: Exploring photo collections in 3d. ACM Trans. Graph., 25(3):835–846, July 2006.
 [52] S. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser. Semantic scene completion from a single depth image. Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition, 2017.
 [53] R. E. Tarjan. Edgedisjoint spanning trees and depthfirst search. Acta Inf., 6(2):171–185, June 1976.
 [54] F. Wang, Q. Huang, and L. Guibas. Image cosegmentation via consistent functional maps. In In Proceedings of the 14th International Conference on Computer Vision (ICCV), 2013.
 [55] F. Wang, Q. Huang, and L. J. Guibas. Image cosegmentation via consistent functional maps. In Proceedings of the 2013 IEEE International Conference on Computer Vision, ICCV ’13, pages 849–856, Washington, DC, USA, 2013. IEEE Computer Society.
 [56] F. Wang, Q. Huang, M. Ovsjanikov, and L. J. Guibas. Unsupervised multiclass joint image segmentation. In CVPR, pages 3142–3149. IEEE Computer Society, 2014.
 [57] F. Wang, Qixing Huang, and L. J. Guibas. Image cosegmentation via consistent functional maps. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 18, 2013, pages 849–856, 2013.
 [58] L. Wang and A. Singer. Exact and stable recovery of rotations for robust synchronization. CoRR, abs/1211.2441, 2012.
 [59] Y. Wang, S. Asafi, O. van Kaick, H. Zhang, D. CohenOr, and B. Chen. Active coanalysis of a set of shapes. ACM Trans. Graph., 31(6):165:1–165:10, Nov. 2012.
 [60] Y. Yang and D. Ramanan. Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell., 35(12):2878–2890, Dec. 2013.
 [61] Z. Yi, H. Zhang, P. Tan, and M. Gong. Dualgan: Unsupervised dual learning for imagetoimage translation. CoRR, abs/1704.02510, 2017.
 [62] Yuxin Chen and E. Candes. The projected power method: An efficient algorithm for joint alignment from pairwise differences. https://arxiv.org/abs/1609.05820, 2016.
 [63] C. Zach, M. Klopschitz, and M. Pollefeys. Disambiguating visual relations using loop constraints. In CVPR, pages 1426–1433. IEEE Computer Society, 2010.
 [64] A. R. Zamir, A. Sax, W. B. Shen, L. J. Guibas, J. Malik, and S. Savarese. Taskonomy: Disentangling task transfer learning. CoRR, abs/1804.08328, 2018.
 [65] T. Zhou, P. Krähenbühl, M. Aubry, Q. Huang, and A. A. Efros. Learning dense correspondence via 3dguided cycle consistency. CoRR, abs/1604.05383, 2016.
 [66] T. Zhou, P. Krähenbühl, M. Aubry, Q. Huang, and A. A. Efros. Learning dense correspondence via 3dguided cycle consistency. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 2730, 2016, pages 117–126, 2016.
 [67] T. Zhou, P. Krähenbühl, M. Aubry, Q. Huang, and A. A. Efros. Learning dense correspondence via 3dguided cycle consistency. In Computer Vision and Pattern Recognition (CVPR), 2016.
 [68] T. Zhou, Y. J. Lee, S. X. Yu, and A. A. Efros. Flowweb: Joint image set alignment by weaving consistent, pixelwise correspondences. In CVPR, pages 1191–1200. IEEE Computer Society, 2015.
 [69] T. Zhou, Y. J. Lee, S. X. Yu, and A. A. Efros. Flowweb: Joint image set alignment by weaving consistent, pixelwise correspondences. In CVPR, pages 1191–1200. IEEE Computer Society, 2015.
 [70] X. Zhou, M. Zhu, and K. Daniilidis. Multiimage matching via fast alternating minimization. CoRR, abs/1505.04845, 2015.
 [71] X. Zhou, M. Zhu, and K. Daniilidis. Multiimage matching via fast alternating minimization. In ICCV, pages 4032–4040, Santiago, Chile, 2015. IEEE Computer Society.
 [72] J. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired imagetoimage translation using cycleconsistent adversarial networks. CoRR, abs/1703.10593, 2017.
Appendix A Proof of Theorems and Propositions
a.1 Proof of Proposition 1
To show that is pathinvariant, it suffices to prove that for every path pair . But by Definition 7, is either in or can be induced from a finite number of operations with merge, stitch and/or cut. So if we can show that the output path pair in every round of operation keeps consistency on the map network , given the input path pairs are consistent, then all path pairs on would be pathinvariant by employing an induction proof. Next we achieve this goal by considering three operations respectively.

merge. The merge operation takes as input two path pairs where , i.e., is formed by stitching three subpaths and in order. By Definition 2, it is easy to see that
But we are given that is consistent on the input pairs, or equivalently,
Hence
So is also consistent on path pair .

stitch. The stitch operation takes as input two path pairs where and . Since is consistent on and , it follows immediately
which means is also consistent on .

cut. The cut operation takes as input two path pairs and , where and are two common vertices and share a common intermediate path from to . The two cycles can be represented by and where and . Since is consistent on and , we have
However, it is known that the inverse of some function must be unique, giving the following result
or in other words, is consistent on path pair .
The consistency of on the output pairs for all three operations given the consistency on their input pairs ensures our proposition.∎
a.2 Proof of Theorem 3.1
The algorithm adds exactly edges in total. And during each edge insertion, at most path pairs would be added to , thus it follows immediately that .
Next we show that indeed is a pathinvariance basis for . To this end, we will verify that every path pair in can be induced from a subset of by operations, using a induction proof. In particular, we claim that at all time points, all path pairs in can be induced from by a series of operations. Initially, this inductive assumption holds trivially since is an empty set.
Suppose now we were processing an edge (so at this time point) and let . By inductive assumption, all path pairs in can be induced from . After inserting into , it suffices to consider path pairs that contain edge since all other path pairs have been guaranteed by inductive assumption. Let be a path pair in containing . Without loss of generality, suppose and
If , then can be induced by stitching and where . We assume , and then would be a path from to in and would be a path from to in .
Recall the definition of and . immediately follows. If , then there exists such that can reach in and and denote such path as . For convenience, we let and when . Every vertex in corresponds to a pathinvariance pair to be added to by our algorithm. Here we assume that it is for where , and is within .
By the property of DAG and the order of edge insertion, all paths from to in are also in since . Thus can be induced from by inductive assumption. Similarly, as is within , is also a pathinvariance pair, which can be induced from . Next we give the operation steps to build :
(6)  
(7)  
(8) 
For the last step, notice that and is equivalent to . Thus all path pairs in can be induced by path pairs in with a series of operations, which completes our proof by induction. ∎
a.3 Proof of Proposition 2
Before proving Proposition 2, we first introduce some wellknown terms for depthfirst search. There are two time stamps and for each vertex , where is defined as the time point when it visits for the first time and as the time point when it finishes visiting . Some edge in can be classified into one of four disjoint types as follows:

Tree Edge: is visited for the first time as we traverse the edge . In this case will be added into the resulting DFS spanning tree. For tree edge we have

Back Edge: is visited and is an ancestor of in the current spanning tree. For back edge we have

Forward Edge: is visited and is an ancestor of in the current spanning tree. For forward edge we have

Cross Edge: is visited and is neither an ancestor nor descendant of in the current spanning tree. For cross edge we have
Using these definitions, we prove that:
Any cycle in have a vertex in such that all other vertices are located within the subtree rooted at , i.e., are the descendants of in .
Let
Without loss of generality, is assumed to be the one with smallest among all . If not all are descendants of , we choose to be the one with smallest , which means is a descendant of but is not. Obviously cannot be a tree edge or forward edge, which causes to be a descendant of and also a descendant of . If is a back edge, then is not a descendant of if and only if since there is in fact unique back path in the spanning tree . But means is a parent of , and thus there exists a smaller than , which results in a contradiction. Also cannot be a cross edge. In fact, since is a descendant of , we have . Together with from cross edge property, we have . But is not a descendant or ancestor of , which means the subtree rooted at is disjoint from the subtree rooted at , so intervals and must be disjoint by the property of depthfirst search. As thus implies , which contradicts the assumption that is smallest among . Hence all are descendants of .
Now come back to the original proposition. Continue using the notation defined above. In addition we define as the subpath from to , i.e.,
We will show can be induced from by a finite number of operations with merge, stitch and cut. Above all, we have assumed the property of pathinvariance on by Theorem 3.1. Given is the common ancestor of all , we inductively prove the following statement:
The path is equivalent to the tree path from to . Here tree path means a path in which all edges are in the spanning tree .
The base case is trivial. Now suppose is equivalent to tree path from to and we continue to check .

If is a tree edge, then is still a tree path and a stitch operation on path pair and gives the equivalency that we want.

If is a forward edge, then there exists a tree path from to . By pathinvariance on , we can stitch two pathinvariance pair and to obtain the desired equivalency.

If is a back edge, then there exists a tree path from to . In addition by our construction the cycle has been added into our basis set . Denote the tree path from to as , then stitching and gives . On the other hand, by inductive assumption we have pathinvariance pair since is just the tree path from to . Thus by merging and we obtain the path pair , or equivalently, .

If is a cross edge, then has been included in . Denote by the tree path from to . In this way all would be equivalent to another tree path from to since all edges involved here are within which maintains all possible pathinvariance pairs. By merging path pairs and we obtain path pair , or , which is exactly we want to verify.
As thus we finished our inductive proof. In particular, the path (also a cycle) is equivalent to , or more precisely, the path pair can be induced from by a finite number of merge and stitch operations.
To complete our proof, we need to show that all path pairs in instead of just can be induced from . This is relatively easy. Consider two path and both from to . Since is strongly connected, there must exist some path from to . The cut operation on and for the common vertices and immediately gives the path pair . ∎
Source  Target  Congealing  RASL  CollectionFlow  DSP  FlowWeb  Ours 

a.4 Proof of Theorem 3.2
To prove this theorem, we first prove the following lemma:
Lemma Suppose and are two strongly connected components in with . Given any vertices and with , and paths ,
Comments
There are no comments yet.