1 Introduction
Transformation synchronization, i.e., estimating consistent rigid transformations across a collection of images or depth scans, is a fundamental problem in various computer vision applications, including multiview structure from motion
[5, 30, 39, 37], geometry reconstruction from depth scans [21, 9], image editing via solving jigsaw puzzles [8], simultaneous localization and mapping [4], and reassembling fractured surfaces [15], to name just a few. A common approach to transformation synchronization proceeds in two phases. The first phase establishes the relative rigid transformations between pairs of objects in isolation. Due to incomplete information presented in isolated pairs, the estimated relative transformations are usually quite noisy. The second phase improves the relative transformations by jointly optimizing them across all input objects. This is usually made possible by utilizing the socalled cycleconsistency constraint, which states that the composite transformation along every cycle should be the identity transformation, or equivalently, the data matrix that stores pairwise transformations in blocks is lowrank (c.f. [16]). The cycleconsistency constraint allows us to jointly improve relative transformations by either detecting inconsistent cycles [8] or performing lowrank matrix recovery [16, 38].(a)  (b) 
(c)  (d) 
However, the success of existing transformation synchronization [38, 5, 2, 19] and more general map synchronization [16, 32, 31, 7, 34, 19]
techniques heavily depends on the alignment between the loss function and the noise pattern of the input data. For example, approaches based on robust norms (e.g., L1
[16, 7]) can tolerate either a constant fraction of adversarial noise (c.f.[16, 19]) or a sublinear outlier ratio when the noise is independent (c.f.
[7, 34]). Such assumptions, unfortunately, deviate from many practical settings, where the majority of the input relative transformations may be incorrect (e.g., when the input scans are noisy), and/or the noise pattern in relative transformations is highly correlated (there are a quadratic number of measurements from a linear number of sources). This motivates us to consider the problem of learning transformation synchronization, which seeks to learn a suitable loss function that is compatible with the noise pattern of specific datasets.In this paper, we introduce an approach that formulates transformation synchronization as an endtoend neural network. Our approach is motivated by reweighted least squares and their application in transformation synchronization (c.f. [5, 2, 10, 19]
), where the loss function dictates how we update the weight associated with each input relative transformation during the synchronization process. Specifically, we design a recurrent neural network that reflects this reweighted scheme. By learning the weights from data directly, our approach implicitly captures a suitable loss function for performing transformation synchronization.
We have evaluated the proposed technique on two real datasets: Redwood [11] and ScanNet [12]. Experimental results show that our approach leads to considerable improvements compared to the stateoftheart transformation synchronization techniques. For example, on Redwood and Scannet, the best combination of existing pairwise matching and transformation synchronization techniques lead to mean angular rotation errors and , respectively. In contrast, the corresponding statistics of our approach are and , respectively. We also perform an ablation study to evaluate the effectiveness of our approach.
Code is publicly available at https://github.com/xiangruhuang/Learning2Sync.
2 Related Works
Existing techniques on transformation synchronization fall into two categories. The first category of methods [21, 15, 40, 29, 43]
uses combinatorial optimization to select a subgraph that only contains consistent cycles. The second category of methods
[38, 25, 18, 16, 17, 7, 44, 34, 26, 20] can be viewed from the perspective that there is an equivalence between cycleconsistent transformations and the fact that the map collection matrix that stores relative transformations in blocks is semidefinite and/or lowrank (c.f.[16]). These methods formulate transformation synchronization as lowrank matrix recovery, where the input relative transformations are considered noisy measurements of this lowrank matrix. In the literature, people have proposed convex optimization [38, 16, 17, 7], nonconvex optimization [5, 44, 26, 20], and spectral techniques [25, 18, 32, 31, 34, 36] for solving various lowrank matrix recovery formulations. Compared with the first category of methods, the second category of methods is computationally more efficient. Moreover, tight exact recovery conditions of many methods have been established.A message from these exact recovery conditions is that existing methods only work if the fraction of noise in the input relative transformations is below a threshold. The magnitude of this threshold depends on the noise pattern. Existing results either assume adversarial noise [16, 20] or independent random noise [38, 7, 34, 3]
. However, as relative transformations are computed between pairs of objects, it follows that these relative transformations are dependent (i.e., between the same source object to different target objects). This means there are a lot of structures in the noise pattern of relative transformations. Our approach addresses this issue by optimizing transformation synchronization techniques to fit the data distribution of a particular dataset. To best of our knowledge, this work is the first to apply supervised learning to the problem of transformation synchronization.
Our approach is also relevant to utilizing recurrent neural networks for solving the pairwise matching problem. Recent examples include learning correspondences between pairs of images [28], predicting the fundamental matrix between two different images of the same underlying environment [33], and computing a dense image flow between an image pair [24]. We study a different problem of transformation synchronization in this paper. In particular, our weighting module leverages problem specific features (e.g., eigengap) for determining the weights associated with relative transformations. Learning transformation synchronization also poses great challenges in making the network trainable endtoend.
3 Problem Statement and Approach Overview
In this section, we describe the problem statement of transformation synchronization (Section 3.1) and present an overview of our approach (Section 3.2).
3.1 Problem Statement
Consider input scans capturing the same underlying object/scene from different camera poses. Let denote the local coordinate system associated with . The input to transformation synchronization can be described as a model graph [22]. Each edge of the model graph is associated with a relative transformation , where and , are rotational and translational components of , respectively. is usually precomputed using an offtheshelf algorithm (e.g., [27, 41]). For simplicity, we impose the assumption that if and only if (i) , and (ii) their associated transformations are compatible, i.e.,
It is expected that many of these relative transformations are incorrect, due to limited information presented between pairs of scans and limitations of the offtheshelf method being used. The goal of transformation synchronization is to recover the absolute pose of each scan in a world coordinate system . Without losing generality, we assume the world coordinate system is given by . Note that unlike traditional transformation synchronization approaches that merely use (e.g.,[5, 38, 2]), our approach also incorporates additional information extracted from the input scans .
3.2 Approach Overview
Our approach is motivated from iteratively reweighted least squares (or IRLS)[13], which has been applied to transformation synchronization (e.g. [5, 2, 10, 19]). The key idea of IRLS is to maintain an edge weight for each input transformation so that the objective function becomes quadratic in the variables, and transformation synchronization admits a closedform solution. One can then use the closedform solution to update the edge weights. Under a special weighting scheme (c.f.[19]), it has been shown that when the fraction of incorrect measurements is below a constant, the weights associated with these incorrect measurements eventually become . One way to understand reweighting schemes is that when the weights converged, the reweighted square loss becomes the actual robust loss function that is used to solve the corresponding transformation synchronization problem. In contrast to using a generic weighting scheme, we propose to learn the weighting scheme from data by designing a recurrent network that replicates the reweighted transformation synchronization procedure. By doing so, we implicitly learn a suitable loss function for transformation synchronization.
As illustrated in Figure 2, the proposed recurrent module combines a synchronization layer and a weighting module. At the th iteration, the synchronization layer takes as input the initial relative transformations and their associated weights and outputs synchronized poses for the input objects . Initially, we set . The technical details of the synchronization layer are described in Section 4.1.
The weighting module operates on each object pair in isolation. For each edge , the input to the proposed weighting module consists of (1) the input relative transformation , (2) the induced relative transformation at the th iteration
(3) features extracted from the initial alignment of the two input scans, and (4) a status vector
that collects global signals from the synchronization layer at the th iteration (e.g., spectral gap). The output is the associated weight at the th iteration.The network is trained endtoend by penalizing the differences between the groundtruth poses and the output of the last synchronization layer. The technical details of this endtoend training procedure are described in Section 4.3.
4 Approach
In this section, we introduce the technical details of our learning transformation synchronization approach. In Section 4.1, we introduce details of the synchronization layer. In Section 4.2, we describe the weighting module. Finally, we show how to train the proposed network endtoend in Section 4.3. Note that the proofs of the propositions introduced in this section are deferred to the supplemental material.
4.1 Synchronization Layer
For simplicity, we ignore the superscripts and when introducing the synchronization layer. Let and be the input relative transformation and its weights associated with the edge . We assume that this weighted graph is connected. The goal of the synchronization layer is to compute the synchronized pose associated with each scan . Note that a correct relative transformation induces two separate constraints on the rotations and translations , respectively:
We thus perform rotation synchronization and translation synchronization separately.
Our rotation synchronization approach adapts the spectral rotation synchronization approach described in [1]. Specifically, we consider the following optimization problem for rotation synchronization:
(1) 
Solving (1) exactly is difficult. We propose to first relax the constraint to when solving (1) and then project each of the resulting solution to . This leads to the following procedure for rotation synchronization. More precisely, we introduce a connection Laplacian [35], whose blocks are given by
(2) 
where collects all neighbor vertices of in .
Let collect the eigenvectors of
that correspond to the three smallest eigenvalues. We choose the sign of each eigenvector such that
. To compute the absolute rotations, we first perform singular value decomposition (SVD) on each
We then output the corresponding absolute rotation estimate as
(3) 
The following proposition states that although do not exactly optimize (1), they still provide effective synchronized rotations due to the following robust recovery property:
Proposition 1.
(Informal) Consider the groundtruth rotations . Suppose where ,
(4) 
Then in (3) approximately recovers the groundtruth rotations . More precisely, we define
as the estimation error on . With we denote the corresponding error matrix. When the constraints in (4) are exact, or equivalently, , then the recovery is also exact. In this case, we have
In other words, if the weighting module sets weights of outlier relative transformations to , then approximately recover the underlying rotations.
Translation synchronization solves the following least square problem to obtain :
(5) 
Let collect the translation components of the synchronized poses in a column vector. Introduce a column vector where
Then an^{1}^{1}1When is positive semidefinite, then the solution is unique, and (6) gives one optimal solution. optimal solution to (5) is given by
(6) 
Similar to the case of rotation synchronization, we have the following robust recovery property:
Proposition 2.
Similar to the case of rotation synchronization, if the pairwise matching module sets the weights of outlier relative transformations to , then approximately recover the underlying translations.
4.2 Weighting Module
We define the weighting module as the following function:
(8) 
where the input consists of (i) a pair of scans and , (ii) the input relative transformation between them, and (iii) a status vector . The output of this weighting module is given by the new weight at the th iteration. With we denote the trainable weights of the weighting module. In the following, we first introduce the definition of the status vector .
Status vector. The purpose of the status vector is to collect additional signals that are useful for determining the output of the weighting module. Define
(9)  
(10)  
(11)  
(12) 
Essentially, and characterize the difference between current synchronized transformations and the input relative transformations. The motivation for using them comes from the fact that for a standard reweighted scheme for transformation synchronization (c.f. [19]), one simply sets for a weighting function (c.f. [13]). This scheme can already recover the underlying groundtruth in the presence of a constant fraction of adversarial incorrect relative transformations (Please refer to the supplemental material for a rigorous analysis). In contrast, our approach seeks to go beyond this limit by leveraging additional information. The definition of comes from Prop.1. equals to the residual of (5). Intuitively, when both and are small, the weighted relative transformations will be consistent, from which we can recover accurate synchronized transformations . We now describe the network design.
Network design. As shown in Figure 3, the key component of our network design is a subnetwork that takes two scans and and a relative transformation between them and output a score in that indicates whether this is a good scan alignment or not, i.e., means a good alignment, and means an incorrect alignment.
We design as a feedforward network. Its input consists of two color maps that characterize the alignment patterns between the two input scans. The value of each pixel represents the distance of the corresponding 3D point to the closest points on the other scan under (See the second column of images in Figure 3). We then concatenate these two color images and feed them into a neural network (we used a modified AlexNet architecture), which outputs the final score.
With this setup, we define the output weight as
(13) 
Here adopts form of traditional reweighting function and encode the importance of the elements of . With we collect all trainable parameters of (13).
4.3 EndtoEnd Training
Let denote a dataset of scan collections with annotated groundtruth poses. Let be the number of recurrent steps (we used four recurrent steps in our experiments) . We define the following loss function for training the weighting module :
(14) 
where we set in all of our experiments. Note that we compare relative rotations to factor out the global orientation among the poses. The global shift in translation is already handled by (6).
We perform backpropagation to optimize (14). The technical challenges are to compute the derivatives that pass through the synchronization layer, including 1) the derivatives of with respect to the elements of , 2) the derivatives of with respect to the elements of and , and 3) the derivatives of each status vector with respect to the elements of and . In the following, we provide explicit expressions for computing these derivatives.
We first present the derivative between the output of rotation synchronization and its input. To make the notation uncluterred, we compute the derivative by treating is a matrix function. The derivative with respect to
can be easily obtained via chainrule.
Proposition 3.
Consider the setup of Prop. 1. Let and be the th eigenvector and eigenvalue of . Expand the SVD of as follows:
Let be the th canonical basis of . We then have
where
where is defined by ,
The following proposition specifies the derivative of with respect to the elements of and :
Proposition 4.
The derivatives of are given by
Regarding the status vectors, the derivatives of with respect to the elements of are given by Prop. 3; The derivatives of and with respect to the elements of are given by Prop. 4. It remains to compute the derivatives of with respect to the elements of , which can be easily obtained via the derivatives of the eigenvalues of [23], i.e., .
5 Experimental Results
This section presents an experimental evaluation of the proposed learning transformation synchronization approach. We begin with describing the experimental setup in Section 5.1. In Section 5.2, we analyze the results of our approach and compare it against baseline approaches. Finally, we present an ablation study in Section 5.3.
Methods  Redwood  ScanNet  
Rotation Error  Translation Error  Rotation Error  Translation Error  
Mean  0.05  0.1  0.25  0.5  0.75  Mean  Mean  0.05  0.1  0.25  0.5  0.75  Mean  
FastGR (all)  29.4  40.2  52.0  63.8  70.4  22.0  39.6  53.0  60.3  67.0  0.68  9.9  16.8  23.5  31.9  38.4  5.5  13.3  22.0  29.0  36.3  1.67  
FastGR (good)  33.9  45.2  57.2  67.4  73.2  26.7  45.7  58.8  65.9  71.4  0.59  12.4  21.4  29.5  38.6  45.1  7.7  17.6  28.2  36.2  43.4  1.43  
Super4PCS (all)  6.9  10.1  16.7  39.6  52.3  4.2  8.9  18.2  31.0  43.5  1.14  0.5  1.3  4.0  17.4  25.2  0.3  1.2  5.3  13.3  21.6  2.11  
Super4PCS (good)  10.3  14.9  23.9  48.0  60.0  6.4  13.3  26.2  41.2  53.2  0.93  0.8  2.3  6.4  23.0  31.7  0.6  2.2  8.9  19.5  29.5  1.80  
RotAvg (FastGR)  30.4  42.6  59.4  74.4  82.1  23.3  43.2  61.8  72.4  80.7  0.42  6.0  10.4  17.3  36.1  46.1  3.7  9.2  19.5  34.0  45.6  1.26  
multiFastGR (FastGR)  17.8  28.7  47.5  74.2  83.2  4.9  18.4  50.2  72.6  81.4  0.93  0.2  0.6  2.8  16.4  27.1  0.1  0.7  4.8  16.4  28.4  1.80  
RotAvg (Super4PCS)  5.4  8.7  17.4  45.1  59.2  3.2  7.4  17.0  32.3  46.3  0.95  0.3  0.8  3.0  15.4  23.3  0.2  1.0  5.8  16.5  27.6  1.70  
multiFastGR (Super4PCS)  2.1  4.1  10.2  33.1  48.3  1.1  3.1  10.3  21.5  31.8  1.25  1.9  5.1  13.9  36.6  47.1  0.4  2.1  9.8  23.2  34.5  1.82  
Our Approach (FastGR)  67.5  77.5  85.6  91.7  94.4  20.7  40.0  70.9  88.6  94.0  0.26  34.4  41.1  49.0  58.9  62.3  42.9  2.0  7.3  22.3  36.9  48.1  1.16  
Our Approach (Super4PCS)  2.3  5.1  13.2  42.5  60.9  1.1  4.0  13.8  29.0  42.3  1.02  0.4  1.7  6.8  29.6  43.5  0.1  0.8  5.6  16.6  27.0  1.90  
Transf. Sync. (FastGR)  27.1  37.7  56.9  74.4  82.4  17.4  34.4  55.9  70.4  81.3  0.43  3.2  6.5  14.6  35.8  47.4  1.6  5.6  15.5  30.9  43.4  1.31  
Input Only (FastGR)  36.7  51.4  68.1  87.7  91.7  25.1  49.3  73.2  86.4  91.6  0.26  11.7  19.4  30.5  50.7  57.7  5.9  15.4  30.5  43.7  52.2  1.03  
No Recurrent (FastGR)  37.8  52.8  71.1  87.7  91.7  26.3  51.1  77.3  87.1  92.0  0.24  8.6  15.3  26.9  51.4  58.2  3.9  11.1  27.3  43.7  53.9  1.01 
5.1 Experimental Setup
Datasets. We consider two datasets in this paper, Redwood [11] and ScanNet [12]:

Redwood contains RGBD sequences of individual objects. We uniformly sample 60 sequences. For each sequence, we sample 30 RGBD images that are 20 frames away from the next one, which cover 600 frames of the original sequence. For experimental evaluation, we use the poses associated with the reconstruction as the groundtruth. We use 35 sequences for training and 25 sequences for testing. Note that the temporal order among the frames in each sequence is discarded in our experiments.

ScanNet contains RGBD sequences, as well as reconstruction, camera pose, for 706 indoor scenes. Each scene contains 23 sequences of different trajectories. We randomly sample 100 sequences from ScanNet. We use 70 sequences for training and 30 sequences for testing. Again the temporal order among the frames in each sequence is discarded in our experiments.
More details about the sampled sequences are given in the appendix.
Pairwise methods. We consider two stateoftheart pairwise methods for generating the input to our approach:

Super4PCS [27] applies sampling to find consistent matches of four point pairs.
Baseline approaches. We consider the following baseline approaches that are introduced in the literature for transformation synchronization:

Robust Relative Rotation Averaging (RotAvg)[6] is a scalable algorithm that performs robust rotation averaging of relative rotations. To recover translations, we additionally apply a stateoftheart translation synchronization approach [19]. We use default setting of its publicly accessible code. [19] is based on our own Matlab implementation.

Geometric Registration [10] solve multiway registration via pose graph optimization. We modify the Open3D implementation to take inputs from Super4PCS or FastGR.
Note that our approach utilizes a weighting module to score the input relative transformations. To make fair comparisons, we apply our pretrained weighting module to filter all input transformations, whose associated scores are below . We then feed these filtered input transformations to each baseline approach for experimental evaluation.
Evaluation protocol. We employ the evaluation protocols of [5] and [19]
for evaluating rotation synchronization and translation synchronization, respectively. Specifically, for rotations, we first solve the best matching global rotation between the groundtruth and the prediction, we then report the cumulative distribution function (or CDF) of angular deviation
between a prediction and its corresponding groundtruth . For translations, we report the CDF of between each pair of prediction and its corresponding groundtruth .5.2 Analysis of Results
Figure 4 and Figure 5 present quantitative and qualitative results, respectively. Overall, our approach yielded fairly accurate results. On Redwood, the mean errors in rotations/translations of FastGR and our result from FastGR are and , respectively. On ScanNet, the mean errors in rotations/translations of FastGR and our result from FastGR are and , respectively. Note that in both cases, our approach leads to salient improvements from the input. The final results of our approach on ScanNet are less accurate than those on Redwood. Besides the fact that the quality of the initial relative transformations is lower on ScanNet than that on Redwood, another factor is that the depth scans from ScanNet are quite noisy, leading to noisy input (and thus less signals) for the weighting module. Still, the improvements of our approach on ScanNet are salient.
Our approach still requires reasonable initial transformations to begin with. This can be understood from the fact that our approach seeks to perform synchronization by selecting a subset of input relative transformations. Although our approach utilizes learning, its performance shall decrease when the quality of the initial relative transformations drops. An evidence is that our approach only leads to modest performance gains when taking the output of Super4PCS as input.
Comparison with stateoftheart approaches. Although all the two baseline approaches improve from the input relative transformations, our approach exhibits significant further improvements from all baseline approaches. On Redwood, the mean rotation and translation errors of the top performing method RotAvg from FastGR are and , respectively. The reductions in mean error of our approach are and for rotations and translations, respectively, which are significant. The reductions in mean errors of our approach on ScanNet are also noticeable, i.e., and in rotations and translations, respectively.
Our approach also achieved relative performance gains from baseline approaches when taking the output of Super4PCS as input. In particular, for mean rotation errors, our approach leads to reductions of and on Redwood and ScanNet, respectively.
When comparing rotations and translations, the improvements on mean rotation errors are bigger than those on mean translation errors. One explanation is that there are a lot of planar structures in Redwood and ScanNet. When aligning such planar structures, rotation errors easily lead to a large change in nearest neighbor distances and thus can be detected by our weighting module. In contrast, translation errors suffer from the gliding effects on planar structures (c.f.[14]), and our weighting module becomes less effective.
5.3 Ablation Study
In this section, we present two variants of our learning transformation synchronization approach to justify the usefulness of each component of our system. Due to space constraint, we perform ablation study only using FastGR.
Input only.
In the first experiment, we simply learn to classify the input maps, and then apply transformation synchronization techniques on the filtered input transformations. In this setting, stateoftheart transformation synchronization techniques achieves mean rotation/translation errors of
and on Redwood and ScanNet, respectively. By applying our learning approach to fixed initial map weights, e.g., we fix of the weighting module during, our approach reduced the mean errors to and on Redwood and ScanNet, respectively. Although the improvements are noticeable, there are still gaps between this reduced approach and our full approach. This justifies the importance of learning the weighting module together.No recurrent module. Another reduced approach is to directly combine the weighting module and one synchronization layer. Although this approach can improve from the input transformations. There is still a big gap between this approach and our full approach (See the last row in Figure 4). This shows the importance of using weighting modules to gradually reduce the error while simultaneously make the entire procedure trainable endtoend.
6 Conclusions
In this paper, we have introduced a supervised transformation synchronization approach. It modifies a reweighted nonlinear least square approach and applies a neural network to automatically determine the input pairwise transformations and the associated weights. We have shown how to train the resulting recurrent neural network endtoend. Experimental results show that our approach is superior to stateoftheart transformation synchronization techniques on ScanNet and Redwood for two stateoftheart pairwise scan matching methods.
There are ample opportunities for future research. So far we have only considered classifying pairwise transformations, it would be interesting to study how to classifying highorder matches. Another interesting direction is to install ICP alignment into our recurrent procedure, i.e., we start from the current synchronized poses and perform ICP between pairs of scans to obtain more signals for transformation synchronization. Moreover, instead of maintaining one synchronized pose per scan, we can maintain multiple synchronized poses, which offer more pairwise matches between pairs of scans for evaluation. Finally, we would like to apply our approach to synchronize dense correspondences across multiple images/shapes.
Acknowledgement: The authors wish to thank the support of NSF grants DMS1546206, DMS1700234, CHS1528025, a DoD Vannevar Bush Faculty Fellowship, a Google focused research award, a gift from adobe research, a gift from snap research, a hardware donation from NVIDIA, and an Amazon AWS AI Research gift.
References
 [1] M. ArieNachimson, S. Z. Kovalsky, I. KemelmacherShlizerman, A. Singer, and R. Basri. Global motion estimation from point matches. In Proceedings of the 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, 3DIMPVT ’12, pages 81–88, Washington, DC, USA, 2012. IEEE Computer Society.
 [2] F. Arrigoni, A. Fusiello, B. Rossi, and P. Fragneto. Robust rotation synchronization via lowrank and sparse matrix decomposition. CoRR, abs/1505.06079, 2015.

[3]
C. Bajaj, T. Gao, Z. He, Q. Huang, and Z. Liang.
SMAC: simultaneous mapping and clustering using spectral
decompositions.
In
Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 1015, 2018
, pages 334–343, 2018.  [4] L. Carlone, R. Tron, K. Daniilidis, and F. Dellaert. Initialization techniques for 3d SLAM: A survey on rotation estimation and its use in pose graph optimization. In ICRA, pages 4597–4604. IEEE, 2015.
 [5] A. Chatterjee and V. M. Govindu. Efficient and robust largescale rotation averaging. In ICCV, pages 521–528. IEEE Computer Society, 2013.
 [6] A. Chatterjee and V. M. Govindu. Robust relative rotation averaging. IEEE transactions on pattern analysis and machine intelligence, 40(4):958–972, 2018.
 [7] Y. Chen, L. J. Guibas, and Q. Huang. Nearoptimal joint object matching via convex relaxation. In ICML, pages 100–108, 2014.
 [8] T. S. Cho, S. Avidan, and W. T. Freeman. The patch transform. IEEE Trans. Pattern Anal. Mach. Intell., 32(8):1489–1501, 2010.
 [9] S. Choi, Q.Y. Zhou, and V. Koltun. Robust reconstruction of indoor scenes. In CVPR, pages 5556–5565. IEEE Computer Society, 2015.

[10]
S. Choi, Q.Y. Zhou, and V. Koltun.
Robust reconstruction of indoor scenes.
In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2015.  [11] S. Choi, Q.Y. Zhou, S. Miller, and V. Koltun. A large dataset of object scans. arXiv:1602.02481, 2016.
 [12] A. Dai, A. X. Chang, M. Savva, M. Halber, T. A. Funkhouser, and M. Nießner. Scannet: Richlyannotated 3d reconstructions of indoor scenes. CoRR, abs/1702.04405, 2017.
 [13] I. Daubechies, R. DeVore, M. Fornasier, and C. S. Güntürk. Iteratively reweighted least squares minimization for sparse recovery. Report, Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA, June 2008.
 [14] N. Gelfand, S. Rusinkiewicz, L. Ikemoto, and M. Levoy. Geometrically stable sampling for the ICP algorithm. In 3DIM, pages 260–267. IEEE Computer Society, 2003.
 [15] Q. Huang, S. Flöry, N. Gelfand, M. Hofer, and H. Pottmann. Reassembling fractured objects by geometric matching. ACM Trans. Graph., 25(3):569–578, July 2006.
 [16] Q. Huang and L. Guibas. Consistent shape maps via semidefinite programming. In Proceedings of the Eleventh Eurographics/ACMSIGGRAPH Symposium on Geometry Processing, SGP ’13, pages 177–186, AirelaVille, Switzerland, Switzerland, 2013. Eurographics Association.
 [17] Q. Huang, F. Wang, and L. J. Guibas. Functional map networks for analyzing and exploring large shape collections. ACM Trans. Graph., 33(4):36:1–36:11, 2014.
 [18] Q. Huang, G. Zhang, L. Gao, S. Hu, A. Butscher, and L. J. Guibas. An optimization approach for extracting and encoding consistent maps in a shape collection. ACM Trans. Graph., 31(6):167:1–167:11, 2012.
 [19] X. Huang, Z. Liang, C. Bajaj, and Q. Huang. Translation synchronization via truncated least squares. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 1459–1468. Curran Associates, Inc., 2017.
 [20] X. Huang, Z. Liang, C. Bajaj, and Q. Huang. Translation synchronization via truncated least squares. In NIPS, 2017.
 [21] D. Huber. Automatic Threedimensional Modeling from Reality. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, December 2002.
 [22] D. F. Huber and M. Hebert. Fully automatic registration of multiple 3d data sets. Image and Vision Computing, 21:637–650, 2001.
 [23] M. K. Kadalbajoo and A. Gupta. An overview on the eigenvalue computation for matrices. Neural, Parallel Sci. Comput., 19(12):129–164, Mar. 2011.

[24]
S. Kim, S. Lin, S. R. JEON, D. Min, and K. Sohn.
Recurrent transformer networks for semantic correspondence.
In NIPS, page to appear, 2018.  [25] V. Kim, W. Li, N. Mitra, S. DiVerdi, and T. Funkhouser. Exploring collections of 3d models using fuzzy correspondences. ACM Trans. Graph., 31(4):54:1–54:11, July 2012.
 [26] S. Leonardos, X. Zhou, and K. Daniilidis. Distributed consistent data association via permutation synchronization. In ICRA, pages 2645–2652. IEEE, 2017.
 [27] N. Mellado, D. Aiger, and N. J. Mitra. Super 4pcs fast global pointcloud registration via smart indexing. Comput. Graph. Forum, 33(5):205–215, Aug. 2014.
 [28] K. Moo Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, and P. Fua. Learning to find good correspondences. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [29] A. Nguyen, M. BenChen, K. Welnicka, Y. Ye, and L. J. Guibas. An optimization approach to improving collections of shape maps. Comput. Graph. Forum, 30(5):1481–1491, 2011.
 [30] O. Özyesil and A. Singer. Robust camera location estimation by convex programming. CoRR, abs/1412.0165, 2014.
 [31] D. Pachauri, R. Kondor, G. Sargur, and V. Singh. Permutation diffusion maps (PDM) with application to the image association problem in computer vision. In NIPS, pages 541–549, 2014.
 [32] D. Pachauri, R. Kondor, and V. Singh. Solving the multiway matching problem by permutation synchronization. In NIPS, pages 1860–1868, 2013.
 [33] R. Ranftl and V. Koltun. Deep fundamental matrix estimation. In Computer Vision  ECCV 2018  15th European Conference, Munich, Germany, September 814, 2018, Proceedings, Part I, pages 292–309, 2018.
 [34] Y. Shen, Q. Huang, N. Srebro, and S. Sanghavi. Normalized spectral map synchronization. In NIPS, pages 4925–4933, 2016.
 [35] A. Singer and H. tieng Wu. Vector diffusion maps and the connection laplacian. Communications in Pure and Applied Mathematics, 65(8), Aug. 2012.
 [36] Y. Sun, Z. Liang, X. Huang, and Q. Huang. Joint map and symmetry synchronization. In Computer Vision  ECCV 2018  15th European Conference, Munich, Germany, September 814, 2018, Proceedings, Part V, pages 257–275, 2018.
 [37] C. Sweeney, T. Sattler, T. Höllerer, M. Turk, and M. Pollefeys. Optimizing the viewing graph for structurefrommotion. In ICCV, pages 801–809. IEEE Computer Society, 2015.
 [38] L. Wang and A. Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference: A Journal of the IMA, 2:145–193, December 2013.
 [39] K. Wilson and N. Snavely. Robust global translations with 1dsfm. In D. J. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, ECCV (3), volume 8691 of Lecture Notes in Computer Science, pages 61–75. Springer, 2014.
 [40] C. Zach, M. Klopschitz, and M. Pollefeys. Disambiguating visual relations using loop constraints. In CVPR, pages 1426–1433. IEEE Computer Society, 2010.
 [41] Q. Zhou, J. Park, and V. Koltun. Fast global registration. In Computer Vision  ECCV 2016  14th European Conference, Amsterdam, The Netherlands, October 1114, 2016, Proceedings, Part II, pages 766–782, 2016.
 [42] Q. Zhou, J. Park, and V. Koltun. Open3d: A modern library for 3d data processing. CoRR, abs/1801.09847, 2018.
 [43] T. Zhou, Y. J. Lee, S. X. Yu, and A. A. Efros. Flowweb: Joint image set alignment by weaving consistent, pixelwise correspondences. In CVPR, pages 1191–1200. IEEE Computer Society, 2015.
 [44] X. Zhou, M. Zhu, and K. Daniilidis. Multiimage matching via fast alternating minimization. CoRR, abs/1505.04845, 2015.
Appendix A Overview
Appendix B More Experimental Results
b.1 More Visual Comparison Results
Figure 6 shows more visual comparisons between our approach and baseline approaches. Again, our approach produces alignments that are close to the underlying groundtruth. The overall quality of our alignments is superior to that of the baseline approaches.
b.2 Cumulative Density Function
Figure 7 plots the cumulative density functions of errors in rotations and translations with respect to a varying threshold.
Redwood  Scannet 
Redwood  Scannet 
Redwood  Scannet 
b.3 Illustration of Dataset
To understand the difficulty of the datasets used in our experiments, we pick a typical scene from each of the Redwood and ScanNet datasets and render 15 out of 30 ground truth point clouds from the same camera view point. From Figure 9 and Figure 8, we can see that ScanNet is generally harder than Redwood, as there is less information that can be extracted by looking at pairs of scans.
Appendix C Proofs of Propositions
We organize this section as follows. In Section C.1, we provide key lemmas regarding the eigendecomposition of a connection Laplacian, including stability of eigenvalues/eigenvectors and derivatives of eigenvectors with respect to elements of the connection Laplacian. In Section C.2, we provide key lemmas regarding the projection operator that maps the space of square matrices to the space of rotations. Section C.3 to Section C.6 describe the proofs of all the propositions stated in the main paper. Section C.7 provides an exact recovery condition of a rotation synchronization scheme via reweighted least squares. Finally, Section C.8 provides proofs for new key lemmas introduced in this section.
c.1 EigenStability of Connection Laplacian
We begin with introducing the problem setting and notations in Section C.1.1. We then present the key lemmas in Section C.1.2.
c.1.1 Problem Setting and Notations
Consider a weighted graph with vertices, i.e., . We assume that is connected. With we denote an edge weight associated with edge . Let be the weighted adjacency matrix (Note that we drop from the expression of to make the notations uncluttered). It is clear that the leading eigenvector of is , and its corresponding eigenvalue is zero. In the following, we shall denote the eigendecomposition of as
where
collect the remaining eigenvectors and their corresponding eigenvalues of , respectively. Our analysis will also use a notation that is closely related to the pseudoinverse of :
(15) 
Our goal is to understand the behavior of the leading eigenvectors of ^{2}^{2}2Note that when applying the stability results to the problem studied in this paper, we always use . However, when assume a general when describing the stability results. for a symmetric perturbation matrix , which is a block matrix whose blocks are given by
where is the perturbation imposed on .
We are interested in , which collects the leading eigenvectors of in its columns. With we denote the corresponding eigenvalues. Note that due to the property of connection Laplacian, . Our goal is to 1) bound the eigenvalues , and 2) to provide blockwise bounds between and , for some rotation matrix .
Besides the notations introduced above that are related to Laplacian matrices, we shall also use a few matrix norms. With and we denote the spectral norm and Frobenius norm, respectively. Given a vector , we denote as the elementwise infinity norm. We will also introduce a norm for square matrices, which is defined as
We will also use a similar norm defined for block matrices (i.e., each block is a matrix):
c.1.2 Key Lemmas
This section presents a few key lemmas that will be used to establish main stability results regarding matrix eigenvectors and matrix eigenvalues. We begin with the classical result of the Weyl’s inequality:
Lemma C.1.
(Eigenvalue stability) For , we have
(16) 
We proceed to describe tools for controlling the eigenvector stability. To this end, we shall rewrite as follows:
Our goal is to bound the deviation between and a rotation matrix and blocks of .
We begin with controlling , which we adopt a result described in [3]:
Lemma C.2.
(Controlling [3]) If
then there exists ^{3}^{3}3If not, we can always negate the last column of U. such that
In particular,
It remains to control the blocks of . We state a formulation that expresses the column of using a series:
Lemma C.3.
Suppose , then ,
(17) 
We conclude this section by providing an explicit expression for computing the derivative of the leading eigenvectors of a connection Laplacian with its elements:
Lemma C.4.
Let be an nonnegative definite matrix and its eigendecomposition is
(18) 
where .
Suppose . Collect the eigenvectors corresponding to the smallest eigenvalues of as the columns of matrix . Namely, where are the smallest eigenvelues of .
Notice that can have different decompositions in (18) when there are repetitive eigenvalues. But in our case where , we claim that is unique under different possible decomposition of so that can be well defined and has an explicit expression
(19) 
Moreover, the differentials of eigenvalues are
(20) 
c.2 Key Lemma Regarding the Projection Operator
This section studies the projection operator which maps the space of square matrices to the space of rotation matrices. We begin with formally defining the projection operator as follows:
Definition 1.
Suppose . Let be the singular value decomposition of square matrix where and are both orthogonal matrices, and all coefficients are nonnegative. Then we define the rotation approximation of as
It is clear that is a rotation matrix, since 1) both and are rotations, and 2) .
Lemma C.5.
Let be a block matrix of form
where . Use to denote the element on position in . Then we have
We then present the following key lemma regarding the stability of the projection operator:
Lemma C.6.
Let be a square matrix and . Suppose , then
Lemma C.7.
Regarding as a function about , then the differential of would be
where all notations follow Definition (1).
c.3 Proof of Prop. 4.1
We first present a formal version of Prop. 4.1 in the main paper.
Proposition 5.
Suppose the underlying rotations are given by . Modify the definition of such that
Define
(21) 
Suppose , , and
Comments
There are no comments yet.