1 Introduction
Structurefrommotion (SfM) is the problem of recovering the 3D structure of a scene from multiple images taken by a camera at different viewpoints. When the scene structure is rigid the problem is generally well defined and has been much studied [25, 12, 24, 8], with rigidity at the heart of almost all visionbased 3D reconstruction theories and methods. When the scene structure is nonrigid (deforming surface, articulated motion, and ), the problem is underconstrained, and constraints such as low dimensionality [4] or local rigidity [23] have been exploited to limit the set of solutions.
Currently, nonrigid structure from motion (NRSfM) lags far behind its rigid counterpart, and is often treated entirely separately from rigid SfM. Part of the reason for this separate treatment lies in the usual formulation of the SfM problem, which approaches the task in two stages: first the relative camera motions the scene are estimated; then the 3D structure is computed afterwards. In each stage, different methods and implementations have to be developed for rigid and nonrigid scenarios separately because of the different structure priors that are exploited. This has a further disadvantage in that it can be difficult to determine
a priori whether the scene is rigid or nonrigid (and if the latter, in what way).Therefore, it is highly desirable to have a generic SfM framework that can deal with both rigid and nonrigid motion, which leads to the main theme of this paper. In fact, as early as in the year 1983, Ullman [26] proposed a “maximizing rigidity” principle that relies on a nonconvex rigidity measure to reconstruct 3D structure from both rigid and nonrigid (rubbery) motion. This idea has resurfaced in various work under the ARAP (“as rigid as possible”) moniker [15, 21]. However, it has not been further developed under the modern view of 3D reconstruction, mainly due to the difficulty in its optimization. In this paper we revisit Ullman’s “maximizing rigidity” principle and propose a novel convex rigidity measure that can be incorporated into a modern SfM framework for both rigid and nonrigid shape reconstruction.
Our proposed formulation yields reconstructions that are more accurate than current stateoftheart for nonrigid shape reconstruction, and which enforce rigidity when this is present in the scene. This is because our framework aims at maximizing the rigidity while still satisfying the image measurements. We thus achieve a unified theory and paradigm for 3D vision reconstruction tasks for both rigid and nonrigid surfaces. Our method does not need to specify which case (out of the above scenarios) is the target to be reconstructed. The method will automatically output the optimal solution that best explains the observations.
2 Related Work
Traditionally, under a perspective camera, the pipeline of RSfM consists of two steps, , a camera motion estimation step and a following structure computation step [12, 10, 9, 8]; or the camera motion and 3D structure can also be estimated simultaneously through measurement matrix factorization [24, 22]. In RSfM literature, most related to our work was by Li [11], who proposed an unusual approach to handle SfM which bypasses the motionestimation step. This method does not require any explicit motion estimation and was called the “structurewithoutmotion” method.
In contrast to RSfM, NRSfM remains an open active research topic [4, 3, 30]
in computer vision. One of the commonly used constraints in NRSfM is the local rigidity constraint
[23, 28], or in some literature the inextensibility constraint [29].Taylor [23] formulated a NRSfM framework in terms of a set of linear length recovery equations using local threepoint triangles under an orthographic camera, and grouped these “loosely coupled” rigid triangles into nonrigid bodies. Varol [28] estimated homographies between corresponding local planar patches in both images. These yield approximate 3D reconstructions of points within each patch up to a scale factor, where the relative scales are resolved by considering the consistency in the overlapping patches. Both methods form part of a recent trend of piecewise reconstruction methods in NRSfM. Instead of relying on a single model for the full surface, these approaches model small patches of the surface independently. Vicente and Agapito [29] exploited a soft inextensibility prior for templatefree nonrigid reconstruction. They formulated an energy function that incorporates the inextensibility prior and minimized it via the QPBO algorithm. Very recently, Chhatkuli [3]
presented a global and convex formulation for templateless 3D reconstruction of a deforming object by using perspective cameras, where the 3D reconstruction problem is recast as a SecondOrder Cone Programming (SOCP) using the MaximumDepth Heuristic (MDH)
[17, 18, 20].The literature of RSfM and NRSfM advance in relatively independent directions. To the best of our knowledge, the first attempt to unify the two fields was by Ullman [26], who proposed to use the principle of “maximizing rigidity’ to recover 3D structure from both rigid and nonrigid motion. Ullman’s original formulation maintained and updated an internal rigid model of the scene across a temporal sequence. A rigid metric was defined in terms of point distance to measure the deviation from the estimated structure to the internal model. The 3D structure was recovered by minimizing the overall deviation from rigidity (internal model) to a local optimum via a gradient method. Compared to Ullman’s method, our method unifies rigid and nonrigid SfM within a convex program, from which we obtain a global optimal solution.
3 Maximizing Rigidity Revisited
In this section, we discuss Ullman’s maximizing rigidity principle in more detail. Ullman assumed that there is an internal model of the scene and the internal model should be as rigid as possible [26]. Let be the Euclidean distance between points and of the internal model, and the Euclidean distance between points and of the estimated structure. A measure of the difference between and was defined as
(1) 
Under an orthographic camera model, the pairwise distance was directly parameterized as , where are the known image (coordinate) measurements and are the unknown depths. Unfortunately, no principled way was provided to handle the general perspective camera model. Intuitively, this measure is the least square difference reweighted by the interpoint distance of the internal model. The reweighting indicates that a point is more likely to move rigidly with its nearest neighbors [26]. However, the reweighting also makes Ullman’s rigidity measure nonconvex in terms of .
Then the problem of determining the most rigid structure can be formulated as minimizing the overall deviation from rigidity . Since the internal model of the scene is often unknown, the remedy that Ullman proposed was to start from a flat plane (as the initial internal model) and incrementally estimate the internal model and 3D structure. This internal model was shown to converge to a local optimum for both rigid and nonrigid motions [26, 7]. See Figure 1 for a spring model illustration of Ullman’s method.
From the analysis above, we can identify three major drawbacks of Ullman’s method: (i) it cannot handle the perspective camera in a principled way; (ii) the rigidity measure used is nonconvex, which leads to local optimum; (iii) it relies on building fully connected graphs (for every pair of points), which, in practice, is often redundant and unnecessary. In the following section, we’ll show how these drawbacks can be circumvented by introducing a novel convex rigidity measure which can be further incorporated into an edgebased 3D reconstruction framework for perspective projection.
4 Our 3D Shape Reconstruction Model
In this section, we present our unified model for both rigid and nonrigid 3D shape reconstruction. The core component of our model lies in a novel convex rigidity measure as introduced below. For notation, points are indexed with a subscript , and image views (or frames) are indexed with a superscript . We assume that the world frame is centered at the camera center and aligned with camera coordinate system.
4.1 Our Rigidity Measure
Ullman’s rigidity measure requires to build a fully connected graph within each time frame and penalize distant edges (as distant point pairs are more likely to move nonrigidly). Instead of using a fully connected graph, we build a Knearestneighbor graph (KNNG), which connects each point to a set of its nearest neighbors, denoted as , based on the Euclidean distance on 2D images [3]. We also use a different internal model than Ullman’s. Specifically, we define a rigid internal model with interpoint distance , , is the maximal distance between points and over all frames. For rigid shapes, corresponds to the Euclidean distance between and , which is invariant over all frames; for nonrigid inextensible shapes, corresponds to the maximal Euclidean distance between and over all frames, which generally equals the Geodesic distance between and . For example, for a nonrigidly deforming paper, its internal model corresponds to the flat paper. To enforce rigidity, we define a measure of the total difference between and for all as
(2) 
Compared to Ullman’d rigidity measure, our rigidity measure has three merits: (i) we significantly reduce the number of edges by using a KNNG instead of a fully connected graph; (ii) our measure is convex, which is crucial for optimization; (iii) we use a robust L1 norm instead of the L2 norm in the rigidity measures. To make the reconstructed scene as rigid as possible, we need to minimize . As we will see in the following subsections, our rigidity measure can be naturally incorporated into an edgebased 3D reconstruction framework under perspective projection.
4.2 EdgeBased Reconstruction
Given a set of 2D point correspondences across images , our target is to find their 3D coordinates in the same global coordinate system. We denote the edge (distance) between the camera center and , which we call a “leg”, as . Define the angle between legs and as . Clearly, we have . We assume that the camera is intrinsically calibrated, so the angles can be trivially computed. Denote the Euclidean distance between two points and in the frame as . For rigid motion, is constant over frames for the same pair of point correspondences. In the case of nonrigid motions, may change over frames, but is bounded by a maximal value , , .
Motivated by [11], we build our model based on viewing triangles formed by each pair of points to compute the 3D structure. See Figure 2 for an illustration. Note that the viewing triangles can only be formed with points of the same frame.
Within each viewing triangle, we have a basic equation following the cosine law
(3) 
We can rewrite this equation in a matrix form as
(4) 
With all viewing triangles, we can construct a system of quadratic equations of the above form in terms of the unknowns , and .
Stack all the legs
into a vector
, with the leg vector for the frame . Define the cosinematrix as with the diagonal elements as one and offdiagonal elements as . Let be the unit basis vector (, all 0 but 1 at the entry), and define the diagonal matrix . Eq. (4) can then be rewritten as(5) 
Note that the matrix is highly sparse with only four nonzero elements.
The edgebased 3D reconstruction problem becomes
(6a)  
(6b)  
(6c) 
where is a vector containing all .
However, the above formulation is not well constrained because: (i) the scale of the solutions cannot be uniquely determined due to the homogeneous equation in (6b); (ii) a trivial allzero solution always exists; (iii) for nonrigid motions, we only have one equality constraint for each , which is insufficient to get deterministic solutions^{2}^{2}2For rigid motions, , , and we have sufficient constraints for . But in many cases, we don’t know whether the scene is rigid or nonrigid a priori.. The scale ambiguity is intrinsic to 3D reconstruction under the perspective camera model [8]. In practice, we can fix a global scale of the scene by normalizing or .
MaximumLeg Heuristic (MLH).
To get the desired solutions, we apply a socalled MaximumLeg Heuristic (MLH). After fixing the global scale, we want to maximize the sum of all legs under the nonnegative constraint, or equivalently
(7a)  
(7b)  
(7c) 
In this way, trivial solutions are avoided. Note that, in the NRSfM literature, there is a commonly used heuristic called Maximum Depth Heuristic (MDH) [17, 18, 20], which maximizes the sum of all depths under the condition that each depth and distance are positive. Our MLH virtually plays the same role as MDH because under a perspective camera, we have , where represents the depth of the point in the frame (, ), and .
5 Convex Program for 3D Shape Reconstruction
Incorporating our rigidity measure in (2) into (7), we get our overall formulation for 3D shape reconstruction as follows
(8a)  
(8b)  
(8c)  
(8d)  
(8e) 
where is a tradeoff parameter, and the equality constraint (8d) fixes the global scale of the reconstructed shape. However, the above formulation is still nonconvex due to the quadratic terms in the both sides of Eq. (8b) and in the lefthandside of Eq. (8d). To make it convex, we first define and . We then change our formulation to the following form
(9a)  
(9b)  
(9c)  
(9d)  
(9e) 
where we approximate with , and drop the absolute value operator as we have an inequality constraint (9c) to make sure is always nonnegative. Due to (9b), our formulation turns out to be a quadratically constrained quadratic program (QCQP), which is unfortunately still a nonconvex and even NPhard problem for indefinite [16, 19].
5.1 SemiDefinite Programming (SDP) Relaxation
We now show how our formulation can be converted to a convex program using SDP relaxation. Note that we have . We can introduce auxiliary variables for . Then Eq. (9b) equivalently becomes two equality constraints . We can directly relax the last nonconvex equality constraint into a convex positive semidefiniteness constraint [5]. Using a Schur complement, can be reformulated [1] as . Ideally, should be a rankone matrix. But, after the relaxation, the rank constraint for may not be maintained. We can minimize as a convex surrogate of .^{3}^{3}3For positive semidefinite , , and the nuclear norm is a wellknown convex surrogate for the rank.
Our formulation becomes an SDP written as:
(10a)  
(10b)  
(10c)  
(10d)  
(10e) 
where , are two positive tradeoff parameters, and is an allone column vector with appropriate dimensions. Note that we remove a term of in the objective (10a) because we have , which is a constant. Our formulation consists of a linear objective subject to linear constraints and SDP constraints, which is known as a convex problem. This convex SDP problem can be solved effectively by any modern SDP solver to a global optimum.
Incomplete Data.
Incomplete measurements are quite common due to occlusions. To handle incomplete measurements, we can introduce a set of visibility masks , where if the point is visible in frame , otherwise . With the visibility masks, the terms related to become and the terms related to become . The problem is still convex and solvable with any SDP solver. Here we assume that the number of visible points in one frame is greater than the neighborhood size; otherwise, we remove that frame.
5.2 3D Reconstruction from Legs
Under the perspective camera model, we can relate and with
(11) 
where . After we get the solutions for all legs , we can then substitute them back to the above equation to compute the 3D coordinates for all points.
Degenerate Cases.
Our system becomes degenerate if there is only pure rotation (around the camera center) in the scene. In fact, pure rotation over the camera center do not change the angles between two vectors, , and ,
(12) 
where the equations hold because and rotation on vectors does not change their length. So if there is only pure rotation in the scene, our system will become underconstrained. This also corresponds to the fact in epipolar geometry that pure rotation cannot be explained by the essential/fundamental matrix (but homography instead). Another degenerate case is when the camera model is close to orthographic. In this case, the viewing angles are all close to zero, which makes our formulation unsolvable.
6 Experiments
We compare our method with four baselines for rigid and nonrigid 3D shape reconstruction. These baselines include: the rigid “structurewithoutmotion” method for a perspective camera in [11], the nonconvex softinextensibility based NRSfM method for an orthographic camera in [29], the priorfree lowrank factorization based NRSfM method for an orthographic camera in [4], and the secondorder cone programming based NRSfM method for a perspective camera [3]. For the baselines, we use the source codes provided by the authors. We implement our method in Matlab and use the MOSEK [14] SDP solver to solve our formulation. We fix all the parameters of the baseline methods to the optimal values. We find that our method is not sensitive to the parameters and , and set and for all our experiments, which are obtained by validating on a separate dataset. Due the limit of space, our qualitative reconstruction results on all synthetic datasets are provided in the supplementary videos.
The metrics we use to evaluate the performance are the 3D Root Mean Square Error (RMSE) (in mm) and the relative 3D error (denoted as RErr) (in %), which are respectively defined as
where is the ground truth coordinates of point in frame . We always have a scale ambiguity for all structurefrommotion methods. For methods that use a perspective camera model, we rescale their reconstructions to best align them with the ground truth before computing the errors. For methods that use an orthographic camera model, we do Procrustes analysis to solve for a similarity transformation that best aligns the reconstructions with the ground truth.
6.1 Nonrigid Structure from Motion
Our method and [3] rely on constructing a KNNG. For both methods, we use the same KNNG and fix the neighborhood size as 20 for this set of experiments.
The Flag (SemiSynthetic) Dataset.
This flag dataset [31] consists of an image sequence of a fabric flag waving in the wind. The ground truth 3D points are provided in this dataset, but neither 2D projection trajectories nor camera calibrations are available. We subsample the 3D points in each frame and generate the input data from a virtual perspective camera with the fieldofview angle as . The final sequence contains 90 points (on each frame) and 50 frames. We report the 3D RMSE and mean relative 3D error in Table 1. Note that our method achieves the lowest 3D reconstruction error among all the competing methods.
The KINECT Paper, Hulk, and TShirt Datasets.
The KINECT paper dataset [27] contains an image sequence of smoothly deforming welltextured paper captured by a KINECT camera. The camera calibration and ground truth 3D are provided. We use the trajectories provided by [3], which was obtained by tracking interest points in this sequence using a flowbased method of [6]
. The trajectories are complete, semidense and outlierfree. Due to the large number of points and frames, we subsample the points and frames in this dataset and get a sequence with 151 points (on each frame) and 23 frames.
The Hulk dataset [2] consists of 21 images taken at different unrelated smooth deformations. The deforming scene is a welltextured paper cover of a comics. The intrinsic camera calibration matrix, 3D ground truth shape and 2D feature trajectories are provided in this dataset. This dataset contains 122 trajectories in 21 views.
The TShirt dataset [2] consists of 10 images taken for a deforming Tshirt. As in the Hulk dataset, the intrinsic camera calibration matrix, 3D ground truth shape and 2D feature trajectories are all provided in this dataset. This dataset contains 85 point trajectories in 10 frames.
The Jumping Trousers Dataset with missing data.
This dataset [31] contains 3D ground truth points for jumping trousers obtained from cloth motion capture. The complete 2D trajectories are generated by projecting the 3D points through a virtual perspective camera. However, due to selfocclusions, the 2D trajectories would have a considerable amount of missing entries, and the visibility masks are provided in the original data. We subsample the points and frames, and get a sequence of 97 points and 29 frames. Since the first two baselines [29, 4] cannot handle incomplete data, we input complete trajectories for them. We use the incomplete trajectories for [3] and our method as the two methods can handle incomplete data.^{4}^{4}4Note, for incomplete data, we only compute average 3D reconstruction error for visible points. And also note that this comparison is unfair for [3] and our method as the other two use complete data. The results are reported in Table 2. Our method, with incomplete data as input, outperforms all the other baselines.
[29]  [4]  [3]  Ours  

RMSE  190.17  49.97  44.05  37.70 
RErr  55.10%  12.67%  13.57%  11.65% 
From this set of experiments, we have shown that our method consistently outperforms all the baselines. We note that on those datasets there is always a significant performance gap between those orthographic camera model based methods ([29, 4]) and those perspective camera model based methods ([3] and ours). In the following experiments, we will only compare with the perspective camera model based methods ([11, 3]).
Robustness to various numbers of points/views, different levels of missing data and noise.
In Figure 5, we show the performance of our method and the bestperforming baseline [3]
on the KINECT paper dataset with increasing number of points/views, increasing ratios of missing data, and increasing levels of synthetic zeromean Gaussian noise (with various standard deviations
). The default experimental setting is with 100 points and 30 views, and the parameters are fixed as , , and . We can see that our method consistently outperforms the baseline method in all scenarios, which verifies the robustness of our method. We believe that our superior performance comes from the novel maximizing rigidity regularization, which better explains the image observations.6.2 Rigid Structure from Motion
In this set of experiments, we test our method for rigid structure reconstruction. Since our method does not utilize the rigidity prior of the scene, we can well expect that our method performs worse than the specifically designed rigid method. The main goal of this set of experiments is thus to show that our method can achieve comparable rigid structure reconstruction to the rigid method. We compare our method with the bestperforming baseline [3] for nonrigid structure from motion, and another method [11] specifically designed for rigid structure from motion. The neighborhood size is set as 20 for all methods.
Rigid Synthetic Dataset.
We verify our method for rigid structure computation on a synthetic dataset. To generate the data, we subsample the ground truth 3D points of one frame of the KINECT paper dataset [27], and apply a transformation (rotation and translation) to these points over time. After a perspective projection, we get a sequence for rigid motion with 61 points and 20 frames. The mean 3D reconstruction errors for all competing methods are reported in Figure 6. We also plot the RMSE (in mm) for each frame of the sequence in Figure 6 and compare our method with the stateoftheart nonrigid SfM method [3] and the rigid SfM method [11]. It’s no surprising that [11] achieves the lowest reconstruction errors in this rigid dataset as it utilizes the prior knowledge that the scene is rigid. Our method, without inputting any prior knowledge of the scene rigidity, gets close results to [11] and significantly outperforms the NRSfM method [3].
The Model House Dataset.
We use the VGG model house dataset ^{5}^{5}5http://www.robots.ox.ac.uk/~vgg/data/datamview.htmlas the realworld dataset for rigid SfM. The camera projection matrices, 2D feature coordinates and 3D ground truth points are provided in this dataset, and the 2D measurements contain moderate amount of noise. The camera intrinsic matrices are computed from camera projection matrices using RQ decomposition [8]. We generate a sequence with complete feature point trajectories of 95 points and 7 frames. We report the 3D reconstruction errors of all methods in Table 3. Again, our method obtains close results to [11] and lower reconstruction error than the NRSfM method [3].
6.3 Articulated Motion Reconstruction
In this set of experiments, we evaluate our method for the 3D reconstruction of articulated motions, and compare our method with the bestperforming baseline [3].
Synthetic Articulated Dataset.
We first test our method on two synthetic sequences where the objects undergo articulated motions. To generate the synthetic data, we take a subset of the ground truth 3D points in the first image of the KINECT paper dataset [27] and divide them into two groups. We synthesize two kinds of articulated motions: (i) the pointarticulated motion (denoted as “pointarticulated” in Table 4), , the two groups of points rotate around a common point in the dataset and meanwhile undertake the same translations through time; (ii) the axisarticulated motion (denoted as “axisarticulated” in Table 4), , the two groups of points rotate around a common axis in the dataset and also undertake the same translations. The 2D feature points are generated by projecting these 3D points with a virtual perspective camera. We finally get two synthetic sequences with 61 points and 19 frames. We report the RMSE (in millimeter) and the mean relative 3D error in Table 4. Our method achieves much lower 3D reconstruction error than the baseline method [3].
sequence  [3]  Ours 

pointarticulated  17.48 (2.45%)  7.70 (1.11%) 
axisarticulated  9.13 (1.36%)  3.07 (0.45%) 
Human Motion Capture Database.
We sample six sequences in the CMU Mocap Database ^{6}^{6}6http://mocap.cs.cmu.edu/ and five sequences (Dance, Drink, Pickup, Yoga, and Stretch sequences) used in [4] to form the human motion capture database. For the latter five sequences, the data are centered to fit the factorizationbased methods, so we further add random translations to each frame. Each sequence of this database consists of 28 (for CMU Mocap), 41 (for Drink, Pickup, Yoga, Stretch) or 75 (for Dance) points with 3D ground truth coordinates. The input data are generated from a virtual camera with perspective projection. We uniformly subsample the frames of each sequence with a sample rate 10 (, ) for CMU Mocap and a sample rate 5 for other sequences, producing sequences with 52 to 335 frames. For CMU Mocap, we set the neighborhood size as 28 for all competing methods, which lets us to use all available points to build the edges; for other sequences, we set as 20. We show the quantitative results of our method and the baseline method in Figure 7, and also give a qualitative comparison of the 3D reconstruction results on this dataset in Figure 8. We can see that our method consistently outperforms the baseline [3].
7 Concluding Remarks
In this paper, we have revisited Ullman’s principle of maximizing rigidity and proposed a novel convex rigidity measure that can be incorporated into a modern structure reconstruction framework to unify both rigid and nonrigid SfM from multiple perspective images. Our reconstruction method relies on directly building viewing triangles, thus not requiring to estimate camera poses. Importantly, our formulation (after SDP relaxations) is convex such that a global optimal solution is guaranteed. We have verified the efficacy of our method by extensive experiments on multiple rigid, nonrigid and articulated datasets.
Limitation and Future Work.
The computational bottleneck of our method lies in solving the SDPs. For a sequence of views and points (for each view), we need to solve SDPs of size . Using an interiorpoint method, one SDP has a worstcase complexity of given a solution accuracy [13], which remains the limiting factor preventing us from testing on modern largescale datasets. In the future, we aim to explore the possibility of applying modern largescale SDP solver, such as [33, 32], to solve our problem more efficiently. Furthermore, we also plan to investigate how to address the degenerate cases as discussed in Sec. 5.2.
References
 [1] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
 [2] A. Chhatkuli, D. Pizarro, and A. Bartoli. Nonrigid shapefrommotion for isometric surfaces using infinitesimal planarity. In BMVC, 2014.

[3]
A. Chhatkuli, D. Pizarro, T. Collins, and A. Bartoli.
Inextensible nonrigid shapefrommotion by secondorder cone
programming.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 1719–1727, 2016.  [4] Y. Dai, H. Li, and M. He. A simple priorfree method for nonrigid structurefrommotion factorization. International Journal of Computer Vision, 107(2):101–122, 2014.
 [5] A. d’Aspremont and S. Boyd. Relaxations and randomized methods for nonconvex QCQPs. EE392o Class Notes, Stanford University, 2003.
 [6] R. Garg, A. Roussos, and L. Agapito. Dense variational reconstruction of nonrigid surfaces from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1272–1279, 2013.
 [7] N. M. Grzywacz and E. C. Hildreth. Incremental rigidity scheme for recovering structure from motion: Positionbased versus velocitybased formulations. JOSA A, 4(3):503–518, 1987.
 [8] R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003.
 [9] R. I. Hartley. In defense of the eightpoint algorithm. IEEE Transactions on pattern analysis and machine intelligence, 19(6):580–593, 1997.
 [10] R. I. Hartley and P. Sturm. Triangulation. Computer vision and image understanding, 68(2):146–157, 1997.
 [11] H. Li. Multiview structure computation without explicitly estimating motion. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2777–2784. IEEE, 2010.
 [12] H. C. LonguetHiggins. A computer algorithm for reconstructing a scene from two projections. Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, MA Fischler and O. Firschein, eds, pages 61–62, 1987.
 [13] Z.Q. Luo, W.k. Ma, A. M.C. So, Y. Ye, and S. Zhang. Semidefinite relaxation of quadratic optimization problems. IEEE Signal Processing Magazine, 27(3):20, 2010.
 [14] A. Mosek. The MOSEK optimization software. Online at http://www. mosek. com, 54:2–1, 2010.
 [15] S. Parashar, D. Pizarro, A. Bartoli, and T. Collins. Asrigidaspossible volumetric shapefromtemplate. In Proceedings of the IEEE International Conference on Computer Vision, pages 891–899, 2015.

[16]
P. M. Pardalos and S. A. Vavasis.
Quadratic programming with one negative eigenvalue is NPhard.
Journal of Global Optimization, 1(1):15–22, 1991.  [17] M. Perriollat, R. Hartley, and A. Bartoli. Monocular templatebased reconstruction of inextensible surfaces. In British Machine Vision Conference, 2008.
 [18] M. Perriollat, R. Hartley, and A. Bartoli. Monocular templatebased reconstruction of inextensible surfaces. International Journal of Computer Vision, 2010.
 [19] S. Sahni. Computationally related problems. SIAM Journal on Computing, 3(4):262–279, 1974.
 [20] M. Salzmann and P. Fua. Linear local models for monocular reconstruction of deformable surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5):931–944, 2011.
 [21] O. Sorkine and M. Alexa. Asrigidaspossible surface modeling. In Symposium on Geometry processing, volume 4, 2007.
 [22] P. Sturm and B. Triggs. A factorization based algorithm for multiimage projective structure and motion. In European conference on computer vision, pages 709–720, 1996.
 [23] J. Taylor, A. D. Jepson, and K. N. Kutulakos. Nonrigid structure from locallyrigid motion. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2761–2768, June 2010.
 [24] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9(2):137–154, 1992.
 [25] S. Ullman. The interpretation of structure from motion. Proceedings of the Royal Society of London B: Biological Sciences, 203(1153):405–426, 1979.
 [26] S. Ullman. Maximizing rigidity: The incremental recovery of 3d structure from rigid and nonrigid motion. Perception, 13(3):255–274, 1984.
 [27] A. Varol, M. Salzmann, P. Fua, and R. Urtasun. A constrained latent variable model. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2248–2255, 2012.
 [28] A. Varol, M. Salzmann, E. Tola, and P. Fua. Templatefree monocular reconstruction of deformable surfaces. In 2009 IEEE 12th International Conference on Computer Vision, pages 1811–1818, Sept 2009.
 [29] S. Vicente and L. Agapito. Soft inextensibility constraints for templatefree nonrigid reconstruction. In European Conference on Computer Vision, pages 426–440, 2012.
 [30] X. Wang, M. Salzmann, F. Wang, and J. Zhao. Templatefree 3d reconstruction of poorlytextured nonrigid surfaces. In European Conference on Computer Vision, pages 648–663. Springer, 2016.
 [31] R. White, K. Crane, and D. A. Forsyth. Capturing and animating occluded cloth. In ACM Transactions on Graphics (TOG), volume 26, page 34. ACM, 2007.
 [32] L. Yang, D. Sun, and K.C. Toh. SDPNAL+: a majorized semismooth newtoncg augmented lagrangian method for semidefinite programming with nonnegative constraints. Mathematical Programming Computation, 7(3):331–366, 2015.
 [33] X.Y. Zhao, D. Sun, and K.C. Toh. A newtoncg augmented lagrangian method for semidefinite programming. SIAM Journal on Optimization, 20(4):1737–1765, 2010.
Comments
There are no comments yet.