1 Introduction
Aiming at recovering the camera motion and nonrigid structure simultaneously from 2D images emanating from monocular cameras, nonrigid structure from motion (NRSfM) is central to many computer vision applications and has received considerable attention in recent years. This classical problem is highly underconstrained. Although existing approaches in NRSfM
[6] [8] [24] [14] [4] have presented promising results but most of these methods assume that, there is only one object undergoing nonrigid deformation in the scene. However, real world nonrigid scenes are much more complex: for example multiple persons performing different activities, soccer players in the playground, salsa dance and etc. All these real world examples constitute multibody nonrigid deformation, which could not be explained well with the single nonrigid object assumption. Therefore, it is quite natural to extend singlebody NRSfM to multibody NRSfM where the task would be to jointly reconstruct and segment multiple 3D deforming objects overtime.In solving the problem of multibody NRSfM, a natural and direct twostage process is to reconstruct nonrigid multibody structure by applying stateoftheart nonrigid reconstruction methods[9][18] [29] and then segment distinct objects using subspace clustering methods such as Sparse Subspace Clustering (SSC) [12] or other clustering algorithms or viceversa. However, by adopting such pipelines the inherent structure of the problem has never been exploited, i.e nonrigid motion segmentation provides critical information to constrain 3D reconstruction while 3D nonrigid reconstruction could also constrain the corresponding motion segmentation problem. Furthermore, since the nonrigid shape deformation actually occurs in 3D space, it is more intuitive to perform segmentation of objects in 3D space rather than on projected 2D image space.
Additionally, it is always convenient–both computationally and numerically to solve a given task using a unified approach than solving it in a sequential way. Therefore, in this paper, we propose a framework to simultaneously reconstruct and cluster multiple nonrigid shapes by exploiting the spatiotemporal correlation in data. By such approach we can explain the dynamics of nonrigid shape in a more intuitive way. Explicitly, we represent multibody NRSfM as union of subspace both in 3D trajectory space (spatially) and 3D shape space (temporally). We use the fact that each 3D trajectory can be expressed with other trajectory only if the trajectory is from the same subspace (spatial clustering) [17], and each individual activity can be expressed with activity belonging to the same subspace (temporal clustering) [29]. A visual illustration of the spatiotemporal subspace concept is presented in Fig. 1. Concretely, spatial clustering tries to reconstruct a trajectory using affine combination of other trajectories from the same deforming object, while temporal clustering tries to explain the shape of deforming objects using affine combination of other shapes at different frame instance belonging to similar activity.
By exploiting the spatiotemporal clustering structure, our approach is able to learn the affinity matrices which naturally encode subspace information. From the affinity matrices, direct inference about number of deformable objects, different activities and membership of each sample to achieve reconstruction can be easily made. Furthermore, we exploit the fact that the connectivity between subspaces must be tight if it belongs to the same subspace and loose if belongs to different subspaces. Therefore, we propose to use a mixture of norm and norm regularization (also known as the Elastic Net [31]), which helps in controlling the sparsity of the affinity matrices.
Contributions:

We propose a joint segmentation and reconstruction framework to the challenging task of complex multibody NRSfM by exploiting the inherent spatiotemporal union of subspace constraint.

We propose to efficiently solve the resultant nonconvex optimization problem based on the Alternating Direction Method of Multipliers (ADMM) method [5].

Extensive experimental results on both synthetic and real multibody NRSfM datasets demonstrate the superior performance of our proposed framework.
2 Related Works
Multibody structure from motion (SfM) is an important problem in computer vision. To work out this problem for rigid motion is a direct extension to elegant multiview geometry techniques [13][20]. However, solution to multibody NRSfM is not straightforward, due to the difficulty in modeling complex nonrigid variations. Recent stateoftheart in NRSfM reconstruction [9] has shown promising results while Zhu et al. [29] proposed that such an approach may fail while modeling longterm complex nonrigid motions. The work quote that Dai et al. [8] work is “highly dependent on the complexity of the motion” [29]. Hence, to overcome this difficulty they suggested to represent longterm nonrigid motion as union of subspace rather than a single subspace. Subsequently, Cho et al. [7] used probabilistic variations to model complex shape.
Despite the above accomplishments, NRSfM is still far behind its rigid counterpart. This gap is principally due to difficulty in modeling real world nonrigid deformation. If the deformation is irregular or arbitrary then to explain the 3D structure is nearly impossible. Nevertheless, many real world deformation can be constrained; as a result Bergler [6] introduced NRSfM which is considered a seminal work in NRSfM. In the work, Bergler demonstrated that nonrigid deformation can be represented by a linear combination of a set of shape basis. Following the work, several researchers tried to model NRSfM by utilizing additional constraints [25], [27], [21]. In 2008, Akhter et al. [4] presented a dual approach by modeling 3D trajectories. In 2009, Akhter et al. [3] proved that even there is an ambiguity in shape bases or trajectory bases, nonrigid shapes can still be solved uniquely without any ambiguity. In 2012, Dai et al. [8] proposed a “priorfree” method to recover camera motion and 3D nonrigid deformation by exploiting low rank constraint only. Besides shape basis model and trajectory basis model, the shapetrajectory approach [16] combines two models and formulates the problems as revealing trajectory of the shape basis coefficients. Besides linear combination model, Lee et al. [18]
proposed a Procrustean Normal Distribution (PND) model, where 3D shapes are aligned and fit into a normal distribution. Simon et al.
[23] exploited the Kronecker pattern in the shapetrajectory (spatitemporal) priors. Zhu and Lucey [30] applied the convolutional sparse coding technique to NRSFM using point trajectories. However, the method requires to learn an overcomplete basis of 3D trajectories, prior to performing 3D reconstruction.Recently, Russell et al. [22] proposed to simultaneously segment a complex dynamic scene containing a mixture of multiple objects into constituent objects and reconstruct a 3D model of the scene by formulating the problem as hierarchical graphcut based segmentation, where the whole scene is decomposed into background and foreground objects with complex motion of nonrigid or articulated objects are modeled as a set of overlapping rigid parts.
Our method varies from the aforementioned works in the following aspects: 1) We provide a novel framework to joint segmentation and reconstruction for multiple nonrigid deformation problem; 2) We propose a simple, yet efficient and elegant optimization routine and its solution based on ADMM; 3) Our method can be applied to both sparse and dense scenarios (up to the order of tenthousand feature tracks).
3 Formulation
Under our formulation, we intend to reconstruct 3D nonrigid shapes such that they satisfy both the spatiotemporal union of affine subspace constraint and the nonrigid shape constraints (low rank and spatial coherency). Let represent the , with the number of frames and the number of feature points. We use the model and eliminate the translation component of camera motions as suggested in [6].
(1) 
where denotes the camera rotation matrix and represents the 3D shapes of deforming objects over entire frames. This classical representation for NRSfM problem [6] aims at recovering both the camera motion and the nonrigid 3D shapes from the 2D measurement matrix such that . Following the same representation to cater 2D3D relation, we use to infer the reprojection error.
3.1 Representing multiple nonrigid deformations in trajectory space
To represent multiple nonrigid objects using a single linear trajectory space does not provide compact representation of 3D trajectories [29]. When there are multiple nonrigid objects, each object can be characterized as lying in an affine subspace. Therefore, the 3D trajectories lie in a union of affine subspaces, which can equivalently be formulated in terms of selfexpressiveness i.e,
(2) 
where . To get rid of the trivial solution of or , we explicitly enforce the diagonal constraint as . As we represent each nonrigid object as lying in an affine subspace, we further enforce the affine constraint . Besides the above constraint, we also want to enforce a constraint that if the trajectories belong to the same deforming object then it must be tightly connected or loosely connected the otherwise. To cater this idea of interclass and intraclass trajectories clustering, we use the elastic net formulation [28] to compromise between connectedness and sparsity. Combining all the constraints together, we reach the following optimization:
(3)  
subject to:  
A visual illustration of this idea in trajectory space for a single trajectory is provided in Fig. 2. Here, and denote the norm and the Frobenius norm respectively.
3.2 Representing multiple nonrigid deformations in shape space
An example complex nonrigid motion is shown in Figure 1, where the subjects are performing different activities at different time instances. Such distinct motion adheres to different local subspace and complete nonrigid motion lies in union of shape subspace. As mentioned in [29] such assumption leads to superior 3D reconstruction. To incorporate this concept in our formulation that different activities lie in union of affine subspaces, we express the 3D shapes in terms of selfexpressiveness of frames along temporal direction.
(4) 
where is the reshuffled version of
representing the perframe 3D shape as a column vector,
A visual intuition of this idea in shape space for single frame is provided in Fig. 3.For temporal clustering, we also use the elastic net as regularization parameters due to similar reason mentioned in Section 3.1 for , thereby formulating the following optimization:
(5)  
subject to:  
3.3 Enforcing the global shape constraint
In seeking a compact representation for multibody nonrigid objects, we penalize the number of independent nonrigid shapes. Similar to [8] and [14], we penalize the nuclear norm of the reshuffled shape matrix , this is because the nuclear norm is known as the convex envelope of the rank function. In this way, the global shape constraint is expressed as:
(6) 
where
denotes the nuclear norm of the matrix, ie, sum of singular values.
3.4 Joint Reconstruction and Segmentation Formulation
Putting all the above constraints (spatiotemporal union of subspace constraint and global shape constraint) together, we reach a multibody nonrigid reconstruction and segmentation formulation:
(7)  
subject to:  
where , , and . are the tradeoff parameters.
4 Solution
To solve the proposed optimization we introduce decoupling variables in Eq. 7, which leads to the following formulation:
(8)  
subject to:  
The auxiliary variables are introduced to simplify the derivation. denotes the linear mapping from to its reshuffled version . Specifically, S =
and
=
.
The first term in the above optimization is meant for penalizing reprojection error under orthographic projection. Under singlebody NRSFM configuration, 3D shape can be well characterized as lying in a single low dimensional linear subspace. However, when there are multiple nonrigid objects, each nonrigid object could be characterized as lying in an affine subspace. To represent this idea mathematically in shape and trajectory space respectively, we introduce and .
In addition to this, to reveal the intrinsic structure of multibody nonrigid structurefrommotion (NRSfM), we seek for the sparsest solution both in trajectory and shape space. Consequently, we enforce the norm for and . However, high sparsity may lead to misclassification of samples or trajectories. Therefore, to maintain the balance between sparsity and connectedness, we incorporate the elastic net for both and . Lastly, we enforce a global shape constraint () for compact representation of multibody nonrigid objects by penalizing the rank of the entire nonrigid shape.
Due to the two bilinear terms and , the overall optimization of Eq.(8) is nonconvex. We solve it via the alternating direction method of multipliers (ADMM), which has a proven effectiveness for many nonconvex problems and is widely used in computer vision. ADMM works by decomposing the original optimization problem into several subproblems, where each subproblem can be solved efficiently. To this end, we seek to decompose Eq.(8) into several subproblems.
We introduce Lagrangian multipliers in the equation (8) and reach the Augmented Lagrangian formulation for Eq.(8)
(9)  
where we define and . are the Lagrange multipliers. is the penalty parameter, where we use the same parameter for each augmented Lagrange term to simplify the derivation and parameter setting. The symbol represents the Frobenius inner product of two matrices, i.e, the trace of the product of two matrices. For example, given two matrices , the Frobenius inner product is calculated as Tr.
The ADMM works by minimizing Eq. (9) with respect to one variable while fixing the others. During each iteration, we update each variable and the Lagrange multipliers in sequel. The detailed derivation for the solution is presented in the Appendix.
Solution for S: The closed form solution for can be derived by taking derivative of (9) w.r.t to and equating to zero.
(10) 
Solution for : The closed form solution for can be derived by taking derivative of (9) w.r.t and equating to zero.
(11) 
Solution for : The closed form solution for can be derived as
(12) 
(13) 
Solution for : The closed form solution for can be derived as
(14) 
(15) 
Solution for : The optimization of given all the remaining variables can be expressed as:
(16)  
A closedform solution exists for this subproblem. Let’s define the softthresholding operation as , the optimal can be obtained as:
(17) 
where = .
Solution for : The closedform solution for can be obtained similarly:
(18) 
Solution for The derivation for the solution of is similar to .
(19) 
Detailed derivations to each subproblems solution are provided in 0.A. Finally, the Lagrange multipliers and are updated as:
(20) 
(21) 
(22) 
(23) 
(24) 
5 Experiments and Results
We performed extensive experiments on benchmark datasets that are freely available. We tested our approach on both real data and synthetic data under sparse and semidense scenarios. Denote
as the estimated 3D structure and
as the groundtruth structure, we use the following error metrics to evaluate the performance of the approach:(i) Relative error in multibody nonrigid 3D reconstruction
(25) 
(ii) Error in multibody nonrigid motion segmentation,
Comments
There are no comments yet.