Symmetric Non-Rigid Structure from Motion for Category-Specific Object Structure Estimation

09/22/2016 ∙ by Yuan Gao, et al. ∙ 0

Many objects, especially these made by humans, are symmetric, e.g. cars and aeroplanes. This paper addresses the estimation of 3D structures of symmetric objects from multiple images of the same object category, e.g. different cars, seen from various viewpoints. We assume that the deformation between different instances from the same object category is non-rigid and symmetric. In this paper, we extend two leading non-rigid structure from motion (SfM) algorithms to exploit symmetry constraints. We model the both methods as energy minimization, in which we also recover the missing observations caused by occlusions. In particularly, we show that by rotating the coordinate system, the energy can be decoupled into two independent terms, which still exploit symmetry, to apply matrix factorization separately on each of them for initialization. The results on the Pascal3D+ dataset show that our methods significantly improve performance over baseline methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

3D structure reconstruction is a major task in computer vision. Structure from motion (SfM) method, which aims at estimating the 3D structure by the 2D annotated keypoints from 2D image sequences, has been proposed for rigid objects

[1], and was later extended to non-rigidity [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. Many man-made objects have symmetric structures [15, 16]. Motivated by this, symmetry has been studied extensively in the past decades [17, 18, 19, 20, 16, 21, 22]. However, this information has not been exploited in recent works on 3D object reconstruction [23, 24], nor used in standard non-rigid structure from motion (NRSfM) algorithms [3, 4, 5, 6, 7, 8, 9, 10, 14].

The goal of this paper is to investigate how symmetry can improve NRSfM. Inspired by recent works [23, 24], we are interested in estimating the 3D structure of objects, such as cars, airplanes, etc. This differs from the classic SfM problem because our input are images of several different object instances from the same category (e.g. different cars), instead of sequential images of the same object undergoing motion. In other words, our goal is to estimate the 3D structures of objects from the same class, given intra-class instances from various viewpoints. Specifically, the Pascal3D+ keypoint annotations on different objects from the same category are used as input to our method, where the the symmetric keypoint pairs can also be easily inferred. In this paper, non-rigidity means the deformation between the objects from same category can be non-rigid, e.g. between sedan and SUV cars, but the objects themselves are rigid and symmetric.

By exploiting symmetry, we propose two symmetric NRSfM methods. By assuming that the 3D structure can be represented by a linear combination of basis functions (the coefficients vary for different objects): one method is an extension of [5] which is based on an EM approach with a Gaussian prior on the coefficients of the deformation bases, named Sym-EM-PPCA; the other method, i.e. Sym-PriorFree, is an extension of [9, 10], which is a direct matrix factorization method without prior knowledge. For fair comparison, we use the same projection models and other assumptions used in [5] and [9, 10].

More specifically, our Sym-EM-PPCA method, following [5], assumes weak perspective projection (i.e. the orthographic projection plus scale). We group the keypoints into symmetric keypoint pairs. We assume that the 3D structure is also symmetric and consists of a mean shape (of that category) and a linear combination of the symmetric deformation bases. As in [5], we put a Gaussian prior on the coefficient of the deformation bases. This is intended partly to regularize the problem and partly to deal an apparent ambiguity in non-rigid structure from motion. But recent work [25] showed that this is a “gauge freedom” which does not affect the estimation of 3D structure, so the prior is not really needed.

Our Sym-PriorFree method is based on prior free non-rigid SfM algorithms [9, 10], which build on the insights in [25]. We formulate the problem of estimating 3D structure and camera parameters in terms of minimizing an energy function, which exploits symmetry, and at the same time can be re-expressed as the sum of two independent energy functions. Each of these energy functions can be minimized separately by matrix factorization, similar to the methods in [9, 10], and the ambiguities are resolved using orthonormality constraints on the viewpoint parameters. This extends work in a companion paper [26], which shows how symmetry can be used to improve rigid structure from motion methods [1].

Our main contributions are: (I) Sym-EM-PPCA, which imposes symmetric constraints on both 3D structure and deformation bases. Sym-Rigid-SfM (see our companion paper [26]) is used to initialize Sym-EM-PPCA with hard symmetric constraints on the 3D structure. (II) Sym-PriorFree, which extends the matrix factorization methods of [9, 10], to initialize a coordinate descent algorithm.

In this paper, we group keypoints into symmetric keypoint pairs, and use a superscript to denote symmetry, i.e. and are the 2D symmetric keypoint pairs. The paper is organized as follows: firstly, we review related works in Section 2. In Section 3, the ambiguities in non-rigid SfM are discussed. Then we present the Sym-EM-PPCA algorithm and Sym-PriorFree algorithm in Section 4. After that, following the experimental settings in [24], we evaluated our methods on the Pascal3D+ dataset [27] in Section 5. Section 5 also includes diagnostic results on the noisy 2D annotations to show that our methods are robust to imperfect symmetric annotations. Finally, we give our conclusions in Section 6.

2 Related Works

There is a long history of using symmetry as a cue for computer vision tasks. For example, symmetry has been used in depth recovery [17, 18, 20] as well as recognizing symmetric objects [19]. Several geometric clues, including symmetry, planarity, orthogonality and parallelism have been taken into account for 3D scene reconstruction [28, 29], in which the author used pre-computed camera rotation matrix by vanishing point [30]. Recently, symmetry has been applied in more areas such as 3D mesh reconstruction with occlusion [21], and scene reconstruction [16]. For 3D keypoints reconstruction, symmetry, incorporated with planarity and compactness prior, has also been studied in [22].

SfM has also been studied extensively in the past decades, ever since the seminal work on rigid SfM [31, 1]. Bregler et al. extended this to the non-rigid case [32]. A Column Space Fitting (CSF) method was proposed for rank- matrix factorization (MF) for SfM with smooth time-trajectories assumption [7], which was later unified in a more general MF framework [33]111However, the general framework in [33] cannot be used to SfM directly, because they did not constrain that all the keypoints have the same translation. . Early analysis of NRSfM showed that there were ambiguities in 3D structure reconstruction [4]. This lead to studies which assumed priors on the NR deformations [4, 5, 34, 35, 7, 6]. But it was then shown that these ambiguities did not affect the final estimate of 3D structure, i.e. all legitimate solutions lying in the same subspace (despite under-constrained) give the same solutions for the 3D structure [25]. This facilitated the invention of prior free matrix factorization method for NRSfM [9, 10]. Recently SfM methods have also been used for category-specific object reconstruction, e.g. estimating the shape of different cars under various viewing conditions [24, 23], but the symmetry cues was not exploited. Note that repetition patterns have recently been incorporated into SfM for urban facades reconstruction [36], but this mainly focused on repetition detection and registration. Finally, in a companion paper [26], we exploited symmetry for rigid SfM.

3 The Ambiguities in Non-rigid SfM

This section reviews the intrinsic ambiguities in non-rigid SfM, i.e. (i) the ambiguities between the camera projection and the 3D structure, and (ii) the ambiguities between the deformation bases and their coefficients [25]. In the following sections (i.e. in Remark 5), we will show the ambiguity between camera projection and 3D structure (i.e. originally the matrix ambiguity as discussed below) can be decomposed into two types of ambiguities under the symmetric constraints, i.e. a scale ambiguity along the symmetry axis, and a matrix ambiguity on the other two axes.

The key idea of non-rigid SfM is to represent the non-rigid deformations of objects in terms of a linear combination of bases:

(1)

where is the stacked 2D keypoints, is the camera projection for the images. is the 3D structure which is modeled by the linear combination of the stacked deformation bases , and is the coefficient.

Firstly, as is well known, there are ambiguities between the projection and the 3D structure in the matrix factorization, i.e. let

be an invertible matrix, then

and will not change the value of . These ambiguities can be solved by imposing orthogonality constraints on the camera parameters up to a fixed rotation, which is a “gauge freedom” [37] corresponding to a choice of coordinate system.

In addition, there are other ambiguities between the coefficients and the deformation bases [4]. Specifically, let be another invertible matrix, and let lie in the null space of the projected deformation bases , then and , or will not change the value of . This motivated Bregeler et al. to impose a Gaussian prior on the coefficient in order to eliminate the ambiguities. Recently, it was proved in [25] that these ambiguities are also “fake”, i.e. they do not affect the estimate of the 3D structure. This proof facilitated prior-free matrix factorization methods for non-rigid SfM [9, 10].

4 Symmetric Non-Rigid Structure from Motion

In this paper we extend non-rigid SfM methods by requiring that the 3D structure is symmetric. We assume the deformations are non-rigid and also symmetric222We assume symmetric deformations because our problem involves deformations from one symmetric object to another. But it also can be extended to non-symmetric deformations straightforwardly.. We propose two symmetric non-rigid SfM models. One is the extension of the iterative EM-PPCA model with a prior on the deformation coefficients [5], and the other extends the prior-free matrix factorization model [9, 10].

For simplicity of derivation, we focus on estimating the 3D structure and camera parameters. In practice, there are occluded keypoints in almost all images in the Pascal3D+ dataset. But we use standard ways to deal with them, such as initializing them ignoring symmetry by rank 3 recovery using the first 3 largest singular value, then treating them as missing data to be estimated by EM or coordinate descent algorithms. In our companion paper

[26], we gave details of these methods for the simpler case of rigid structure from motion.

Note that we use slightly different camera models for Sym-EM-PPCA (weak perspective projection) and Sym-PriorFree (orthographic projection). This is to stay consistent with the non-symmetric methods which we compare with, namely [5] and [9, 10]. Similarly, we treat translation differently by either centeralizing the data or treating it as a variable to be estimated, as appropriate. We will discuss this further when presenting the Sym-PriorFree method.

4.1 The Symmetric EM-PPCA Model

In EM-PPCA [5], Bregler et al. assume that the 3D structure is represented by a mean structure plus a non-rigid deformation. Suppose there are keypoints on the structure, the non-rigid model of EM-PPCA is:

(2)

where , and

are the stacked vectors of 2D keypoints, 3D mean structure and translations.

, in which is the scale parameter for weak perspective projection, is the grouped deformation bases, is the coefficient of the bases, and is the Gaussian noise .

Extending Eq. (2) to our symmetry problem in which there are keypoint pairs and , we have:

(3)

Assuming that the object is symmetric along the -axis, the relationship between and , and are:

(4)

where , is a matrix operator which negates the first row, and

is an identity matrix. Thus, we have

333We set hard constraints on and , i.e. replace by in Eq. (5), because it can be guaranteed by the Sym-RSfM initialization in our companion paper [26]. While the initialization on and by PCA cannot guarantee such a desirable property, thus a Language multiplier term is used for the constraint on and in Eq. (9).:

(5)

Following Bregler et al. [5], we introduce a prior on the coefficient variable

. This prior is a zero mean unit variance Gaussian. It is used for (partly) regularizing the inference task but also for dealing with the ambiguities between basis coefficients

and bases , as mentioned above (when [5] was published it was not realized that these are “gauge freedom”). This enables us to treat

as the hidden variable and use EM algorithm to estimate the structure and camera viewpoint parameters. The formulation of the problem, in terms of Gaussian distributions (or, more technically, the use of conjugate priors) means that both steps of the EM algorithm are straightforward to implement.

Remark 1

Our Sym-EM-PPCA method is a natural extension of the method in [5]

to maximize the marginal probability

with a Gaussian prior on and a Language multiplier term (i.e. a regularization term) on . This can be solved by general EM algorithm [38], where both the E and M

steps take simple forms because the underlying probability distributions are Gaussians (due to conjugate Gaussian prior).

Figure 1: The graphical model of the variables and parameters.

E-Step: This step is to get the statistics of from its posterior. Let the prior on be as in [5]. Then, we have , and , which do not provide the complete posterior distribution directly. Fortunately, the conditional dependence of the variables shown in Fig. 1 (graphical model) implies that the posterior of can be calculated by:

(6)

The last equation of Eq. (6) is obtained by the fact that the prior and the conditional distributions of are all Gaussians (conjugate prior). Then the first and second order statistics of can be obtained as:

(7)
(8)

where .

M-Step: This is to maximize the joint likelihood which is similar to the coordinate descent in Sym-RSfM (in a companion paper [26]) and that in Sym-PriorFree method in the later sections. The complete log-likelihood is:

(9)

The maximization of Eq. (9) is straightforward, i.e. taking the derivative of each unknown parameter in and equating it to 0. The update rule of each parameter is very similar to the original EM-PPCA [5] (except should be updated jointly), which we put in Appendix A2.

Initialization. and are initialized by the PCA on the residual of the 2D keypoints minus their rigid projections iteratively. Other variables (including the rigid projections) are initialized by Sym-RSfM [26]. Specifically, , and the occluded points can be initialized directly from Sym-RSfM, is initialized as 1, is initialized by .

4.2 The Symmetric Prior-Free Matrix Factorization Model

In the Prior-Free NRSfM [9, 10], Dai et al. also used the linear combination of several deformations bases to represent the non-rigid deformation. But, unlike EM-PPCA [5], Dai et al. estimated the non-rigid structure directly without using the mean structure and the prior on the coefficients. We make the same assumptions so that we can directly compare with them.

Assume that are the keypoints for image , then we have:

(10)

where , , and .

Without loss of generality, we assume that the symmetry is across the -axis: , where is a matrix operator negating the first row of . Then the first two terms in Eq. (10) give us the energy function (or the likelihood) to estimate the unknown and recover the missing data by coordinate descent on:

(11)

where and are the index sets of the visible and invisible keypoints, respectively. and are the 2D and 3D ’th keypoints of the ’th image. We treat the as missing/hidden variables to be estimated.

Remark 2

It is straightforward to minimize Eq. (11) by coordinate descent. The missing points can be initialized simply by rank 3 recovery (i.e. by the reconstruction using the first 3 largest singular value) ignoring the symmetry property and non-rigidity. But it is much harder to get good initializations for the and . In the following, we will describe how we get good initializations for each and exploiting symmetry after the missing points have been initialized.

Let is the stacked keypoints of images, , the model is represented by:

(12)

where are the stacked camera projection matrices, in which blkdiag denotes block diagonal. are the stacked 3D structures. , where are the stacked coefficients. Similar equations apply to .

Note that , are stacked differently than how they were stacked for the Sym-EM-PPCA method (i.e. , ). It is because now we have different ’s (i.e. ), while there is only one in the Sym-EM-PPCA method.

In the following, we assume the deformation bases are symmetric, which ensures that the non-rigid structures are symmetric (e.g. the deformation from sedan to truck is non-rigid and symmetric since sedan and truck are both symmetric). This yields an energy function:

(13)

where , and .

Remark 3

Note that we cannot use the first equation of Eq. (13) to solve directly (even if not exploiting symmetry), because and are of rank but estimating directly by SVD on and/or requires rank matrix factorization. Hence we focus on the last equation of Eq. (13) to get the initialization of firstly. Then, can be updated by coordinate descent on the first equation of Eq. (13) under orthogonality constraints on and low-rank constraint on .

Observe that the last equation of Eq. (13) cannot be optimized directly by SVD either, because they consist of two terms which are not independent. In other words, the matrix factorizations of and do not give consistent estimations of and . Instead, we now discuss how to estimate and by rotating the coordinate axes (to decouple the depended energy terms), performing matrix factorization, and using subspace intersection (to eliminate the ambiguities), which is an extension of the original prior-free method [9, 10] and our companion Sym-RSfM [26].

We first rotate coordinate systems (of ) to obtain decoupled equations:

(14)

where the two righthand sides of the equation depend on different components of . More specifically, by discarding the all 0 rows of the bases, , , , .

This yield two independent energies to be minimized separately by SVD:

(15)
Remark 4

We have formulated Sym-PriorFree as minimizing two energy terms in Eq. (15), which consists of independent variables. This implies that we can solve them by matrix factorization on each energy term separately, which gives solutions for and for the basis vectors up to an ambiguity . It will be discussed more explicitly in the following and we will show how to use orthogonality of the camera parameters to partially solve for .

Solving Eq. (15) by matrix factorization gives us solutions up to a matrix ambiguity . More precisely, there are ambiguity matrices between the true solutions and the initial estimation output by matrix factorization :

(16)

where and .

Now, the problem becomes to find . Note that we have orthonormality constraints on each camera projection matrix , which further impose constraints on . Thus, it can be used to partially estimate the ambiguity matrices . Since the factorized matrix, i.e. and , are the stacked 2D keypoints for all the images, thus and obtained from one image must satisfy the orthonormality constraints on other images, hence we use (i.e. from image ) for our derivation.

Let , where are the first columns of the first and second rows of , and are the last columns of the first and second rows of , respectively. Thus, Eq. (16) implies:

(17)
(18)

where are the ’th double-row of . is the first column of the camera projection matrix of the ’th image , and is the second and third columns of .

Let be the th column and double-column of , respectively. Then, from Eqs. (17) and (18), we get:

(19)

By merging the equations of Eq. (19) together, can be represented by:

(20)
Remark 5

Similar to the rigid symmetry case in [26], Eq. (20) indicates that there is no rotation ambiguities on the symmetric direction. The rotation ambiguities only exist in the -plane (i.e. the non-symmetric plane).

The orthonormality constraints can be imposed to estimate :

(21)

Thus, we have:

(22)
(23)
(24)
Remark 6

The main difference of the derivations from the orthonormality constraints between the rigid and non-rigid cases is that, for the rigid case, the dot product of each row of is equal to 1, while for non-rigid the dot product on each row of gives us a unknown value . But note that is the same for the both rows, i.e. Eqs. (22) and (23), corresponding to the same projection.

Eliminating the unknown value in Eqs. (22) and (23) (by subtraction) and rewriting in vectorized form gives:

(25)

Letting , yield the constraints:

(26)
Remark 7

As shown in Xiao et al. [4], the orthonormality constraints, i.e. Eq. (26), are not sufficient to solve for the ambiguity matrix . But Xiao et al. showed that the solution of lies in the null space of of dimensionality [4]. Akhter et al. [6] proved that this was a “gauge freedom” because all legitimate solutions lying in this subspace (despite under-constrained) gave the same solutions for the 3D structure. More technically, the ambiguity of corresponds only to a linear combination of ’s column-triplet and a rotation on [25]. This observation was exploited by Dai et al. in [9, 10], where they showed that, up to the ambiguities aforementioned, can be solved by the intersection of 3 subspaces as we will describe in the following.

Following the strategy in [9, 10], we have intersection of subspaces conditions:

(27)

The first subspace comes from Eq. (26), i.e. the solutions of the Eq. (26) lie in the the null space of of dimensionality [4]. The second subspace requires that and are positive semi-definite. The third subspace comes from the fact that is of rank 1 and is of rank 2.

Note that as stated in [9, 10], Eq. (27) imposes all the necessary constraints on . There is no difference in the recovered 3D structures using the different solutions that satisfy Eq. (27).

We can obtain a solution of , under the condition of Eq. (27), by standard semi-definite programming (SDP):

(28)

where indicates the trace norm.

Remark 8

After recovering and , we can estimate the camera parameters as follows. Note that it does not need to the whole ambiguity matrix [9, 10].

After has been solved, Eq. (20) (i.e. ) implies that the camera projection matrix can be obtained by normalizing the two rows of to have unit norm. Then, can be constructed by .

Remark 9

After estimated the camera parameters, we can solve for the 3D structure adopting the methods in [9, 10], i.e. by minimizing a low-rank constraint on rearranged (i.e. more compact) under the orthographic projection model.

Similar to [9, 10], the structure can be estimated by:

(29)

where , and is rearranged and more compact , i.e.

and are the row-permutation matrices of 0 and 1 that select to form , i.e. for .

Remark 10

After obtaining the initial estimates of (from matrix factorization as described above) and the occluded keypoints, we can minimize the full energy (likelihood) in Eq. (11) d by coordinate descent to obtain better estimates of and the occluded keypoints.

Energy Minimization After obtained initial , and missing points, Eq. (11) can be minimized by coordinate descent. The energy about , is:

(30)

Note that can be updated exactly as the same as its initialization in Eq. (29) by the low-rank constraint. While each of should be updated under the nonlinear orthonormality constraints similar to the idea in EM-PPCA [5]: we first parameterize to a full rotation matrix and update by its rotation increment. Please refer to Appendix A3.

The occluded points and with are updated by minimizing the full energy in Eq. (11) directly:

(31)

Similar to Sym-RSfM [26], after updating the occluded points, we also re-estimate the translation for each image by