Rigid Multiview Varieties

The multiview variety from computer vision is generalized to images by n cameras of points linked by a distance constraint. The resulting five-dimensional variety lives in a product of 2n projective planes. We determine defining polynomial equations, and we explore generalizations of this variety to scenarios of interest in applications.

Authors

• 8 publications
• 15 publications
• 31 publications
• 3 publications
04/15/2016

The Chow Form of the Essential Variety in Computer Vision

The Chow form of the essential variety in computer vision is calculated....
07/14/2011

A Hilbert Scheme in Computer Vision

Multiview geometry is the study of two-dimensional images of three-dimen...
11/18/2016

Minimal Problems for the Calibrated Trifocal Variety

We determine the algebraic degree of minimal problems for the calibrated...
08/21/2016

Congruences and Concurrent Lines in Multi-View Geometry

We present a new framework for multi-view geometry in computer vision. A...
08/17/2007

On Ullman's theorem in computer vision

Both in the plane and in space, we invert the nonlinear Ullman transform...
03/21/2016

Nearest Points on Toric Varieties

We determine the Euclidean distance degree of a projective toric variety...

Code Repositories

Rigid-Multiview-Variety

a bundle of functions to check the statments in of the rigid multiview variety paper

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The emerging field of Algebraic Vision is concerned with interactions between computer vision and algebraic geometry. A central role in this endeavor is played by projective varieties that arise in multiview geometry [5].

The set-up is as follows: A camera is a linear map from the three-dimensional projective space to the projective plane , both over . We represent cameras by matrices of rank . The kernel of is the focal point . Each image point of camera has a line through as its fiber in . This is the back-projected line.

We assume throughout that the focal points of the cameras are in general position, i.e. all distinct, no three on a line, and no four on a plane. Let denote the line in spanned by the focal points and . This is the baseline of the camera pair . The image of the focal point in the image plane of the camera is the epipole . Note that the baseline is the back-projected line of with respect to and also the back-projected line of with respect to . See Figure 1 for a sketch.

Fix a point in which is not on the baseline , and let and be the images of under and . Since is not on the baseline, neither image point is the epipole for the other camera. The two back-projected lines of and meet in a unique point, which is . This process of reconstructing from two images and is called triangulation [5, §9.1].

The triangulation procedure amounts to solving the linear equations

 (1)

For general data we have , where is obtained from by deleting the th row. Cramer’s Rule can be used to recover . Let

be the column vector formed by the signed maximal minors of

. Write for the first four coordinates of . These are bilinear functions of and . They yield

 (2) X=˜∧5Bjk1=˜∧5Bjk2=⋯=˜∧5Bjk6.

We note that, in most practical applications, the data will be noisy, in which case triangulation requires techniques from optimization [1].

The multiview variety of the camera configuration was defined in [3] as the closure of the image of the rational map

 (3) ϕA:P3⇢P2×P2×⋯×P2,X↦(A1X,A2X,…,AnX).

The points are the consistent views in cameras. The prime ideal of was determined in [3, Corollary 2.7]. It is generated by the bilinear polynomials plus further trilinear polynomials. See [8] for the natural generalization of this variety to higher dimensions.

The analysis in [3] was restricted to a single world point . In this paper we study the case of two world points that are linked by a distance constraint. Consider the hypersurface in defined by

 (4) Q=(X0Y3−Y0X3)2+(X1Y3−Y1X3)2+(X2Y3−Y2X3)2−X23Y23.

The affine variety in consists of pairs of points whose Euclidean distance is . The rigid multiview map is the rational map

 (5) ψA:V(Q)↪P3×P3⇢(P2)n×(P2)n,(X,Y)↦((A1X,…AnX),(A1Y,…AnY)).

The rigid multiview variety is the image of this map. This is a -dimensional subvariety of . Its multihomogeneous prime ideal lives in the polynomial ring , where and are coordinates for the th factor on the left respectively right in . Our aim is to determine the ideal . Knowing generators of has the potential of being useful for designing optimization tools as in [1] for triangulation in the presence of distance constraints.

The choice of world and image coordinates for the camera configuration gives our problem the following group symmetries. Let be an element of the Euclidean group of motions , which is generated by rotations and translations. We may multiply the camera configuration on the right by to obtain . Then since is invariant under . For , we may multiply on the left to obtain . Then .

This paper is organized as follows. In Section 2 we present the explicit computation of the rigid multiview ideal for . Our main result, to be stated and proved in Section 3, is a system of equations that cuts out the rigid multiview variety for any . Section 4 is devoted to generalizations. The general idea is to replace by arbitrary subvarieties of that represent polynomial constraints on world points. We focus on scenarios that are of interest in applications to computer vision.

Our results in Propositions 1, 3, 4 and Corollary 2 are proved by computations with Macaulay2 [4]; for details see Appendix A. Following standard practice in computational algebraic geometry, we carry out the computation on many samples in a Zariski dense set of parameters, and then conclude that it holds generically.

2. Two, Three and Four Cameras

In this section we offer a detailed case study of the rigid multiview variety when the number of cameras is small. We begin with the case . The prime ideal lives in the polynomial ring in variables. This is the homogeneous coordinate ring of , so it is naturally -graded. The variables have degree , the variables have degree , the variables have degree , and the variables have degree . Our ideal is -homogeneous.

Throughout this section we shall assume that the camera configuration is generic in the sense of algebraic geometry. This means that lies in the complement of a certain (unknown) proper algebraic subvariety in the affine space of all -tuples of -matrices. All our results in Section 2 were obtained by symbolic computations with sufficiently many random choices of (see Appendix A

for details). Such choices of camera matrices are generic. They will be attained with with probability

.

Proposition 1.

For , the rigid multiview ideal is minimally generated by eleven -homogeneous polynomials in twelve variables, one of degree , one of degree , and nine of degree .

We prove this result by sufficiently many random computations with Macaulay2. A slightly simplified version of the code is shown in Listing 1 in Appendix A.

Let us look at the result in more detail. The first two bilinear generators are the familiar -determinants

 (6) det[A1u10A20u2]anddet[A1v10A20v2].

These cut out two copies of the multiview threefold , in separate variables, for and . If we write the two bilinear forms in (6) as and then is a real -matrix of rank , known as the fundamental matrix [5, §9] of the camera pair .

The rigid multiview variety is a divisor in . The nine octics that cut out this divisor can be understood as follows. We write and for the -matrices in (6), and and for the matrices obtained by deleting their th rows. The kernels of these -matrices are represented, via Cramer’s Rule, by and . We write and for the vectors given by their first four entries. As in (2), these represent the two world points and in . Their coordinates are bilinear forms in or , where each coefficient is a -minor of . For instance, writing for the entry of , the first coordinate of  is

 −(a321a232a342−a321a242a332−a331a222a342+a331a242a322+a341a222a332−a341a232a322)u11u20+(a321a132a342−a321a142a332−a331a122a342+a331a142a322+a341a122a332−a341a132a322)u11u21−(a321a132a242−a321a142a232−a331a122a242+a331a142a222+a341a122a232−a341a132a222)u11u22+(a221a232a342−a221a242a332−a231a222a342+a231a242a322+a241a222a332−a241a232a322)u12u20−(a221a132a342−a221a142a332−a231a122a342+a231a142a322+a241a122a332−a241a132a322)u12u21+(a221a132a242−a221a142a232−a231a122a242+a231a142a222+a241a122a232−a241a132a222)u12u22.

Recall that the two world points in are linked by a distance constraint (4), expressed as a biquadratic polynomial . We set , where is a quadrilinear form. We regard

as a tensor of order

. It lives in the subspace of . Here denotes the space of symmetric tensors of order .

We now substitute our Cramer’s Rule formulas for and into the quadrilinear form . For any choice of indices and ,

 (7) T(˜∧5Bi,˜∧5Bj,˜∧5Ck,˜∧5Cl)

is a multihomogeneous polynomial in of degree . This polynomial lies in but not in the ideal of , so it can serve as one of the nine minimal generators described in Proposition 1.

The number of distinct polynomials appearing in (7) equals . A computation verifies that these polynomials span a real vector space of dimension . The image of that vector space modulo the degree component of the ideal has dimension .

We record three more features of the rigid multiview with cameras. The first is the multidegree [9, §8.5], or, equivalently, the cohomology class of in . It equals

 2u21v1+2u1u2v1+2u22v1+2u21v2+2u1u2v2+2u22v2+2u1v21+2u1v1v2+2u1v22+2u2v21+2u2v1v2+2u2v22.

This is found with the built-in command multidegree in Macaulay2.

The second is the table of the Betti numbers of the minimal free resolution of  in the format of Macaulay2 [4]. In that format, the columns correspond to the syzygy modules, while rows denote the degrees. For we obtain

0  1  2  3 4 5
total: 1 11 25 22 8 1
0: 1  .  .  . . .
1: .  2  .  . . .
2: .  .  1  . . .
7: .  9 24 22 8 1

The column labeled 1 lists the minimal generators from Proposition 1. Since the codimension of is , the table shows that is not Cohen-Macaulay. The unique th syzygy has degree in the -grading.

The third point is an explicit choice for the nine generators of degree in Proposition 1. Namely, we take and in (7). The following corollary is also found by computation:

Corollary 2.

The rigid multiview ideal for is generated by together with the nine polynomials for .

We next come to the case of three cameras:

Proposition 3.

For , the rigid multiview ideal is minimally generated by polynomials in variables. Its Betti table is given in Table 1.

Proposition 3 is proved by computation. The generators occur in eight symmetry classes of multidegrees. Their numbers in these classes are

 (110000):1(220111):3(220220):9(211211):1(111000):1(211111):1(220211):3(111111):1

For instance, there are nine generators in degree , arising from Proposition 1 for the first two cameras. Using various pairs among the three cameras when forming the matrices and in (7), we can construct the generators of degree classes and .

Table 1 shows the Betti table for in Macaulay2 format. The first two entries (6 and 2) in the 1-column refer to the eight minimal generators of . These are six bilinear forms, representing the three fundamental matrices, and two trilinear forms, representing the trifocal tensor of the three cameras (cf. [2], [5, §15]). The entry 1 in row 5 of column 1 marks the unique sextic generator of , which has -degree .

For the case of four cameras we obtain the following result.

Proposition 4.

For , the rigid multiview ideal is minimally generated by polynomials in variables. All of them are induced from . Up to symmetry, the degrees of the generators in the -grading are

 (11000000):1(22001110):3(22002200):9(21102110):1(11100000):1(21101110):1(22002110):3(11101110):1

We next give a brief explanation of how the rigid multiview ideals were computed with Macaulay2 [4]. For the purpose of efficiency, we introduce projective coordinates for the image points and affine coordinates for the world points. We work in the corresponding polynomial ring

 Q[u,v][X0,X1,X2,Y0,Y1,Y2].

The rigid multiview map is thus restricted to . The prime ideal of its graph is generated by the following two classes of polynomials:

1. the minors of the matrices

2. the dehomogenized distance constraint

 Q((X0,X1,X2,1)⊤,(Y0,Y1,Y2,1)⊤).

From this ideal we eliminate the six world coordinates .

For a speed up, we exploit the group actions described in Section 1. We replace and by and . Here and are chosen so that is sparse. The modification to is needed since we generally use . The elimination above now computes the ideal , and it terminates much faster. For example, for , the computation took two minutes for sparse and more than one hour for non-sparse . For , Macaulay2 ran out of memory after 18 hours of CPU time for non-sparse . The complete code used in this paper can be accessed via http://www3.math.tu-berlin.de/combi/dmg/data/rigidMulti/.

One last question is whether the Gröbner basis property in [3, §2] extends to the rigid case. This does not seem to be the case in general. Only in Proposition 1 can we choose minimal generators that form a Gröbner basis.

Remark 5.

Let . The reduced Gröbner basis of in the reverse lexicographic term order is a minimal generating set. For a generic choice of cameras the initial ideal equals

For special cameras the exact form of the initial ideal may change. However, up to symmetry the degrees of the generators in the -grading stay the same. In general, a universal Gröbner basis for the rigid multiview ideal consists of octics of degree plus the two quadrics (6). This was verified using the Gfan [6] package in Macaulay2. Analogous statements do not hold for .

3. Equations for the Rigid Multiview Variety

The computations presented in Section 2 suggest the following conjecture.

Conjecture 6.

The rigid multiview ideal is minimally generated by polynomials. These polynomials come from two triples of cameras, and their number per class of degrees is

 (110..000..):1⋅2(n2)(220..111..):3⋅2(n2)(n3)(220..220..):9⋅(n2)2(211..211..):1⋅n2(n−1)22(111..000..):1⋅2(n3)(211..111..):1⋅2n(n−12)(n3)(220..211..):3⋅2n(n2)(n−12)(111..111..):1⋅(n3)2

At the moment we have a computational proof only up to

. Table 2 offers a summary of the corresponding numbers of generators.

Conjecture 6 implies that is set-theoretically defined by the equations coming from triples of cameras. It turns out that, for the set-theoretic description, pairs of cameras suffice. The following is our main result:

Theorem 7.

Suppose that the focal points of are in general position in . The rigid multiview variety is cut out as a subset of by the octic generators of degree class . In other words, equations coming from any two pairs of cameras suffice set-theoretically.

With notation as in the introduction, the relevant octic polynomials are

 T(˜∧5Bj1k1i1,˜∧5Bj1k1i2,˜∧5Cj2k2i3,˜∧5Cj2k2i4),

for all possible choices of indices. Let denote the ideal generated by these polynomials in , the polynomial ring in variables. As before, we write for the prime ideal that defines the -dimensional variety in . It is generated by bilinear forms and trilinear forms, corresponding to fundamental matrices and trifocal tensors. In light of Hilbert’s Nullstellensatz, Theorem 7 states that the radical of is equal to . To prove this, we need a lemma.

A point in the multiview variety is triangulable if there exists a pair of indices such that the matrix has rank . Equivalently, there exists a pair of cameras for which the unique world point can be found by triangulation. Algebraically, this means for some .

Lemma 8.

All points in are triangulable except for the pair of epipoles, , in the case where . Here, the rigid multiview variety contains the threefolds and .

Proof.

Let us first consider the case of cameras. The first claim holds because the back-projected lines of the two camera images and always span a plane in except when and . In that case both back-projected lines agree with the common baseline . Alternatively, we can check algebraically that the variety defined by the -minors of the matrix consists of the single point .

For the second claim, fix a generic point in and consider the surface

 (8) XQ={Y∈P3:Q(X,Y)=0}.

Working over , the baseline is either tangent to , or it meets that quadric in exactly two points. Our assumption on the genericity of implies that no point in the intersection is a focal point. This gives

 (9) (A1X,A2X,A1YX,A2YX)=(A1X,A2X,e1←2,e2←1).

The point lies in the multiview variety . Each generic point in has this form for some . Hence (9) proves the desired inclusion . The other inclusion follows by switching the roles of and .

If there are more than two cameras then for each world point , due to general position of the cameras, there is a pair of cameras such that avoids the pair’s baseline. This shows that each point is triangulable if . ∎

Proof of Theorem 7.

It follows immediately from the definition of the ideals in question that the following inclusion of varieties holds in :

 V(JA)⊆V(IA(u)+IA(v)+HA).

We prove the reverse inclusion. Let be a point in the right hand side.

Suppose that and are both triangulable. Then has a unique preimage in , determined by a single camera pair . Likewise, has a unique preimage in , also determined by a single camera pair . There exist indices such that

 X=˜∧5Bj1k1i1andY=˜∧5Cj2k2i2.

Suppose that is not in . Then . This implies

 Q(X,Y)=T(X,X,Y,Y)=T(˜∧5Bj1k1i1,˜∧5Bj1k1i1,˜∧5Cj2k2i2,˜∧5Cj2k2i2)≠0,

and hence . This is a contradiction to our choice of .

It remains to consider the case where is not triangulable. By Lemma 8, we have , as well as and . The case where is not triangulable is symmetric, and this proves the theorem. ∎

The equations in Theorem 7 are fairly robust, in the sense that they work as well for many special position scenarios. However, when the cameras are generic then the number of octics that cut out the divisor inside can be reduced dramatically, namely to .

Corollary 9.

As a subset of the -dimensional ambient space , the -dimensional rigid multiview variety is cut out by polynomials of degree class . One choice of such polynomials is given by

 Q(˜∧5B12i,˜∧5C12k),Q(˜∧5B12i,˜∧5C13k)Q(˜∧5B13i,˜∧5C12k),Q(˜∧5B13i,˜∧5C13k)for all 1≤i,k≤2.
Proof.

First we claim that for each triangulable point at least one of the matrices or has rank , and the same for with or . We prove this by contradiction. By symmetry between and , we can assume that . Then , , and . However, this last equality of the two epipoles is a contradiction to the hypothesis that the focal points of the cameras are not collinear.

Next we claim that if has rank then at least one of the submatrices or has rank , and the same for , and . Note that the bottom submatrix of has rank , since the first four columns are linearly independent, by genericity of and . The claim follows. ∎

4. Other Constraints, More Points, and No Labels

In this section we discuss several extensions of our results. A first observation is that there was nothing special about the constraint in (4). For instance, fix positive integers and , and let be any irreducible polynomial that is bihomogeneous of degree . Its variety is a hypersurface of degree in . The following analogue to Theorem 7 holds, if we define the map as in (5).

Theorem 10.

The closure of the image of the map is cut out in by polynomials of degree class . In other words, the equations coming from any two pairs of cameras suffice set-theoretically.

Proof.

The tensor that represents now lives in . The polynomial vanishes on the image of and has degree . The proof of Theorem 7 remains valid. The surface in (8) is irreducible of degree in . These polynomials cut out that image inside . ∎

Remark 11.

In the generic case, we can replace by 16, as in Corollary 9.

Another natural generalization is to consider world points that are linked by one or several constraints in . Taking images with cameras, we obtain a variety which lives in . For instance, if and are constrained to lie on a plane in , then and is a variety of dimension in . Taking -matrices as in (1) for the four points, we then form

 (10) det(˜∧5Bi,˜∧5Cj,˜∧5Dk,˜∧5El)for% all 1≤i,j,k,l≤6.

For we verified with Macaulay2 that the prime ideal is generated by of these determinants, along with the four bilinear forms for .

Proposition 12.

The variety is cut out in by the polynomials from (10). In other words, the equations coming from any two pairs of cameras suffice set-theoretically.

Proof.

Each polynomial (10) is in . The proof of Theorem 7 remains valid. The planes intersect the baseline in one point each. ∎

To continue the theme of rigidity, we may impose distance constraints on pairs of points. Fixing a nonzero distance between points and gives

 Qij=(Xi0Xj3−Xj0Xi3)2+(Xi1Xj3−Xj1Xi3)2+(Xi2Xj3−Xj2Xi3)2−d2ijX2i3X2j3.

We are interested in the image of the variety