The reconstruction of three dimensional depth information given a set of two dimensional input images is a classic problem in computer vision. The class of methods fulfilling this task by inferring local shape from brightness analysis is called photometric methods BKP1986 ; Woehler2013 . They usually employ a static view point and variations in illumination to obtain the 3D structure. Fundamental photometric reconstruction processes are SFS and PS BKP1986 . SFS typically requires a single input image, whereas PS makes use of several input images taken from a fixed view point under different illumination. PS incorporates SFS in the sense that SFS equations applied to each of the input images are integrated into a common PS process in order to obtain the 3D shape. This integrated model is usually formulated as an optimisation task that best explains the input images in terms of a pointwise estimation of shape and appearance.
The pioneer of the PS method was Woodham in 1978 Woodham1978a , see also Horn et al. HWS78 . The mathematical formulation of the PS problem is based on the use of the IIE as in SFS for the individual input images, respectively. The IIE constitutes a relation between the image intensity and the reflectance map. The classic proceeding is thereby to consider Lambert’s law Lambert1760 for modelling the appearance of a shape given information on its geometry and albedo as well as the lighting in a scene. It has been shown that the orientation of a Lambertian surface can be uniquely determined from the resulting appearance variations provided that the surface is illuminated by at least three known, non-coplanar light sources, corresponding to at least three input images Woodham1980 . However, let us also mention the classic work of Kozera Kozera91 as well as Onn and Bruckstein OB90 where refined existence and uniqueness results are presented for the two-image case. As a beneficial aspect beyond the possible estimation of 3D shape, PS enables to compute an albedo map allowing to deal with non-uniform object materials or textured objects in a photographed scene.
As to complete our brief review of some general aspects of PS, let us note that it is possible to extend Woodham’s classic PS model as for instance to non-Lambertian reflectance as e.g. in Bartal2017 ; Ikehata2014a ; KhBoBr17 ; Mecca2016 ; TMDD2016 , or to take into account several types of lighting in a scene, see e.g. BJK07 ; QuDuWuCrLaDu17
. One may also consider a PS approach based on solving partial differential equations (PDEs) corresponding to ratios of the underlying IIEs, see for instanceMeRoCr15 ; TMDD2016 . The latter approach makes it possible to compute the 3D shape directly whereas in most methods following the classic PS setting a field of surface normals is computed which needs to be integrated in another step; see e.g. BBQBD17 for a recent discussion of integration techniques.
Let us turn to the formulation of the PS approach we make use of. At this stage we keep the presentation rather general as we elaborate on the details that are of some importance in the context of applying our optimisation approach in Section 2. However, in order to explain the developments documented in this paper, it is useful to provide some formulae here.
Photometric 3D reconstruction is often formulated as an inverse problem: given an image , the aim is to compute a depth map that best explains the observed grey levels of the data. To this end we use the IIE where represent the coordinates over the reconstruction domain and where denotes the reflectance map BKP1986 . This model describes interactions between the surface and the lighting
. The vectorrepresents reflectance parameters as e.g. the albedo, which can be either known or considered as hidden unknown parameters. For the sake of simplicity, we will consider in this paper only Lambertian reflectance without shadows, and we assume that the lighting of a photographed scene is directional and known. Moreover our camera is assumed to perform an orthographic projection. As in PS several input images , are considered under varying lighting , , the PS problem consists in finding a depth map that best explains all IIE simultaneously:
Our contribution. Our aim is to obtain the solution of the PS problem, see also Figure 1 for an account. We show that estimating the optimal solution necessarily involves non-trivial optimisation methods, even with the simplest models for the reflectance function and the most simple deviations from the model assumptions that may occur, i.e. we consider Lambertian reflectance without shadows and additive, zero-mean Gaussian noise.
To achieve our goal we propose a numerical framework to approximate an optimal solution which can be used to refine classic PS results. Our approach relies on matrix differential theory for analytic derivations and on recent developments in non-convex optimisation. In that novel framework for this class of problems we prove here the convergence of the optimisation method. The theoretical results are supplemented by a thorough numerical investigation that highlights some important observations on the optimisation routine.
The basic procedure of this work has been the subject of our conference paper HoQuBrRa16 , the results of which are mainly contained in the second, third and beginning of the fourth section of this article. Our current paper extends that previous work significantly by providing the mathematical validation of convergence and the extended analysis of the numerical optimisation algorithm. These are also exactly the core contributions of this paper. Moreover, we give a much more detailed description of the matrix calculus framework we employ.
Example input image for PS
Classic PS with integration
2 Construction of our method and more related work
As shown by Woodham Woodham1980 , all surface normals can be estimated in the classic PS model without ambiguity, provided input images and non-coplanar calibrated lighting vectors are given. In addition, the reflectance parameters (e.g. the albedo) can also be estimated. This is usually achieved by minimising the difference between the given data, i.e. the input images and the reprojection according to the estimated normal and albedo:
with a penaliser . As a result, one obtains an approximation of the normal and the albedo at each position .
Since there is no coupling between the normals estimated in two neighboring pixels, those estimates are the optimal local explanations of the image, in the sense of the estimator . Yet, the estimated normal field is in general not integrable. Thus, the depth map that can be obtained by integration is not an optimal image explanation, but only a smooth explanation of the noisy normal field, c.f. Figure 1.
Instead of this pointwise joint estimation of the normal and the albedo, it is, as already mentioned in the introduction, possible to employ photometric ratios. Following that procedure means to divide the -th by the -th IIE in (1). This way, one obtains a homogeneous linear system in each normal vector that does not depend on the albedo, see MeRoCr15
. However, these ratios introduce additional difficulties in the models. It is common to assume that image data is corrupted by additive, zero-mean, Gaussian noise. In that case the ML function should be chosen to be quadratic. Unfortunately, the ratio of two Gaussian random variables follows a Cauchy distributionHinkley1969 . Thus, additional care has to be taken to find the most efficient penaliser. Another frequent assumption is that the estimated normal fields should be integrable, yet, this is a rather restrictive assumption. The normal field computed by many aforementioned PS approaches does not necessarily need to be integrable. Hence, the integration task is usually formulated as another optimisation problem which aims at minimising the discrepancy between the estimated normal field and that of the recovered surface. Following that approach we now go into some more details.
Assuming orthographic camera projection, the relation between the normal and the depth is given by:
where is the gradient of . Then, the best smooth surface explaining the computed normals can be estimated in several ways BBQBD17 , for instance by solving the variational problem:
One may realise that, at this stage of the process chain of PS with integration, the images are not explicitly considered anymore. Thus, the final surface is in general not necessarily optimal in the sense of the reprojection criterion. Regularising the normal field before integration Reddy2009 ; Zeisl2014 may also ensure integrability, but since such methods only use the normal field, and not the images, they may be unable to assert optimality with respect to the reprojection.
Global PS approaches solve the latter problem as they represent a way to ensure that the recovered surface is optimal with respect to the reprojection criterion. Moreover, it is possible to solve the system (1) directly in terms of the depth Clark1992 : this ensures both that the recovered surface is regular, and that it is optimal with respect to the reprojection criterion, calculated from the depth map and not from a non-integrable estimate of its gradient. Some PDE-based PS approaches have been recently proposed, and were shown to ease the resolution in particularly difficult situations such as pointwise lighting QuDuWuCrLaDu17 and specular reflectance TMDD2016 . To ensure robustness, such methods can be coupled with variational methods. In other words, the criterion which should be considered for ensuring optimality of a surface reconstruction by PS is not the local criterion (2), but rather:
A theoretical analysis of the choice , can be found in Chabrowski1993 . Numerical resolution methods based on proximal splittings were more recently introduced in SSVM2015a . Yet, this last work relies on an “optimise then discretise” approach which would involve non-trivial oblique BC, replaced there for simplicity reasons by Dirichlet BC. Obviously, this represents a strong limitation which prevents working with many real-world data where this oblique BC is rarely available.
The optimisation problem (5) is usually non-linear and non-convex. The ratio procedure described earlier can be used: it simultaneously eliminates the albedo and the non-linear terms, c.f. Smith2016 ; Mecca2016 ; TMDD2016 ; GSSM2015 and obviously removes the bias due to non-integrability. But let us recall that it is only the best linear unbiased estimate, and also not the optimal one. To guarantee optimality, it is necessary to minimise the nonlinear, non-convex energy, i.e. without employing ratios. Other methods QuDuWuCrLaDu17 ; Queau2017 overcome the nonlinearity by absorbing it in the auxiliary albedo variable. Again, the solution is not that of the original problem (5) which remains, to the best of our knowledge, unsolved.
Solving (5) is a challenging problem. Efficient strategies to find the sought minimum are scarce. Recently Ochs et al. OCBP2013 proposed a novel method to handle such non-convex optimisation problems, called iPiano. A major asset of the approach is the extensive convergence theory provided in OCBP2013 ; P2016 . Because of this solid mathematical foundation we explore the iPiano approach in this work. The scheme makes explicit use of the derivative of the cost function, which in our case involves derivatives of matrix-valued functions, and we will employ as a technical novelty, matrix differential theory MN1985 ; MN2007 to derive the resulting scheme.
3 Non-convex discrete variational model for PS
In this section we describe the details of our framework for estimating both the depth and the (Lambertian) reflectance parameters over the domain .
3.1 Assumptions on the PS model
We assume grey level images , , are available, along with the lighting vectors , assumed to be known and non-coplanar. We also assume Lambertian reflectance and neglect shadows, which leads to the following well-known model:
where , and is the albedo at the surface point conjugated to position , considered as a hidden unknown parameter. Let us note that real-world PS images can be processed by low-rank factorisation techniques in order to match the linear reflectance model (6), c.f. Wu2010 .
We further assume orthographic projection, hence the normal is given by (3). Then the reflectance model becomes a function of the depth map :
with , for all . Eventually, we assume that the images differ from this reflectance model only up to additive, zero-mean, Gaussian noise. The ML estimator is thus the least-squares estimator , and the cost function in the reprojection criterion (5) becomes:
3.2 Tikhonov regularisation of the model
Our energy in (8) only depends on the gradient and not on the depth itself. As a consequence, solutions can only be determined up to an arbitrary constant. As a remedy we follow Mecca2016 and introduce a reference depth , thus regularising our initial model with a zero-th order Tikhonov regulariser controlled by a parameter :
In practice, can be set to any small value, so that a solution of (9) lies as close as possible to a minimiser of (8). In all our experiments we set and as the classic PS solution followed by least-squares integration BBQBD17 .
As already mentioned, “optimise then discretise” approaches for solving (9), such as SSVM2015a , involve non-trivial BC. Hence, we prefer a “discretise then optimise”, finite dimensional formulation of the variational PS problem (9).
In our discrete setting we are given images , , with pixels labelled with a single index running from 1 to . We discretise (9) in the following way:
where is the vector of intensities at pixel , represents now a finite difference approximation of the gradient of at pixel , and is a matrix containing the stacked lighting vectors .
We remark that the matrix can be decomposed into two sub-matrices and of dimensions and such that , and so that
Let us also introduce a block matrix , such that each block is a matrix containing the finite difference coefficients used for approximating the gradient:
We further introduce the aliases
and stack them, respectively, in a block-diagonal matrix
and a vector
Using these notational conventions as well as
the task in (10) can be rewritten compactly as
which is the discrete PS model we propose to tackle in this paper. Observe that, if and were constant, problem (19) would be a linear least squares problem with respect to .
Let us remark that (19) can be easily extended to include more realistic reflectance Ju2013 ; KhBoBr17 and lighting BMVC.28.128 ; QuDuWuCrLaDu17 models, as well as more robust estimators Ikehata2014a ; Queau2017 : this only requires to change the definition of , which stands for the global reprojection error .
3.4 Alternating optimisation strategy
In order to ensure applicability of our method to real-world data, the albedo cannot be assumed to be known. Inspired by the well-known Expectation-Maximisation algorithm, we treat as a hidden parameter, and opt for an alternating strategy which iteratively refines the depth with fixed albedo, and the hidden parameter with fixed depth:
starting from and taking as the albedo obtained by the classic PS approach Woodham1980 . Of course, the choice of a particular prior has a direct influence on the convergence of the algorithm. The proposed scheme globally converges towards the solution, even with a trivial prior , but convergence is very slow in this case. Thus the proposed method should be considered as a post-processing technique to refine classic PS approaches, rather than as a standalone PS method.
Now, let us comment on the two optimisation problems in (20). Updating amounts pointwise to a linear least-squares problem, which admits the following closed-form solution at each pixel:
The computation of is considerably harder, and it is dealt with in the following paragraphs.
4 An inertial proximal point algorithm for PS
In this section we discuss the numerical solution strategy of our problem (20). We especially discuss the main difficulty within this strategy, that is to compute the gradient with respect to for the function in (17). Apart from the explicit formula for this gradient we also investigate an approximation leading to efficient computations on a desktop computer with an intel CPU.
4.1 The iPiano algorithm
We will now make precise the iPiano algorithm OCBP2013 for our problem (20). Since the albedo is fixed for the purpose of the corresponding optimisation stage, we denote . The iPiano algorithm seeks a minimiser of
where is convex and is smooth. What makes iPiano appealing is the fact that must not necessarily be smooth and is not required to be convex. This allows manifold designs of novel fixed-point schemes. In its general form it evaluates
where the proximal operator is given by
and which goes back to Moreau M1965 . Before we can define the final algorithm we also need to determine the gradient of .
4.2 Matrix Calculus
We will first recall some general rules to derive the Jacobian of a matrix valued function, before we apply these rules to our setting in the next section.
In our setting the main difficulty is that the matrix depends on our sought unknown . In order to state a useful representation of arising differential expressions we have to resort to matrix differential calculus. We refer to MN1985 ; P1985 ; MN2007 ; PP2008 for a more in-depth representation. A key notion is the definition of the Jacobian of a matrix, which can be obtained in several ways. In this paper we follow the one given in MN1985 .
Definition 1 (Jacobian of a Matrix Valued Function)
Let be a differentiable real matrix function of an matrix of real variables, i.e. . The Jacobian matrix of at is the matrix
where corresponds to the vectorisation operator described in HJ1994 (Definition 4.29). This operator stacks column-wise all the entries from its matrix argument to form a large vector.
Here, differentiability of a matrix valued function means that the corresponding vectorised function is differentiable in the usual sense. By this definition the computation of a matrix Jacobian can be reduced to computing a Jacobian for a vector valued function.
Let be a differentiable diagonal matrix
then the Jacobian matrix of at has the form
where denotes a block of zeros.
The following two lemmas state extensions of the product and chain-rule to matrix valued settings. They provide us closed form representations that will be useful for the forthcoming findings. These results have been extracted fromMN1985 (Theorem 7 and 9 respectively). Since these lemmas have been copied verbatim, we refer to their source for the detailed proofs.
Lemma 1 (Chain Rule)
Let be a subset of and assume that is differentiable at an interior point of . Let be a subset of such that for all , and assume that is differentiable at an interior point of . Then the composite function defined by is differentiable at and
Definition 2 (Kronecker Product)
Let be a matrix and be a matrix then the Kronecker Product is defined as
For a row vector
and the identity matrixwe have
Lemma 2 (Product Rule)
Let and be two matrix functions defined and differentiable on an open set . Then the matrix product is differentiable on and the Jacobian matrix is given by
Here, represents the identity matrix in .
Let be a differentiable -matrix and be a -matrix, then by Lemma 2 we have
4.3 Gradient computation
The following two corollaries are a direct consequence from the foregoing statements. It suffices to plug in the corresponding quantities. We also remind, that our choice of the matrix derivative allows us to interpret vectors as matrices having a single column only.
Let be a matrix depending on and a matrix which does not depend on , then the Jacobian of the matrix-vector product is given by
We apply the product rule on the product between and and subsequently on the product . In a first step this yields
Since the result follows immediately. ∎
where denotes the gradient with respect to .
Since is given by we conclude from the chain- and product-rule that
Since the gradient is simply the transposed version of the Jacobian, we obtain
from which the statement follows immediately. ∎
Let us now come to our main result.
Let be given with sufficiently smooth data and . Then we have for the gradient of the following closed form expression:
From the relationship between the canonical scalar product in and the Euclidean norm we deduce that
Applying the gradient at each term separately and using the results from the previous corollaries, we obtain
which can be simplified to
The result follows now from the linearity of the Jacobian. ∎
4.4 Approximation of the gradient of
Our numerical scheme depends on a gradient descent step of from (17) (resp. (39)) with respect to . However, the evaluation of is computationally expensive. It contains several matrix-matrix multiplications as well as the evaluation of a matrix Jacobian and a Kronecker product. These computations need to be done in every iteration. As we will see in Lemma 5, the evaluation of can be done in a way, so that the main effort lies in computing dyadics of vectors and .
In order to improve the performance of our numerical approach we further seek an approximation to
that requires significantly less floating point operations. To this end, we assume for a moment that neither our matrix, nor our vector depend on the unknown . In that case we obtain
Our conclusions from (44) are twofold. First of all, seems to be a good candidate for a descent direction. At least when our data and does not depend on , then is an optimal and significantly easier to evaluate descent direction. Secondly, we can exploit (44) to derive a refined version of the iPiano algorithm for our task at hand. If we apply a lagged iteration on the descent step of , then our matrix and our vector become automatically independent of our current iterate and would be the steepest descent direction. The fact that would not have to be recomputed in every iteration could outweigh the loss of accuracy and yield an additional performance boost.
is non-negative. This is especially true if is positive semi-definite.
Reordering the terms for in (43) yields the following relation between and
where we have omitted the obvious dependencies on . Our vector will be a descent direction if . Using (47) we conclude
Expanding in the second inner product yields
Thus, we obtain
Now, we are in presence of a descent direction whenever the expression
is non-negative. This is especially true, if the matrix
is positive semi-definite.∎
Let us conclude this section by remarking that the matrix
does not have any particular structure. Indeed, in general, it is made up from a product of non-symmetric and non-square matrices. Thus, additional claims on the spectral properties of this matrix are difficult to derive.
Nevertheless, we conjecture that
is an efficient way to approximate for computations. We will investigate possible deficiencies later in our numerical experiments.
4.5 Summary of the solution strategy
Our final algorithm for the computation of the depth and the albedo is given in Algorithm 1. For the step sizes we employed the “lazy backtracking” algorithm as in OCBP2013 . This includes increasing the Lipschitz constant for by multiplication with a parameter ( in our experiments), until the new iterate fulfils