sparse-depth-sensing
Sparse Depth Sensing for Resource-Constrained Robots
view repo
We consider the case in which a robot has to navigate in an unknown environment but does not have enough on-board power or payload to carry a traditional depth sensor (e.g., a 3D lidar) and thus can only acquire a few (point-wise) depth measurements. We address the following question: is it possible to reconstruct the geometry of an unknown environment using sparse and incomplete depth measurements? Reconstruction from incomplete data is not possible in general, but when the robot operates in man-made environments, the depth exhibits some regularity (e.g., many planar surfaces with only a few edges); we leverage this regularity to infer depth from a small number of measurements. Our first contribution is a formulation of the depth reconstruction problem that bridges robot perception with the compressive sensing literature in signal processing. The second contribution includes a set of formal results that ascertain the exactness and stability of the depth reconstruction in 2D and 3D problems, and completely characterize the geometry of the profiles that we can reconstruct. Our third contribution is a set of practical algorithms for depth reconstruction: our formulation directly translates into algorithms for depth estimation based on convex programming. In real-world problems, these convex programs are very large and general-purpose solvers are relatively slow. For this reason, we discuss ad-hoc solvers that enable fast depth reconstruction in real problems. The last contribution is an extensive experimental evaluation in 2D and 3D problems, including Monte Carlo runs on simulated instances and testing on multiple real datasets. Empirical results confirm that the proposed approach ensures accurate depth reconstruction, outperforms interpolation-based strategies, and performs well even when the assumption of structured environment is violated.
READ FULL TEXT VIEW PDFSparse Depth Sensing for Resource-Constrained Robots
Video demonstrations:
https://youtu.be/vE56akCGeJQ
Source code:
https://github.com/sparse-depth-sensing
Recent years have witnessed a growing interest towards miniaturized robots, for instance the RoboBee [1], Piccolissimo [2], the DelFly [3, 4], the Black Hornet Nano [5], Salto [6]. These robots are usually palm-sized (or even smaller), can be deployed in large volumes, and provide a new perspective on societally relevant applications, including artificial pollination, environmental monitoring, and disaster response. Despite the rapid development and recent success in control, actuation, and manufacturing of miniature robots, on-board sensing and perception capabilities for such robots remain a relatively unexplored, challenging open problem. These small platforms have extremely limited payload, power, and on-board computational resources, thus preventing the use of standard sensing and computation paradigms.
In this paper we explore novel sensing techniques for miniaturized robots that cannot carry standard sensors. In the last two decades, a large body of robotics research focused on the development of techniques to perform inference from data produced by “information-rich” sensors (e.g., high-resolution cameras, 2D and 3D laser scanners). A variety of approaches has been proposed to perform geometric reconstruction using these sensors, for instance see [7, 8, 9] and the references therein. On the other extreme of the sensor spectrum, applications and theories have been developed to cope with the case of minimalistic sensing [10, 11, 12, 13]. In this latter case, the sensor data is usually not metric (i.e., the sensor cannot measure distances or angles) but instead binary in nature (e.g., binary detection of landmarks), and the goal is to infer only the topology of the (usually planar) environment rather than its geometry. This work studies a relatively unexplored region between these two extremes of the sensor spectrum.
Our goal is to design algorithms (and lay the theoretical foundations) to reconstruct a depth profile (i.e., a laser scan in 2D, or a depth image in 3D, see fig:zed) from sparse and incomplete depth measurements. Contrary to the literature on minimalistic sensing, we provide tools to recover complete geometric information, while requiring much fewer data points compared to standard information-rich sensors. This effort complements recent work on hardware and sensor design, including the development of lightweight, small-sized depth sensors. For instance, a number of ultra-tiny laser range sensors are being developed as research prototypes (e.g., the dime-sized, 20-gram laser of [14], and an even smaller lidar-on-a-chip system with no moving parts [15]), while some other distance sensors have already been released to the market (e.g., the TeraRanger’s single-beam, 8-gram distance sensor [16], and the LeddarVu’s 8-beam, 100-gram laser scanner [17]). These sensors provide potential hardware solutions for sensing on micro (or even nano) robots. Although these sensors meet the requirements of payload and power consumption of miniature robots, they only provide very sparse and incomplete depth data, in the sense that the raw depth measurements are extremely low-resolution (or even provide only a few beams). In other words, the output of these sensors cannot be utilized directly in high-level tasks (e.g., object recognition and mapping), and the need to reconstruct a complete depth profile from such sparse data arises.
Contribution. We address the following question: is it possible to reconstruct a complete depth profile from sparse and incomplete depth samples?
In general, the answer is negative, since the environment can be very adversarial (e.g., 2D laser scan where each beam is drawn randomly from a uniform distribution), and it is impossible to recover the depth from a small set of measurements. However, when the robot operates in structured environments (e.g., indoor, urban scenarios) the depth data exhibits some regularity. For instance, man-made environments are characterized by the presence of many planar surfaces and a few edges and corners. This work shows how to leverage this regularity to recover a depth profile from a handful of sensor measurements. Our overarching goal is two-fold: to establish theoretical conditions under which depth reconstruction from sparse and incomplete measurements is possible, and to develop practical inference algorithms for depth estimation.
Our first contribution, presented in sec:cs, is a general formulation of the depth estimation problem. Here we recognize that the “regularity” of a depth profile is captured by a specific function (the -norm of the 2^{nd}-order differences of the depth profile). We also show that by relaxing the -norm to the (convex) -norm, our problem falls within the cosparsity model in compressive sensing (CS). We review related work and give preliminaries on CS in sec:relatedWork and sec:preliminaries.
The second contribution, presented in sec:sensingConstraints, is the derivation of theoretical conditions for depth recovery. In particular, we provide conditions under which reconstruction of a profile from incomplete measurements is possible, investigate the robustness of depth reconstruction in the presence of noise, and provide bounds on the reconstruction error. Contrary to the existing literature in CS, our conditions are geometric (rather than algebraic) and provide actionable information to guide sampling strategy.
Our third contribution, presented in sec:algorithms, is algorithmic. We discuss practical algorithms for depth reconstruction, including different variants of the proposed optimization-based formulation, and solvers that enable fast depth recovery. In particular, we discuss the application of a state-of-the-art solver for non-smooth convex programming, called NESTA [18].
Our fourth contribution, presented in sec:experiments, is an extensive experimental evaluation, including Monte Carlo runs on simulated data and testing with real sensors. The experiments confirm our theoretical findings and show that our depth reconstruction approach is extremely resilient to noise and works well even when the regularity assumptions are partially violated. We discuss many applications for the proposed approach. Besides our motivating scenario of navigation with miniaturized robots, our approach finds application in several endeavors, including data compression and super-resolution depth estimation.
sec:conclusion draws conclusions and discusses future research. Proofs and extra visualizations are given in the appendix.
This paper extend the preliminary results presented in [19] in multiple directions. In particular, the error bounds in sec:optimality and sec:stableRecovery, the algorithms and solvers in sec:algorithms, and most of the experiments of sec:experiments are novel and have not been previously published.
This work intersects several lines of research across fields.
Minimalistic Sensing. Our study of depth reconstruction from sparse sensor data is related to the literature on minimalistic sensing. Early work on minimalistic sensing includes contributions on sensor-less manipulation [20], robot sensor design [21, 22], and target tracking [23]. [10], [11]. [24] use binary measurements of the presence of landmarks to infer the topology of the environment. [25, 26] reconstruct the topology of a sensor network from unlabeled observations from a mobile robot. [27] and [28] investigate a localization problem using contact sensors. [29] use depth discontinuities measurements to support exploration and search in unknown environments. [30, 13] propose a combinatorial filter to estimate the path (up to homotopy class) of a robot from binary detections. [31] addresses minimality of information for vision-based place recognition.
Sensing and perception on miniaturized robots. A fairly recent body of work in robotics focuses on miniaturized robots and draws inspiration from small animals and insects. Most of the existing literature focuses on the control of such robots, either open-loop or based on information from external infrastructures. However, there has been relatively little work on onboard sensing and perception. For example, the Black Hornet Nano [5] is a military-grade micro aerial vehicle equipped with three cameras but with basically no autonomy. Salto [6] is Berkeley’s 100g legged robot with agile jumping skills. The jump behavior is open-loop due to lack of sensing capabilities, and the motion is controlled by a remote laptop. The RoboBee [1] is an 80-milligram, insect-scale robot capable of hovering motion. The state estimation relies on an external array of cameras. Piccolissimo [2] is a tiny, self-powered drone with only two moving parts, completely controlled by an external, hand-held infrared device. The DelFly Explorer [3, 4] is a 20-gram flying robot with an onboard stereo vision system. It is capable of producing a coarse depth image at 11Hz and is thus one of the first examples of miniaturized flying robot with basic obstacle avoidance capabilities.
Fast Perception and Dense 3D Reconstruction.
The idea of leveraging priors on the structure of the environment to improve or enable geometry estimation has been investigated in early work in computer vision for single-view 3D reconstruction and feature matching
[32, 33]. Early work by [34] addresses Structure from Motion by assuming the environment to be piecewise planar. More recently, [35] propose an approach to speed-up stereo reconstruction by computing the disparity at a small set of pixels and considering the environment to be piecewise planar elsewhere. [36] combine live dense reconstruction with shape-priors-based 3D tracking and reconstruction. [37]propose a regularization based on the structure tensor to better capture the local geometry of images.
[38] produce high-resolution depth maps from subsampled depth measurements by using segmentation based on both RGB images and depth samples. [39] compute a dense depth map from a sparse point cloud. This work is related to our proposal with three main differences. First, the work [39] uses an energy minimization approach that requires parameter tuning (the authors use Bayesian optimization to learn such parameter); our approach is parameter free and only assumes bounded noise. Second, we use a 2^{nd}-order difference operator to promote depth regularity, while [39] considers alternative costs, including nonconvex regularizers. Finally, by recognizing connections with the cosparsity model in CS, we provide theoretical foundations for the reconstruction problem.Map Compression. Our approach is also motivated by the recent interest in map compression. [40] propose a compression method for occupancy grid maps, based on the information bottleneck theory. [41, 42] use Gaussian processes to improve 2D mapping quality from smaller amount of laser data. [43] investigate wavelet-based compression techniques for 3D point clouds. [44, 45] discuss point cloud compression techniques based on sparse coding. [46, 47] propose a variable selection method to retain only an important subset of measurements during map building.
Compressive Sensing (CS). Finally, our work is related to the literature on compressive sensing [48, 49, 50, 51]. While Shannon’s theorem states that to reconstruct a signal (e.g., a depth profile) we need a sampling rate (e.g., the spatial resolution of our sensor) which must be at least twice the maximum frequency of the signal, CS revolutionized signal processing by showing that a signal can be reconstructed from a much smaller set of samples if it is sparse in some domain. CS mainly invokes two principles. First, by inserting randomness in the data acquisition, one can improve reconstruction. Second, one can use -minimization to encourage sparsity of the reconstructed signal. Since its emergence, CS impacted many research areas, including image processing (e.g., inpainting [52], total variation minimization [53]), data compression and 3D reconstruction [54, 55, 56], tactile sensor data acquisition [57], inverse problems and regularization [58], matrix completion [59], and single-pixel imaging techniques [60, 61, 62]. While most CS literature assumes that the original signal is sparse in a particular domain, i.e., for some matrix
and a sparse vector
(this setup is usually called the synthesis model), very recent work considers the case in which the signal becomes sparse after a transformation is applied (i.e., given a matrix , the vector is sparse). The latter setup is called the analysis (or cosparsity) model [63, 64]. An important application of the analysis model in compressive sensing is total variation minimization, which is ubiquitous in image processing [53, 65]. In a hindsight we generalize total variation (which applies to piecewise constant signals) to piecewise linear functions.Depth Estimation from Sparse Measurements. Few recent papers investigate the problem of reconstructing a dense depth image from sparse measurements. [66] exploit the sparsity of the disparity maps in the Wavelet domain. The dense reconstruction problem is then posed as an optimization problem that simultaneously seeks a sparse coefficient vector in the Wavelet domain while preserving image smoothness. They also introduce a conjugate subgradient method for the resulting large-scale optimization problem. Liu et al. [67] empirically show that a combined dictionary of wavelets and contourlets produces a better sparse representation of disparity maps, leading to more accurate reconstruction. In comparison with [66, 67], our work has four major advantages. Firstly, our algorithm works with a remarkably small number of samples (e.g. 0.5%), while both [66, 67] operate with at least 5% samples, depending on the image resolution. Secondly, our algorithm significantly outperforms previous work in both reconstruction accuracy and computation time, hence pushing the boundary of achievable performance in depth reconstruction from sparse measurements. An extensive experimental comparison is presented in sec:exp-comparison. Thirdly, the sparse representation presented in this work is specifically designed to encode depth profiles, while both [66, 67] use wavelet representations, which do not explicitly leverage the geometry of the problem. Indeed, our representation is derived from a simple, intuitive geometric model and thus has clear physical interpretation. Lastly, unlike previous work which are mostly algorithmic in nature, we provide theoretical guarantees and error bounds, as well as conditions under which the reconstruction is possible.
We use uppercase letters for matrices, e.g., , and lowercase letters for vectors and scalars, e.g, and . Sets are denoted with calligraphic fonts, e.g., . The cardinality of a set is denoted with . For a set , the symbol denotes its complement. For a vector and a set of indices , is the sub-vector of corresponding to the entries of with indices in . In particular, is the -th entry. The symbols (resp. ) denote a vector of all ones (resp. zeros) of suitable dimension.
The support set of a vector is denoted with
We denote with the Euclidean norm and we also use the following norms:
(1) | |||||
(2) | |||||
(3) |
Note that is simply the number of nonzero elements in . The sign vector of is a vector with entries:
For a matrix and an index set , let denote the sub-matrix of containing only the rows of with indices in ; in particular, is the -th row of . Similarly, given two index sets and , let denote the sub-matrix of including only rows in and columns in . Let
denote the identity matrix. Given a matrix
, we define the following matrix operator normIn the rest of the paper we use the cosparsity model in CS. In particular, we assume that the signal of interest is sparse under the application of an analysis operator. The following definitions formalize this concept.
A vector is said to be cosparse with respect to a matrix if .
Given a vector and a matrix , the -support of is the set of indices corresponding to the nonzero entries of , i.e., . The -cosupport is the complement of , i.e., the indices of the zero entries of .
Our goal is to reconstruct 2D depth profiles (i.e., a scan from a 2D laser range finder) and 3D depth profiles (e.g., a depth image produced by a kinect or a stereo camera) from partial and incomplete depth measurements. In this section we formalize the depth reconstruction problem, by first considering the 2D and the 3D cases separately, and then reconciling them under a unified framework.
In this section we discuss how to recover a 2D depth profile . One can imagine that the vector includes (unknown) depth measurements at discrete angles; this is what a standard planar range finder would measure.
In our problem, due to sensing constraints, we do not have direct access to , and we only observe a subset of its entries. In particular, we measure
(4) |
where the matrix with is the measurement matrix, and represents measurement noise. The structure of is formalized in the following definition.
A sample set is the set of entries of the profile that are measured. A matrix is called a (sparse) sampling matrix (with sample set ), if .
Recall that is a sub-matrix of the identity matrix, with only rows of indices in . It follows that , i.e., the matrix selects a subset of entries from . Since , we have much fewer measurements than unknowns. Consequently, cannot be recovered from , without further assumptions.
In this paper we assume that the profile is sufficiently regular, in the sense that it contains only a few “corners”, e.g., fig:nam1(a). Corners are produced by changes of slope: considering 3 consecutive points at coordinates , , and ,^{1}^{1}1Note that corresponds to the horizontal axis in fig:nam1(a), while the depth is shown on the vertical axis in the figure. there is a corner at if
(5) |
In the following we assume that for all : this comes without loss of generality since the full profile is unknown and we can reconstruct it at arbitrary resolution (i.e., at arbitrary ); hence (5) simplifies to . We formalize the definition of “corner” as follows.
Given a 2D depth profile , the corner set is the set of indices such that .
Intuitively, is the discrete equivalent of the 2^{nd}-order derivative at . We call the curvature at sample : if this quantity is zero, the neighborhood of is flat (the three points are collinear); if it is negative, the curve is locally concave; if it is positive, it is locally convex. To make notation more compact, we introduce the 2^{nd}-order difference operator:
(6) |
Then a profile with only a few corners is one where is sparse. In fact, the -norm of counts exactly the number of corners of a profile:
(7) |
where is the number of corners in the profile.
When operating in indoor environments, it is reasonable to assume that has only a few corners. Therefore, we want to exploit this regularity assumption and the partial measurements in (4) to reconstruct . Let us start from the noiseless case in which in (4). In this case, a reasonable way to reconstruct the profile is to solve the following optimization problem:
(L0) |
which seeks the profile that is consistent with the measurements (4) and contains the smallest number of corners. Unfortunately, problem (L0) is NP-hard due to the nonconvexity of the (pseudo) norm. In this work we study the following relaxation of problem (L0):
() |
which is a convex program (it can be indeed rephrased as a linear program), and can be solved efficiently in practice. sec:sensingConstraints provides conditions under which (
) recovers the solution of (L0). Problem () falls in the class of the cosparsity models in CS [64].In the presence of bounded measurement noise (4), i.e., , the -minimization problem becomes:
() |
Note that we assume that the norm of the noise is bounded, since this naturally reflects the sensor model in our robotic applications (i.e., bounded error in each laser beam). On the other hand, most CS literature considers the norm of the error to be bounded and thus obtains an optimization problem with the norm in the constraint. The use of the norm as a constraint in () resembles the Dantzig selector of Candes and Tao [68], with the main difference being the presence of the matrix in the objective.
In this section we discuss how to recover a 3D depth profile (a depth map, as the one in fig:zed(a)), using incomplete measurements. As in the 2D setup, we do not have direct access to , but instead only have access to point-wise measurements in the form:
(8) |
where represents measurement noise. Each measurement is a noisy sample of the depth of at pixel .
We assume that is sufficiently regular, which intuitively means that the depth profile contains mostly planar regions and only a few “edges”. We define the edges as follows.
Given a 3D profile , the vertical edge set is the set of indices such that . The horizontal edge set is the set of indices such that . The edge set is the union of the two sets: .
Intuitively, is not in the edge set if the patch centered at is planar, while otherwise. As in the 2D case we introduce 2^{nd}-order difference operators and to compute the vertical differences and the horizontal differences :
(9) |
where the matrices and are the same as the one defined (6), but with suitable dimensions; each entry of the matrix contains the vertical (2^{nd}-order) differences at a pixel, while collects the horizontal differences.
Following the same reasoning of the 2D case, we obtain the following -norm minimization
(10) | |||||
subject to |
where denotes the (column-wise) vectorization of a matrix, and we assume noiseless measurements. In the presence of measurement noise, the equality constraint in (10) is again replaced by , , where is an upper bound on the pixel-wise noise .
In this section we show that the 3D depth reconstruction problem (10) can be reformulated to be closer to its 2D counterpart (), if we vectorize the depth profile (matrix ). For a given profile , we define the number of pixels , and we call the vectorized version of , i.e., . Using standard properties of the vectorization operator, we get
(11) | |||
where is the Kronecker product, is an identity matrix of size , and is a vector which is zero everywhere except the -th entry which is . Stacking all measurements (8) in a vector and using (IV-C), problem (10) can be written succinctly as follows:
() |
where the matrix (stacking rows in the form ) has the same structure of the sampling matrix introduced in Definition 3, and the “regularization” matrix is:
(12) |
Note that () is the same as (), except for the fact that the matrix in the objective is replaced with a larger matrix . It is worth noticing that the matrix is also sparse, with only 3 non-zero entries (, , and ) on each row in suitable (but not necessarily consecutive) positions.
In the presence of noise, we define an error vector which stacks the noise terms in (8) for each pixel , and assume pixel-wise bounded noise . The noisy 3D depth reconstruction problem then becomes:
() |
Again, comparing () and (), it is clear that in 2D and 3D we solve the same optimization problem, with the only difference lying in the matrices and .
2D/ 3D | Sampling Strategy | Result | Remark |
---|---|---|---|
2D &3D | noiseless | prop:nam | sufficient condition for exact recovery (algebraic condition) |
2D | noiseless, corners & neighbors | prop:nam1D | sufficient condition for exact recovery (geometric condition) |
3D | noiseless, edges & neighbors | prop:nam2D | sufficient condition for exact recovery (geometric condition) |
2D | noiseless | prop:subdifferential | necessary and sufficient condition for optimality (algebraic condition) |
3D | noiseless | cor:subdifferential2D | necessary and sufficient condition for optimality (algebraic condition) |
2D | noiseless, twin samples & boundaries | thm:1Doptimality | necessary and sufficient condition for optimality (geometric condition) |
2D | noiseless, twin samples & boundaries | prop:1DrecoveryError | reconstruction error bound |
3D | noiseless, grid samples | thm:2Doptimality_xtrue | sufficient condition for optimality (geometric condition) |
3D | noiseless, grid samples | prop:2DrecoveryError | reconstruction error bound |
2D | noisy | prop:robust_subdifferential | necessary and sufficient condition for robust optimality (algebraic condition) |
3D | noisy | cor:robust_subdifferential2D | necessary and sufficient condition for robust optimality (algebraic condition) |
2D | noisy | thm:1Doptimality_robust | necessary condition for robust optimality (geometric condition) |
2D | noisy, twin samples & boundaries | prop:1DrecoveryError_robust | reconstruction error bound |
3D | noisy, grid samples | prop:2DrecoveryError_robust | reconstruction error bound |
This section provides a comprehensive analysis on the quality of the depth profiles reconstructed by solving problems () and () in the 2D case, and problems () and () in 3D. A summary of the key technical results presented in this paper is given in tab:summary.
In particular, sec:BP_exact discusses exact recovery and provides the conditions on the depth measurements such that the full depth profile can be recovered exactly. Since these conditions are quite restrictive in practice (although we will discuss an interesting application to data compression in sec:experiments), sec:optimality analyzes the reconstructed profiles under more general conditions. More specifically, we derive error bounds that quantify the distance between the ground truth depth profile and our reconstruction. sec:stableRecovery extends these error bounds to the case in which the depth measurements are noisy.
In this section we provide sufficient conditions under which the full depth profile can be reconstructed exactly from the given depth samples.
Recent results on cosparsity in compressive sensing provide sufficient conditions for exact recovery of a cosparse profile , from measurements (where is a generic matrix). We recall this condition in prop:nam below and, after presenting the result, we discuss why this condition is not directly amenable for roboticists to use.
Despite its generality, prop:nam provides only an algebraic condition. In our depth estimation problem, it would be more desirable to have geometric conditions, which suggest the best sampling locations. Our contribution in this section is a geometric interpretation of prop:nam:
We first provide a result for the 2D case. The proof is given in proof:prop-nam1D.
Let be a 2D depth profile with corner set . Assuming noiseless measurements (4), the following hold:
if the sampling set is the union of the corner set and the first and last entries of , then ;
prop:nam1D implies that we can recover the original profile exactly, if we measure the neighborhood of each corner. An example that satisfies such condition is illustrated in fig:samplingCorners(a). When we sample only the corners, however, prop:nam1D states that ; in principle in this case one might still hope to recover the profile , since the condition in Proposition 6 is only sufficient for exact recovery. But it turns out that in our problem one can find counterexamples with in which -minimization fails to recover . A pictorial example is shown in fig:samplingCorners(b), where we show an optimal solution which differs from the true profile .
We derive a similar condition for 3D problems. The proof is given in proof:prop-nam2D.
In the experimental section, we show that these initial results already unleash interesting applications. For instance, in stereo vision problems, we could locate the position of the edges from the RGB images and recover the depth in a neighborhood of the edge pixels. Then, the complete depth profile can be recovered (at arbitrary resolution) via ().
The exact recovery conditions of prop:nam1D and prop:nam2D are quite restrictive if we do not have prior knowledge of the position of the corners or edges. In this section we provide more powerful results that do not require sampling corners or edges. Empirically, we observe that when we do not sample all the edges, the optimization problems () and () admit multiple solutions, i.e., multiple profiles attain the same optimal cost. The basic questions addressed in this section are: which profiles are in the solution set of problems () and ()? Is the ground truth profile among these optimal solutions? How far can an optimal solution be from the ground truth profile ? In order to answer these questions, in this section we derive optimality conditions for problems () and (), under the assumption that all measurements are noise-free.
In this section, we derive a general algebraic condition for a 2D profile (resp. 3D) to be in the solution set of () (resp. ()). sec:BP_optimality1D and sec:BP_optimality2D translate this algebraic condition into a geometric constraint on the curvature of the profiles in the solution set.
Let be the sampling matrix and be the sample set. Given a profile which is feasible for (), is a minimizer of () if and only if there exists a vector such that
(14) |
where is the -support of (i.e., the set of indices of the nonzero entries of ) and is the set of entries of that we do not sample (i.e., the complement of ).
The proof of prop:subdifferential is based on the subdifferential of the -minimization problem and is provided in proof:subdifferential. An analogous result holds in 3D.
A given profile is in the set of minimizers of () if and only if the conditions of prop:subdifferential hold, replacing with in eqs. (14).
We omit the proof of cor:subdifferential2D since it follows the same line of the proof of prop:subdifferential.
In this section we derive necessary and sufficient geometric conditions for to be in the solution set of (). Using these findings we obtain two practical results: (i) an upper bound on how far any solution of () can be from the ground truth profile ; (ii) a general algorithm that recovers even when the conditions of prop:nam1D fail (the algorithm is presented in sec:algorithmicVariants).
To introduce our results, we need the following definition.
Let (sign of the curvature at ). A 2D depth profile is sign consistent if, for any two consecutive samples , one of the two conditions holds:
no sign change: for any two integers , with , if and , then ;
sign change only at the boundary: for any integer , with , ;
This technical definition has a clear geometric interpretation. In words, a profile is sign consistent, if its curvature does not change sign (i.e., it is either convex or concave) within each interval between consecutive samples. See fig:signConsistency for examples of sign consistency, alongside with a counter-example.
In the following we show that any optimal solutions for problem () must be sign consistent. In order to simplify the analysis for thm:1Doptimality below, we assume that we pick pairs of consecutive samples (rather than individual, isolated samples). We formalize this notion as follows.
A twin sample is a pair of consecutive samples, i.e., with .
The proof of thm:1Doptimality is given in proof:thm-1Doptimality. This theorem provides a tight geometric condition for a profile to be optimal. More specifically, a profile is optimal for problem () if it passes through the given set of samples (i.e., it satisfies the constraint in ()) and does not change curvature between consecutive samples. This result also provides insights into the conditions under which the ground truth profile will be among the minimizers of (), and how one can bound the depth estimation error, as stated in the following proposition.
Let be the ground truth profile generating noiseless measurements (4). Assume that we sample the boundary of and the sample set includes a twin sample in each linear segment in . Then, is in the set of minimizers of (). Moreover, denote with the naive solution obtained by connecting consecutive samples with a straight line (linear interpolation). Then, any optimal solution lies between and , i.e., for any index , it holds . Moreover, it holds
(15) |
where is the distance between the sample and the nearest corner in , while is the angle that the line connecting with the nearest corner forms with the vertical.
A visualization of the parameters and is given in fig:toyExample(a). The proof of prop:1DrecoveryError is given in proof:1DrecoveryError.
prop:1DrecoveryError provides two important results. First, it states that any optimal solution (e.g., the dotted green line in fig:nam1(b)) should lie between the ground truth depth (solid black line) and the naive solution (dashed blue line). In other words, any arbitrary set of twin samples defines an envelope that contains all possible solutions. An example of such envelope is illustrated in fig:toyExample(b). The width of this envelope bounds the maximum distance between any optimal solution and the ground truth, and hence such envelope provides a point-wise quantification of the reconstruction error. Second, prop:1DrecoveryError provides an upper bound on the overall reconstruction error in eq. (15). The inequality implies that the reconstruction error grows with the parameter , the distance between our samples and the corners. In addition, the error also increases if the parameter is small, meaning that the ground truth profiles are “pointy” and there exist abrupt changes of slope between consecutive segments. An instance of such “pointy” behavior is the second corner from right in fig:toyExample(b).
We will further show in sec:algorithms that prop:1DrecoveryError has algorithmic implications. Based on prop:1DrecoveryError, we design an algorithm that exactly recovers a 2D profile, even when the sample set does not contain all corners. Before moving to algorithmic aspects, let us consider the 3D case.
In this section we provide a sufficient geometric condition for a 3D profile to be in the solution set of (). We start by introducing a specific sampling strategy (the analogous of the twin samples in 2D) to simplify the analysis.
Given a 3D profile , a grid sample set includes pairs of consecutive rows and columns of , along with the boundaries (first and last two rows, first and last two columns). This sampling strategy divides the image in rectangular patches, i.e., sets of non-sampled pixels enclosed by row-samples and column-samples.
fig:envelop1D(a) shows an example of grid samples and patches. If we have patches and we denote the set of non-sampled pixels in patch with , then the union includes all the pixels in the depth image. We can now extend the notion of sign consistency to the 3D case.
Let be a 3D depth profile. Let be a grid sampling set and be the non-sampled patches. Let be the restriction of to its entries in . Then, is called 3D sign consistent if for all , the nonzero entries of are all or , and the nonzero entries of are all or , where is the 2^{nd}-order difference operator (6) of suitable dimension.
Intuitively, 3D sign consistency indicates that the sign of the profile’s curvature does not change, either horizontally or vertically, within each non-sampled patch. We now present a sufficient condition for to be in the solution set of ().
The proof is given in proof:thm-2Doptimality_xtrue. thm:2Doptimality_xtrue is weaker than thm:1Doptimality, the 2D counterpart, since our definition of 3D sign consistency is only sufficient, but not necessary, for optimality. Nevertheless, it can be used to bound the depth recovery error as follows.
Let be the ground truth profile generating noiseless measurements (4). Let be a grid sampling set and assume to be 3D sign consistent with respect to . Moreover, let and be the point-wise lower and upper bound of the row-wise envelope, built as in fig:toyExample(b) by considering each row of the 3D depth profile as a 2D profile. Then, is an optimal solution of (), and any other optimal solution of () satisfies:
(16) |
Roughly speaking, if our grid sampling is “fine” enough to capture all changes in the sign of the curvature of , then is among the solutions of (). Despite the similarity to prop:1DrecoveryError, the result in prop:2DrecoveryError is weaker. More specifically, prop:2DrecoveryError is based on the fact that we can compute an envelope only for the ground truth profile (but not for all the optimal solutions, as in prop:1DrecoveryError). Moreover, the estimation error bound in eq. (16) can be only computed a posteriori, i.e., after obtaining an optimal solution . Nevertheless, the result can be readily used in practical applications, in which one wants to bound the depth estimation error. An example of the row-wise envelope is given in fig:envelop1D(b).
In this section we analyze the depth reconstruction quality for the case where the measurements (4) are noisy. In other words, we now focus on problems () and ().
In this section, we derive a general algebraic condition for a 2D profile (resp. 3D) to be in the solution set of () (resp. ()). This condition generalizes the optimality condition of sec:optimality_algebraic to the noisy case. In sec:BPD_optimality1D and sec:BPD_optimality2D, we apply this algebraic condition to bound the depth reconstruction error.
Let be the sampling matrix, be the sample set and be the noisy measurements as in (4), with and . Given a profile which is feasible for (), define the active set as follows
(17) |
We also define its two subsets
(18) |
Also denote . Then is a minimizer of () if and only if there exists a vector such that
(19) | |||
(20) |
where is the -support of , and is the set of un-sampled entries in (i.e., the complement of ).
The proof is given in proof:prop-robust_subdifferential. A visual illustration of the active set is given in fig:activeSet. We will provide some geometric insights on the algebraic conditions in prop:robust_subdifferential in the next two sections. Before moving on, we re-ensure that the robust optimality conditions straightforwardly extends to the 3D case.
We skip the proof of cor:robust_subdifferential2D since it proceeds along the same line of the proof of prop:robust_subdifferential.
In this section we consider the 2D case and provide a geometric interpretation of the algebraic conditions in prop:robust_subdifferential. The geometric interpretation follows from a basic observation, which enables us to relate the noisy case with our noiseless analysis of sec:BP_optimality1D. The observation is that if a profile satisfies the robust optimality conditions (19)-(20) then it also satisfies the noiseless optimality condition (14), hence being sign consistent, as per thm:1Doptimality.
We present a brief proof for thm:1Doptimality_robust below.
thm:1Doptimality_robust will help establish error bounds on the depth reconstruction. Before presenting these bounds, we formally define the 2D sign consistent -envelope.
Assume that the sample set includes only twin samples and we sample the “boundary” of the profile, i.e., , and . Moreover, for each pair of consecutive twin samples and , define the following line segments for :
Further define the following profiles:
and | ||
where denotes the point-wise maximum among the segments in eqs. (1), (3), and (5). We define the 2D sign consistent -envelope as the region enclosed between the upper bound and the lower bound .
A pictorial representation of the line segments (1)-(6) in def:scEnvelop is given in fig:envelopNoisy(a)-(b). fig:envelopNoisy(a) shows an example where line segment (1) intersects with (3) and line segment (2) intersects with (4). In fig:envelopNoisy(b), these line segments do no intersect. An example of the resulting 2D sign consistent -envelope is illustrated in fig:envelopNoisy(c).
Our interest towards the 2D sign consistent -envelope is motivated by the following proposition.
Under the conditions of def:scEnvelop, any 2D sign-consistent profile, belongs to the 2D sign consistent -envelope.
(a) (b) |
(c) |
The proof of prop:scEnvelop is given in proof:prop-noisyEnvelop.
Next we introduce a proposition that characterizes the depth reconstruction error bounds of an optimal solution.
Let be the ground truth generating noisy measurements (4). Assume that we sample the boundary of and the sample set includes a twin sample in each linear segment in . Then, belongs to the 2D sign consistent -envelope, and any optimal solution of () also lies in the -envelope. Moreover, denoting with and , the point-wise lower and upper bound of the -envelope (def:scEnvelop), and considering any consecutive pairs of twin samples and , for all , it holds:
The proof of prop:1DrecoveryError_robust is given in proof:prop-1DrecoveryError_robust.
In this section we characterize the error bounds of an optimal solution of () in the noisy case. The result is similar to its noiseless counterpart in prop:2DrecoveryError.
Let be the ground truth generating noisy measurements (4). Let be a grid sample set and assume to be 3D sign consistent with respect to . Moreover, let and be the point-wise lower and upper bound of the row-wise 2D sign consistent -envelope, built as in fig:envelopNoisy(b) by considering each row of the 3D depth profile as a 2D profile. Then, given any optimal solution of (), it holds that
(21) |
The proof for prop:2DrecoveryError_robust follows the same line as the proof of prop:2DrecoveryError, and we omit the proof for brevity.
The formulations discussed so far, namely (), (), (), (), directly translate into algorithms: each optimization problem can be solved using standard convex programming routines and returns an optimal depth profile.
This section describes two algorithmic variants that further enhance the quality of the depth reconstruction (sec:algorithmicVariants), and then presents a fast solver for the resulting -minimization problems (sec:solvers).
In this section we describe other algorithmic variants for the 2D and 3D case. sec:alg1 proposes a first algorithm that solves 2D problems and is inspired by prop:1DrecoveryError. sec:alg2 discusses variants of () for 3D problems.
prop:1DrecoveryError dictates that any optimal solution of () lies between the naive interpolation solution and the ground truth profile (recall fig:nam1(b)). alg:1 is based on a simple idea: on the one hand, if the true profile is concave between two consecutive samples (cf. with the first corner in fig:nam1(b)), then we should look for an optimal profile having depth “as large as possible” in that particular interval (while still being within the optimal set of ()); on the other hand, if the shape is convex (second corner in fig:nam1(b)) we should look for an optimal profile with depth as “as small as possible”, since this is the closest to .
alg:1 first solves problem () and computes an optimal solution and the corresponding optimal cost (lines 1-1). Let us skip lines 1-1
for the moment and take a look at line
1: the constraints in this optimization problem include the same constraint of line 1 (), plus an additional constraint in line 1 () that restricts to stay within the optimal solution set of (). Therefore, it only remains to design a new objective function that “encourages” a solution that is close to while still being within this optimal set. To this end, we use a simple linear objective , where is a vector of coefficients, such that the objective function penalizes large entries in the profile if , and rewards large entries when . More specifically, the procedure for choosing a proper coefficient is as follows. For any consecutive pairs of twin samples and (), the algorithm looks at the slope difference between the second pair (i.e., ) and the first pair (). If this difference is negative, then the function is expected to be concave between the samples. In this case the sign for any point between the samples is set to . If the difference is positive, then the signs are set to . Otherwise the signs will be 0. We prove the following result.Under the assumptions of prop:1DrecoveryError, alg:1 recovers the 2D depth profile exactly.
In the formulations () and () we used the matrix to encourage “flatness”, or in other words, regularity of the depth profiles. In this section we discuss alternative objective functions which we evaluate experimentally in sec:experiments. These objectives simply adopt different definitions for the matrix in () and (). For clarify, we denote the formulation introduced earlier in this paper (using the matrix defined in (12)) as the “L1” formulation (also recalled below), and we introduce two new formulations, denoted as “L1diag” and “L1cart”, which use different objectives.
L1 formulation: Although we already discussed the structure of the matrix in sec:rec2d3d_unified, here we adopt a slightly different perspective that will make the presentation of the variants L1diag and L1cart clearer. In particular, rather than taking a matrix view as done in sec:rec2d3d_unified, we interpret the action of the matrices and in eq. (10) as the application of a kernel (or convolution filter) to the 3D depth profile . In particular, we note that:
(22) |
where “” denotes the action of a discrete convolution filter and the kernels and are defined as
Intuitively, and applied at a pixel return the 2^{nd}-order differences along the horizontal and vertical directions at that pixel, respectively. The L1 objective, presented in sec:rec2d3d_unified, can be then written as:
L1diag formulation: While L1 only penalizes, for each pixel, variations along the horizontal and vertical direction, the objective of the L1diag formulation includes an additional 2^{nd}-order derivative, which penalizes changes along the diagonal direction. This additional term can be written as , where the kernel is:
Therefore, the objective in the L1diag formulation is
L1cart formulation: When introducing the L1 formulation in sec:cs, we assumed that we reconstruct the depth at uniformly-spaced point, i.e., the ^{2}^{2}2With slight abuse of notation here we use and to denote the horizontal and vertical coordinates of a point with respect to the image plane. coordinates of each point belong to a uniform grid; in other words, looking at the notion of curvature in (5), we assumed (also in the 3D case). While this comes without loss of generality, since the full profile is unknown and we can reconstruct it at arbitrary resolution, we note that typical sensors, even in 2D, do not produce measurements with uniform spacing, see fig:metricExample.
For this reason, in this section we generalize the L1 objective to account for irregularly-spaced points. If we denote with and the horizontal and vertical coordinates of the 3D point observed at pixel , a general expression for the horizontal and vertical 2^{nd}-order differences is:
(23) |
where the convolution kernels at pixel are defined as: