Image-Guided Depth Sampling and Reconstruction

08/04/2019 ∙ by Adam Wolff, et al. ∙ Technion 7

Depth acquisition, based on active illumination, is essential for autonomous and robotic navigation. LiDARs (Light Detection And Ranging) with mechanical, fixed, sampling templates are commonly used in today's autonomous vehicles. An emerging technology, based on solid-state depth sensors, with no mechanical parts, allows fast, adaptive, programmable scans. In this paper, we investigate the topic of adaptive, image-driven, sampling and reconstruction strategies. First, we formulate a piece-wise linear depth model with several tolerance parameters and estimate its validity for indoor and outdoor scenes. Our model and experiments predict that, in the optimal case, about 20-60 piece-wise linear structures can approximate well a depth map. This translates to a depth-to-image sampling ratio of about 1/1200. We propose a simple, generic, sampling and reconstruction algorithm, based on super-pixels. We reach a sampling rate which is still far from the optimal case. However, our sampling improves grid and random sampling, consistently, for a wide variety of reconstruction methods. Moreover, our proposed reconstruction achieves state-of-the-art results, compared to image-guided depth completion algorithms, reducing the required sampling rate by a factor of 3-4. A single-pixel depth camera built in our lab illustrates the concept.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 6

page 10

page 12

page 13

page 15

page 16

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, depth sensing has become essential for a variety of new significant applications. For example, depth sensors assist autonomous cars in navigation and in collision prevention [31]. The physical constraints on active depth sensing mobile devices, such as light detection and ranging (LiDAR), yield sparse depth measurements per scan. This results in a coarse point cloud and requires an additional estimation of missing data.

Traditional LiDARs have a restricted scanning mechanism. Those devices measure distance in specified angle intervals, using a fixed number of horizontal scan-lines (usually 16 to 64), depending on the number of transceivers. A new revolutionary technology is now emerging of solid-state depth sensors. They are based on optical phased-arrays with no mechanical parts, and can thus scan the scene fast in an adaptive manner (programmable scanning) [9, 29]. In addition, those innovative devices are much cheaper than those currently in use. This calls for the development of new, efficient, sampling strategies, which reduce the reconstruction error per sample. Since almost always autonomous platforms are equipped with RGB cameras, we investigate the possibility to improve the depth sampling process by taking the RGB information into account.

In this paper, we address the topic of image-guided depth sampling and reconstruction. First, we introduce the concept of adaptive depth sampling and develop an appropriate model of the data. Then, we introduce a fast and practical image-guided algorithm for depth sampling and reconstruction, based on super-pixels. An example of output of a our algorithm is shown in Fig. 1. We demonstrate in experiments that our framework outperforms state-of-the-art depth completion methods for both indoor and outdoor scenes. Finally, since current solid-state technology is not yet technically-open for reconfiguration of the sampling, we illustrate the concept in real life, by a single-pixel depth camera, which was 3D-printed in our lab.

2 Related Work

Depth completion: The task of depth reconstruction from scattered sparse samples is being increasingly investigated. The main methods can be divided to those which require only the sparse depth input (unguided) and to those assisted by additional information, e.g. color image (guided).

Among the unguided methods, some use classical approach [19, 24]

, while others rely on more advanced tools such as deep learning

[11, 35].

On the contrary, guided methods exploit the connection between depth maps and their corresponding color image. Earlier methods used traditional image processing tools [6, 14]. Recently, several deep learning-based methods [10, 15, 17, 18, 20, 21, 25, 26] achieved state-of-the-art results.

Early guided depth sampling: Despite the intensive development in depth completion, the issue of adaptive sampling is yet little addressed. Only [16, 22]

have offered a non-trivial (i.e. uniformly random or grid) sampling pattern as a previous step to depth reconstruction. Both studies selected sampling at locations which are most probable to have strong depth gradient. Nonetheless, they failed dealing with very low sampling budget of less than 5% of ground-truth pixels.

Nonuniform sampling: Over the years, the field of nonuniform sampling has been well established [2, 4, 27, 36]. However, these studies focus on the reconstruction of the signal for a given nonuniform sampling pattern and not on how to design data-driven patterns, given side information.

3 The Space of Depth Images

Any sampling strategy is based on a model of the signal to be sampled. For example, in classical Fourier analysis, the assumption is of band-limited signals. Thus, sampling at the Nyquist frequency guarantees perfect reconstruction. Compressed sensing [8] assumes a sparse underlying model of the signal (such as in terms of edges). Sub-Nyquist sampling [28], is based on the ability to manipulate correctly aliased signals, based on prior knowledge of the frequency structure of the data. However, the models above assume a single source of data to be sampled. We would like to examine an appropriate model for depth scenes, as well as the relation to the RGB data of the same scene. We propose a simple depth model and try to validate it experimentally on benchmark data. We then relate it to RGB.

Synthia

NYU-Depth-v2

RGB

GT

Piece-wise planar approx.

Figure 2: Examples for piece-wise planar approximation.

Synthia

NYU-Depth-v2

Figure 3: Statistics of piece-wise planar approximation. Top: percentage of approximated planes per image. Bottom: , as defined in Eq. (2).
(a) Fit: piece-wise approximately flat objects
(b) Do not fit: non flat and non convex objects
Figure 4: Examples of image parts which fit (top) and parts which do not fit (bottom) the piece-wise planar depth model.

3.1 Piece-wise planar depth model

Our primary objective is to obtain depth information for autonomous navigation. Thus, an appropriate model should represent well the general geometrical setting (roads, walls, sidewalks) as well as the location of significant landmarks and obstacles (poles, signs, rocks and objects in a room). For objects, we would like to obtain their location but not necessarily their precise geometry. This leads us to a piece-wise planar model, which was mentioned in [5, 33] but not yet formulated and tested. Given a depth image our hypothesis is that most of the scene can be well represented by a piece-wise planar approximation. More formally, let be the image domain, where is its area. Let , , be a set of sub-domains which define a partition of . Thus, , , , . Let be a 2D piece-wise linear function, defined on the domain by

(1)

where are some constants. Let be a binary function in which indicates validity of the model, where indicates validity and invalidity. We denote by the set of valid points, . We assume , . Our hypothesis is that given some small tolerance parameters , a validity map and a small number of regions , for any depth map there exists a piece-wise planar approximation, defined by Eq. (1), such that

(2)

Thus, can be well approximated by a 2D piece-wise linear function, almost everywhere, provided we know the partition set and the plane parameters for each . Our aim is to approximate from the RGB image. In order to recover we need to sample each region 3 times, to estimate its coefficients (in the noiseless case). This gives us a lower bound on the number of samples required to obtain a high quality depth image:

(3)

We now turn to experimentally check this hypothesis and examine the values of , and in indoor and outdoor scenes.

To validate the proposed model, we made a piece-wise planar depth approximation for two datasets which have dense ground-truth depth. For outdoor scenes, there are little real-life benchmarks with dense depth, we therefore resorted to a high quality emulation, using 787 images downsampled to pixels in summer sequence 5 (left stereo, front view) of Synthia [30] dataset. For indoor scenes we used 654 images downsampled and center-cropped to pixels (persisting [26]) of NYU-Depth-v2 [32] test set.

Examples of piece-wise planar approximations are shown in Fig. 2 (bottom), compared to the ground truth (middle row). Dark-blue indicates regions not in the set . It can be observed that the approximation is quite accurate. Statistical results are presented in Fig. 3. The average model parameters recovered for Synthia are , , . The average model parameters recovered for NYU-Depth-v2 are , , .

In these cases, according to Eq. (3), Synthia can be well approximated in an optimal scenario by an average of only 200 samples, whereas NYU-v2 by an average of 56 samples (in both cases, this translates to about sampling ratio, compared to the ground-truth depth resolution). In Fig. 4 we show examples of objects which fit well a piece-wise planar approximation (top) and counter examples of highly non-convex structures or ones with high curvature.

3.2 Relation of RGB and depth

Next, we want to examine the possibility to estimate the partition set from the RGB data. This is a very challenging task, which is an open problem at this point. We thus turn to a simpler problem of checking the relation between RGB edges and depth discontinuities. Given the set of RGB boundaries (edges) and depth boundaries we would like to calculate empirically, for each coordinate , the following conditional probabilities:

(4)
(5)

We compute the set for each image by using a generic edge-detector, well suited for natural images [13]. For the set we employed a threshold on the depth gradient, normalized by the depth value. We allowed some tolerance in the registration of the images due to misalignment, so for any , the search is in a pixel neighborhood. For Synthia we got , . For NYU-v2 we got , . The high values of indicate the ability to predict well depth discontinuities, based on RGB edges. The relatively low values of indicate that we should expect many false partitions (which appear only in the RGB, but not in the depth data). Thus we can expect to be able to approximate, to some extent, the partition set based solely on the RGB image, by over-segmentation. As the partition is quite a rough approximation, additional samples are required, above the lower bound expressed in Eq. (3). In Fig. 5 we show examples of boundaries in the RGB and depth for both sets.

Figure 5: Examples of depth discontinuities (red) and RGB edges (green) correlation. Blue pixels include both types.

4 Method

We propose a generic and simple method for depth sparse sampling and dense reconstruction. The following assumptions are made:

  1. Measurements are of high quality, such that noise of the range measurement is negligible, compared to the global error of the dense reconstruction.

  2. Sampling budget is limited to samples.

  3. An RGB image of the scene is available to guide the process. The reconstructed depth is registered to this image. Sensitive cameras may be used for night scenes.

  4. Sampling is point-wise. The system can sample at any desired location of the RGB image. The sampling pattern can change for each image.

Figure 6: Algorithm block diagram.

4.1 Algorithm design requirements

Several requirements are vital to the design of such an algorithm: It needs to capture well the shape and boundaries of objects, to be computationally fast and memory efficient, and to have control on the number of samples. Surprisingly, all these requirements coincide with those for super-pixels (SPs) [1] - an over-segmentation technique applied to RGB images. This led us to the following algorithm.

4.2 Proposed algorithm

The proposed algorithm is divided into two parts, sampling and reconstruction. It includes the following steps:

  • Sampling:

    1. [label=S.0]

    2. A super-pixel map is generated from the RGB image using SLIC [1]. The desired number of SPs is set to . The SPs compactness is adjusted to high value to ensure regularly shaped SPs.

    3. For each SP, the SP center of mass (CoM) is computed by calculating the mean of the coordinates of all pixels in the SP. A depth sample is taken at the CoM location. If the CoM is located outside of the SP (for some non-convex SP), the depth sample is taken at the closest location to the CoM of the SP.

  • Reconstruction: Our reconstruction is based on the samples and SPs of the sampling stage.

    1. [label=R.0]

    2. For each SP, a single depth measurement is available, thus a zero-order estimation is performed. That is, the entire SP takes the depth value of the sample. Let be the resulting depth image.

    3. is calculated by .

    4. A bilateral filter [34] is applied over . The filter’s parameters are fixed for a given number of samples and type of scene (road / room). Let be the bilateral filter result.

    5. The final dense reconstructed depth image, , is calculated by .

See Fig. 6 for a high-level diagram of our framework.

4.3 Principles of the algorithm

Figure 7: Reconstruction variants study on NYU-Depth-v2 dataset. We compare 1st-order with 3 samples per segment to 3 variants of 0-order reconstruction, all with the same number of samples.

4.3.1 Sampling

Sampling based on RGB segmentation. This follows the model and relations between RGB edges and depth discontinuities, discussed in previous section.

Sampling at center of mass of segment. There are several reasons for this choice: it reduces an inherent uncertainty near discontinuities. Secondly, for piece-wise planar depth regions, the center of mass minimizes the RMSE. Moreover, practical depth sensing technologies have a finite spatial resolution and cannot sample well near depth discontinuities. Fig. 7 demonstrates that sampling at SP CoM leads to a more accurate depth reconstruction results than sampling a random pixel location inside the SP.

Why super-pixels. Sampling with super-pixels enables measuring small elements. It also limits reconstruction error since the size of the segment is limited. When the depth discontinuity is not well reflected in the RGB (we term it camouflaged objects) the sampling reduces to an approximate grid-sampling scheme, which provides a lower bound on the resolution. This is illustrated in Fig. 9.

4.3.2 Reconstruction

0-order vs. 2D linear reconstruction in each segment. At a first glance, it seems natural to estimate by SPs the subdomains of the model, which require 3 samples to obtain a plane approximation in the region of the SP. However, we found out that this is not an optimal strategy. A better approach is to increase the number of segments by a factor of 3 and to sample once each segment. This allows to increase the overall resolution (or smallest object size

) of the system while still being able to recover reasonably well large planar segments, with a proper nonlinear filtering operation (see below). The depth of the smallest objects that the system can measure are estimated by a constant value. This facilitates the detection of poles, signs and small obstacles at a low sampling cost. Fig.

7 demonstrates that 0-order reconstruction is much more accurate than linear reconstruction in terms of RMSE, for a given sampling budget . In Fig. 8 the ability to detect well small objects by 0-order estimation is illustrated.

Bilateral filtering. Having more SPs with zero-order estimation allows to sample well small objects. However, now large flat regions are heavily degraded by staircasing artifacts. Our proposed solution is to apply a fast, nonlinear, edge preserving filter [34]. It is designed such that actual depth discontinuities are preserved, whereas false edges, which stem from the 0-order estimation, are smoothed out. Due to the log function, smoothing is relative to the depth. This approximates well the piece-wise planar model for large regions, such as walls and roads, yielding also lower RMSE, as seen in Fig. 7. As can be seen in Fig. 8, the artifacts in the reconstruction of the large planar background region are quite minimal.

(a) RGB
(b) Large seg. samp.
(c) Small seg. samp.
(d) Depth GT
(e) 1st-order recon.
(f) Our recon.
Figure 8: Toy example 1: Comparing 1st-order estimation of larger segments (middle row, RMSE=55.0) to zero-order estimation for smaller segments, following nonlinear smoothing (bottom, RMSE=7.7). In both cases 75 samples were used. The latter approach allows for smaller objects to be well reconstructed at the expense of slight staircasing artifacts in larger flat regions (roads, walls etc.).

(a) RGB
(b) Depth GT
(c) Samples
(d) Recon.
Figure 9: Toy example 2: Camouflaged object. In some rare cases, the object is camouflaged and can hardly be detected in the RGB image. In this case image-guided sampling fails. As our method is based on super-pixels, when there are no distinct edges, the method degenerates in a natural manner to classical grid sampling.

4.3.3 MTF analysis

We aim to measure the spatial resolution of our sampling and reconstruction strategy. We use modulation transfer function (MTF) - a standard tool for characterizing the resolution of imaging systems [7]. Our chart is based on the Siemens Star testchart modified for RGB-guided depth sampling. We modified the original testchart by a road scene, replacing the white regions with typical background content and the black regions with typical foreground content. Depth is assumed to take binary values, as in the original chart. The MTF calculation is explained in detail in [23].

Fig. LABEL:fig:mtf_quant presents the results computed from the reconstructed images in Fig. 11. One can observe the clear increase of resolution of the proposed method, compared to RGB-guided and non-guided depth completion approaches.

Figure 10: MTF comparison between bilinear222We use Delaunay triangulation [3] to perform a bivariate linear interpolation., bilateral solver [6], L1diag [24] and ours. Our method achieves significantly higher resolution at the same sampling rate.
(a) RGB
(b) Depth GT
(c) Bilinear
(d) Bilat. solver [6]
(e) L1diag [24]
(f) Ours
Figure 11: Qualitative results for depth reconstruction on our modified MTF road-like testchart with samples (1% ratio).

5 Experiments

We evaluate the performance of our depth sampling and reconstruction method and compare it to other approaches. For all other examined methods we simulate uniform random depth samples at different sparsity levels. We define pixel density as the ratio between the number of sampled pixels to the total number of pixels in the image. To demonstrate the generalization of our algorithm, we use two distinct datasets for evaluation - one for outdoor scenarios and one for indoor scenarios. For the outdoor dataset we also evaluate over a subset of image areas focusing on small obstacles in the image. Later, we make a qualitative comparison between different sampling patterns and show that ours leads to better result. Finally, we show initial experimental results, using a real system we built, based on the proposed principles. We measure performance on all experiments with RMSE (root mean squared error), and also report the REL (relative absolute error) metric on NYU-Depth-v2.

5.1 Outdoor data (Synthia)

The Synthia dataset [30] provides synthetic RGB, depth and semantic images for urban driving scenarios. We use synthetic data as for now there is no large non-synthetic dataset that provides a dense and accurate depth map. We need a dense depth map to be able to sample at any given point. Accuracy is required especially to show the increased resolution we obtain. Thus, two large real-life datasets do not apply: KITTI depth completion benchmark [35] has semi-dense depth, and Cityscapes [12] has low resolution depth, which is in some cases inaccurate. More technical details are given in Section 3.1. We make the evaluation on 0-100m depth range, which is similar to the range of a typical vehicle-mounted LiDAR.

Full scene experiment. Quantitative results are presented in Fig. 11(a). We achieve a 30% lower RMSE than the second best at 0.45% density, and keep having the best result through all densities. We also evaluated a deep-learning method [15] trained on KITTI benchmark [35], but it failed to obtain any comparable results. Qualitative comparison is shown in Fig. 13, exhibiting precise and sharp reconstruction, especially of small objects.

(a) Synthia
(b) Synthia obstacles
Figure 12: Quantitative comparison on Synthia (left) and obstacles (right) datasets between bilinear interpolation, bilateral solver [6], IP-Basic [19], L1diag [24] and ours.
Figure 13: Qualitative results of depth completion on Synthia and our obstacles dataset at 2% density (5000 samples). Tested completion methods: bilinear interpolation, bilateral solver [6], IP-Basic [19], L1diag [24] and ours.

Obstacles set. To enable evaluation over important objects in the image and to reduce the impact of far background on the performance measurement, we derived from Synthia a set of 100 obstacles, which we refer to as the obstacles dataset. We applied sampling and reconstruction over the entire image, but now evaluate over the obstacle mask. Quantitative results are presented in Fig. 11(b), and qualitative comparison is shown in Fig. 13. Table 1 compares the number of samples required to achieve certain levels of accuracy. We require 3-4 times less samples for a given RMSE.

RMSE [m] 0.6 0.75 0.95
Bilinear interp. 7.87% 2.90% 1.23%
Bilateral solver [6] 8% 8% 8%
IP-Basic [19] 8% 4.78% 2.42%
L1diag [24] 7.09% 3.08% 1.33%
Ours 1.89% 0.86% 0.40%
Table 1: Quantitative comparison for depth completion of required sampling percentage per RMSE on our Synthia obstacles dataset. Our method is 3-4 times more economic than second best.

Sampling only. We claim that using only our proposed sampling pattern, any completion method can achieve better results than using other existing sampling patterns, especially for small objects in the scene. Fig. 14 proves this qualitatively for 3 distinct reconstruction methods.

Figure 14: Sampling methods comparison: 4 sampling patterns are compared for different reconstruction method: uniform random, grid, Liu et al. [22] and ours (sampling only). The reconstructions are bilinear interp., IP-Basic [19] and L1diag [24]
Samples Method RMSE [m] REL
200 Bilinear interp. 0.257 0.047
200 L1diag [24] 0.236 0.044
225 Liao et al. [21] 0.442 0.104
200 Ma et al. [26] 0.230 0.044
200 HMS-Net [17] 0.233 0.044
200 Li et al. [20] 0.256 0.046
200 Ours 0.211 0.035
Table 2: Quantitative comparison with state-of-the-art on the NYU-Depth-v2 dataset.

5.2 Indoor data (NYU-Depth-v2)

The NYU-Depth-v2 dataset [32] includes labeled pairs of aligned RGB and dense depth images collected from different indoor scenes. More technical details are given in Section 3.1. Quantitative results are listed in Table 2. Our method outperforms all other methods. Note that 4 out of the other 6 methods are based on deep learning, while ours is not. A qualitative comparison is shown in Fig. 15. Although suffering from slight staircase artifacts, our result preserves edges better and stays precise even in small items.

Figure 15: Qualitative results of depth completion on NYU-Depth-v2 dataset at 0.7% density (500 samples). Tested completion methods: bilinear interpolation, L1diag [24], Ma et al. [26] and ours.

5.3 Prototype: Single-pixel mechanical sampler

Finally, we performed an experiment over real scene designed in our laboratory. To enable controllable sampling, we built a sampling device (Fig. 16) assembled by a laser rangefinder, a camera, motors and printed parts. We generated ground-truth images for comparison with Kinect 2 sensor. Note that ground-truth is in real-world coordinate, while our system measures range values.

We created two scenes and sampled them with 3 different patterns to demonstrate the superiority of our method. Results are presented in Fig. 16(f). While the first scene (top) is a toy example for testing the sampling resolution, the second scene (bottom) is more realistic. In both cases our method is able to sample all object (even the thinner ones) in the scene and reconstruct them quite accurately.

Figure 16: Our mechanical sampler.

Experiment 1

(a) RGB
(b) GT
(c) Depth samples (35 each)
(d) Random + [6]
(e) Grid + [6]
(g) RGB
(h) GT
(i) Depth samples (70 each)
(j) Random + [6]
(k) Grid + [6]
(l) Ours
(f) Ours
Figure 17: Experimental results on simple (top) and challenging (bottom) scenes taken in our lab. Random and grid patterns, both reconstructed by bilateral solver [6], fail to sample some of the targets. In contrast, we manage to take a measurement from each object and recover its shape precisely.
(f) Ours

Experiment 2

6 Conclusion

In this paper, we introduced a novel approach for image-based sparse depth sampling and dense reconstruction. We suggested a parametric piece-wise linear model and have shown its validity for indoor and outdoor datasets. We demonstrated that the correlation between depth and color domains allows to approximate well depth scenes using only an RGB image and a low number of carefully chosen depth samples. A single-pixel depth sampler was constructed as a proof-of-concept, verifying our predictions. We believe that this new direction calls for additional extensive research, in order to develop advanced, cheap and accurate depth sensing systems. In future work, we plan to combine classical and modern learning methods to further improve the performance and accuracy.

References

  • [1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Süsstrunk, et al. Slic superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence, 34(11):2274–2282, 2012.
  • [2] A. Aldroubi and K. Gröchenig. Nonuniform sampling and reconstruction in shift-invariant spaces. SIAM review, 43(4):585–620, 2001.
  • [3] I. Amidror. Scattered data interpolation methods for electronic imaging systems: a survey. Journal of electronic imaging, 11(2):157–177, 2002.
  • [4] P. Babu and P. Stoica. Spectral analysis of nonuniformly sampled data–a review. Digital Signal Processing, 20(2):359–378, 2010.
  • [5] S. Baker, R. Szeliski, and P. Anandan. A layered approach to stereo reconstruction. In

    Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231)

    , pages 434–441. IEEE, 1998.
  • [6] J. T. Barron and B. Poole. The fast bilateral solver. In European Conference on Computer Vision, pages 617–632. Springer, 2016.
  • [7] G. D. Boreman. Modulation transfer function in optical and electro-optical systems, volume 21. SPIE press Bellingham, WA, 2001.
  • [8] E. J. Candes, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
  • [9] P. Cheben, R. Halir, J. H. Schmid, H. A. Atwater, and D. R. Smith. Subwavelength integrated photonics. Nature, 560(7720):565, 2018.
  • [10] Z. Chen, V. Badrinarayanan, G. Drozdov, and A. Rabinovich. Estimating depth from rgb and sparse sensing. arXiv preprint arXiv:1804.02771, 2018.
  • [11] N. Chodosh, C. Wang, and S. Lucey. Deep convolutional compressed sensing for lidar depth completion. arXiv preprint arXiv:1803.08949, 2018.
  • [12] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele.

    The cityscapes dataset for semantic urban scene understanding.

    In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  • [13] P. Dollár and C. L. Zitnick. Structured forests for fast edge detection. In Proceedings of the IEEE international conference on computer vision, pages 1841–1848, 2013.
  • [14] G. Drozdov, Y. Shapiro, and G. Gilboa. Robust recovery of heavily degraded depth measurements. In 3D Vision (3DV), 2016 Fourth International Conference on, pages 56–65. IEEE, 2016.
  • [15] A. Eldesokey, M. Felsberg, and F. S. Khan. Propagating confidences through cnns for sparse data regression. arXiv preprint arXiv:1805.11913, 2018.
  • [16] S. Hawe, M. Kleinsteuber, and K. Diepold. Dense disparity maps from sparse disparity measurements. In 13th International Conference on Computer Vision, 2011.
  • [17] Z. Huang, J. Fan, S. Yi, X. Wang, and H. Li. Hms-net: Hierarchical multi-scale sparsity-invariant network for sparse depth completion. arXiv preprint arXiv:1808.08685, 2018.
  • [18] M. Jaritz, R. De Charette, E. Wirbel, X. Perrotton, and F. Nashashibi. Sparse and dense data with cnns: Depth completion and semantic segmentation. In 2018 International Conference on 3D Vision (3DV), pages 52–60. IEEE, 2018.
  • [19] J. Ku, A. Harakeh, and S. L. Waslander. In defense of classical image processing: Fast depth completion on the cpu. arXiv preprint arXiv:1802.00036, 2018.
  • [20] Y. Li, K. Qian, T. Huang, and J. Zhou. Depth estimation from monocular image and coarse depth points based on conditional gan. In MATEC Web of Conferences, volume 175, page 03055. EDP Sciences, 2018.
  • [21] Y. Liao, L. Huang, Y. Wang, S. Kodagoda, Y. Yu, and Y. Liu. Parse geometry from a line: Monocular depth estimation with partial laser observation. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 5059–5066. IEEE, 2017.
  • [22] L.-K. Liu, S. H. Chan, and T. Q. Nguyen. Depth reconstruction from sparse samples: Representation, algorithm, and sampling. IEEE Transactions on Image Processing, 24(6):1983–1996, 2015.
  • [23] C. Loebich, D. Wueller, B. Klingen, and A. Jaeger. Digital camera resolution measurement using sinusoidal siemens stars. In Digital Photography III, volume 6502, page 65020N. International Society for Optics and Photonics, 2007.
  • [24] F. Ma, L. Carlone, U. Ayaz, and S. Karaman. Sparse depth sensing for resource-constrained robots. arXiv preprint arXiv:1703.01398, 2017.
  • [25] F. Ma, G. V. Cavalheiro, and S. Karaman. Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. arXiv preprint arXiv:1807.00275, 2018.
  • [26] F. Ma and S. Karaman. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–8. IEEE, 2018.
  • [27] F. Marvasti. Nonuniform sampling: theory and practice. Springer Science & Business Media, 2012.
  • [28] M. Mishali and Y. C. Eldar. From theory to practice: Sub-nyquist sampling of sparse wideband analog signals. IEEE Journal of Selected Topics in Signal Processing, 4(2):375–391, 2010.
  • [29] C. V. Poulton, A. Yaacobi, D. B. Cole, M. J. Byrd, M. Raval, D. Vermeulen, and M. R. Watts. Coherent solid-state lidar with silicon photonic optical phased arrays. Optics letters, 42(20):4091–4094, 2017.
  • [30] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3234–3243, 2016.
  • [31] B. Schwarz. Lidar: Mapping the world in 3d. Nature Photonics, 4(7):429, 2010.
  • [32] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision, pages 746–760. Springer, 2012.
  • [33] H. Tao, H. S. Sawhney, and R. Kumar. A global matching framework for stereo computation. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 1, pages 532–539. IEEE, 2001.
  • [34] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In Computer Vision, 1998. Sixth International Conference on, pages 839–846. IEEE, 1998.
  • [35] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger. Sparsity invariant cnns. arXiv preprint arXiv:1708.06500, 2017.
  • [36] J. Yen. On nonuniform sampling of bandwidth-limited signals. IRE Transactions on circuit theory, 3(4):251–257, 1956.