I Introduction
Most contemporary architectures for geometric scene understanding cast the problem as one of regression  given an image, infer a depth for each pixel. However, in safetycritical systems such as autonomous vehicles, such perceptual inferences will be used to make critical decisions and motion plans with considerable implications for safety. For example, what if the estimated depth of an obstacle on the road is incorrect? Here, it is crucial to build recognition systems that (1) allow for safetycritical gracefuldegradation in functionality, rather than catastrophic failures; (2) are selfaware enough to diagnose when such failures occur; and (3) extract enough information to take an appropriate action, e.g. a slowdown, pullover, or alerting of a manual operator. Such requirements are explicitly laid out in Automotive Safety Integrity Level (ASIL) standards which selfdriving vehicles will be required to satisfy [18].
Such safety standards represent significant challenges for datadriven machine vision algorithms, which are unlikely to provide formal guarantees of performance [27]
. One attractive solution is that of probabilistic modeling, where uncertainty estimates are propagated throughout a model. In the contemporary world of deep learning, deep Bayesian methods
[6, 17] provide uncertainty estimates over model parameters (e.g., observing a scene that looks different than experience) and uncertainty estimates arising from ambiguous data (e.g., a sensor failure). We apply such approaches to the problem of depth estimation from a single camera. Our particular approach differs from prior work in two notable aspects. First, prior methods often require Monte Carlo sampling to compute uncertainty estimates [6], which can be slow for realtime safetycritical applications. Second, while certainty estimates provide some degree of selfawareness, they are limited to unimodal estimates of scene structure, implicitly producing a Gaussian estimate of depth represented by a regressed mean and regressed variance (or confidence)
[17]. Instead, we develop representations that report back multimodal distributions that allow us to ask more nuanced questions (e.g., “what is the second possible depth of a pixel?”, “how many modes exist in the distribution?”), as shown in Fig. 1 and Fig. 3.From a practical perspective, one may ask why bother estimating depth from a single camera when specialpurpose sensors for depth estimation exist (such as LIDAR or multiview camera rigs)? Common arguments include cost, payload and power consumption of robots [29], but we motivate this problem from a safety perspective. One crucial method for ensuring ASIL certification is redundancy, and so estimates of scene geometry that are independently produced from various sensors (e.g., independently from LIDAR and independently from cameras) and that agree provide additional fault tolerance. In Fig. 4, we illustrate a situation in which monocular depth estimation complements range sensing.
Our overall approach to probabilistic reasoning is to recast the continuous problem of depth regression (given an image patch , regress a depth value ) as a discrete problem of selecting one out of many possible discretized depths . Previous work [3] has already demonstrated that discretization can improve the accuracy of the underlying depth regression task, but we show that discretization is even more useful for producing simple and efficient (and possibility multimodal) uncertainty
estimates of depth. Intuitively, Kway classifiers are often trained with softmax loss functions, and so naturally report a distribution over
possible discrete depths. Importantly, we find that such distributions can be further improved by recasting the multiclass formulation as a binary multilabel task  essentially, train independent binary classifiers that classify patches at particular discrete depths. It is straightforward to show that the binary multilabel formulation can be seen as a relaxation of the multiclass problem that removes a linear constraint. Removing this constraint creates a more challenging learning problem that appears to be better regularized in terms of uncertainty reports. At testtime, we use the logits as an unnormalized distribution over possible depths, though they can easily be normalized posthoc (to compute summary statistics such as the expected depth).Our main contributions are as follows:

We formulate the problem of monocular depth estimation in a probabilistic framework, which gives us confidence intervals of depth instead of point estimations.

We recast the problem of depth regression as multilabel depth classification, which yields reliable, multimodal distributions over depth.

Our method produces accurate depth and significantly better uncertainty estimation over prior art on KITTI and NYUdepth while running near realtime.

Our predicted distribution over depths improves monocular 3D map reconstruction, reducing streaklike artifacts and improving accuracy as well as memory efficiency.
Ii Related Work
Single Image Depth Estimation: Early works [13, 26] popularize the problem of inferring scene depth maps from a single image, making use of handcrafted features. Eigen et al. [4]
take a datadriven approach to learn features in a coarsetofine network that refines global structure with local predictions. Some recent work substantially improves the performance of single image depth estimation using better deep neural network architectures
[20, 23, 28].Depth Estimation as Classification: Closely related to our work, Cao et al. [3] formulates depth estimation as a multiclass classification problem and use soft targets to train the model. However, they make inference by choosing the most likely depth class, which does not take full advantage of the depth distribution, while we explore richer inference methods based on the predicted depth distributions. More importantly, the standard multiclass classification approach tends to make confident errors and does not yield reliable uncertainty estimations. Instead, we learn the classification model as independent binary classifiers, which regularizes the model and gives us much better uncertainty estimation as well as noticeable performance improvement on standard benchmarks. Fu et al. [5] formulate depth estimation as ordinal regression, aiming to predict a CDF over depth. However, they do not ensure the predicted CDF to be monotonically nondecreasing. This makes it ungrounded to apply probabilistic reasoning for uncertainty estimation. In contrast, we formulate depth estimation as a discrete classification problem, aiming to predict a valid depth PDF.
Uncertainty in Depth Estimation: Kendall et al. [17]
introduce two kinds of uncertainties: epistemic uncertainty (over model parameters) and aleatoric uncertainty (over output distributions). They show that epistemic uncertainty is datadependent while aleatoric uncertainty is not. They model aleatoric uncertainty by fitting the variance of Gaussian distributions (also proposed in recent work on lightweight probabilistic extensions for deep networks
[7]). However, this might lead to unstable training and suboptimal performance. More importantly, this ignored the fact that depth distributions are multimodal in many cases (for example at depth discontinuities and reflective surfaces). They capture epistemic uncertainty by Bayesian neural networks [6]. However, it requires expensive Monte Carlo sampling to obtain depth predictions and uncertainty estimations. Instead, we focus on modeling the multimodal distributions over depth, which gives us more reliable uncertainty metrics without the additional computational overhead.Multiple Hypotheses Learning (MHL): Prior works [11, 21] formulate the problem of learning to predict a set of plausible hypotheses as multiplechoice learning. They train an ensemble of models to produce multiple possibilities and define an oracle to pick up the best hypothesis. Rupprecht et al. [25] uses a shared architecture to produce multiple hypotheses and train the network by assigning each sample to the closest hypothesis. Different from these approaches, we train a single network to produce a multimodal distribution, from which we can obtain multiple predictions without directly optimizing an oracle loss in training.
Iii Method
We solve the problem of inferring continuous depth through discrete classification. To illustrate the method, we first introduce how we discretize continuous depth into discrete categories. Then we show the formulation of depth estimation as a multiclass classification task (mutual exclusive) and a multilabel (binary) classification task (not mutually exclusive). Then we discuss the output of our model, i.e. a probabilistic categorical distribution over discrete depths, and how we will evaluate the output, including evaluating as a standard depth estimation task and as a depth estimation with uncertainty.
Discretization: We discretize continuous depth values in the log space. Given a continuous range of depth , we discretize it into intervals, i.e. , with
(1) 
This captures the perceptual difference in human visual systems, i.e., we care more about differences in depths of close objects than distant ones. Furthermore, due to sensor sampling effects, we tend to encounter more close points rather than far away ones. Working in log space partially alleviates this class imbalance problem.
Multiclass Classification: As a baseline method, we first show how we recast the continuous regression as a multiclass classification problem. A discrete distribution over depth can be parameterized by a categorical distribution . We learn to predict the probability
of each depth label by minimizing the negative log likelihood. Since we use the output of a softmax layer as the predicted probability, we will also refer to this variant as “Softmax” in the following text. Given a ground truth label
, image feature , and the model parameters , the loss function can be written as,(2) 
Here the distribution is predicted from a way multiclass classifier.
Equation (2
) gives us the crossentropy between an onehot label vector
and the predicted distribution . To incorporate the ordinal nature of the depth labels, i.e., penalize predictions closer to the ground truth less than predictions further away, we replace the onehot target vector with a discretized Gaussian centered around the ground truth, i.e.,(3) 
where is the partition function.
Binary Classification: To alleviate competition between depth classes, we further model continuous depth as a collection of
independent Bernoulli random variables
, where encodes the probability of falling into the depth interval. We also refer to this variant as multilabel in the paper. The loss function is written as,(4) 
where is an unnormalized version of soft target distribution.
One can see this as a relaxation of the training objective from Eq.(2) that drops the constraint that [22]. The variance is designed such that for all depth classes within difference to ground truth, their label is greater than . In test time, we push the prelogit scores of each binary classifier through a softmax and obtain a distribution over discrete depth, as shown in Fig. 5.
Predicting Depth from a Distribution: After obtaining the distribution over depth, Cao et al. [3] report the most confident depth class, ignoring the multimodal nature of the predicted distribution. Different from their approach, we report the expected depth based on the predicted distribution as , which takes into account the whole distribution and yields better depth estimations.
Uncertainty and Multiple Hypotheses: We now describe various statistics that can be computed from our multimodal distribution, motivated by autonomous robotic perception. Because the perception module of robots needs to be selfaware enough to report potential failures to the downstream planner or onlinemapping module when faced with ambiguous scenes, the first statistic is uncertainty, as computed with Shannon entropy:
(5) 
Secondly, even if the mostlikely (or expected) depth of a particular pixel is far away, a robotic motion planner may wish to decrease speed if there is a nonnegligible probability that its depth is in the nearfield (due to say, a translucent obstacle). As such, our network can directly output multiple depth modes to downstream planners.
Evaluation: Evaluating the above functionality on a robotic platform is difficult. Instead, to evaluate the quality of uncertainty estimation, we make use of the area under ROC curve (AUC), which is widely used in stereo vision and optical flow [2, 15]. To assess the accuracy of the multihypotheses output, we follow past work on MHL [11, 21] and use an “oracle” evaluation protocol where an algorithm is allowed to report back multiple depth predictions, and the best one is chosen to compute the accuracy [11]. We also report standard metrics [4] on depth estimation benchmarks.
Implementation We follow the architecture of Kuznietsov et al. [19] as shown in Fig. 5. We further add a spatial pyramid pooling module [12] to extract global and semiglobal features from the scene. We experimented with different numbers of bins on KITTI. With 32, 64, 96, 128 bins, our method achieves an absolute relative error (ARE) of 9.34%, 8.61%, 8.60%, 8.59%. As improvement becomes marginal, we pick 64 as the number of bins and used it for all experiments in this paper. Fig. 6 shows the unnormalized softtarget distribution we use when training binary classifiers.
Iv Experiments
We first introduce our experimental setup, including dataset and training details. We then compare to prior estimation methods that reason about uncertainties. Finally, we compare our method with the stateoftheart on the standard depth estimation task, as well as using multihypotheses evaluation [25].
Setup: We test our method on the standard depth estimation benchmarks, including KITTI [8] for outdoor scenes (180m) and NYUv2 [4] for indoor scenes (0.510m). On KITTI, we follow Eigen’s split [4] for training and testing. On NYUv2, we sample k images following [20] for training and test on the official test split.
Training:
We first initialize the weights of our ResNet50 backbone with the ImageNet pretrained ones. To augment training data, we apply random gamma, brightness, and color shift, as in
[10]. We finetune the weights with an Adam optimizer with an initial learning rate of and decrease the learning rate with a factor of after epochs. We train our KITTI model for a total of 60 epochs and our NYUv2 model for a total of 160 epochs. Our experiments are run on a machine with GeForce GTX Titan X GPU using Tensorflow.Iva Depth Estimation with Uncertainty
Baselines: Considering most prior art do not reason about uncertainty, we compare to predictive Gaussian and predictive Gaussian with Monte Carlo dropout (Gaussiandropout) [7, 17] in terms of depth estimation with uncertainty, as shown in Tab. I. For a fair comparison, we reimplement and train predictive Gaussian and Gaussiandropout on KITTI and NYU depth v2. We make sure the reimplemented version has an architecture that is as close as possible to ours. For predictive Gaussian, we use the same backbone architecture but with a different prediction head, which predicts the mean and variance of a Gaussian distribution over depth in log space. To train predictive Gaussian, we minimize the perbatch negative loglikelihood based on the predicted mean and variance. For Gaussiandropout, we use the same backbone architecture and prediction head except we perform dropout with a probability of 0.5 after several convolutional layers, as in Kendall et al. [16]. During inference, we draw 32 samples to make predictions and estimate uncertainty. Following the same idea, we apply Monte Carlo dropout to our binary model, referred to as Binarydropout.
Following Hu et al. [15], we plot ROC curves to evaluate our depth estimation with uncertainty, as shown in Fig. 7 and Fig. 8. Such curves demonstrate how well the predicted uncertainty correlates with the actual depth estimation performance. A point on the curve indicates a performance of on the least uncertain (%) predictions over all pixels in the test set. Perfect uncertainty estimation, from the perspective of the ROC curve, should rank predictions as if they are ranked by the actual error. As a reference, we include curves with such oracle w.r.t. a specific error metric (absolute relative error or ARE). Below, we first compare two variants of our model (binary classification and multiclass classification). Then we will compare our model to prior art that predicts uncertainty (predictive Gaussian and Gaussiandropout). For each submetric under AUC, we follow the definition in Eigen et al. [4].
Binary classification vs Multiclass classification: In Fig. 7, we compare the model trained with binary classification loss (“Binary”) to the model trained with multiclass classification loss (“Softmax”). As we can see on the left side of both plots, the uncertainty predicted by the multiclass classifier does not correlate well with the actual error rate, especially for those least uncertain (or most confident) pixels. In contrast, the model trained with binary classification loss produces a curve that monotonically increases as the uncertainty threshold goes up, because it is able to correctly rank more correct pixels as more confident. We posit that our multilabel loss (that removes a linear constraint present in the multiclass formulation) acts as an additional regularizer that improves uncertainty estimation.
Gaussian vs Binary: In Fig. 8, we find predictive Gaussian also yields reliable uncertainty estimation, as it produces a monotonically increasing curve. Overall it achieves a slightly worse performance, comparing to our model trained with binary classification. It might be due to its unimodal assumption and optimization difficulties in training time (discussed further in our ablation study). Interestingly, adding Monte Carlo dropout significantly improves NYU performance for both predictive Gaussian (“Gaussiandropout”) and our approach (“Binarydropout”). However, on KITTI, we see a strictly worse performance for the predictive Gaussian.
Quantitative evaluation: In Tab. I, we further compare uncertainty estimation quantitatively using metrics introduced in Section III. Our binary classification method produces better performance in terms of AUC compared to predictive Gaussian and its Monte Carlo dropout variant in terms of ARE and , without expensive Monte Carlo sampling. By adding Monte Carlo dropout to our model, we can further improve AUC of ARE, RMSE and on NYU depth v2. Although predictive Gaussian with Monte Carlo dropout outperforms our binary loss on all metrics based on RMSE, it is too slow for realtime perception. Please refer to Tab. I for more detailed discussion.
AUC  time  
Method  ARE  RMSE  (ms)  
K  Gaussian [17]  4.38  1.42  2.63  64 
Softmax  5.19  2.88  2.93  74  
Binary  4.17  1.33  1.79  74  
Gaussiandropout [17]  5.18  1.21  3.61  467  
Binarydropout  4.20  1.33  2.06  540  
N  Gaussian [17]  10.94  0.41  10.95  44 
Softmax  11.17  0.53  11.09  52  
Binary  10.28  0.42  9.26  52  
Gaussiandropout [17]  10.33  0.32  10.30  353  
Binarydropout  9.39  0.40  7.79  410 
Method  ARE ()  RMSE  ()  time (ms)  
K  Binary  8.9  3.85  90.7  74 
Fu et al. [5]  9.1  3.90  90.5  74  
Cao et al. [3]  9.3  4.02  90.8  74  
Eigen et al. [4]  19.0  7.16  69.2  13  
Godard et al. [10]  11.4  4.94  86.1  35  
Cao et al. [3]  11.5  4.71  88.7    
Fu et al. [5]  7.2  2.73  93.2  1250  
N  Binary  14.2  0.51  82.7  52 
Binarydropout  13.9  0.50  82.8  410  
Kendall et al. [17]  14.4  0.51  81.5  353  
Eigen et al. [4]  15.8  0.64  76.9  10  
Laina et al. [20]  12.7  0.57  81.1  55  
Fu et al. [5]  11.5  0.51  82.8    
Kendall et al. [17]  11.0  0.51  81.7  7500 
IvB Multihypothesis Depth Prediction
We first evaluate standard depth prediction performance on KITTI and NYUv2 using metrics proposed in [4], as shown in Tab. II. We then extend the evaluation by allowing multiple depth hypotheses. For a fair comparison, we reimplement Fu et al. [5] and Cao et al. [3] under the same setup as ours (a lightweight backbone and no testtime ensemble). We also include numbers in the original paper as a reference. Please refer to Tab. II for detailed comparison.
To evaluate our multimodal distributions, we follow the standard protocol in multihypothesis learning [21]. After computing the prelogits scores, we report back depth hypotheses with the highest scores, and the one with the lowest error is selected by the oracle for evaluation.
Since most methods can’t output multiple hypotheses, we compare to the ones that can be trained to output multiple hypotheses [25], referred to as MHL. Similar to traditional regression, we directly regress to the depth in log space. However in training time, we make predictions and construct an oracle loss by selecting the prediction that best describes the groundtruth in terms of distance. We train the MHL baseline for , and use an oracle to select the best prediction for evaluation. Please see Fig. 9 for analysis of the results.
V Building Maps with Uncertainty
In this section, we demonstrate one application of geometric uncertainty estimation: robust map reconstruction. Though maps are often constructed in an offline stage, online mapping can be an integral part of autonomous navigation in unknown/changing environments [24].
In practice, is it notoriously difficult to build 3D maps from raw depth predictions because they tend to contain “streaklike artifacts” [1], which not only affect the quality of the map but also increase the memory usage (because they often result in larger occupied volumes). Empirically, we find that such artifacts often happen where ground truth depth is inherently ambiguous and follows a multimodal distribution, e.g. depth discontinuities and reflective surfaces. Since our depth estimator is designed to predict multimodal distributions over depth, we use it to improve the accuracy of map reconstruction. By simply thresholding the uncertainty of each pixel’s predicted distributions, we can significantly reduce streak artifacts and memory usage, as shown in Fig. 2.
We evaluate the performance of map reconstruction with and without uncertainty on KITTI odometry sequence00 [9], which is not included in the training set. Specifically, we run our monocular depth estimator on left RGB images, and feed the output depth maps together with groundtruth odometry as the input of Octomap [14]. The accuracy is measured as the percentage of correctly mapped map cells, where a cell counts as correctly mapped if it has the same state (free or occupied) as the LiDAR map (groundtruth). As shown in Tab. III, applying a simple uncertaintybased ranking and selection improves the accuracy of monocular maps by 1.8% and reduces the memory usage by 25%.
Method  Accuracy (%)  Memory (MB) 

LiDARFOV  95.9  1220.9 
Oursbinary  88.3  1682.6 
Oursbinary80%  89.9  1263.2 
Conclusion
Robotic applications of perception present new challenges for safetycritical, faulttolerant operation. Inspired by past approaches that advocate a probabilistic Bayesian perspective, we demonstrate a simple but effective strategy of discretization (with the appropriate quantization, smoothing, and training scheme) as a mechanism for generating detailed predictions that support such safetycritical operations.
Appendix A Supplementary material
Aa Ablation study
To reveal the contribution of each design choice to the accuracy of the standard depth estimation task, we perform an extensive ablation study as shown in Tab. V.
Classification vs Regression: We first compare regression loss to classification losses (Binary and Multiclass). We find that classification loss always outperforms regression method in terms of absolute relative error and . However, regression achieves competitive RMSE, likely because it directly minimizes squared error. We also implement Berhu [20] regression loss, and it is still easily outperformed by classificationbased methods.
Multiclass vs Binary classification Training with binary classification loss gets similar performance compared to multiclass classification loss on KITTI. However, it yields significantly better results on NYU. Since test images in NYU differ more from the training images than KITTI, we posit that binary classification loss gives better generalization ability compared to multiclass classification loss.
Effect of Monte Carlo dropout On KITTI, Monte Carlo dropout makes prediction performance worse for both binary classification method and predictive Gaussian. However on NYU, it improves results for both methods. This is possible because NYU contains more diverse scenes, where dropout helps prevent overfitting. While on KITTI, training and testing data are highly correlated. Therefore, regularizing the model by dropout does not help.
Expectation vs Mostlikely class inference On KITTI, we find that expectation yields better results for all metrics except for . While for NYU, expectation always outperforms (or on par with) mostlikely class. This indicates that expectation is a better way of making a prediction from a depth distribution, since it makes use of the whole distribution.
Soft targets vs Onehot targets Comparing the results of training with softtarget distribution vs onehot label, we find that softtarget always performs better. We posit that by training with soft targets, our model benefits from sample sharing, and thus performs better than using onehot labels.
Test dataset  Abs Rel ()  RMSE  () 

KITTI  9.9  3.969  89.1 
NYUv2  15.3  0.541  80.3 
AB Training on mixed KITTI and NYUv2
To obtain a robust model that works for both indoor and outdoor scenes, we train a single model using KITTI and NYUv2. To precisely capture the full depth range in both datasets, we adjust the depth range to m to m and the number of depth intervals to . At training time, we randomly crop the data to and average loss over the image before averaging over the whole batch. As shown in Tab. IV, when trained jointly, the performance of our model is not severely affected on both datasets.
Acknowledgements This work was supported by the CMU Argo AI Center for Autonomous Vehicle Research.
References
 [1] (2018) Robust dense mapping for largescale dynamic environments. In ICRA, Cited by: §V.
 [2] (2006) A confidence measure for variational optic flow methods. In Geometric Properties for Incomplete Data, pp. 283–298. Cited by: §III.
 [3] (2017) Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology. Cited by: TABLE V, §I, §II, §III, Fig. 7, Fig. 9, §IVB, TABLE II.
 [4] (2014) Depth map prediction from a single image using a multiscale deep network. In NIPS, Cited by: §II, §III, §IVA, §IVB, TABLE II, §IV.
 [5] (2018) Deep ordinal regression network for monocular depth estimation. In CVPR, Cited by: TABLE V, §II, Fig. 9, §IVB, TABLE II.
 [6] (2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In ICML, Cited by: §I, §II.
 [7] (2018) Lightweight probabilistic deep networks. In CVPR, Cited by: §II, §IVA.
 [8] (2013) Vision meets robotics: the kitti dataset. IJRR. Cited by: §IV.
 [9] (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, Cited by: §V.
 [10] Unsupervised monocular depth estimation with leftright consistency. In CVPR, pp. 7. Cited by: TABLE II, §IV.
 [11] (2012) Multiple choice learning: learning to produce multiple structured outputs. In NIPS, Cited by: §II, §III.
 [12] (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, Cited by: §III.
 [13] (2005) Geometric context from a single image. In ICCV, Cited by: §II.
 [14] (2013) OctoMap: an efficient probabilistic 3d mapping framework based on octrees. Autonomous robots 34 (3), pp. 189–206. Cited by: Fig. 2, §V.
 [15] (2012) A quantitative evaluation of confidence measures for stereo vision. TPAMI. Cited by: §III, §IVA.
 [16] (2015) Bayesian segnet: model uncertainty in deep convolutional encoderdecoder architectures for scene understanding. arXiv preprint arXiv:1511.02680. Cited by: §IVA, TABLE II.

[17]
(2017)
What uncertainties do we need in bayesian deep learning for computer vision?
. In NIPS, Cited by: TABLE V, §I, §II, Fig. 8, §IVA, TABLE I, TABLE II.  [18] (2016) Challenges in autonomous vehicle testing and validation. SAE International Journal of Transportation Safety 4 (1), pp. 15–24. Cited by: §I.
 [19] (2017) Semisupervised deep learning for monocular depth map prediction. In CVPR, Cited by: §III.
 [20] (2016) Deeper depth prediction with fully convolutional residual networks. In 3DV, Cited by: §AA, TABLE V, §II, TABLE II, §IV.
 [21] (2016) Stochastic multiple choice learning for training diverse deep ensembles. In NIPS, Cited by: §II, §III, §IVB.
 [22] (2018) Bruteforce facial landmark analysis with a 140,000way classifier. AAAI. Cited by: §III.
 [23] (2016) Learning depth from single monocular images using deep convolutional neural fields.. TPAMI. Cited by: §II.
 [24] (2018) Autonomous vehicle navigation in rural environments without detailed prior maps. In ICRA, Cited by: §V.
 [25] (2017) Learning in an uncertain world: representing ambiguity through multiple hypotheses. In ICCV, Cited by: §II, §IVB, §IV.
 [26] (2006) Learning depth from single monocular images. In NIPS, Cited by: §II.
 [27] (2017) On a formal model of safe and scalable selfdriving cars. CoRR abs/1708.06374. External Links: Link, 1708.06374 Cited by: §I.
 [28] (2017) Multiscale continuous crfs as sequential deep networks for monocular depth estimation. In CVPR, Cited by: §II.
 [29] (2017) Realtime monocular dense mapping on aerial robots using visualinertial fusion. In ICRA, Cited by: §I.
Comments
There are no comments yet.