A robust and accurate object detection system using on-board sensors (e.g. camera, LiDAR, Radar) is crucial for the road scene understanding of autonomous driving. Among different sensors, LiDAR can provide us with accurate depth information, and is robust under different illumination conditions such as daytime and nighttime. These properties make LiDAR indispensable for safe autonomous driving. The recent Uber’s autonomous driving fatal tragedy could have been avoided, if the LiDAR perception system had robustly detected the pedestrian, or had timely informed the human driver to trigger the emergency braking because it was uncertain with the driving situation.
Recently, deep learning approaches have brought significant improvement to the object detection problem. Many methods have been proposed that use LiDAR point clouds [3, 4, 5, 6, 7, 8, 9, 10, 11] or fuse them with camera images [12, 13, 14, 15, 16, 17, 18, 19]
. However, they only give us deterministic bounding box regression and use softmax scores to represent recognition probability, which does not necessarily represent uncertainties in the network. In other words, they do not provide detection confidence regarding classification and localization. For a robust perception system, we need to explicitly model the network’s uncertainties.
Towards this goal, in this work we build a probabilistic 2-stage-based object detector from LiDAR point clouds by introducing heteroscedastic aleatoric uncertainties - the uncertainties that represent sensor observation noises and vary with the data input. The method works by adding auxiliary outputs to model the aleatoric uncertainties, and training the network with a robust multi-loss function. In this way, the network learns to focus more on informative training samples and ignore the noisy ones. We call our method PROD (Probabilistic Real-time Object Dector). Our contributions can be summarized as follows:
We model heteroscedastic aleatoric uncertainties in a 3D object detection network using LiDAR point clouds.
We show that by leveraging aleatoric uncertainties, the network produces state-of-the-art results and significantly increases the average precision up to compared to the baseline method without any uncertainty estimations.
We systematically study how the aleatoric uncertainties behave. We show that the uncertainties are related with each other and are influenced by multiple factors such as detection distance, occlusion, softmax score, and orientation.
In the sequel, we first summarize related works in Sec. II, and then describe our proposed method in Sec. III in detail. Sec. IV shows the experimental results regarding the improvement of object detection performance by leveraging aleatoric uncertainties and understanding how the uncertainties behave. Sec. V summarizes the work and discusses future research. The video of this work is provided as supplementary material.
Ii Related Works
In the following, we summarize methods for LiDAR-based object detection in autonomous driving and uncertainty quantification in deep neural networks.
Ii-a LiDAR-based Object Detection
employs a 3D fully convolutional neural network for discretized point clouds to predict an objectness map and a 3D bounding box map. Other works project 3D point clouds onto a 2D plane and use 2D convolutional network to process these LiDAR feature maps. They can be represented by front-view cylindrical images[4, 21, 19, 12], camera-coordinate images [8, 22, 23], or bird’s eye view (BEV) map [24, 25, 26, 27]. Besides LiDAR, an autonomous driving car is ususally equipped with other sensors such as cameras or radar sensors. Therefore, it is natural to fuse them for more robust and accurate object detection. For instance, Chen et al.  use LiDAR BEV map to generate region proposals, and fuse the regional features from LiDAR BEV and front-view maps, as well as camera images for 3D car detection. Qi et al.  propose to generate 2D object bounding boxes by an image detector, and use the regional point clouds within these bounding boxes for object detection.
Ii-B Uncertainty Quantification in Deep Neural Networks
There are two types of uncertainties we can model in neural networks. Epistemic uncertainty shows the model’s uncertainty when describing the training dataset. It can be quantified through variational inference , sampling technique [29, 20] or ensemble 
, and has been applied to active learning[31, 32], image semantic segmentation [33, 34], camera location estimation  or open-dataset object detection problems . Aleatoric
uncertainty, on the other hand, models the observation noises of input data. It has been modeled by Laplacian distribution or Gaussian distribution for camera image semantics and geometry predictions[37, 38]. Recently, we explicitly model and compare the epistemic and aleatoric uncertainties in an object detector . We have shown that epistemic uncertainties are related to detection accuracy, whereas aleatoric uncertainties are influenced by observation noises.
In this section, we present our method which leverages heteroscedastic aleatoric uncertainties for robust LiDAR 3D object detection. We start with illustrating our network architecture, which is followed by a description of how to model the uncertainties. We end the section with a description of our proposed robust multi-loss function.
Iii-a Network Architecture
Iii-A1 LiDAR Point Clouds Transformation
In this work, LiDAR point clouds are encoded in 2D bird’s eye view (BEV) feature maps as network inputs. These feature maps include height and density information generated by projecting the 3D point clouds on the 2D grid .
Iii-A2 Two-stage Object Detector
The network architecture is shown in Fig. 1. We follow the two-stage object detection network proposed in . The LiDAR BEV feature maps are fed into a pre-processing network based on VGG16 to extract high-level LiDAR features. After the pre-processing layers, a region proposal network (RPN) produces 3D region of interests (ROIs) based on pre-defined 3D anchors at each pixel on the feature map. The RPN consists of two task-specific fully connected layers, each with hidden units. A ROI is parametrized by , with and indicating the ROI position in the bird’s eye view plane, its height, , , its dimensions, and the softmax objectness score. The anchor’s dimensions are determined by clustering the training samples with two clusters. Its height is based on the LiDAR’s height above the ground plane. Similar to , the RPN regresses the region proposals by the normalized offsets denoted as and predicts the objectness score
by a softmax layer.
The Fast-RCNN head (FRH) with three fully connected hidden layers ( units for each layer) is designed to fine-tune ROIs generated by RPN. It produces multi-task outputs, i.e. the softmax probability , 3D bounding box location and orientation. We encode the location with four corner method introduced in : , with and being the relative position of a bounding box corner in , axes of the LiDAR coordinate frame, and being the heights offsets from the ground plane. We also encode the orientation as , with being the object orientation in BEV. As explained in , explicitly modeling the orientation can remedy angle wrapping problem, resulting in a better object detection performance.
Iii-B Modeling Heteroscedastic Aleatoric Uncertainties
As introduced in Sec. I, heteroscedastic aleatoric uncertainties indicate data-dependent observation noises in LiDAR sensor. For example, a distant or occluded object should yield high aleatoric uncertainties, since there are a few LiDAR points representing them. In our proposed robust LiDAR 3D object detector, we extract aleatoric uncertainties in both RPN and FRH.
Let us denote an input LiDAR BEV feature image as , and a region of interest produced by RPN as . We also refer to and as the classification labels for RPN and FRH outputs respectively, with indicating “object” and “background” class. We denote as the network weights and as the noise-free outputs of the object detection network.
We use multivariate Gaussian distributions with diagonal covariance matrices to model the observation likelihood for the anchor position , 3D bounding box location and orientation :
where , and
refer to the observation noise vectors, and each element of which indicates an aleatoric uncertainty scalar corresponding to an element in, and .
For classification tasks, the observation likelihood and can be represented by softmax functions:
Here, we do not explicitly model the aleatoric classification uncertainties, as they are self-contained from the softmax scores which follow the categorical distribution.
The uncertainty scores , , can be obtained by adding auxiliary output layers in the object detection network. To increase numerical stability and consider the positivity constraints, we use for regression. The regression outputs of RPN that model the aleatoric uncertainties can be formulated as , and for FRH they are and .
Iii-C Robust Multi-Loss Function
We incorporate aleatoric uncertainties , and in a multi-loss function for training our object detector via:
where we use smooth loss for , and , and cross entropy loss for and , similar to . Modeling auxiliary uncertainties in this multi-loss function has two effects. First, an uncertainty score can serve as a relative weight to a sub-loss. Thus, optimizing relative weights enables the object detector to balance the contribution of each sub-loss, allowing the network to be trained more easily. Second, leatoric uncertainties can increase the network robustness against noisy input data. For an input sample with high aleatoric uncertainties, i.e. the sample is noisy, the model decreases the residual regression loss because becomes small. Conversely, the network is encouraged to learn from the informative samples with low aleatoric uncertainty by increasing the residual regression loss with larger term.
Iv Experimental Results
Iv-a Experimental Setup
Iv-A1 Dataset and Input Representation
We evaluate the performance of our proposed method on the “Car” category from the KITTI object detection benchmark . We use the LiDAR point cloud within the range - meters in the LiDAR coordinate frame. The point clouds are discretized into height slices along the axis with meters resolution and the length and width are discretized with meters resolution, similar to . After incorporating a density feature map, the input LiDAR point clouds are represented by the feature maps with size .
Iv-A2 Implementation Details
The RPN network and the Fast-RCNN head are trained jointly in an end-to-end fasion. The background anchors are ignored for the RPN regression loss. An anchor is assigned to be “Car” class when its Intersection over Union (IoU) with ground truth in the BEV is larger than and “background” if it is below . Anchors that are neither “Car” nor “background” do not contribute to the learning. Besides, we apply Non-Maximum Suppression (NMS) with the threshold on the region proposals to reduce redundancy, keeping proposals during the training process and during the test time. To train the Faster R-CNN head, a ROI is labeled to be positive when its 2D IoU overlap with ground truth in BEV is larger than and negative when it is less than . We train the network with Adam optimizer. The learning rate is initialized as and decayed exponentially for every steps with a decay factor . We also use Dropout and regularization to prevent over-fitting. We first train the network without aleatoric uncertainties for steps and then use the robust multi-loss function (Eq. 3) for another steps. We find that the network converges faster following this training strategy.
Iv-B Comparison with State-of-the-art Methods
We first compare the performance of our proposed network PROD (Ours) with the baseline method which does not explicitly model aleatoric uncertainties, as well as other state-of-the-art methods (see Tab. I and Tab. II). For a fair comparison, we only consider LiDAR-only methods. We use 3D Average Precision for 3D detection (, and bird’s eye view Average Precision for 3D localization (). The AP values are calculated at Intersection Over Union IOU=0.7 threshold introduced in  unless mentioned otherwise.
Tab. I shows the performance on KITTI test set. The baseline method performs similarly to MV3D(BV+FV) network. By leveraging aleatoric uncertainties, PROD significantly improves the baseline method up to for , and produces results comparable to PIXOR  and VoxelNet . Tab. II shows the detection performance on KITTI val set with the same train-val split introduced in . By modeling aleatoric uncertainties, PROD improves the baseline method up to nearly . It also outperforms all other methods in for moderate and hard settings. The experiments show the effectiveness of our proposed method.
|3D FCN ||-||-||-||69.94||62.54||55.94|
|MV3D (BV+FV) ||66.77||52.73||51.31||85.82||77.00||68.94|
|MV3D (BV+FV) ||71.19||56.60||55.30||86.18||77.32||76.33|
|F-PointNet (LiDAR) ||69.50||62.30||59.73||-||-||-|
Iv-C Ablation Study
We then conduct an extensive study regarding on where to model the uncertainties, network speed and memory, as well as a qualitative analysis. We use the train-val split introduced in  for evaluations.
Iv-C1 Where to Model Aleatoric Uncertainties
In this experiment we study the effectiveness of modeling aleatoric uncertainties in different networks of our LiDAR 3D object detector. In this regard, we train another two detectors that only capture uncertainties either in RPN (Aleatoric RPN) or FRH (Aleatoric FRH) and compare their 3D detection performance with the baseline method (Baseline) and our proposed method (Ours) that models uncertainties in both RPN and FRH. Tab. III illustrates the AP values and their improvements for Baseline on KITTI easy, moderate, and hard settings. We find that modeling the aleatoric uncertainties in either RPN, FRH or both can improve the detection performance, while modeling the uncertainties in both networks brings the largest performance gain in moderate and hard settings. Furthermore, we evaluate the detection performance on different LiDAR ranges, shown by Tab. IV. Again, our method consistently improves the detection performance compared to Baseline. Modeling the uncertainties in both RPN and FRH shows highest improvement between ranges meters, indicating that aleatoric uncertainties in both networks can compensate each other. As we will demonstrate in the following section (Sec. IV-D), our proposed network handles cars from easy, moderate, hard, near-range or long-range settings differently. It learns to adapt to noisy data, resulting in improved detection performance.
|Aleatoric RPN||72.92 (+1.42)||63.84 (+0.13)||58.61 (+1.30)|
|Aleatoric FRH||81.07 (+9.57)||65.51 (+1.80)||65.09 (+7.78)|
|Ours||78.81 (+7.31)||65.89 (+2.18)||65.19 (+7.88)|
|0-20 (m)||20-35 (m)||35-50 (m)||50-70 (m)|
|Aleatoric RPN||80.86 (+8.44)||79.72 (+0.76)||65.44 (+7.57)||30.54 (+4.37)|
|Aleatoric FRH||79.10 (+6.68)||83.89 (+4.93)||61.98 (+4.11)||29.67 (+3.50)|
|Ours||80.78 (+8.36)||84.75 (+5.79)||66.81 (+8.94)||34.07 (+7.90)|
Iv-C2 Runtime and Number of Parameters
We use the runtime and the number of parameters to evaluate the computational efficiency and memory requirement. Tab. V shows the results of PROD relative to the baseline network. We only need additional and parameters to predict aleatoric uncertainties during inference, showing the high efficiency of our proposed method.
|Method||Number of parameters||Runtime|
Iv-C3 Qualitative Analysis
). However, the network also tends to mis-classify objects with car-like shapes, e.g. in Fig.2 the network incorrectly predicts the fences on the bottom-left side as a car. Such failures could be avoided by fusing the image features from vision cameras.
Iv-D Understanding Aleatoric Uncertainties
We finally conduct comprehensive experiments to understand how aleatoric uncertainties behave. We use the scalar Total Variance
Total Variance(TV) to quantify aleatoric uncertainties introduced in . A large TV score indicates high uncertainty. We also use Pearson Correlation Coefficient (PCC) to measure the linear correlation. We study RPN uncertainties and FRH uncertainties. The RPN uncertainties indicate observation noises when predicting anchor positions, whereas the FRH uncertainties indicate the noises for final bounding box predictions. It consists of FRH location uncertainties for the bounding box regression and FRH orientation uncertainties for heading predictions. We evaluate all predictions with a score larger than from the KITTI val set unless mentioned otherwise.
Iv-D1 Relationship Between Uncertainties
Fig. 3 and Fig. 3 show the prediction distribution of FRH uncertainties with RPN uncertainties, as well as FRH location uncertainties with orientation uncertainties, respectively. The uncertainties are highly correlated with each other, indicating that our detector has learned to represent LiDAR observation noises at different sub-networks and prediction outputs.
Iv-D2 Orientation Uncertainties
Fig. 4 illustrates the prediction distribution of FRH orientation uncertainties (radial axis) w.r.t. angle values (angular axis) in the polar coordinate. Most predictions lie at four base angles, i.e. . Fig. 4 shows the average orientation uncertainties with orientation difference between predicted angles and the nearest base angles. They are highly correlated with PCC=0.99, showing that the network produces high observation noises when predicting car headings that are different from the base angles.
Iv-D3 Relationship Between Softmax Score, Detection Distance and Uncertainties
In Fig. 5 we plot the average RPN and FRH uncertainties with the increasing softmax scores for anchor and final object classification. We find a strong negative correlation between them. As introduced by Eq. 2, the softmax scores can be regarded as aleatoric uncertainties for classification. This indicate that our network has learned to adapt the uncertainties in regression tasks (i.e. anchor position, bounding box location and orientation) to that in the classification, i.e. the uncertainties increase as the softmax score reduces. Fig. 5 shows that the average RPN and FRH uncertainties become larger as detection distance increases. This is because the LiDAR sensor measures fewer reflections from a distant car, which yields high observation noises.
Iv-D4 Uncertainty Distribution for Easy, Moderate, and Hard Settings
We finally evaluate the FRH uncertainty distribution for easy, moderate, and hard objects, demonstrated in Fig. 6. The uncertainty distributions vary: for easy setting there are more predictions with lower uncertainties, whereas for hard objects which have larger occlusions, truncations, and detection distances, the network estimates higher uncertainties. The result indicates that the network has learned to treat objects from these three settings differently.
Iv-D5 Qualitative Observations
In Fig. 7 we visualize nine exemplary detections whose uncertainties are equally distributed at log scale, ranging from -3 to -1. We observe: (1). The uncertainties are influenced by occlusion, detection distance, orientation as well as softmax score, as discussed above. (2) The network shows higher aleatoric uncertainties if there are fewer points around the car. The results show that our network has captured reasonable LiDAR observation noises.
V Discussion and Conclusion
We have presented our robust LiDAR 3D object detection network called PROD that leverages heteroscedastic aleatoric uncertainties to significantly improve detection performance. Trained with a multi-loss function which incorporates aleatoric uncertainties, PROD learns to adapt to noisy data and increases the average precision up to , producing state-of-the-art results on KITTI object detection benchmark. Our method only requires to modify the cost function and output layers with only additional inference time. This makes our method suitable for real-time autonomous driving applications.
We have qualitatively analyzed how PROD learns to deal with noisy data in Sec. IV-C. The network tends to predict high uncertainties for detections that are highly occluded (Fig. 7), far from the ego-vehicle (Fig. 5), with different orientations from the base angles (Fig. 4) or with low objectness score (Fig. 5). Therefore, our network learns less from noisy samples during the training process by penalizing the multi-loss objective with the terms (Eq. 3). Conversely, the netowrk is encouraged to learn more from the informative training samples with more LiDAR reflections. In this way, its robustness against noisy data is enhanced, resulting in improved detection performance for data from easy, moderate, hard settings (Tab. III) or at different distances (Tab. IV). Note that this effect is because we explicitly model the observation noises rather than an ad-hoc solution.
Compared to Focal Loss 
which incorporates prediction probability in the loss function to tackle the positive-negative sample imbalance problem, our proposed robust multi-loss function works in an opposite way: it depreciates “outliers” training samples with high aleatoric uncertainties, and encourages the network to learn from those with small errors. It is an interesting future work to investigate how a network behaves with Focal Loss and our proposed method using different driving datasets.
We thank Zhongyu Lou his suggestions and inspiring discussions. We also thank Xiao Wei for reading the script.
-  “Preliminary report highway: Hwy18mh010,” National Transportation Safety Board (NTSB), Tech. Rep., 05 2018.
-  J. Janai, F. Güney, A. Behl, and A. Geiger, “Computer vision for autonomous vehicles: Problems, datasets and state-of-the-art,” arXiv preprint arXiv:1704.05519, 2017.
-  B. Li, “3d fully convolutional network for vehicle detection in point cloud,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on. IEEE, 2017, pp. 1513–1518.
-  B. Li, T. Zhang, and T. Xia, “Vehicle detection from 3d lidar using fully convolutional network,” Robotics:Science and Systems, 2016.
-  Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” arXiv preprint arXiv:1711.06396, 2017.
-  D. Z. Wang and I. Posner, “Voting for voting in online point cloud object detection.” in Robotics: Science and Systems, vol. 1, 2015, p. 5.
-  M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner, “Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017, pp. 1355–1361.
-  A. Asvadi, L. Garrote, C. Premebida, P. Peixoto, and U. J. Nunes, “Depthcn: vehicle detection using 3d-lidar and convnet,” in IEEE International Conference on Intelligent Transportation Systems, 2017.
-  K. Minemura, H. Liau, A. Monrroy, and S. Kato, “Lmnet: Real-time multiclass object detection on cpu using 3d lidars,” arXiv preprint arXiv:1805.04902, 2018.
-  Y. Zeng, Y. Hu, S. Liu, J. Ye, Y. Han, X. Li, and N. Sun, “Rt3d: Real-time 3-d vehicle detection in lidar point cloud for autonomous driving,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3434–3440, Oct 2018.
-  S. Yu, T. Westfechtel, R. Hamada, K. Ohno, and S. Tadokoro, “Vehicle detection and localization on bird’s eye view elevation images using convolutional neural network,” in 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), Oct 2017, pp. 102–109.
-  X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3d object detection network for autonomous driving,” in
-  J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. Waslander, “Joint 3d proposal generation and object detection from view aggregation,” arXiv preprint arXiv:1712.02294, 2017.
-  D. Xu, D. Anguelov, and A. Jain, “Pointfusion: Deep sensor fusion for 3d bounding box estimation,” arXiv preprint arXiv:1711.10871, 2017.
-  C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum pointnets for 3d object detection from rgb-d data,” arXiv preprint arXiv:1711.08488, 2017.
-  X. Du, M. H. Ang Jr, S. Karaman, and D. Rus, “A general pipeline for 3d detection of vehicles,” arXiv preprint arXiv:1803.00387, 2018.
-  X. Du, M. H. Ang, and D. Rus, “Car detection for autonomous vehicle: Lidar and vision fusion approach through deep learning framework,” in Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on. IEEE, 2017, pp. 749–754.
-  D. Matti, H. K. Ekenel, and J.-P. Thiran, “Combining lidar space clustering and convolutional neural networks for pedestrian detection,” in Advanced Video and Signal Based Surveillance (AVSS), 2017 14th IEEE International Conference on. IEEE, 2017, pp. 1–6.
-  A. Pfeuffer and K. Dietmayer, “Optimal sensor data fusion architecture for object detection in adverse weather conditions,” in Proceedings of International Conference on Information Fusion, 2018, pp. 2592 – 2599.
-  Y. Gal, “Uncertainty in deep learning,” Ph.D. dissertation, University of Cambridge, 2016.
-  B. Wu, A. Wan, X. Yue, and K. Keutzer, “Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud,” arXiv preprint arXiv:1710.07368, 2017.
-  T. Kim and J. Ghosh, “Robust detection of non-motorized road users using deep learning on optical and lidar data,” in Intelligent Transportation Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE, 2016, pp. 271–276.
-  J. Schlosser, C. K. Chow, and Z. Kira, “Fusing lidar and images for pedestrian detection using convolutional neural networks,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016, pp. 2198–2205.
-  D. Feng, L. Rosenbaum, and K. Dietmayer, “Towards safe autonomous driving: Capture uncertainty in the deep neural network for lidar 3d vehicle detection,” arXiv preprint arXiv:1804.05132, 2018.
-  L. Caltagirone, S. Scheidegger, L. Svensson, and M. Wahde, “Fast lidar-based road detection using fully convolutional neural networks,” in Intelligent Vehicles Symposium (IV), 2017 IEEE. IEEE, 2017, pp. 1019–1024.
-  J. Beltran, C. Guindel, F. M. Moreno, D. Cruzado, F. Garcia, and A. de la Escalera, “Birdnet: a 3d object detection framework from lidar information,” arXiv preprint arXiv:1805.01195, 2018.
-  B. Yang, W. Luo, and R. Urtasun, “Pixor: Real-time 3d object detection from point clouds,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
G. E. Hinton and D. Van Camp, “Keeping the neural networks simple by
minimizing the description length of the weights,” in
Proceedings of the sixth annual conference on Computational learning theory. ACM, 1993, pp. 5–13.
-  A. Graves, “Practical variational inference for neural networks,” in Advances in neural information processing systems, 2011, pp. 2348–2356.
-  I. Osband, C. Blundell, A. Pritzel, and B. Van Roy, “Deep exploration via bootstrapped dqn,” in Advances in neural information processing systems, 2016, pp. 4026–4034.
Y. Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image
International Conference on Machine Learning, 2017, pp. 1183–1192.
-  W. H. Beluch, T. Genewein, A. Nürnberger, and J. M. Köhler, “The power of ensembles for active learning in image classification,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
-  M. Kampffmeyer, A.-B. Salberg, and R. Jenssen, “Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2016 IEEE Conference on. IEEE, 2016, pp. 680–688.
-  A. Kendall, V. Badrinarayanan, , and R. Cipolla, “Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding,” in Proceedings of the British Machine Vision Conference (BMVC), 2017.
-  A. Kendall and R. Cipolla, “Modelling uncertainty in deep learning for camera relocalization,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), May 2016, pp. 4762–4769.
-  D. Miller, L. Nicholson, F. Dayoub, and N. Sünderhauf, “Dropout sampling for robust object detection in open-set conditions,” in IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018.
-  A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” in Advances in Neural Information Processing Systems, 2017, pp. 5580–5590.
-  A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
-  S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.
-  A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012, pp. 3354–3361.
-  T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection (best student paper award),” in International Conference on Computer Vision (ICCV), Venice, Italy, 2017, oral.