False Positive Removal for 3D Vehicle Detection with Penetrated Point Classifier

05/27/2020 ∙ by Sungmin Woo, et al. ∙ Yonsei University 0

Recently, researchers have been leveraging LiDAR point cloud for higher accuracy in 3D vehicle detection. Most state-of-the-art methods are deep learning based, but are easily affected by the number of points generated on the object. This vulnerability leads to numerous false positive boxes at high recall positions, where objects are occasionally predicted with few points. To address the issue, we introduce Penetrated Point Classifier (PPC) based on the underlying property of LiDAR that points cannot be generated behind vehicles. It determines whether a point exists behind the vehicle of the predicted box, and if does, the box is distinguished as false positive. Our straightforward yet unprecedented approach is evaluated on KITTI dataset and achieved performance improvement of PointRCNN, one of the state-of-the-art methods. The experiment results show that precision at the highest recall position is dramatically increased by 15.46 percentage points and 14.63 percentage points on the moderate and hard difficulty of car class, respectively.



There are no comments yet.


page 2

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

3D object detection is a crucial task in autonomous driving since vehicles must be provided with accurate scene information. It is essential for safe driving to recognize the other cars driving ahead or parked at the side of the road precisely. To achieve this goal, recent works of 3D object detection utilize the data from several types of sensors, such as a monocular camera, stereo cameras, and LiDAR. They provide various and useful data, but each sensor has its limitations.

RGB images from cameras contain appearance information that can be efficiently exploited by the advanced techniques of 2D Convolutional Neural Network (CNN). However, they lack depth information which is material to 3D object detection. This drawback results in considerably low performance for monocular 3D object detection

[naiden2019shift, Wang_2019_CVPR, Chen_2016_CVPR, Mousavian_2017_CVPR], that only uses a single image to detect the objects in 3D space. To overcome the limitation, [you2019pseudo, li2019stereo, chen2020dsgn]

used two images from the stereo camera to estimate the depth of the scene more precisely, by sharing the weights of image feature extractors. The estimated depth of the stereo images is more enhanced than monocular images, but due to the discrete nature of the pixels, they still are insufficient for real-world applications. Therefore, depth prediction from 2D images itself is not adequate for autonomous driving, where an accurate location of the object is required.

On the other hand, point clouds from LiDAR provide precise distance information, which makes it easy to capture the spatial structures of the scenes. Thus, most 3D object detection methods with high performance are point-based detection. However, the LiDAR point cloud has fatal weaknesses of being sparse in the long range, and that the points are irregular and unordered. To handle these challenges, existing methods such as PointNet [Qi_2017_CVPR] and PointNet++ [qi2017pointnet++] directly process the raw point cloud regardless of the order of the points. Alternatively, point clouds are converted into specific forms that can be easily processed by projecting a point cloud onto the corresponding image for 2D CNN [ku2018joint, yang2018pixor, liang2018deep, chen2017multi] or dividing into voxels for 3D CNN [zhou2018voxelnet, song2016deep, yan2018second].