A survey of Object Classification and Detection based on 2D/3D data

05/29/2019
by   Xiaoke Shen, et al.
0

Recently, by using deep neural network based algorithms, object classification, detection and semantic segmentation solutions are significantly improved. However, one challenge for 2D image-based systems is that they cannot provide accurate 3D location information. This is critical for location sensitive applications such as autonomous driving and robot navigation. On the other hand, 3D methods, such as RGB-D and RGB-LiDAR based systems, can provide solutions that significantly improve the RGB only approaches. That is why this is an interesting research area for both industry and academia. Compared with 2D image-based systems, 3D-based systems are more complicated due to the following five reasons: 1) Data representation itself is more complicated. 3D images can be represented by point clouds, meshes, volumes. 2D images have pixel grid representations. 2) The computation and memory resource requirement is higher as an extra dimension is added. 3) Different distribution of the objects and difference in scene areas between indoor and outdoor make one unified framework hard to achieve. 4) 3D data, especially for the outdoor scenario, is sparse compared with the dense 2D images which makes the detection task more challenging. Finally, large size labelled datasets, which are extremely important for supervised based algorithms, are still under construction compared with well-built 2D datasets such as ImageNet. Based on challenges listed above, the described systems are organized by application scenarios, data representation methods and main tasks addressed. At the same time, critical 2D based systems which greatly influence the 3D ones are also introduced to show the connection between them.

READ FULL TEXT
research
06/23/2021

FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection

Accurate detection of obstacles in 3D is an essential task for autonomou...
research
09/27/2022

Towards Multimodal Multitask Scene Understanding Models for Indoor Mobile Agents

The perception system in personalized mobile agents requires developing ...
research
11/22/2017

Frustum PointNets for 3D Object Detection from RGB-D Data

While object recognition on 2D images is getting more and more mature, 3...
research
04/05/2020

MNEW: Multi-domain Neighborhood Embedding and Weighting for Sparse Point Clouds Segmentation

Point clouds have been widely adopted in 3D semantic scene understanding...
research
12/16/2020

S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds

With the increasing reliance of self-driving and similar robotic systems...
research
01/24/2023

RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving

Casting semantic segmentation of outdoor LiDAR point clouds as a 2D prob...

Please sign up or login with your details

Forgot password? Click here to reset