ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning

07/15/2022
by   Shengchao Hu, et al.
0

Many existing autonomous driving paradigms involve a multi-stage discrete pipeline of tasks. To better predict the control signals and enhance user safety, an end-to-end approach that benefits from joint spatial-temporal feature learning is desirable. While there are some pioneering works on LiDAR-based input or implicit design, in this paper we formulate the problem in an interpretable vision-based setting. In particular, we propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously, which is called ST-P3. Specifically, an egocentric-aligned accumulation technique is proposed to preserve geometry information in 3D space before the bird's eye view transformation for perception; a dual pathway modeling is devised to take past motion variations into account for future prediction; a temporal-based refinement unit is introduced to compensate for recognizing vision-based elements for planning. To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system. We benchmark our approach against previous state-of-the-arts on both open-loop nuScenes dataset as well as closed-loop CARLA simulation. The results show the effectiveness of our method. Source code, model and protocol details are made publicly available at https://github.com/OpenPerceptionX/ST-P3.

READ FULL TEXT

page 2

page 11

page 21

page 23

page 24

page 25

research
03/17/2023

TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

Vision-centric joint perception and prediction (PnP) has become an emerg...
research
03/15/2020

MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps

The ability to reliably perceive the environmental states, particularly ...
research
03/31/2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3D visual perception tasks, including 3D detection and map segmentation ...
research
06/16/2022

Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline

Current end-to-end autonomous driving methods either run a controller ba...
research
10/09/2018

Functionally Modular and Interpretable Temporal Filtering for Robust Segmentation

The performance of autonomous systems heavily relies on their ability to...
research
06/30/2020

Deep Learning for Vision-based Prediction: A Survey

Vision-based prediction algorithms have a wide range of applications inc...
research
06/19/2023

PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird's-Eye View

Accurately perceiving instances and predicting their future motion are k...

Please sign up or login with your details

Forgot password? Click here to reset