SODFormer: Streaming Object Detection with Transformer Using Events and Frames

08/08/2023
by   Dianze Li, et al.
0

DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.

READ FULL TEXT

page 2

page 5

page 7

page 10

page 12

page 13

page 15

page 18

research
06/14/2023

Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images

Recent camera-based 3D object detection methods have introduced sequenti...
research
09/17/2022

RGB-Event Fusion for Moving Object Detection in Autonomous Driving

Moving Object Detection (MOD) is a critical vision task for successfully...
research
05/05/2023

Asynchronous Events-based Panoptic Segmentation using Graph Mixer Neural Network

In the context of robotic grasping, object segmentation encounters sever...
research
12/06/2022

Event-based Monocular Dense Depth Estimation with Recurrent Transformers

Event cameras, offering high temporal resolutions and high dynamic range...
research
06/21/2021

MODETR: Moving Object Detection with Transformers

Moving Object Detection (MOD) is a crucial task for the Autonomous Drivi...
research
09/17/2023

Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera

The ability to detect objects in all lighting (i.e., normal-, over-, and...
research
03/20/2023

Bimodal SegNet: Instance Segmentation Fusing Events and RGB Frames for Robotic Grasping

Object segmentation for robotic grasping under dynamic conditions often ...

Please sign up or login with your details

Forgot password? Click here to reset