Event-based Monocular Dense Depth Estimation with Recurrent Transformers

12/06/2022
by   Xu Liu, et al.
0

Event cameras, offering high temporal resolutions and high dynamic ranges, have brought a new perspective to address common challenges (e.g., motion blur and low light) in monocular depth estimation. However, how to effectively exploit the sparse spatial information and rich temporal cues from asynchronous events remains a challenging endeavor. To this end, we propose a novel event-based monocular depth estimator with recurrent transformers, namely EReFormer, which is the first pure transformer with a recursive mechanism to process continuous event streams. Technically, for spatial modeling, a novel transformer-based encoder-decoder with a spatial transformer fusion module is presented, having better global context information modeling capabilities than CNN-based methods. For temporal modeling, we design a gate recurrent vision transformer unit that introduces a recursive mechanism into transformers, improving temporal modeling capabilities while alleviating the expensive GPU memory cost. The experimental results show that our EReFormer outperforms state-of-the-art methods by a margin on both synthetic and real-world datasets. We hope that our work will attract further research to develop stunning transformers in the event-based vision community. Our open-source code can be found in the supplemental material.

READ FULL TEXT

page 1

page 6

page 7

page 8

research
08/16/2023

Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN

Monocular depth estimation is an ongoing challenge in computer vision. R...
research
07/26/2023

MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation

We release MiDaS v3.1 for monocular depth estimation, offering a variety...
research
04/11/2022

HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model

Monocular omnidirectional depth estimation is receiving considerable res...
research
11/30/2021

360MonoDepth: High-Resolution 360° Monocular Depth Estimation

360 cameras can capture complete environments in a single shot, which ma...
research
08/08/2023

SODFormer: Streaming Object Detection with Transformer Using Events and Frames

DAVIS camera, streaming two complementary sensing modalities of asynchro...
research
11/22/2022

Event Transformer+. A multi-purpose solution for efficient event data processing

Event cameras record sparse illumination changes with high temporal reso...
research
12/11/2022

Recurrent Vision Transformers for Object Detection with Event Cameras

We present Recurrent Vision Transformers (RVTs), a novel backbone for ob...

Please sign up or login with your details

Forgot password? Click here to reset