MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation

07/26/2023
by   Reiner Birkl, et al.
0

We release MiDaS v3.1 for monocular depth estimation, offering a variety of new models based on different encoder backbones. This release is motivated by the success of transformers in computer vision, with a large variety of pretrained vision transformers now available. We explore how using the most promising vision transformers as image encoders impacts depth estimation quality and runtime of the MiDaS architecture. Our investigation also includes recent convolutional approaches that achieve comparable quality to vision transformers in image classification tasks. While the previous release MiDaS v3.0 solely leverages the vanilla vision transformer ViT, MiDaS v3.1 offers additional models based on BEiT, Swin, SwinV2, Next-ViT and LeViT. These models offer different performance-runtime tradeoffs. The best model improves the depth estimation quality by 28 requiring high frame rates. We also describe the general process for integrating new backbones. A video summarizing the work can be found at https://youtu.be/UjaeNNFf9sE and the code is available at https://github.com/isl-org/MiDaS.

READ FULL TEXT
research
02/07/2022

Transformers in Self-Supervised Monocular Depth Estimation with Unknown Camera Intrinsics

The advent of autonomous driving and advanced driver assistance systems ...
research
08/16/2023

Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN

Monocular depth estimation is an ongoing challenge in computer vision. R...
research
12/06/2022

Event-based Monocular Dense Depth Estimation with Recurrent Transformers

Event cameras, offering high temporal resolutions and high dynamic range...
research
08/18/2023

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

The growing popularity of Vision Transformers as the go-to models for im...
research
03/24/2021

Vision Transformers for Dense Prediction

We introduce dense vision transformers, an architecture that leverages v...
research
07/18/2023

Linearized Relative Positional Encoding

Relative positional encoding is widely used in vanilla and linear transf...
research
03/20/2023

Reliability in Semantic Segmentation: Are We on the Right Track?

Motivated by the increasing popularity of transformers in computer visio...

Please sign up or login with your details

Forgot password? Click here to reset