Deep Laparoscopic Stereo Matching with Transformers

07/25/2022
by   Xuelian Cheng, et al.
0

The self-attention mechanism, successfully employed with the transformer structure is shown promise in many computer vision tasks including image recognition, and object detection. Despite the surge, the use of the transformer for the problem of stereo matching remains relatively unexplored. In this paper, we comprehensively investigate the use of the transformer for the problem of stereo matching, especially for laparoscopic videos, and propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design. To be specific, we investigate several ways to introduce transformers to volumetric stereo matching pipelines by analyzing the loss landscape of the designs and in-domain/cross-domain accuracy. Our analysis suggests that employing transformers for feature representation learning, while using CNNs for cost aggregation will lead to faster convergence, higher accuracy and better generalization than other options. Our extensive experiments on Sceneflow, SCARED2019 and dVPN datasets demonstrate the superior performance of our HybridStereoNet.

READ FULL TEXT
research
07/07/2022

Vision Transformers: State of the Art and Research Challenges

Transformers have achieved great success in natural language processing....
research
11/25/2022

Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations

The formidable accomplishment of Transformers in natural language proces...
research
05/30/2021

Transformer-Based Deep Image Matching for Generalizable Person Re-identification

Transformers have recently gained increasing attention in computer visio...
research
04/24/2023

Transformer-based stereo-aware 3D object detection from binocular images

Vision Transformers have shown promising progress in various object dete...
research
11/21/2022

PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism

Existing deep calibrated photometric stereo networks basically aggregate...
research
11/18/2022

Improved Cross-view Completion Pre-training for Stereo Matching

Despite impressive performance for high-level downstream tasks, self-sup...
research
02/14/2022

CATs++: Boosting Cost Aggregation with Convolutions and Transformers

Cost aggregation is a highly important process in image matching tasks, ...

Please sign up or login with your details

Forgot password? Click here to reset