Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy: Appearance Flow to the Rescue

12/15/2021
by   Shuwei Shao, et al.
2

Recently, self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos, achieving remarkable performance in autonomous driving scenarios. One widely adopted assumption of depth and ego-motion self-supervised learning is that the image brightness remains constant within nearby frames. Unfortunately, the endoscopic scene does not meet this assumption because there are severe brightness fluctuations induced by illumination variations, non-Lambertian reflections and interreflections during data collection, and these brightness fluctuations inevitably deteriorate the depth and ego-motion estimation accuracy. In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem. The appearance flow takes into consideration any variations in the brightness pattern and enables us to develop a generalized dynamic image constraint. Furthermore, we build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes, which comprises a structure module, a motion module, an appearance module and a correspondence module, to accurately reconstruct the appearance and calibrate the image brightness. Extensive experiments are conducted on the SCARED dataset and EndoSLAM dataset, and the proposed unified framework exceeds other self-supervised approaches by a large margin. To validate our framework's generalization ability on different patients and cameras, we train our model on SCARED but test it on the SERV-CT and Hamlyn datasets without any fine-tuning, and the superior results reveal its strong generalization ability. Code will be available at: <https://github.com/ShuweiShao/AF-SfMLearner>.

READ FULL TEXT

page 2

page 3

page 4

page 8

page 10

page 13

page 14

research
04/03/2020

Towards Better Generalization: Joint Depth-Pose Learning without PoseNet

In this work, we tackle the essential problem of scale inconsistency for...
research
04/08/2020

Self-Supervised Monocular Scene Flow Estimation

Scene flow estimation has been receiving increasing attention for 3D env...
research
09/04/2023

EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity

Self-supervised monocular scene flow estimation, aiming to understand bo...
research
05/07/2020

Self-Supervised Human Depth Estimation from Monocular Videos

Previous methods on estimating detailed human depth often require superv...
research
10/09/2022

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

Masked autoencoders (MAEs) have emerged recently as art self-supervised ...
research
02/25/2019

Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation

Accurate relative pose is one of the key components in visual odometry (...
research
07/05/2021

Do Different Tracking Tasks Require Different Appearance Models?

Tracking objects of interest in a video is one of the most popular and w...

Please sign up or login with your details

Forgot password? Click here to reset