Epipolar Transformers

05/10/2020
by   Yihui He, et al.
6

A common approach to localize 3D human joints in a synchronized and calibrated multi-view setup consists of two-steps: (1) apply a 2D detector separately on each view to localize joints in 2D, and (2) perform robust triangulation on 2D detections from each view to acquire the 3D joint locations. However, in step 1, the 2D detector is limited to solving challenging cases which could potentially be better resolved in 3D, such as occlusions and oblique viewing angles, purely in 2D without leveraging any 3D information. Therefore, we propose the differentiable "epipolar transformer", which enables the 2D detector to leverage 3D-aware features to improve 2D pose estimation. The intuition is: given a 2D location p in the current view, we would like to first find its corresponding point p' in a neighboring view, and then combine the features at p' with the features at p, thus leading to a 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p'. Experiments on InterHand and Human3.6M show that our approach has consistent improvements over the baselines. Specifically, in the condition where no external data is used, our Human3.6M model trained with ResNet-50 backbone and image size 256 x 256 outperforms state-of-the-art by 4.23 mm and achieves MPJPE 26.9 mm.

READ FULL TEXT

page 4

page 5

page 7

page 12

page 13

research
05/28/2022

WT-MVSNet: Window-based Transformers for Multi-view Stereo

Recently, Transformers were shown to enhance the performance of multi-vi...
research
11/29/2021

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers

In this paper, we present TransMVSNet, based on our exploration of featu...
research
03/30/2022

PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation

In this paper, we propose a new deep learning-based method for estimatin...
research
04/21/2021

Soft Expectation and Deep Maximization for Image Feature Detection

Central to the application of many multi-view geometry algorithms is the...
research
04/24/2023

Transformer-based stereo-aware 3D object detection from binocular images

Vision Transformers have shown promising progress in various object dete...
research
08/04/2022

Occupancy Planes for Single-view RGB-D Human Reconstruction

Single-view RGB-D human reconstruction with implicit functions is often ...
research
02/06/2018

Toward Marker-free 3D Pose Estimation in Lifting: A Deep Multi-view Solution

Lifting is a common manual material handling task performed in the workp...

Please sign up or login with your details

Forgot password? Click here to reset