EgoHumans: An Egocentric 3D Multi-Human Benchmark

05/25/2023
by   Rawal Khirodkar, et al.
0

We present EgoHumans, a new multi-view multi-human video benchmark to advance the state-of-the-art of egocentric human 3D pose estimation and tracking. Existing egocentric benchmarks either capture single subject or indoor-only scenarios, which limit the generalization of computer vision algorithms for real-world applications. We propose a novel 3D capture setup to construct a comprehensive egocentric multi-human benchmark in the wild with annotations to support diverse tasks such as human detection, tracking, 2D/3D pose estimation, and mesh recovery. We leverage consumer-grade wearable camera-equipped glasses for the egocentric view, which enables us to capture dynamic activities like playing soccer, fencing, volleyball, etc. Furthermore, our multi-view setup generates accurate 3D ground truth even under severe or complete occlusion. The dataset consists of more than 125k egocentric images, spanning diverse scenes with a particular focus on challenging and unchoreographed multi-human activities and fast-moving egocentric views. We rigorously evaluate existing state-of-the-art methods and highlight their limitations in the egocentric scenario, specifically on multi-human tracking. To address such limitations, we propose EgoFormer, a novel approach with a multi-stream transformer architecture and explicit 3D spatial reasoning to estimate and track the human pose. EgoFormer significantly outperforms prior art by 13.6 on the EgoHumans dataset.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 6

page 8

research
11/30/2020

CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild

Human pose estimation from single images is a challenging problem in com...
research
07/01/2020

The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose

The availability of a large labeled dataset is a key requirement for app...
research
02/06/2018

Toward Marker-free 3D Pose Estimation in Lifting: A Deep Multi-view Solution

Lifting is a common manual material handling task performed in the workp...
research
10/11/2021

Adaptively Multi-view and Temporal Fusing Transformer for 3D Human Pose Estimation

In practical application, 3D Human Pose Estimation (HPE) is facing with ...
research
10/14/2021

HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media

We introduce HUMAN4D, a large and multimodal 4D dataset that contains a ...
research
11/20/2021

A Deeper Look into DeepCap

Human performance capture is a highly important computer vision problem ...
research
04/22/2022

Leveraging Deepfakes to Close the Domain Gap between Real and Synthetic Images in Facial Capture Pipelines

We propose an end-to-end pipeline for both building and tracking 3D faci...

Please sign up or login with your details

Forgot password? Click here to reset