H2O: A Benchmark for Visual Human-human Object Handover Analysis

by   Ruolin Ye, et al.

Object handover is a common human collaboration behavior that attracts attention from researchers in Robotics and Cognitive Science. Though visual perception plays an important role in the object handover task, the whole handover process has been specifically explored. In this work, we propose a novel rich-annotated dataset, H2O, for visual analysis of human-human object handovers. The H2O, which contains 18K video clips involving 15 people who hand over 30 objects to each other, is a multi-purpose benchmark. It can support several vision-based tasks, from which, we specifically provide a baseline method, RGPNet, for a less-explored task named Receiver Grasp Prediction. Extensive experiments show that the RGPNet can produce plausible grasps based on the giver's hand-object states in the pre-handover phase. Besides, we also report the hand and object pose errors with existing baselines and show that the dataset can serve as the video demonstrations for robot imitation learning on the handover task. Dataset, model and code will be made public.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 9


O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning

Contrary to the vast literature in modeling, perceiving, and understandi...

Task-Oriented Hand Motion Retargeting for Dexterous Manipulation Imitation

Human hand actions are quite complex, especially when they involve objec...

DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video

Dexterous multi-fingered robotic hands have a formidable action space, y...

DexYCB: A Benchmark for Capturing Hand Grasping of Objects

We introduce DexYCB, a new dataset for capturing hand grasping of object...

OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction

Learning how humans manipulate objects requires machines to acquire know...

BOSS: A Benchmark for Human Belief Prediction in Object-context Scenarios

Humans with an average level of social cognition can infer the beliefs o...

Deep Active Visual Attention for Real-time Robot Motion Generation: Emergence of Tool-body Assimilation and Adaptive Tool-use

Sufficiently perceiving the environment is a critical factor in robot mo...