The Object at Hand: Automated Editing for Mixed Reality Video Guidance from Hand-Object Interactions

09/29/2021
by   Yao Lu, et al.
6

In this paper, we concern with the problem of how to automatically extract the steps that compose real-life hand activities. This is a key competence towards processing, monitoring and providing video guidance in Mixed Reality systems. We use egocentric vision to observe hand-object interactions in real-world tasks and automatically decompose a video into its constituent steps. Our approach combines hand-object interaction (HOI) detection, object similarity measurement and a finite state machine (FSM) representation to automatically edit videos into steps. We use a combination of Convolutional Neural Networks (CNNs) and the FSM to discover, edit cuts and merge segments while observing real hand activities. We evaluate quantitatively and qualitatively our algorithm on two datasets: the GTEA<cit.>, and a new dataset we introduce for Chinese Tea making. Results show our method is able to segment hand-object interaction videos into key step segments with high levels of precision.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

research
09/29/2021

Egocentric Hand-object Interaction Detection and Application

In this paper, we present a method to detect the hand-object interaction...
research
09/12/2022

Graphing the Future: Activity and Next Active Object Prediction using Graph-based Activity Representations

We present a novel approach for the visual prediction of human-object in...
research
11/16/2022

Egocentric Hand-object Interaction Detection

In this paper, we propose a method to jointly determine the status of ha...
research
06/11/2020

Understanding Human Hands in Contact at Internet Scale

Hands are the central means by which humans manipulate their world and b...
research
03/03/2021

Learning Asynchronous and Sparse Human-Object Interaction in Videos

Human activities can be learned from video. With effective modeling it i...
research
06/06/2023

Learn the Force We Can: Multi-Object Video Generation from Pixel-Level Interactions

We propose a novel unsupervised method to autoregressively generate vide...
research
03/29/2018

Getting nowhere fast: trade-off between speed and precision in training to execute image-guided hand-tool movements

Background: The speed and precision with which objects are moved by hand...

Please sign up or login with your details

Forgot password? Click here to reset