How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors

10/04/2021
by   Satoshi Tsutsui, et al.
0

Understanding users' activities from head-mounted cameras is a fundamental task for Augmented and Virtual Reality (AR/VR) applications. A typical approach is to train a classifier in a supervised manner using data labeled by humans. This approach has limitations due to the expensive annotation cost and the closed coverage of activity labels. A potential way to address these limitations is to use self-supervised learning (SSL). Instead of relying on human annotations, SSL leverages intrinsic properties of data to learn representations. We are particularly interested in learning egocentric video representations benefiting from the head-motion generated by users' daily activities, which can be easily obtained from IMU sensors embedded in AR/VR devices. Towards this goal, we propose a simple but effective approach to learn video representation by learning to tell the corresponding pairs of video clip and head-motion. We demonstrate the effectiveness of our learned representation for recognizing egocentric activities of people and dogs.

READ FULL TEXT
research
09/22/2021

Accuracy Evaluation of Touch Tasks in Commodity Virtual and Augmented Reality Head-Mounted Displays

An increasing number of consumer-oriented head-mounted displays (HMD) fo...
research
12/05/2022

Muscles in Action

Small differences in a person's motion can engage drastically different ...
research
07/22/2020

Towards Secure and Usable Authentication for Augmented and Virtual Reality Head-Mounted Displays

Immersive technologies, including augmented and virtual reality (AR ...
research
01/02/2023

STEPs: Self-Supervised Key Step Extraction from Unlabeled Procedural Videos

We address the problem of extracting key steps from unlabeled procedural...
research
01/05/2023

EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding

Recent advances in egocentric video understanding models are promising, ...
research
12/13/2016

How do people explore virtual environments?

Understanding how people explore immersive virtual environments is cruci...

Please sign up or login with your details

Forgot password? Click here to reset