What I See Is What You See: Joint Attention Learning for First and Third Person Video Co-analysis

04/16/2019
by   Huangyue Yu, et al.
0

In recent years, more and more videos are captured from the first-person viewpoint by wearable cameras. Such first-person video provides additional information besides the traditional third-person video, and thus has a wide range of applications. However, techniques for analyzing the first-person video can be fundamentally different from those for the third-person video, and it is even more difficult to explore the shared information from both viewpoints. In this paper, we propose a novel method for first- and third-person video co-analysis. At the core of our method is the notion of "joint attention", indicating the learnable representation that corresponds to the shared attention regions in different viewpoints and thus links the two viewpoints. To this end, we develop a multi-branch deep network with a triplet loss to extract the joint attention from the first- and third-person videos via self-supervised learning. We evaluate our method on the public dataset with cross-viewpoint video matching tasks. Our method outperforms the state-of-the-art both qualitatively and quantitatively. We also demonstrate how the learned joint attention can benefit various applications through a set of additional experiments.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 7

page 8

page 11

page 12

research
12/07/2021

ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints

Learning self-supervised video representation predominantly focuses on d...
research
08/11/2019

Temporal Knowledge Propagation for Image-to-Video Person Re-identification

In many scenarios of Person Re-identification (Re-ID), the gallery set c...
research
10/19/2020

Self-Supervised Visual Attention Learning for Vehicle Re-Identification

Visual attention learning (VAL) aims to produce a confidence map as weig...
research
06/17/2020

When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous

We aim to enable robots to visually localize a target person through the...
research
01/12/2020

Attention Flow: End-to-End Joint Attention Estimation

This paper addresses the problem of understanding joint attention in thi...
research
11/24/2017

For Your Eyes Only: Learning to Summarize First-Person Videos

With the increasing amount of video data, it is desirable to highlight o...
research
11/30/2020

Adaptive Compact Attention For Few-shot Video-to-video Translation

This paper proposes an adaptive compact attention model for few-shot vid...

Please sign up or login with your details

Forgot password? Click here to reset