Three-Stream Fusion Network for First-Person Interaction Recognition

02/19/2020
by   Ye-Ji Kim, et al.
1

First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearer's movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. Thre three-stream architecture captures the characteristics of the target appearance, target motion, and camera ego-motion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory(LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two-public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competing activity recognition methods in first-person videos where considerable camera ego-motion occurs.

READ FULL TEXT

page 3

page 12

page 13

page 16

page 18

research
02/26/2019

IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition

Effective spatiotemporal feature representation is crucial to the video-...
research
04/11/2021

Temporal Consistency Two-Stream CNN for Human Motion Prediction

Fusion is critical for a two-stream network. In this paper, we propose a...
research
11/01/2018

Hierarchical Long Short-Term Concurrent Memory for Human Interaction Recognition

In this paper, we aim to address the problem of human interaction recogn...
research
01/22/2021

Human Interaction Recognition Framework based on Interacting Body Part Attention

Human activity recognition in videos has been widely studied and has rec...
research
09/02/2015

Manipulated Object Proposal: A Discriminative Object Extraction and Feature Fusion Framework for First-Person Daily Activity Recognition

Detecting and recognizing objects interacting with humans lie in the cen...
research
11/30/2017

Future Person Localization in First-Person Videos

We present a new task that predicts future locations of people observed ...
research
02/19/2020

SummaryNet: A Multi-Stage Deep Learning Model for Automatic Video Summarisation

Video summarisation can be posed as the task of extracting important par...

Please sign up or login with your details

Forgot password? Click here to reset