A Unified Method for First and Third Person Action Recognition

12/30/2017
by   Ali Javidani, et al.
0

In this paper, a new video classification methodology is proposed which can be applied in both first and third person videos. The main idea behind the proposed strategy is to capture complementary information of appearance and motion efficiently by performing two independent streams on the videos. The first stream is aimed to capture long-term motions from shorter ones by keeping track of how elements in optical flow images have changed over time. Optical flow images are described by pre-trained networks that have been trained on large scale image datasets. A set of multi-channel time series are obtained by aligning descriptions beside each other. For extracting motion features from these time series, PoT representation method plus a novel pooling operator is followed due to several advantages. The second stream is accomplished to extract appearance features which are vital in the case of video classification. The proposed method has been evaluated on both first and third-person datasets and results present that the proposed methodology reaches the state of the art successfully.

READ FULL TEXT
research
02/19/2018

Learning Representative Temporal Features for Action Recognition

In this paper we present a novel video classification methodology that a...
research
05/28/2019

Hallucinating Optical Flow Features for Video Classification

Appearance and motion are two key components to depict and characterize ...
research
10/22/2019

Predictive Coding Networks Meet Action Recognition

Action recognition is a key problem in computer vision that labels video...
research
02/10/2020

Joint Encoding of Appearance and Motion Features with Self-supervision for First Person Action Recognition

Wearable cameras are becoming more and more popular in several applicati...
research
08/10/2020

2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors Challenges: An Efficient Optical Flow Stream Guided Framework

To address the problem of training on small datasets for action recognit...
research
12/19/2014

Pooled Motion Features for First-Person Videos

In this paper, we present a new feature representation for first-person ...
research
05/30/2019

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

Learning to represent videos is a very challenging task both algorithmic...

Please sign up or login with your details

Forgot password? Click here to reset