A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion Compensation for Action Recognition in the EPIC-Kitchens Dataset

by   Alejandro López-Cifuentes, et al.

Action recognition is currently one of the top-challenging research fields in computer vision. Convolutional Neural Networks (CNNs) have significantly boosted its performance but rely on fixed-size spatio-temporal windows of analysis, reducing CNNs temporal receptive fields. Among action recognition datasets, egocentric recorded sequences have become of important relevance while entailing an additional challenge: ego-motion is unavoidably transferred to these sequences. The proposed method aims to cope with it by estimating this ego-motion or camera motion. The estimation is used to temporally partition video sequences into motion-compensated temporal chunks showing the action under stable backgrounds and allowing for a content-driven temporal sampling. A CNN trained in an end-to-end fashion is used to extract temporal features from each chunk, which are late fused. This process leads to the extraction of features from the whole temporal range of an action, increasing the temporal receptive field of the network.



There are no comments yet.


page 1

page 2


Motion Feature Network: Fixed Motion Filter for Action Recognition

Spatio-temporal representations in frame sequences play an important rol...

CAMREP- Concordia Action and Motion Repository

Action recognition, motion classification, gait analysis and synthesis a...

Probabilistic Motion Estimation Based on Temporal Coherence

We develop a theory for the temporal integration of visual motion motiva...

Slow Feature Analysis for Human Action Recognition

Slow Feature Analysis (SFA) extracts slowly varying features from a quic...

Learn to cycle: Time-consistent feature discovery for action recognition

Temporal motion has been one of the essential components for effectively...

Single-frame Regularization for Temporally Stable CNNs

Convolutional neural networks (CNNs) can model complicated non-linear re...

3DPalsyNet: A Facial Palsy Grading and Motion Recognition Framework using Fully 3D Convolutional Neural Networks

The capability to perform facial analysis from video sequences has signi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.