Action recognition with spatial-temporal discriminative filter banks

08/20/2019
by   Brais Martinez, et al.
26

Action recognition has seen a dramatic performance improvement in the last few years. Most of the current state-of-the-art literature either aims at improving performance through changes to the backbone CNN network, or they explore different trade-offs between computational efficiency and performance, again through altering the backbone network. However, almost all of these works maintain the same last layers of the network, which simply consist of a global average pooling followed by a fully connected layer. In this work we focus on how to improve the representation capacity of the network, but rather than altering the backbone, we focus on improving the last layers of the network, where changes have low impact in terms of computational cost. In particular, we show that current architectures have poor sensitivity to finer details and we exploit recent advances in the fine-grained recognition literature to improve our model in this aspect. With the proposed approach, we obtain state-of-the-art performance on Kinetics-400 and Something-Something-V1, the two major large-scale action recognition benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 8

page 9

research
08/03/2022

Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Fine-grained action recognition is a challenging task in computer vision...
research
11/04/2017

Attentional Pooling for Action Recognition

We introduce a simple yet surprisingly powerful model to incorporate att...
research
06/30/2022

Spatial Transformer Network with Transfer Learning for Small-scale Fine-grained Skeleton-based Tai Chi Action Recognition

Human action recognition is a quite hugely investigated area where most ...
research
09/03/2022

Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

The goal of fine-grained action recognition is to successfully discrimin...
research
02/24/2022

Slow-Fast Visual Tempo Learning for Video-based Action Recognition

Action visual tempo characterizes the dynamics and the temporal scale of...
research
06/08/2020

Action Recognition with Deep Multiple Aggregation Networks

Most of the current action recognition algorithms are based on deep netw...
research
07/25/2021

Adaptive Recursive Circle Framework for Fine-grained Action Recognition

How to model fine-grained spatial-temporal dynamics in videos has been a...

Please sign up or login with your details

Forgot password? Click here to reset