Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

11/27/2017
by   Xiang Long, et al.
0

Recently, substantial research effort has focused on how to apply CNNs or RNNs to better extract temporal patterns from videos, so as to improve the accuracy of video classification. In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets. We investigate the potential of a purely attention based local feature integration. Accounting for the characteristics of such features in video classification, we propose a local feature integration framework based on attention clusters, and introduce a shifting operation to capture more diverse signals. We carefully analyze and compare the effect of different attention mechanisms, cluster sizes, and the use of the shifting operation, and also investigate the combination of attention clusters for multimodal integration. We demonstrate the effectiveness of our framework on three real-world video classification datasets. Our model achieves competitive results across all of these. In particular, on the large-scale Kinetics dataset, our framework obtains an excellent single model accuracy of 79.4 accuracy on the validation set. The attention clusters are the backbone of our winner solution at ActivityNet Kinetics Challenge 2017. Code and models will be released soon.

READ FULL TEXT

page 1

page 5

page 6

research
11/04/2020

A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification

Recently, many attention-based deep neural networks have emerged and ach...
research
01/11/2017

Attention-Based Multimodal Fusion for Video Description

Currently successful methods for video description are based on encoder-...
research
09/09/2023

A Spatial-Temporal Deformable Attention based Framework for Breast Lesion Detection in Videos

Detecting breast lesion in videos is crucial for computer-aided diagnosi...
research
10/02/2020

Attention-Based Clustering: Learning a Kernel from Context

In machine learning, no data point stands alone. We believe that context...
research
08/12/2017

Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification

This paper describes our solution for the video recognition task of Acti...
research
12/13/2021

Makeup216: Logo Recognition with Adversarial Attention Representations

One of the challenges of logo recognition lies in the diversity of forms...
research
10/03/2019

Pay Attention: Leveraging Sequence Models to Predict the Useful Life of Batteries

We use data on 124 batteries released by Stanford University to first tr...

Please sign up or login with your details

Forgot password? Click here to reset