iqiyi Submission to ActivityNet Challenge 2019 Kinetics-700 challenge: Hierarchical Group-wise Attention

02/07/2020
by   Qian Liu, et al.
10

In this report, the method for the iqiyi submission to the task of ActivityNet 2019 Kinetics-700 challenge is described. Three models are involved in the model ensemble stage: TSN, HG-NL and StNet. We propose the hierarchical group-wise non-local (HG-NL) module for frame-level features aggregation for video classification. The standard non-local (NL) module is effective in aggregating frame-level features on the task of video classification but presents low parameters efficiency and high computational cost. The HG-NL method involves a hierarchical group-wise structure and generates multiple attention maps to enhance performance. Basing on this hierarchical group-wise structure, the proposed method has competitive accuracy, fewer parameters and smaller computational cost than the standard NL. For the task of ActivityNet 2019 Kinetics-700 challenge, after model ensemble, we finally obtain an averaged top-1 and top-5 error percentage 28.444

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2019

Spatially and Temporally Efficient Non-local Attention Network for Video-based Person Re-Identification

Video-based person re-identification (Re-ID) aims at matching video sequ...
research
11/12/2018

NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification

This paper introduces a fast and efficient network architecture, NeXtVLA...
research
07/13/2017

UTS submission to Google YouTube-8M Challenge 2017

In this paper, we present our solution to Google YouTube-8M Video Classi...
research
06/18/2021

Multi-Granularity Network with Modal Attention for Dense Affective Understanding

Video affective understanding, which aims to predict the evoked expressi...
research
08/02/2020

Tensor Low-Rank Reconstruction for Semantic Segmentation

Context information plays an indispensable role in the success of semant...
research
09/29/2018

Non-local NetVLAD Encoding for Video Classification

This paper describes our solution for the 2^nd YouTube-8M video understa...
research
07/05/2022

Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention

Existing transformer-based image backbones typically propagate feature i...

Please sign up or login with your details

Forgot password? Click here to reset