A Better Baseline for AVA

07/26/2018
by   Rohit Girdhar, et al.
0

We introduce a simple baseline for action localization on the AVA dataset. The model builds upon the Faster R-CNN bounding box detection framework, adapted to operate on pure spatiotemporal features - in our case produced exclusively by an I3D model pretrained on Kinetics. This model obtains 21.9 average AP on the validation set of AVA v2.1, up from 14.5 spatiotemporal model used in the original AVA paper (which was pretrained on Kinetics and ImageNet), and up from 11.3 of the publicly available baseline using a ResNet101 image feature extractor, that was pretrained on ImageNet. Our final model obtains 22.8 submissions to the AVA challenge at CVPR 2018.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2023

Fine-tune the pretrained ATST model for sound event detection

Sound event detection (SED) often suffers from the data deficiency probl...
research
07/22/2020

Rethinking CNN Models for Audio Classification

In this paper, we show that ImageNet-Pretrained standard deep CNN models...
research
11/27/2017

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

The purpose of this study is to determine whether current video datasets...
research
09/15/2021

Miðeind's WMT 2021 submission

We present Miðeind's submission for the English→Icelandic and Icelandic→...
research
11/01/2017

Improving Object Localization with Fitness NMS and Bounded IoU Loss

We demonstrate that many detection methods are designed to identify only...
research
11/21/2017

Functional Map of the World

We present a new dataset, Functional Map of the World (fMoW), which aims...
research
07/24/2022

Affective Behaviour Analysis Using Pretrained Model with Facial Priori

Affective behaviour analysis has aroused researchers' attention due to i...

Please sign up or login with your details

Forgot password? Click here to reset