YOWO-Plus: An Incremental Improvement

10/20/2022
by   Jianhua Yang, et al.
0

In this technical report, we would like to introduce our updates to YOWO, a real-time method for spatio-temporal action detection. We make a bunch of little design changes to make it better. For network structure, we use the same ones of official implemented YOWO, including 3D-ResNext-101 and YOLOv2, but we use a better pretrained weight of our reimplemented YOLOv2, which is better than the official YOLOv2. We also optimize the label assignment used in YOWO. To accurately detection action instances, we deploy GIoU loss for box regression. After our incremental improvement, YOWO achieves 84.9% frame mAP and 50.5% video mAP on the UCF101-24, significantly higher than the official YOWO. On the AVA, our optimized YOWO achieves 20.6% frame mAP with 16 frames, also exceeding the official YOWO. With 32 frames, our YOWO achieves 21.6 frame mAP with 25 FPS on an RTX 3090 GPU. We name the optimized YOWO as YOWO-Plus. Moreover, we replace the 3D-ResNext-101 with the efficient 3D-ShuffleNet-v2 to design a lightweight action detector, YOWO-Nano. YOWO-Nano achieves 81.0 % frame mAP and 49.7% video frame mAP with over 90 FPS on the UCF101-24. It also achieves 18.4 % frame mAP with about 90 FPS on the AVA. As far as we know, YOWO-Nano is the fastest state-of-the-art action detector. Our code is available on https://github.com/yjh0410/PyTorch_YOWO.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2023

YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection

Designing a real-time framework for the spatio-temporal action detection...
research
04/08/2018

YOLOv3: An Incremental Improvement

We present some updates to YOLO! We made a bunch of little design change...
research
06/13/2019

Grid R-CNN Plus: Faster and Better

Grid R-CNN is a well-performed objection detection framework. It transfo...
research
01/02/2022

TVNet: Temporal Voting Network for Action Localization

We propose a Temporal Voting Network (TVNet) for action localization in ...
research
05/14/2022

ETAD: A Unified Framework for Efficient Temporal Action Detection

Untrimmed video understanding such as temporal action detection (TAD) of...
research
01/14/2020

Actions as Moving Points

The existing action tubelet detectors mainly depend on heuristic anchor ...
research
11/02/2022

Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning

Recent incremental learning for action recognition usually stores repres...

Please sign up or login with your details

Forgot password? Click here to reset