YOLOV: Making Still Image Object Detectors Great at Video Object Detection

08/20/2022
by   Yuheng Shi, et al.
11

Video object detection (VID) is challenging because of the high variation of object appearance as well as the diverse deterioration in some frames. On the positive side, the detection in a certain frame of a video, compared with in a still image, can draw support from other frames. Hence, how to aggregate features across different frames is pivotal to the VID problem. Most of existing aggregation algorithms are customized for two-stage detectors. But, the detectors in this category are usually computationally expensive due to the two-stage nature. This work proposes a simple yet effective strategy to address the above concerns, which spends marginal overheads with significant gains in accuracy. Concretely, different from the traditional two-stage pipeline, we advocate putting the region-level selection after the one-stage detection to avoid processing massive low-quality candidates. Besides, a novel module is constructed to evaluate the relationship between a target frame and its reference ones, and guide the aggregation. Extensive experiments and ablation studies are conducted to verify the efficacy of our design, and reveal its superiority over other state-of-the-art VID approaches in both effectiveness and efficiency. Our YOLOX-based model can achieve promising performance (e.g., 87.5% AP50 at over 30 FPS on the ImageNet VID dataset on a single 2080Ti GPU), making it attractive for large-scale or real-time applications. The implementation is simple, the demo code and models have been made available at https://github.com/YuHengsss/YOLOV .

READ FULL TEXT

page 1

page 6

page 9

page 10

research
07/07/2020

Single Shot Video Object Detector

Single shot detectors that are potentially faster and simpler than two-s...
research
07/22/2022

QueryProp: Object Query Propagation for High-Performance Video Object Detection

Video object detection has been an important yet challenging topic in co...
research
07/14/2023

TALL: Thumbnail Layout for Deepfake Video Detection

The growing threats of deepfakes to society and cybersecurity have raise...
research
09/30/2022

INT: Towards Infinite-frames 3D Detection with An Efficient Framework

It is natural to construct a multi-frame instead of a single-frame 3D de...
research
03/15/2023

FAQ: Feature Aggregated Queries for Transformer-based Video Object Detectors

Video object detection needs to solve feature degradation situations tha...
research
10/02/2022

DFA: Dynamic Feature Aggregation for Efficient Video Object Detection

Video object detection is a fundamental yet challenging task in computer...
research
07/15/2019

Sequence Level Semantics Aggregation for Video Object Detection

Video objection detection (VID) has been a rising research direction in ...

Please sign up or login with your details

Forgot password? Click here to reset