TF-Blender: Temporal Feature Blender for Video Object Detection

08/12/2021
by   Yiming Cui, et al.
0

Video objection detection is a challenging task because isolated video frames may encounter appearance deterioration, which introduces great confusion for detection. One of the popular solutions is to exploit the temporal information and enhance per-frame representation through aggregating features from neighboring frames. Despite achieving improvements in detection, existing methods focus on the selection of higher-level video frames for aggregation rather than modeling lower-level temporal relations to increase the feature representation. To address this limitation, we propose a novel solution named TF-Blender,which includes three modules: 1) Temporal relation mod-els the relations between the current frame and its neighboring frames to preserve spatial information. 2). Feature adjustment enriches the representation of every neigh-boring feature map; 3) Feature blender combines outputs from the first two modules and produces stronger features for the later detection tasks. For its simplicity, TF-Blender can be effortlessly plugged into any detection network to improve detection behavior. Extensive evaluations on ImageNet VID and YouTube-VIS benchmarks indicate the performance guarantees of using TF-Blender on recent state-of-the-art methods.

READ FULL TEXT

page 4

page 6

research
09/06/2022

PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection

Recent years have witnessed a trend of applying context frames to boost ...
research
10/02/2022

DFA: Dynamic Feature Aggregation for Efficient Video Object Detection

Video object detection is a fundamental yet challenging task in computer...
research
10/05/2022

Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection

This paper presents the novel idea of generating object proposals by lev...
research
09/11/2018

Temporal-Spatial Mapping for Action Recognition

Deep learning models have enjoyed great success for image related comput...
research
12/23/2020

Blur More To Deblur Better: Multi-Blur2Deblur For Efficient Video Deblurring

One of the key components for video deblurring is how to exploit neighbo...
research
05/13/2022

The Effectiveness of Temporal Dependency in Deepfake Video Detection

Deepfakes are a form of synthetic image generation used to generate fake...
research
07/15/2019

Sequence Level Semantics Aggregation for Video Object Detection

Video objection detection (VID) has been a rising research direction in ...

Please sign up or login with your details

Forgot password? Click here to reset