RN-VID: A Feature Fusion Architecture for Video Object Detection

03/24/2020
by   Hughes Perreault, et al.
0

Consecutive frames in a video are highly redundant. Therefore, to perform the task of video object detection, executing single frame detectors on every frame without reusing any information is quite wasteful. It is with this idea in mind that we propose RN-VID, a novel approach to video object detection. Our contributions are twofold. First, we propose a new architecture that allows the usage of information from nearby frames to enhance feature maps. Second, we propose a novel module to merge feature maps of same dimensions using re-ordering of channels and 1 x 1 convolutions. We then demonstrate that RN-VID achieves better mAP than corresponding single frame detectors with little additional cost during inference.

READ FULL TEXT
research
09/15/2021

FFAVOD: Feature Fusion Architecture for Video Object Detection

A significant amount of redundancy exists between consecutive frames of ...
research
03/28/2019

Road User Detection in Videos

Successive frames of a video are highly redundant, and the most popular ...
research
02/27/2018

Recurrent Residual Module for Fast Inference in Videos

Deep convolutional neural networks (CNNs) have made impressive progress ...
research
03/28/2019

Improving Object Detection with Inverted Attention

Improving object detectors against occlusion, blur and noise is a critic...
research
11/17/2017

Mobile Video Object Detection with Temporally-Aware Feature Maps

This paper introduces an online model for object detection in videos des...
research
03/30/2021

3D-MAN: 3D Multi-frame Attention Network for Object Detection

3D object detection is an important module in autonomous driving and rob...
research
09/16/2020

Dual Semantic Fusion Network for Video Object Detection

Video object detection is a tough task due to the deteriorated quality o...

Please sign up or login with your details

Forgot password? Click here to reset