Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images

10/04/2018
by   Avisek Lahiri, et al.
10

Deep learning based object detectors require thousands of diversified bounding box and class annotated examples. Though image object detectors have shown rapid progress in recent years with the release of multiple large-scale static image datasets, object detection on videos still remains an open problem due to scarcity of annotated video frames. Having a robust video object detector is an essential component for video understanding and curating large-scale automated annotations in videos. Domain difference between images and videos makes the transferability of image object detectors to videos sub-optimal. The most common solution is to use weakly supervised annotations where a video frame has to be tagged for presence/absence of object categories. This still takes up manual effort. In this paper we take a step forward by adapting the concept of unsupervised adversarial image-to-image translation to perturb static high quality images to be visually indistinguishable from a set of video frames. We assume the presence of a fully annotated static image dataset and an unannotated video dataset. Object detector is trained on adversarially transformed image dataset using the annotations of the original dataset. Experiments on Youtube-Objects and Youtube-Objects-Subset datasets with two contemporary baseline object detectors reveal that such unsupervised pixel level domain adaptation boosts the generalization performance on video frames compared to direct application of original image object detector. Also, we achieve competitive performance compared to recent baselines of weakly supervised methods. This paper can be seen as an application of image translation for cross domain object detection.

READ FULL TEXT

page 1

page 3

page 6

page 7

page 8

research
03/30/2018

Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation

Can we detect common objects in a variety of image domains without insta...
research
08/02/2017

Temporal Dynamic Graph LSTM for Action-driven Video Object Detection

In this paper, we investigate a weakly-supervised object detection frame...
research
02/01/2021

ConvNets for Counting: Object Detection of Transient Phenomena in Steelpan Drums

We train an object detector built from convolutional neural networks to ...
research
09/04/2023

SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

Despite significant progress in semi-supervised learning for image objec...
research
12/16/2021

HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static Images

Existing state-of-the-art methods for Video Object Segmentation (VOS) le...
research
07/30/2017

Discover and Learn New Objects from Documentaries

Despite the remarkable progress in recent years, detecting objects in a ...
research
09/19/2018

Towards Large-Scale Video Video Object Mining

We propose to leverage a generic object tracker in order to perform obje...

Please sign up or login with your details

Forgot password? Click here to reset