Self-supervised object detection from audio-visual correspondence

04/13/2021
by   Triantafyllos Afouras, et al.
0

We tackle the problem of learning object detectors without supervision. Differently from weakly-supervised object detection, we do not assume image-level class labels. Instead, we extract a supervisory signal from audio-visual data, using the audio component to "teach" the object detector. While this problem is related to sound source localisation, it is considerably harder because the detector must classify the objects by type, enumerate each instance of the object, and do so even when the object is silent. We tackle this problem by first designing a self-supervised framework with a contrastive objective that jointly learns to classify and localise objects. Then, without using any supervision, we simply use these self-supervised labels and boxes to train an image-based object detector. With this, we outperform previous unsupervised and weakly-supervised detectors for the task of object detection and sound source localization. We also show that we can align this detector to ground-truth classes with as little as one label per pseudo-class, and show how our method can learn to detect generic objects that go beyond instruments, such as airplanes and cats.

READ FULL TEXT

page 1

page 3

page 6

page 9

page 16

page 17

research
08/16/2022

Object Discovery via Contrastive Learning for Weakly Supervised Object Detection

Weakly Supervised Object Detection (WSOD) is a task that detects objects...
research
05/23/2020

Self-supervised Robust Object Detectors from Partially Labelled datasets

In the object detection task, merging various datasets from similar cont...
research
12/16/2020

Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera

Deep learning is the essential building block of state-of-the-art person...
research
08/10/2019

Object-Aware Instance Labeling for Weakly Supervised Object Detection

Weakly supervised object detection (WSOD), where a detector is trained w...
research
02/21/2023

Self-improving object detection via disagreement reconciliation

Object detectors often experience a drop in performance when new environ...
research
02/07/2023

Look around and learn: self-improving object detection by exploration

Object detectors often experience a drop in performance when new environ...
research
11/09/2018

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

We tackle the problem of audiovisual scene analysis for weakly-labeled d...

Please sign up or login with your details

Forgot password? Click here to reset