Fast Object Class Labelling via Speech

11/23/2018
by   Michael Gygli, et al.
2

Object class labelling is the task of annotating images with labels on the presence or absence of objects from a given class vocabulary. Simply asking one yes-no question per class, however, has a cost that is linear in the vocabulary size and is thus inefficient for large vocabularies. Modern approaches rely on a hierarchical organization of the vocabulary to reduce annotation time, but remain expensive (several minutes per image for the 200 classes in ILSVRC). Instead, we propose a new interface where classes are annotated via speech. Speaking is fast and allows for direct access to the class name, without searching through a list or hierarchy. As additional advantages, annotators can simultaneously speak and scan the image for objects, the interface can be kept extremely simple, and using it requires less mouse movement. However, a key challenge is to train annotators to only say words from the given class vocabulary. We present a way to tackle this challenge and show that our method yields high-quality annotations at significant speed gains (2.3 - 14.9x faster than existing methods).

READ FULL TEXT

page 1

page 3

page 4

page 5

page 7

page 8

research
05/25/2019

Efficient Object Annotation via Speaking and Pointing

Deep neural networks deliver state-of-the-art visual recognition, but th...
research
06/04/2019

Natural Vocabulary Emerges from Free-Form Annotations

We propose an approach for annotating object classes using free-form tex...
research
02/16/2020

Block Annotation: Better Image Annotation for Semantic Segmentation with Sub-Image Decomposition

Image datasets with high-quality pixel-level annotations are valuable fo...
research
08/23/2014

Learning a Hierarchical Compositional Shape Vocabulary for Multi-class Object Representation

Hierarchies allow feature sharing between objects at multiple levels of ...
research
03/28/2022

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

Novel object captioning aims at describing objects absent from training ...
research
01/07/2022

Detecting Twenty-thousand Classes using Image-level Supervision

Current object detectors are limited in vocabulary size due to the small...
research
07/06/2021

Terminologies, modèles de données archéologiques et thésaurus documentaires

The HyperThésau and Bibracte numérique projects have given rise to a col...

Please sign up or login with your details

Forgot password? Click here to reset