Natural Vocabulary Emerges from Free-Form Annotations

by   Jordi Pont-Tuset, et al.

We propose an approach for annotating object classes using free-form text written by undirected and untrained annotators. Free-form labeling is natural for annotators, they intuitively provide very specific and exhaustive labels, and no training stage is necessary. We first collect 729 labels on 15k images using 124 different annotators. Then we automatically enrich the structure of these free-form annotations by discovering a natural vocabulary of 4020 classes within them. This vocabulary represents the natural distribution of objects well and is learned directly from data, instead of being an educated guess done before collecting any labels. Hence, the natural vocabulary emerges from a large mass of free-form annotations. To do so, we (i) map the raw input strings to entities in an ontology of physical objects (which gives them an unambiguous meaning); and (ii) leverage inter-annotator co-occurrences, as well as biases and knowledge specific to individual annotators. Finally, we also automatically extract natural vocabularies of reduced size that have high object coverage while remaining specific. These reduced vocabularies represent the natural distribution of objects much better than commonly used predefined vocabularies. Moreover, they feature more uniform sample distribution over classes.



There are no comments yet.


page 1

page 4

page 5

page 6

page 7

page 8

page 9

page 10


Fast Object Class Labelling via Speech

Object class labelling is the task of annotating images with labels on t...

Detecting Twenty-thousand Classes using Image-level Supervision

Current object detectors are limited in vocabulary size due to the small...

Multilingual Language Processing From Bytes

We describe an LSTM-based model which we call Byte-to-Span (BTS) that re...

Learning a Hierarchical Compositional Shape Vocabulary for Multi-class Object Representation

Hierarchies allow feature sharing between objects at multiple levels of ...

Exploiting Semantic Contextualization for Interpretation of Human Activity in Videos

We use large-scale commonsense knowledge bases, e.g. ConceptNet, to prov...

Learning with Noisy Labels by Targeted Relabeling

Crowdsourcing platforms are often used to collect datasets for training ...

A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning

Subword tokenization is a commonly used input pre-processing step in mos...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.