Do We Need More Training Data?

03/05/2015
by   Xiangxin Zhu, et al.
0

Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of discriminatively trained templates defined on oriented gradient features. We investigate the performance of mixtures of templates as the number of mixture components and the amount of training data grows. Surprisingly, even with proper treatment of regularization and "outliers", the performance of classic mixture models appears to saturate quickly (∼10 templates and ∼100 positive training examples per template). This is not a limitation of the feature space as compositional mixtures that share template parameters via parts and that can synthesize new templates not encountered during training yield significantly better performance. Based on our analysis, we conjecture that the greatest gains in detection performance will continue to derive from improved representations and learning algorithms that can make efficient use of large datasets.

READ FULL TEXT

page 4

page 5

page 8

page 10

page 13

research
05/02/2013

Learning Mixtures of Bernoulli Templates by Two-Round EM with Performance Guarantee

Dasgupta and Shulman showed that a two-round variant of the EM algorithm...
research
12/20/2021

RetroComposer: Discovering Novel Reactions by Composing Templates for Retrosynthesis Prediction

The main target of retrosynthesis is to recursively decompose desired mo...
research
06/23/2022

Mining Error Templates for Grammatical Error Correction

Some grammatical error correction (GEC) systems incorporate hand-crafted...
research
05/24/2023

InteractiveIE: Towards Assessing the Strength of Human-AI Collaboration in Improving the Performance of Information Extraction

Learning template based information extraction from documents is a cruci...
research
04/04/2022

Into-TTS : Intonation Template based Prosody Control System

Intonations take an important role in delivering the intention of the sp...
research
04/28/2022

Automatic Detection and Classification of Symbols in Engineering Drawings

A method of finding and classifying various components and objects in a ...
research
03/28/2022

Optimizing Elimination Templates by Greedy Parameter Search

We propose a new method for constructing elimination templates for effic...

Please sign up or login with your details

Forgot password? Click here to reset