Synbols: Probing Learning Algorithms with Synthetic Datasets

09/14/2020
by   Alexandre Lacoste, et al.
58

Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols – Synthetic Symbols – a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of distribution generalization, unsupervised representation learning, and object counting.

READ FULL TEXT

page 3

page 14

page 15

page 16

page 17

page 18

page 20

01/11/2019

Machine Learning Automation Toolbox (MLaut)

In this paper we present MLaut (Machine Learning AUtomation Toolbox) for...
12/08/2020

Synthetic Data: Opening the data floodgates to enable faster, more directed development of machine learning methods

Many ground-breaking advancements in machine learning can be attributed ...
12/23/2019

The Labeling Distribution Matrix (LDM): A Tool for Estimating Machine Learning Algorithm Capacity

Algorithm performance in supervised learning is a combination of memoriz...
03/12/2022

The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms

In recent years, the machine learning research community has benefited t...
12/18/2014

Stochastic Descent Analysis of Representation Learning Algorithms

Although stochastic approximation learning methods have been widely used...
07/28/2021

Doing Great at Estimating CATE? On the Neglected Assumptions in Benchmark Comparisons of Treatment Effect Estimators

The machine learning toolbox for estimation of heterogeneous treatment e...
03/19/2020

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions

Creating open-ended algorithms, which generate their own never-ending st...

Code Repositories

synbols

The Synbols dataset generator


view repo

synbols-benchmarks

Benchmarks for the Synbols project


view repo

MLRC2020

Reproducing : "Synbols: Probing Learning Algorithms with Synthetic Datasets"


view repo

DER-SSL

DER-SSL: Dark Experience Replay with Self-Supervised Learning


view repo