The Randomness of Input Data Spaces is an A Priori Predictor for Generalization

06/08/2021
by   Martin Briesch, et al.
0

Over-parameterized models can perfectly learn various types of data distributions, however, generalization error is usually lower for real data in comparison to artificial data. This suggests that the properties of data distributions have an impact on generalization capability. This work focuses on the search space defined by the input data and assumes that the correlation between labels of neighboring input values influences generalization. If correlation is low, the randomness of the input data space is high leading to high generalization error. We suggest to measure the randomness of an input data space using Maurer's universal. Results for synthetic classification tasks and common image classification benchmarks (MNIST, CIFAR10, and Microsoft's cats vs. dogs data set) find a high correlation between the randomness of input data spaces and the generalization error of deep neural networks for binary classification problems.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

03/09/2021

More data or more parameters? Investigating the effect of data structure on generalization

One of the central features of deep learning is the generalization abili...
01/16/2017

Datenqualität in Regressionsproblemen

Regression models are increasingly built using datasets which do not fol...
01/18/2019

Foothill: A Quasiconvex Regularization Function

Deep neural networks (DNNs) have demonstrated success for many supervise...
12/03/2019

Windable Heads Recognizing NL with Constant Randomness

Every language in NL has a k-head two-way nondeterministic finite automa...
09/11/2021

MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning

Among the biggest challenges we face in utilizing neural networks traine...
12/14/2016

Deep Function Machines: Generalized Neural Networks for Topological Layer Expression

In this paper we propose a generalization of deep neural networks called...
07/19/2018

The Deep Kernelized Autoencoder

Autoencoders learn data representations (codes) in such a way that the i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.