The Labeling Distribution Matrix (LDM): A Tool for Estimating Machine Learning Algorithm Capacity

12/23/2019
by   Pedro Sandoval Segura, et al.
0

Algorithm performance in supervised learning is a combination of memorization, generalization, and luck. By estimating how much information an algorithm can memorize from a dataset, we can set a lower bound on the amount of performance due to other factors such as generalization and luck. With this goal in mind, we introduce the Labeling Distribution Matrix (LDM) as a tool for estimating the capacity of learning algorithms. The method attempts to characterize the diversity of possible outputs by an algorithm for different training datasets, using this to measure algorithm flexibility and responsiveness to data. We test the method on several supervised learning algorithms, and find that while the results are not conclusive, the LDM does allow us to gain potentially valuable insight into the prediction behavior of algorithms. We also introduce the Label Autoencoder as an additional tool for estimating algorithm capacity, with more promising initial results.

READ FULL TEXT
research
09/30/2020

First-order Optimization for Superquantile-based Supervised Learning

Classical supervised learning via empirical risk (or negative log-likeli...
research
10/02/2022

Learning Algorithm Generalization Error Bounds via Auxiliary Distributions

Generalization error boundaries are essential for comprehending how well...
research
11/04/2022

Impact Learning: A Learning Method from Features Impact and Competition

Machine learning is the study of computer algorithms that can automatica...
research
09/14/2020

Synbols: Probing Learning Algorithms with Synthetic Datasets

Progress in the field of machine learning has been fueled by the introdu...
research
07/22/2015

An Empirical Comparison of SVM and Some Supervised Learning Algorithms for Vowel recognition

In this article, we conduct a study on the performance of some supervise...
research
05/13/2023

On the Capacity of DNA Labeling

DNA labeling is a powerful tool in molecular biology and biotechnology t...
research
09/02/2022

Feature diversity in self-supervised learning

Many studies on scaling laws consider basic factors such as model size, ...

Please sign up or login with your details

Forgot password? Click here to reset