A Coreset Learning Reality Check

01/15/2023
by   Fred Lu, et al.
0

Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets. In recent years, several works have proposed methods for subsampling rows from a data matrix while maintaining relevant information for classification. While these works are supported by theory and limited experiments, to date there has not been a comprehensive evaluation of these methods. In our work, we directly compare multiple methods for logistic regression drawn from the coreset and optimal subsampling literature and discover inconsistencies in their effectiveness. In many cases, methods do not outperform simple uniform subsampling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2018

On Coresets for Logistic Regression

Coresets are one of the central methods to facilitate the analysis of la...
research
06/01/2021

Logistic Regression Through the Veil of Imprecise Data

Logistic regression is an important statistical tool for assessing the p...
research
11/01/2019

Robust contrastive learning and nonlinear ICA in the presence of outliers

Nonlinear independent component analysis (ICA) is a general framework fo...
research
10/04/2018

Privacy-Preserving Multiparty Learning For Logistic Regression

In recent years, machine learning techniques are widely used in numerous...
research
12/03/2018

On functional logistic regression via RKHS's

In this work we address the problem of functional logistic regression, r...
research
08/20/2019

Compliance Change Tracking in Business Process Services

Regulatory compliance is an organization's adherence to laws, regulation...

Please sign up or login with your details

Forgot password? Click here to reset