Loss-Proportional Subsampling for Subsequent ERM

06/07/2013
by   Paul Mineiro, et al.
0

We propose a sampling scheme suitable for reducing a data set prior to selecting a hypothesis with minimum empirical risk. The sampling only considers a subset of the ultimate (unknown) hypothesis set, but can nonetheless guarantee that the final excess risk will compare favorably with utilizing the entire original data set. We demonstrate the practical benefits of our approach on a large dataset which we subsample and subsequently fit with boosted trees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Learning Bounds for Risk-sensitive Learning

In risk-sensitive learning, one aims to find a hypothesis that minimizes...
research
09/20/2023

Optimize-via-Predict: Realizing out-of-sample optimality in data-driven optimization

We examine a stochastic formulation for data-driven optimization wherein...
research
11/06/2018

Frank-Wolfe Algorithm for Exemplar Selection

In this paper, we consider the problem of selecting representatives from...
research
05/07/2014

A Mathematical Theory of Learning

In this paper, a mathematical theory of learning is proposed that has ma...
research
11/29/2018

The Multiple Random Dot Product Graph Model

Data in the form of graphs, or networks, arise naturally in a number of ...
research
04/14/2023

Obfuscation of Discrete Data

Data obfuscation deals with the problem of masking a data-set in such a ...
research
06/27/2018

Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data

Empirical risk minimization is the principal tool for prediction problem...

Please sign up or login with your details

Forgot password? Click here to reset