AutoCoreset: An Automatic Practical Coreset Construction Framework

05/19/2023
by   Alaa Maalouf, et al.
0

A coreset is a tiny weighted subset of an input set, that closely resembles the loss function, with respect to a certain set of queries. Coresets became prevalent in machine learning as they have shown to be advantageous for many applications. While coreset research is an active research area, unfortunately, coresets are constructed in a problem-dependent manner, where for each problem, a new coreset construction algorithm is usually suggested, a process that may take time or may be hard for new researchers in the field. Even the generic frameworks require additional (problem-dependent) computations or proofs to be done by the user. Besides, many problems do not have (provable) small coresets, limiting their applicability. To this end, we suggest an automatic practical framework for constructing coresets, which requires (only) the input data and the desired cost function from the user, without the need for any other task-related computation to be done by the user. To do so, we reduce the problem of approximating a loss function to an instance of vector summation approximation, where the vectors we aim to sum are loss vectors of a specific subset of the queries, such that we aim to approximate the image of the function on this subset. We show that while this set is limited, the coreset is quite general. An extensive experimental study on various machine learning applications is also conducted. Finally, we provide a ``plug and play" style implementation, proposing a user-friendly system that can be easily used to apply coresets for many problems. Full open source code can be found at \href{https://github.com/alaamaalouf/AutoCoreset}{\text{https://github.com/alaamaalouf/AutoCoreset}}. We believe that these contributions enable future research and easier use and applications of coresets.

READ FULL TEXT

page 8

page 16

page 17

research
11/04/2021

A Unified Approach to Coreset Learning

Coreset of a given dataset and loss function is usually a small weighed ...
research
04/17/2023

MFGLib: A Library for Mean-Field Games

Mean-field games (MFGs) are limiting models to approximate N-player game...
research
07/06/2018

The CodRep Machine Learning on Source Code Competition

CodRep is a machine learning competition on source code data. It is care...
research
02/21/2022

A Probabilistic Approach to The Perfect Sum Problem

The subset sum problem is known to be an NP-hard problem in the field of...
research
06/09/2020

Faster PAC Learning and Smaller Coresets via Smoothed Analysis

PAC-learning usually aims to compute a small subset (ε-sample/net) from ...
research
11/04/2021

Introduction to Coresets: Approximated Mean

A strong coreset for the mean queries of a set P in ℝ^d is a small weigh...
research
10/19/2019

Introduction to Coresets: Accurate Coresets

A coreset (or core-set) of an input set is its small summation, such tha...

Please sign up or login with your details

Forgot password? Click here to reset