Data Summarization via Bilevel Optimization

09/26/2021
by   Zalán Borsos, et al.
0

The increasing availability of massive data sets poses a series of challenges for machine learning. Prominent among these is the need to learn models under hardware or human resource constraints. In such resource-constrained settings, a simple yet powerful approach is to operate on small subsets of the data. Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective. However, existing coreset constructions are highly model-specific and are limited to simple models such as linear regression, logistic regression, and k-means. In this work, we propose a generic coreset construction framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem. In contrast to existing approaches, our framework does not require model-specific adaptations and applies to any twice differentiable model, including neural networks. We show the effectiveness of our framework for a wide range of models in various settings, including training non-convex models online and batch active learning.

READ FULL TEXT

page 16

page 20

research
06/06/2020

Coresets via Bilevel Optimization for Continual Learning and Streaming

Coresets are small data summaries that are sufficient for model training...
research
07/28/2022

Adaptive Second Order Coresets for Data-efficient Machine Learning

Training machine learning models on massive datasets incurs substantial ...
research
02/17/2023

Black-Box Batch Active Learning for Regression

Batch active learning is a popular approach for efficiently training mac...
research
11/01/2022

Batch Active Learning from the Perspective of Sparse Approximation

Active learning enables efficient model training by leveraging interacti...
research
03/25/2020

Dimension Independent Generalization Error with Regularized Online Optimization

One classical canon of statistics is that large models are prone to over...
research
12/19/2020

GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning

Large scale machine learning and deep models are extremely data-hungry. ...
research
10/10/2020

Maximin Optimization for Binary Regression

We consider regression problems with binary weights. Such optimization p...

Please sign up or login with your details

Forgot password? Click here to reset