Dataset Condensation via Efficient Synthetic-Data Parameterization

05/30/2022
by   Jang-Hyun Kim, et al.
44

The great success of machine learning with massive amounts of data comes at a price of huge computation costs and storage for training and tuning. Recent studies on dataset condensation attempt to reduce the dependence on such massive data by synthesizing a compact training dataset. However, the existing approaches have fundamental limitations in optimization due to the limited representability of synthetic datasets without considering any data regularity characteristics. To this end, we propose a novel condensation framework that generates multiple synthetic data with a limited storage budget via efficient parameterization considering data regularity. We further analyze the shortcomings of the existing gradient matching-based condensation methods and develop an effective optimization technique for improving the condensation of training data information. We propose a unified algorithm that drastically improves the quality of condensed data against the current state-of-the-art on CIFAR-10, ImageNet, and Speech Commands.

READ FULL TEXT

page 7

page 16

page 17

research
11/20/2022

Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation

Model-based deep learning has achieved astounding successes due in part ...
research
05/24/2023

Post-processing Private Synthetic Data for Improving Utility on Selected Measures

Existing private synthetic data generation algorithms are agnostic to do...
research
12/09/2022

Synthetic Data for Object Classification in Industrial Applications

One of the biggest challenges in machine learning is data collection. Tr...
research
06/15/2022

Condensing Graphs via One-Step Gradient Matching

As training deep learning models on large dataset takes a lot of time an...
research
03/03/2022

CAFE: Learning to Condense Dataset by Aligning Features

Dataset condensation aims at reducing the network training effort throug...
research
10/10/2018

LIRS: Enabling efficient machine learning on NVM-based storage via a lightweight implementation of random shuffling

Machine learning algorithms, such as Support Vector Machine (SVM) and De...

Please sign up or login with your details

Forgot password? Click here to reset