Dataset Distillation: A Comprehensive Review

01/17/2023
by   Ruonan Yu, et al.
0

Recent success of deep learning can be largely attributed to the huge amount of data used for training deep neural networks. However, the sheer amount of data significantly increase the burden on storage and transmission. It would also consume considerable time and computational resources to train models on such large datasets. Moreover, directly publishing raw data inevitably raise concerns on privacy and copyright. Focusing on these inconveniences, dataset distillation (DD), also known as dataset condensation (DC), has become a popular research topic in recent years. Given an original large dataset, DD aims at a much smaller dataset containing several synthetic samples, such that models trained on the synthetic dataset can have comparable performance with those trained on the original real one. This paper presents a comprehensive review and summary for recent advances in DD and its application. We first introduce the task in formal and propose an overall algorithmic framework followed by all existing DD methods. Then, we provide a systematic taxonomy of current methodologies in this area. Their theoretical relationship will also be discussed. We also point out current challenges in DD through extensive experiments and envision possible directions for future works.

READ FULL TEXT
research
10/06/2019

Improving Dataset Distillation

Dataset distillation is a method for reducing dataset sizes: the goal is...
research
05/03/2023

A Survey on Dataset Distillation: Approaches, Applications and Future Directions

Dataset distillation is attracting more attention in machine learning as...
research
05/29/2023

Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching

The expenses involved in training state-of-the-art deep hashing retrieva...
research
01/13/2023

A Comprehensive Survey to Dataset Distillation

Deep learning technology has unprecedentedly developed in the last decad...
research
01/11/2023

Data Distillation: A Survey

The popularity of deep learning has led to the curation of a vast number...
research
10/07/2021

A Data-Centric Approach for Training Deep Neural Networks with Less Data

While the availability of large datasets is perceived to be a key requir...
research
12/14/2020

Deep Learning for Material recognition: most recent advances and open challenges

Recognizing material from color images is still a challenging problem to...

Please sign up or login with your details

Forgot password? Click here to reset