Merged-GHCIDR: Geometrical Approach to Reduce Image Data

09/06/2022
by   Devvrat Joshi, et al.
7

The computational resources required to train a model have been increasing since the inception of deep networks. Training neural networks on massive datasets have become a challenging and time-consuming task. So, there arises a need to reduce the dataset without compromising the accuracy. In this paper, we present novel variations of an earlier approach called reduction through homogeneous clustering for reducing dataset size. The proposed methods are based on the idea of partitioning the dataset into homogeneous clusters and selecting images that contribute significantly to the accuracy. We propose two variations: Geometrical Homogeneous Clustering for Image Data Reduction (GHCIDR) and Merged-GHCIDR upon the baseline algorithm - Reduction through Homogeneous Clustering (RHC) to achieve better accuracy and training time. The intuition behind GHCIDR involves selecting data points by cluster weights and geometrical distribution of the training set. Merged-GHCIDR involves merging clusters having the same labels using complete linkage clustering. We used three deep learning models- Fully Connected Networks (FCN), VGG1, and VGG16. We experimented with the two variants on four datasets- MNIST, CIFAR10, Fashion-MNIST, and Tiny-Imagenet. Merged-GHCIDR with the same percentage reduction as RHC showed an increase of 2.8 MNIST, Fashion-MNIST, CIFAR10, and Tiny-Imagenet, respectively.

READ FULL TEXT

page 1

page 2

research
08/27/2022

Geometrical Homogeneous Clustering for Image Data Reduction

In this paper, we present novel variations of an earlier approach called...
research
09/30/2019

Deep Amortized Clustering

We propose a deep amortized clustering (DAC), a neural architecture whic...
research
12/05/2020

Weight Update Skipping: Reducing Training Time for Artificial Neural Networks

Artificial Neural Networks (ANNs) are known as state-of-the-art techniqu...
research
06/19/2019

Training on test data: Removing near duplicates in Fashion-MNIST

MNIST and Fashion MNIST are extremely popular for testing in the machine...
research
10/07/2016

Distributed Averaging CNN-ELM for Big Data

Increasing the scalability of machine learning to handle big volume of d...
research
09/18/2019

Scalable Deep Unsupervised Clustering with Concrete GMVAEs

Discrete random variables are natural components of probabilistic cluste...
research
07/24/2019

A graphical heuristic for reduction and partitioning of large datasets for scalable supervised training

A scalable graphical method is presented for selecting, and partitioning...

Please sign up or login with your details

Forgot password? Click here to reset