A graphical heuristic for reduction and partitioning of large datasets for scalable supervised training

07/24/2019
by   Sumedh Yadav, et al.
0

A scalable graphical method is presented for selecting, and partitioning datasets for the training phase of a classification task. For the heuristic, a clustering algorithm is required to get its computation cost in a reasonable proportion to the task itself. This step is proceeded by construction of an information graph of the underlying classification patterns using approximate nearest neighbor methods. The presented method constitutes of two approaches, one for reducing a given training set, and another for partitioning the selected/reduced set. The heuristic targets large datasets, since the primary goal is significant reduction in training computation run-time without compromising prediction accuracy. Test results show that both approaches significantly speed-up the training task when compared against that of state-of-the-art shrinking heuristic available in LIBSVM. Furthermore, the approaches closely follow or even outperform in prediction accuracy. A network design is also presented for the partitioning based distributed training formulation. Added speed-up in training run-time is observed when compared to that of serial implementation of the approaches.

READ FULL TEXT

page 1

page 17

research
04/11/2023

Partitioner Selection with EASE to Optimize Distributed Graph Processing

For distributed graph processing on massive graphs, a graph is partition...
research
03/23/2022

Out-of-Core Edge Partitioning at Linear Run-Time

Graph edge partitioning is an important preprocessing step to optimize d...
research
12/22/2017

ADWISE: Adaptive Window-based Streaming Edge Partitioning for High-Speed Graph Processing

In recent years, the graph partitioning problem gained importance as a m...
research
02/04/2023

Reducing Nearest Neighbor Training Sets Optimally and Exactly

In nearest-neighbor classification, a training set P of points in ℝ^d wi...
research
02/16/2018

Recognizing Cuneiform Signs Using Graph Based Methods

The cuneiform script constitutes one of the earliest systems of writing ...
research
05/10/2021

GSPMD: General and Scalable Parallelization for ML Computation Graphs

We present GSPMD, an automatic, compiler-based parallelization system fo...
research
09/06/2022

Merged-GHCIDR: Geometrical Approach to Reduce Image Data

The computational resources required to train a model have been increasi...

Please sign up or login with your details

Forgot password? Click here to reset