Active Sampler: Light-weight Accelerator for Complex Data Analytics at Scale

12/12/2015
by   Jinyang Gao, et al.
0

Recent years have witnessed amazing outcomes from "Big Models" trained by "Big Data". Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more "learning value" to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a light-weight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of large-scale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.6-2.2x.

READ FULL TEXT

page 1

page 4

page 5

research
04/11/2022

The Principle of Least Sensing: A Privacy-Friendly Sensing Paradigm for Urban Big Data Analytics

With the worldwide emergence of data protection regulations, how to cond...
research
02/24/2018

Stochastic Gradient Descent on Highly-Parallel Architectures

There is an increased interest in building data analytics frameworks wit...
research
07/18/2021

A stepped sampling method for video detection using LSTM

Artificial neural networks that simulate human achieves great successes....
research
04/01/2018

SampleAhead: Online Classifier-Sampler Communication for Learning from Synthesized Data

State-of-the-art techniques of artificial intelligence, in particular de...
research
01/02/2019

Approximate Computation for Big Data Analytics

Over the past a few years, research and development has made significant...
research
04/02/2020

High Bandwidth Memory on FPGAs: A Data Analytics Perspective

FPGA-based data processing in datacenters is increasing in popularity du...

Please sign up or login with your details

Forgot password? Click here to reset