Parallel-and-stream accelerator for computationally fast supervised learning

10/29/2021
by   Emily C. Hector, et al.
0

Two dominant distributed computing strategies have emerged to overcome the computational bottleneck of supervised learning with big data: parallel data processing in the MapReduce paradigm and serial data processing in the online streaming paradigm. Despite the two strategies' common divide-and-combine approach, they differ in how they aggregate information, leading to different trade-offs between statistical and computational performance. In this paper, we propose a new hybrid paradigm, termed a Parallel-and-Stream Accelerator (PASA), that uses the strengths of both strategies for computationally fast and statistically efficient supervised learning. PASA's architecture nests online streaming processing into each distributed and parallelized data process in a MapReduce framework. PASA leverages the advantages and mitigates the disadvantages of both the MapReduce and online streaming approaches to deliver a more flexible paradigm satisfying practical computing needs. We study the analytic properties and computational complexity of PASA, and detail its implementation for two key statistical learning tasks. We illustrate its performance through simulations and a large-scale data example building a prediction model for online purchases from advertising data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2019

A Foundation of Lazy Streaming Graphs

A streaming graph system continuously processes a stream of operations o...
research
09/14/2017

Scalable real-time processing with Spark Streaming: implementation and design of a Car Information System

Streaming data processing is a hot topic in big data these days, because...
research
02/16/2018

Online Machine Learning in Big Data Streams

The area of online machine learning in big data streams covers algorithm...
research
06/05/2023

Better Write Amplification for Streaming Data Processing

Many current applications have to perform data processing in a streaming...
research
01/26/2020

A Visual Analytics Framework for Reviewing Streaming Performance Data

Understanding and tuning the performance of extreme-scale parallel compu...
research
02/11/2018

Distributed Readability Analysis Of Turkish Elementary School Textbooks

The readability assessment deals with estimating the level of difficulty...
research
08/27/2018

Piecewise Linear Approximation in Data Streaming: Algorithmic Implementations and Experimental Analysis

Piecewise Linear Approximation (PLA) is a well-established tool to reduc...

Please sign up or login with your details

Forgot password? Click here to reset