Processing Analytical Workloads Incrementally

09/16/2015
by   Priyank Gupta, et al.
0

Analysis of large data collections using popular machine learning and statistical algorithms has been a topic of increasing research interest. A typical analysis workload consists of applying an algorithm to build a model on a data collection and subsequently refining it based on the results. In this paper we introduce model materialization and incremental model reuse as first class citizens in the execution of analysis workloads. We materialize built models instead of discarding them in a way that can be reused in subsequent computations. At the same time we consider manipulating an existing model (adding or deleting data from it) in order to build a new one. We discuss our approach in the context of popular machine learning models. We specify the details of how to incrementally maintain models as well as outline the suitable optimizations required to optimally use models and their incremental adjustments to build new ones. We detail our techniques for linear regression, naive bayes and logistic regression and present the suitable algorithms and optimizations to handle these models in our framework. We present the results of a detailed performance evaluation, using real and synthetic data sets. Our experiments analyze the various trade offs inherent in our approach and demonstrate vast performance benefits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2022

A Visual Analytics Approach to Building Logistic Regression Models and its Application to Health Records

Multidimensional data analysis has become increasingly important in many...
research
10/25/2022

Workload Similarity Analysis using Machine Learning Techniques

Finding the similarity between two workload behaviors is helpful in 1. c...
research
02/10/2019

SCADA System Testbed for Cybersecurity Research Using Machine Learning Approach

This paper presents the development of a Supervisory Control and Data Ac...
research
07/16/2017

Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data

Can we evolve better training data for machine learning algorithms? To i...
research
02/10/2023

Transactional Panorama: A Conceptual Framework for User Perception in Analytical Visual Interfaces

Many tools empower analysts and data scientists to consume analysis resu...
research
01/10/2020

Multi-layer Optimizations for End-to-End Data Analytics

We consider the problem of training machine learning models over multi-r...
research
07/06/2023

OLR-WA Online Regression with Weighted Average

Machine Learning requires a large amount of training data in order to bu...

Please sign up or login with your details

Forgot password? Click here to reset