Support vector regression model for BigData systems

12/05/2016
by   Alessandro Maria Rizzi, et al.
0

Nowadays Big Data are becoming more and more important. Many sectors of our economy are now guided by data-driven decision processes. Big Data and business intelligence applications are facilitated by the MapReduce programming model while, at infrastructural layer, cloud computing provides flexible and cost effective solutions for allocating on demand large clusters. In such systems, capacity allocation, which is the ability to optimally size minimal resources for achieve a certain level of performance, is a key challenge to enhance performance for MapReduce jobs and minimize cloud resource costs. In order to do so, one of the biggest challenge is to build an accurate performance model to estimate job execution time of MapReduce systems. Previous works applied simulation based models for modeling such systems. Although this approach can accurately describe the behavior of Big Data clusters, it is too computationally expensive and does not scale to large system. We try to overcome these issues by applying machine learning techniques. More precisely we focus on Support Vector Regression (SVR) which is intrinsically more robust w.r.t other techniques, like, e.g., neural networks, and less sensitive to outliers in the training set. To better investigate these benefits, we compare SVR to linear regression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2021

Machine Learning for Performance Prediction of Spark Cloud Applications

Big data applications and analytics are employed in many sectors for a v...
research
10/01/2019

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

This survey article reviews the challenges associated with deploying and...
research
12/03/2018

Resource Management and Scheduling for Big Data Applications in Cloud Computing Environments

This chapter presents software architectures of the big data processing ...
research
08/13/2018

Allocation of Graph Jobs in Geo-Distributed Cloud Networks

Recently, processing of big-data has drawn tremendous attention, where c...
research
10/17/2019

ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance

Configuration space complexity makes the big-data software systems hard ...
research
03/03/2019

Multiple Learning for Regression in big data

Regression problems that have closed-form solutions are well understood ...
research
09/22/2019

Cutting the Unnecessary Long Tail: Cost-Effective Big Data Clustering in the Cloud

Clustering big data often requires tremendous computational resources wh...

Please sign up or login with your details

Forgot password? Click here to reset