Run Time Prediction for Big Data Iterative ML Algorithms: a KMeans case study

10/09/2017
by   Eduardo Rodrigues, et al.
0

Data science and machine learning algorithms running on big data infrastructure are increasingly important in activities ranging from business intelligence and analytics to cybersecurity, smart city management, and many fields of science and engineering. As these algorithms are further integrated into daily operations, understanding how long they take to run on a big data infrastructure is paramount to controlling costs and delivery times. In this paper we discuss the issues involved in understanding the run time of iterative machine learning algorithms and provide a case study of such an algorithm - including a statistical characterization and model of the run time of an implementation of K-Means for the Spark big data engine using the Edward probabilistic programming language.

READ FULL TEXT

page 1

page 2

page 3

research
08/02/2013

United Statistical Algorithm, Small and Big Data: Future OF Statistician

This article provides the role of big idea statisticians in future of Bi...
research
04/30/2018

A Cyberinfrastructure for BigData Transportation Engineering

Big Data-driven transportation engineering has the potential to improve ...
research
03/03/2020

Artificial Intelligence, Chaos, Prediction and Understanding in Science

Machine learning and deep learning techniques are contributing much to t...
research
07/22/2020

Big Issues for Big Data: challenges for critical spatial data analytics

In this paper we consider some of the issues of working with big data an...
research
12/30/2013

Petuum: A New Platform for Distributed Machine Learning on Big Data

What is a systematic way to efficiently apply a wide spectrum of advance...
research
02/28/2019

Big Data for Traffic Monitoring and Management

The last two decades witnessed tremendous advances in the Information an...
research
07/03/2017

Version 0.1 of the BigDAWG Polystore System

A polystore system is a database management system (DBMS) composed of in...

Please sign up or login with your details

Forgot password? Click here to reset