A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

by   Sanaa Hamid Mohamed, et al.

This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source platform; Hadoop, are enabling the development of a large number of cloud-based services and big data applications. MapReduce and Hadoop thus introduce innovative, efficient, and accelerated intensive computations and analytics. These services usually utilize commodity clusters within geographically-distributed data centers and provide cost-effective and elastic solutions. However, the increasing traffic between and within the data centers that migrate, store, and process big data, is becoming a bottleneck that calls for enhanced infrastructures capable of reducing the congestion and power consumption. Moreover, enterprises with multiple tenants requesting various big data services are challenged by the need to optimize leasing their resources at reduced running costs and power consumption while avoiding under or over utilization. In this survey, we present a summary of the characteristics of various big data programming models and applications and provide a review of cloud computing infrastructures, and related technologies such as virtualization, and software-defined networking that increasingly support big data systems. Moreover, we provide a brief review of data centers topologies, routing protocols, and traffic characteristics, and emphasize the implications of big data on such cloud data centers and their supporting networks. Wide ranging efforts were devoted to optimize systems that handle big data in terms of various applications performance metrics and/or infrastructure energy efficiency. Finally, some insights and future research directions are provided.


page 1

page 9

page 26

page 31


Big Data in Cloud Computing Review and Opportunities

Big Data is used in decision making process to gain useful insights hidd...

Standards for Energy Efficient Virtualization, Content Distribution and Big Data in Beyond 5G Networks

Power consumption in communication networks and the supporting computing...

Support vector regression model for BigData systems

Nowadays Big Data are becoming more and more important. Many sectors of ...

A Survey of Semantics-Aware Performance Optimization for Data-Intensive Computing

We are living in the era of Big Data and witnessing the explosion of dat...

BigDataSDNSim: A Simulator for Analyzing Big Data Applications in Software-Defined Cloud Data Centers

Emerging paradigms of big data and Software-Defined Networking (SDN) in ...

Greening Big Data Networks: The Impact of Veracity

The continuous increase in big data applications, in number and types, c...

Machine Learning for Performance Prediction of Spark Cloud Applications

Big data applications and analytics are employed in many sectors for a v...