Distributed Log Analysis on the Cloud Using MapReduce

02/10/2018
by   Galip Aydın, et al.
0

In this paper we describe our work on designing a web based, distributed data analysis system based on the popular MapReduce framework deployed on a small cloud; developed specifically for analyzing web server logs. The log analysis system consists of several cluster nodes, it splits the large log files on a distributed file system and quickly processes them using MapReduce programming model. The cluster is created using an open source cloud infrastructure, which allows us to easily expand the computational power by adding new nodes. This gives us the ability to automatically resize the cluster according to the data analysis requirements. We implemented MapReduce programs for basic log analysis needs like frequency analysis, error detection, busy hour detection etc. as well as more complex analyses which require running several jobs. The system can automatically identify and analyze several web server log types such as Apache, IIS, Squid etc. We use open source projects for creating the cloud infrastructure and running MapReduce jobs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2023

Cloud-native RStudio on Kubernetes for Hopsworks

In order to fully benefit from cloud computing, services are designed fo...
research
02/17/2021

Deployment of Elastic Virtual Hybrid Clusters Across Cloud Sites

Virtual clusters are widely used computing platforms than can be deploye...
research
05/15/2023

Validity Constraints for Data Analysis Workflows

Porting a scientific data analysis workflow (DAW) to a cluster infrastru...
research
10/11/2022

Client Error Clustering Approaches in Content Delivery Networks (CDN)

Content delivery networks (CDNs) are the backbone of the Internet and ar...
research
04/14/2020

Interactive distributed cloud-based web-server systems for the smart healthcare industry

The work aims to investigate the possible contemporary interactive cloud...
research
07/02/2021

Structural biology in the clouds: The WeNMR-EOSC Ecosystem

Structural biology aims at characterizing the structural and dynamic pro...
research
10/11/2021

Fallout: Distributed Systems Testing as a Service

All modern distributed systems list performance and scalability as their...

Please sign up or login with your details

Forgot password? Click here to reset