Distributed Streaming Analytics on Large-scale Oceanographic Data using Apache Spark

07/31/2019
by   Janak Dahal, et al.
0

Real-world data from diverse domains require real-time scalable analysis. Large-scale data processing frameworks or engines such as Hadoop fall short when results are needed on-the-fly. Apache Spark's streaming library is increasingly becoming a popular choice as it can stream and analyze a significant amount of data. In this paper, we analyze large-scale geo-temporal data collected from the USGODAE (United States Global Ocean Data Assimilation Experiment) data catalog, and showcase and assess the ability of Spark stream processing. We measure the latency of streaming and monitor scalability by adding and removing nodes in the middle of a streaming job. We also verify the fault tolerance by stopping nodes in the middle of a job and making sure that the job is rescheduled and completed on other nodes. We design a full-stack application that automates data collection, data processing and visualizing the results. We also use Google Maps API to visualize results by color coding the world map with values from various analytics.

READ FULL TEXT
research
09/14/2017

Scalable real-time processing with Spark Streaming: implementation and design of a Car Information System

Streaming data processing is a hot topic in big data these days, because...
research
01/27/2023

TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization

Stream processing engines (SPEs) are widely used for large scale streami...
research
06/05/2023

Better Write Amplification for Streaming Data Processing

Many current applications have to perform data processing in a streaming...
research
06/21/2019

The Coming Age of Pervasive Data Processing

Emerging Big Data analytics and machine learning applications require a ...
research
04/18/2022

Unveiling User Behavior on Summit Login Nodes as a User

We observe and analyze usage of the login nodes of the leadership class ...
research
04/01/2020

Streaming Temporal Graphs: Subgraph Matching

We investigate solutions to subgraph matching within a temporal stream o...
research
11/03/2019

A Streaming Analytics Language for Processing Cyber Data

We present a domain-specific language called SAL(the Streaming Analytics...

Please sign up or login with your details

Forgot password? Click here to reset