A Fast, Scalable, Universal Approach For Distributed Data Aggregations

10/27/2020
by   Niranda Perera, et al.
0

In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these applications. They are traditionally aimed at generating meaningful information on large data-sets, and today, they are being used for engineering more effective features for complex AI models. Aggregations are usually carried out on top of data abstractions such as tables/ arrays and are combined with other operations such as grouping of values. There are frameworks that excel in the said domains individually. But, we believe that there is an essential requirement for a data analytics tool that can universally integrate with existing frameworks, and thereby increase the productivity and efficiency of the entire data analytics pipeline. Cylon endeavors to fulfill this void. In this paper, we present Cylon's fast and scalable aggregation operations implemented on top of a distributed in-memory table structure that universally integrates with existing frameworks.

READ FULL TEXT

page 4

page 5

research
07/19/2020

High Performance Data Engineering Everywhere

The amazing advances being made in the fields of machine and deep learni...
research
08/22/2021

Big Data Analytics in Humanitarian and Disaster Operations: A Systematic Review

By the outset of this review, 168 million people needed humanitarian aid...
research
03/21/2020

Translation of Array-Based Loops to Distributed Data-Parallel Programs

Large volumes of data generated by scientific experiments and simulation...
research
01/14/2022

Tools and Practices for Responsible AI Engineering

Responsible Artificial Intelligence (AI) - the practice of developing, e...
research
01/19/2023

Supercharging Distributed Computing Environments For High Performance Data Engineering

The data engineering and data science community has embraced the idea of...
research
07/03/2023

In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

The Data Science domain has expanded monumentally in both research and i...
research
10/08/2021

Big Machinery Data Preprocessing Methodology for Data-Driven Models in Prognostics and Health Management

Sensor monitoring networks and advances in big data analytics have guide...

Please sign up or login with your details

Forgot password? Click here to reset