A Fast, Scalable, Universal Approach For Distributed Data Aggregations

by   Niranda Perera, et al.
University of Moratuwa
Indiana University

In the current era of Big Data, data engineering has transformed into an essential field of study across many branches of science. Advancements in Artificial Intelligence (AI) have broadened the scope of data engineering and opened up new applications in both enterprise and research communities. Aggregations (also termed reduce in functional programming) are an integral functionality in these applications. They are traditionally aimed at generating meaningful information on large data-sets, and today, they are being used for engineering more effective features for complex AI models. Aggregations are usually carried out on top of data abstractions such as tables/ arrays and are combined with other operations such as grouping of values. There are frameworks that excel in the said domains individually. But, we believe that there is an essential requirement for a data analytics tool that can universally integrate with existing frameworks, and thereby increase the productivity and efficiency of the entire data analytics pipeline. Cylon endeavors to fulfill this void. In this paper, we present Cylon's fast and scalable aggregation operations implemented on top of a distributed in-memory table structure that universally integrates with existing frameworks.


page 4

page 5


High Performance Data Engineering Everywhere

The amazing advances being made in the fields of machine and deep learni...

Big Data Analytics in Humanitarian and Disaster Operations: A Systematic Review

By the outset of this review, 168 million people needed humanitarian aid...

Translation of Array-Based Loops to Distributed Data-Parallel Programs

Large volumes of data generated by scientific experiments and simulation...

Tools and Practices for Responsible AI Engineering

Responsible Artificial Intelligence (AI) - the practice of developing, e...

Supercharging Distributed Computing Environments For High Performance Data Engineering

The data engineering and data science community has embraced the idea of...

In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

The Data Science domain has expanded monumentally in both research and i...

Big Machinery Data Preprocessing Methodology for Data-Driven Models in Prognostics and Health Management

Sensor monitoring networks and advances in big data analytics have guide...

Please sign up or login with your details

Forgot password? Click here to reset