HPTMT Parallel Operators for High Performance Data Science Data Engineering

08/13/2021
by   Vibhatha Abeykoon, et al.
4

Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together.

READ FULL TEXT

page 1

page 2

page 6

page 8

page 14

page 15

research
01/31/2023

Data Science: A Systematic Treatment

There has been an increasing recognition of the value of data and of dat...
research
07/27/2021

HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks

Data-intensive applications impact many domains, and their steadily incr...
research
09/06/2019

SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle

Machine learning (ML) applications become increasingly common in many do...
research
01/25/2019

Flexible Operator Embeddings via Deep Learning

Integrating machine learning into the internals of database management s...
research
07/03/2023

In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

The Data Science domain has expanded monumentally in both research and i...
research
06/26/2020

From Simple Features to Moving Features and Beyond?

Mobility data science lacks common data structures and analytical functi...
research
08/10/2023

Building a serverless Data Lakehouse from spare parts

The recently proposed Data Lakehouse architecture is built on open file ...

Please sign up or login with your details

Forgot password? Click here to reset