A New Framework for Expressing, Parallelizing and Optimizing Big Data Applications

03/02/2022
by   A. Hommelberg, et al.
0

The Forelem framework was first introduced as a means to optimize database queries using optimization techniques developed for compilers. Since its introduction, Forelem has proven to be more versatile and to be applicable beyond database applications. In this paper we show that the original Forelem framework can be used to express and optimize Big Data applications, more specifically: k-Means clustering and PageRank, resulting in automatically generated implementations of these applications. These implementations are more efficient than state-of-the-art, hand-written MPI C/C++ implementations of k-Means and PageRank, as well as significantly outperform state-of-the-art Hadoop implementations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2019

Big-Data Clustering: K-Means or K-Indicators?

The K-means algorithm is arguably the most popular data clustering metho...
research
10/17/2016

High-performance K-means Implementation based on a Simplified Map-Reduce Architecture

The k-means algorithm is one of the most common clustering algorithms an...
research
02/03/2021

Optimization meets Big Data: A survey

This paper reviews recent advances in big data optimization, providing t...
research
04/14/2022

Big-means: Less is More for K-means Clustering

K-means clustering plays a vital role in data mining. However, its perfo...
research
04/30/2019

FastContext: an efficient and scalable implementation of the ConText algorithm

Objective: To develop and evaluate FastContext, an efficient, scalable i...
research
01/22/2018

Smoke: Fine-grained Lineage at Interactive Speed

Data lineage describes the relationship between individual input and out...
research
01/18/2016

Pulse processing routines for neutron time-of-flight data

A pulse shape analysis framework is described, which was developed for n...

Please sign up or login with your details

Forgot password? Click here to reset