Big Data Dwarfs: Towards Fully Understanding Big Data Analytics Workloads

02/01/2018
by   Wanling Gao, et al.
0

Though the big data benchmark suites like BigDataBench and CloudSuite have been used in architecture and system researches, we have not yet answered the fundamental issue-- what are abstractions of frequently-appearing units of computation in big data analytics, which we call big data dwarfs. For the first time, we identify eight big data dwarfs, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations among a wide variety of big data analytics workloads. We implement the eight dwarfs on different software stacks as the dwarf components. We present the application of the big data dwarfs to construct big data proxy benchmarks using the directed acyclic graph (DAG)-like combinations of the dwarf components with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks shorten the execution time by 100s times on the real systems while they are qualified for both earlier architecture design and later system evaluation across different architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2017

A Dwarf-based Scalable Big Data Benchmarking Methodology

Different from the traditional benchmarking methodology that creates a n...
research
10/18/2018

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads

For the architecture community, reasonable simulation time is a strong r...
research
11/06/2018

Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

The design and construction of high performance computing (HPC) systems ...
research
12/14/2022

Towards Interactive, Adaptive and Result-aware Big Data Analytics

As data volumes grow across applications, analytics of large amounts of ...
research
11/16/2020

ACCORDANT: A Domain Specific Model and DevOpsApproach for Big Data Analytics Architectures

Big data analytics (BDA) applications use machine learning algorithms to...
research
09/04/2023

Towards Persistent Memory based Stateful Serverless Computing for Big Data Applications

The Function-as-a-service (FaaS) computing model has recently seen signi...
research
02/26/2018

Digital Archives as Big Data

Digital archives contribute to Big data. Combining social network analys...

Please sign up or login with your details

Forgot password? Click here to reset