DeepAI AI Chat
Log In Sign Up

Big Data Dwarfs: Towards Fully Understanding Big Data Analytics Workloads

by   Wanling Gao, et al.

Though the big data benchmark suites like BigDataBench and CloudSuite have been used in architecture and system researches, we have not yet answered the fundamental issue-- what are abstractions of frequently-appearing units of computation in big data analytics, which we call big data dwarfs. For the first time, we identify eight big data dwarfs, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations among a wide variety of big data analytics workloads. We implement the eight dwarfs on different software stacks as the dwarf components. We present the application of the big data dwarfs to construct big data proxy benchmarks using the directed acyclic graph (DAG)-like combinations of the dwarf components with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks shorten the execution time by 100s times on the real systems while they are qualified for both earlier architecture design and later system evaluation across different architectures.


page 1

page 2

page 3

page 4


A Dwarf-based Scalable Big Data Benchmarking Methodology

Different from the traditional benchmarking methodology that creates a n...

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads

For the architecture community, reasonable simulation time is a strong r...

Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

The design and construction of high performance computing (HPC) systems ...

Towards Interactive, Adaptive and Result-aware Big Data Analytics

As data volumes grow across applications, analytics of large amounts of ...

ACCORDANT: A Domain Specific Model and DevOpsApproach for Big Data Analytics Architectures

Big data analytics (BDA) applications use machine learning algorithms to...

Digital Archives as Big Data

Digital archives contribute to Big data. Combining social network analys...

Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study

While cluster computing frameworks are continuously evolving to provide ...