A Dwarf-based Scalable Big Data Benchmarking Methodology

11/09/2017
by   Wanling Gao, et al.
0

Different from the traditional benchmarking methodology that creates a new benchmark or proxy for every possible workload, this paper presents a scalable big data benchmarking methodology. Among a wide variety of big data analytics workloads, we identify eight big data dwarfs, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations. We implement the eight dwarfs on different software stacks, e.g., OpenMP, MPI, Hadoop as the dwarf components. For the purpose of architecture simulation, we construct and tune big data proxy benchmarks using the directed acyclic graph (DAG)-like combinations of the dwarf components with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks preserve the micro-architecture, memory, and I/O characteristics, and they shorten the simulation time by 100s times while maintain the average micro-architectural data accuracy above 90 percentage on both X86 64 and ARMv8 processors. We will open-source the big data dwarf components and proxy benchmarks soon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2018

Big Data Dwarfs: Towards Fully Understanding Big Data Analytics Workloads

Though the big data benchmark suites like BigDataBench and CloudSuite ha...
research
10/18/2018

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads

For the architecture community, reasonable simulation time is a strong r...
research
05/26/2020

Benchmarking Graph Data Management and Processing Systems: A Survey

The development of scalable, representative, and widely adopted benchmar...
research
01/24/2019

Accuracy vs. Computational Cost Tradeoff in Distributed Computer System Simulation

Simulation is a fundamental research tool in the computer architecture f...
research
08/26/2018

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads

The complexity and diversity of big data and AI workloads make understan...
research
02/01/2018

Data Dwarfs: A Lens Towards Fully Understanding Big Data and AI Workloads

The complexity and diversity of big data and AI workloads make understan...
research
06/15/2019

Proxy expenditure weights for Consumer Price Index: Audit sampling inference for big data statistics

Purchase data from retail chains provide proxy measures of private house...

Please sign up or login with your details

Forgot password? Click here to reset