BigDataBench: A Dwarf-based Big Data and AI Benchmark Suite

02/23/2018
by   Wanling Gao, et al.
0

As architecture, system, data management, and machine learning communities pay greater attention to innovative big data and data-driven artificial intelligence (in short, AI) algorithms, architecture, and systems, the pressure of benchmarking rises. However, complexity, diversity, frequently changed workloads, and rapid evolution of big data, especially AI systems raise great challenges in benchmarking. First, for the sake of conciseness, benchmarking scalability, portability cost, reproducibility, and better interpretation of performance data, we need understand what are the abstractions of frequently-appearing units of computation, which we call dwarfs, among big data and AI workloads. Second, for the sake of fairness, the benchmarks must include diversity of data and workloads. Third, for co-design of software and hardware, the benchmarks should be consistent across different communities. Other than creating a new benchmark or proxy for every possible workload, we propose using dwarf-based benchmarks--the combination of eight dwarfs--to represent diversity of big data and AI workloads. The current version--BigDataBench 4.0 provides 13 representative real-world data sets and 47 big data and AI benchmarks, including seven workload types: online service, offline analytics, graph analytics, AI, data warehouse, NoSQL, and streaming. BigDataBench 4.0 is publicly available from http://prof.ict.ac.cn/BigDataBench. Also, for the first time, we comprehensively characterize the benchmarks of seven workload types in BigDataBench 4.0 in addition to traditional benchmarks like SPECCPU, PARSEC and HPCC in a hierarchical manner and drill down on five levels, using the Top-Down analysis from an architecture perspective.

READ FULL TEXT

page 6

page 13

page 15

page 16

page 17

page 18

research
02/23/2018

BigDataBench: A Scalable and Unified Big Data and AI Benchmark Suite

Several fundamental changes in technology indicate domain-specific hardw...
research
05/26/2020

Benchmarking Graph Data Management and Processing Systems: A Survey

The development of scalable, representative, and widely adopted benchmar...
research
08/26/2018

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads

The complexity and diversity of big data and AI workloads make understan...
research
06/12/2023

Benchmarking Neural Network Training Algorithms

Training algorithms, broadly construed, are an essential part of every d...
research
06/05/2019

Evaluating Geospatial RDF stores Using the Benchmark Geographica 2

Since 2007, geospatial extensions of SPARQL, like GeoSPARQL and stSPARQL...
research
11/27/2018

AI Matrix - Synthetic Benchmarks for DNN

Deep neural network (DNN) architectures, such as convolutional neural ne...
research
08/15/2017

GARDENIA: A Domain-specific Benchmark Suite for Next-generation Accelerators

This paper presents the Graph Analytics Repository for Designing Next-ge...

Please sign up or login with your details

Forgot password? Click here to reset