AIPerf: Automated machine learning as an AI-HPC benchmark

08/17/2020
by   Zhixiang Ren, et al.
0

The plethora of complex artificial intelligence (AI) algorithms and available high performance computing (HPC) power stimulates the convergence of AI and HPC. The expeditious development of AI components, in both hardware and software domain, increases the system heterogeneity, which prompts the challenge on fair and comprehensive benchmarking. Existing HPC and AI benchmarks fail to cover the variety of heterogeneous systems while providing a simple quantitative measurement to reflect the overall performance of large clusters for AI tasks. To address the challenges, we specify the requirements of an AI-HPC considering the future scenarios and propose an end-to-end benchmark suite utilizing automated machine learning (AutoML) as a representative AI application. The extremely high computational cost and high scalability make AutoML a desired workload candidate for AI-HPC benchmark. We implement the algorithms in a highly efficient and parallel way to ensure automatic adaption on various systems regarding AI accelerator's memory and quantity. The benchmark is particularly customizable on back-end training framework and hyperparameters so as to achieve optimal performance on diverse systems. The major metric to quantify the machine performance is floating-point operations per second (FLOPS), which is measured in a systematic and analytical approach. We also provide a regulated score as a complementary result to reflect hardware and software co-performance. We verify the benchmark's linear scalability on different scales of nodes up to 16 equipped with 128 GPUs and evaluate the stability as well as reproducibility at discrete timestamps. The source code, specifications, and detailed procedures are publicly accessible on GitHub: https://github.com/AI-HPC-Research-Team/AIPerf.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2022

SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems

Novel artificial intelligence (AI) technology has expedited various scie...
research
10/21/2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

Scientific communities are increasingly adopting machine learning and de...
research
12/13/2022

Towards Seamless Management of AI Models in High-Performance Computing

With the increasing prevalence of artificial intelligence (AI) in divers...
research
06/01/2017

On the Scalability of Data Reduction Techniques in Current and Upcoming HPC Systems from an Application Perspective

We implement and benchmark parallel I/O methods for the fully-manycore d...
research
04/07/2022

Predicting Performance of Heterogeneous AI Systems with Discrete-Event Simulations

In recent years, artificial intelligence (AI) technologies have found in...
research
10/02/2019

MLPerf Training Benchmark

Machine learning is experiencing an explosion of software and hardware s...
research
08/04/2021

The MIT Supercloud Dataset

Artificial intelligence (AI) and Machine learning (ML) workloads are an ...

Please sign up or login with your details

Forgot password? Click here to reset