Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

08/12/2022
by   Jie You, et al.
0

Training deep neural networks (DNNs) is becoming increasingly more resource- and energy-intensive every year. Unfortunately, existing works primarily focus on optimizing DNN training for faster completion, often without considering the impact on energy efficiency. In this paper, we observe that common practices to improve training performance can often lead to inefficient energy usage. More importantly, we demonstrate that there is a tradeoff between energy consumption and performance optimization. To this end, we propose Zeus, an optimization framework to navigate this tradeoff by automatically finding optimal job- and GPU-level configurations for recurring DNN training jobs. Zeus uses an online exploration-exploitation approach in conjunction with just-in-time energy profiling, averting the need for expensive offline measurements, while adapting to data drifts over time. Our evaluation shows that Zeus can improve the energy efficiency of DNN training by 15.3

READ FULL TEXT

page 10

page 19

research
04/13/2023

Energy-Efficient GPU Clusters Scheduling for Deep Learning

Training deep neural networks (DNNs) is a major workload in datacenters ...
research
06/12/2018

End-to-End Learning of Energy-Constrained Deep Neural Networks

Deep Neural Networks (DNN) are increasingly deployed in highly energy-co...
research
03/04/2023

Chasing Low-Carbon Electricity for Practical and Sustainable DNN Training

Deep learning has experienced significant growth in recent years, result...
research
05/27/2019

The Impact of GPU DVFS on the Energy and Performance of Deep Learning: an Empirical Study

Over the past years, great progress has been made in improving the compu...
research
11/05/2018

Workload-aware Automatic Parallelization for Multi-GPU DNN Training

Deep neural networks (DNNs) have emerged as successful solutions for var...
research
04/13/2020

Enabling Incremental Knowledge Transfer for Object Detection at the Edge

Object detection using deep neural networks (DNNs) involves a huge amoun...
research
12/10/2017

Optimal Energy Tradeoff among Communication, Computation and Caching with QoI-Guarantee

Many applications must ingest and analyze data that are continuously gen...

Please sign up or login with your details

Forgot password? Click here to reset