Leveraging GPU batching for scalable nonlinear programming through massive Lagrangian decomposition

06/28/2021
by   Youngdae Kim, et al.
0

We present the implementation of a trust-region Newton algorithm ExaTron for bound-constrained nonlinear programming problems, fully running on multiple GPUs. Without data transfers between CPU and GPU, our implementation has achieved the elimination of a major performance bottleneck under a memory-bound situation, particularly when solving many small problems in batch. We discuss the design principles and implementation details for our kernel function and core operations. Different design choices are justified by numerical experiments. By using the application of distributed control of alternating current optimal power flow, where a large problem is decomposed into many smaller nonlinear programs using a Lagrangian approach, we demonstrate computational performance of ExaTron on the Summit supercomputer at Oak RidgeNational Laboratory. Our numerical results show the linear scaling with respect to the batch size and the number of GPUs and more than 35 times speedup on 6 GPUs than on 40 CPUs available on a single node.

READ FULL TEXT
research
07/18/2019

Semi-Lagrangian Vlasov simulation on GPUs

In this paper, our goal is to efficiently solve the Vlasov equation on G...
research
07/31/2023

Accelerating Optimal Power Flow with GPUs: SIMD Abstraction of Nonlinear Programs and Condensed-Space Interior-Point Methods

This paper introduces a novel computational framework for solving altern...
research
09/09/2020

Efficient Parameter Selection for Scaled Trust-Region Newton Algorithm in Solving Bound-constrained Nonlinear Systems

We investigate the problem of parameter selection for the scaled trust-r...
research
02/17/2023

Towards Efficient Alternating Current Optimal Power Flow Analysis on Graphical Processing Units

We present a solution of sparse alternating current optimal power flow (...
research
03/10/2023

A performance portable implementation of the semi-Lagrangian algorithm in six dimensions

In this paper, we describe our approach to develop a simulation software...
research
11/02/2020

Toward Performance-Portable PETSc for GPU-based Exascale Systems

The Portable Extensible Toolkit for Scientific computation (PETSc) libra...
research
06/15/2019

Accelerating Concurrent Heap on GPUs

Priority queue, often implemented as a heap, is an abstract data type th...

Please sign up or login with your details

Forgot password? Click here to reset