The Adaptive s-step Conjugate Gradient Method
On modern large-scale parallel computers, the performance of Krylov subspace iterative methods is limited by global synchronization. This has inspired the development of s-step Krylov subspace method variants, in which iterations are computed in blocks of s, which can reduce the number of global synchronizations per iteration by a factor of O(s). Although the s-step variants are mathematically equivalent to their classical counterparts, they can behave quite differently in finite precision depending on the parameter s. If s is chosen too large, the s-step method can suffer a convergence delay and a decrease in attainable accuracy relative to the classical method. This makes it difficult for a potential user of such methods - the s value that minimizes the time per iteration may not be the best s for minimizing the overall time-to-solution, and further may cause an unacceptable decrease in accuracy. Towards improving the reliability and usability of s-step Krylov subspace methods, in this work we derive the adaptive s-step CG method, a variable s-step CG method where in block k, the parameter s_k is determined automatically such that a user-specified accuracy is attainable. The method for determining s_k is based on a bound on growth of the residual gap within block k, from which we derive a constraint on the condition numbers of the computed O(s_k)-dimensional Krylov subspace bases. The computations required for determining the block size s_k can be performed without increasing the number of global synchronizations per block. Our numerical experiments demonstrate that the adaptive s-step CG method is able to attain up to the same accuracy as classical CG while still significantly reducing the total number of global synchronizations.
READ FULL TEXT