SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate via Compiler Co-design

06/27/2023
by   Fu-Ming Guo, et al.
0

This paper introduces SparseOptimizer, a novel deep learning optimizer that exploits Moreau-Yosida regularization to naturally induce sparsity in large language models such as BERT, ALBERT and GPT. Key to the design of SparseOptimizer is an embedded shrinkage operator, which imparts sparsity directly within the optimization process. This operator, backed by a sound theoretical framework, includes an analytical solution, thereby reinforcing the optimizer's robustness and efficacy. Crucially, SparseOptimizer's plug-and-play functionality eradicates the need for code modifications, making it a universally adaptable tool for a wide array of large language models. Empirical evaluations on benchmark datasets such as GLUE, RACE, SQuAD1, and SQuAD2 confirm that SparseBERT and SparseALBERT, when sparsified using SparseOptimizer, achieve performance comparable to their dense counterparts, BERT and ALBERT, while significantly reducing their parameter count. Further, this work proposes an innovative optimizer-compiler co-design strategy, demonstrating the potential of inference acceleration (3.37x, 6.30x, and 7.15x in comparison with Pytorch, TensorFlow, and LLVM generic compile, respectively) in SparseBERT when paired with an appropriately designed compiler. This study represents a significant step forward in the evolution of efficient, scalable, and high-performing large language models, setting a precedent for future exploration and optimization in this domain. The SparseOptimizer code and SparseALBERT model will be publicly available upon paper acceptance.

READ FULL TEXT
research
09/11/2023

Large Language Models for Compiler Optimization

We explore the novel application of Large Language Models to code optimi...
research
08/23/2023

Integrating Large Language Models into the Debugging C Compiler for generating contextual error explanations

This paper introduces a method for Large Language Models (LLM) to produc...
research
07/05/2023

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Adaptive gradient methods, such as Adam and LAMB, have demonstrated exce...
research
07/20/2023

Addressing Compiler Errors: Stack Overflow or Large Language Models?

Compiler error messages serve as an initial resource for programmers dea...
research
04/13/2021

Large-Scale Contextualised Language Modelling for Norwegian

We present the ongoing NorLM initiative to support the creation and use ...
research
03/12/2022

Optimizer Amalgamation

Selecting an appropriate optimizer for a given problem is of major inter...
research
11/25/2022

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices

Pipeline parallelism enables efficient training of Large Language Models...

Please sign up or login with your details

Forgot password? Click here to reset