With the significant computational growth of DNN models Hernandez and Brown (2020), AI researchers all around the globe have started to promote and adopt the concept of ‘Green AI’ Schwartz et al. (2020). Many recent works Strubell et al. (2019); Lacoste et al. (2019); Patterson et al. (2021); Dai et al. (2021)
address the environmental challenges such as energy usage and carbon emission level of DNN models and develop more efficient deep learning solutions. In response to this problem, we introduce a robust and easy-to-use low-rank matrix factorization toolkit which can reduce not only the computational cost but also the size of the model with minimal performance loss.
Low-rank matrix factorization is done by decomposing a large matrix into two or more smaller matrices, reducing computation and memory costs. Post-training factorization
methods with singular-value decomposition (SVD)Golub and Reinsch (1970) and non-negative matrix factorization (NMF) Lee and Seung (2001) have been applied to approximate the weight matrix of a trained model Winata et al. (2019); Ben Noach and Goldberg (2020). On the other line of work, factorization-by-design applies matrix factorization is directly to the model structure prior to the training. This method produces impressive results with the compressed model is not only smaller and faster but also able to outperform the uncompressed model Winata et al. (2020); Cahyawijaya (2021); Kuchaiev and Ginsburg (2017).
Despite the fact that many works have been published on low-rank matrix factorization, all the solutions are model-dependent, making applicability to different model architecture difficult and cumbersome. To improve the generalization and applicability of the low-rank matrix factorization method, we introduce Greenformer, an eloquent low-rank matrix factorization toolkit that supports multiple use cases of matrix factorization and is currently implemented for the PyTorch frameworkPaszke et al. (2019). As shown in Figure 1, with Greenformer, we can easily factorize any deep neural networks to perform both factorization-by-design and post-training factorization. We further demonstrate the effectiveness of our Greenformer toolkit for three different use cases: 1) factorization-by-design, 2) post-training factorization, and 3) few-shot via in-context learning factorization.
Design and Consideration
Greenformer performs decomposition to the weight matrices of linear and convolution layers. Namely, a weight matrix is decomposed into two low-rank matrices and , where .
Greenformer decomposes a matrix by utilizing a factorization solver. There are three different factorization solvers implemented in Greenformer: Random, SVD Golub and Reinsch (1970), and Semi-Nonnegative Matrix Factorization (SNMF) Lee and Seung (2001). Random solver replaces the original matrix with two random matrices by referring the original size and the specified target rank. Note that random solver is not suitable for post-training factorization, since it may break what the model learnt in the main training as it does not approximate the original matrix. SVD solver computes where
is a diagonal and has singular values. SNMF is an extension of NMF which alleviates the non-negative constraint on. SNMF solver performs decomposition of , where is strictly non-negative yet has no restriction on signs.
As the three solvers mentioned above cannot handle tensors, Greenformer rearranges weight tensors to matrices for decomposition of convolutional layers. For instance, a 1D convolution layer consists of a weight, where and denote the number of input channel and output channel, and denotes the size of the convolution kernel. Greenformer rearranges the weight into a 2-dimensional matrix . The matrix is then decomposed and converted back into the original dimension producing tensors and . The same trick is also applied for 2D and 3D convolution layers.
The decomposed matrices and/or tensors are then wrapped into a compatible low-rank module which is then used to replace the original linear an/or convolution layers of the model. Specifically, we replace a linear layer into a Linear Encoder-Decoder (LED) layer and replace a convolution layer into a Convolution Encoder-Decoder (CED) layer. The depiction of LED and/or CED layers work is shown in Figure 3. Both LED, and CED have the same input and output with the linear and convolution layers; hence, they can maintain compatibility with the model.
To maximize the outcome of automatic factorization, Greenformer only performs factorization when the low-rank is less than the maximum rank to ensure reduction of the theoretical computational cost. For a given weight matrix of size the maximum rank is defined as:
To improve its flexibility, Greenformer supports factorization with a dynamic rank across all layers by computing the rank based on a ratio to the maximum rank of the corresponding layer. Additionally, we also observe that applying factorization to all layers of large pretrained models leads to significant performance loss. To address this problem, Greenformer is equipped with a filtering feature that enables factorization only on a specific set of modules.
We test our toolkit on three use cases: 1) Factorization-by-design, where we train models prior to the training; 2) post-training factorization, where we factorize models prior to evaluation phase; and in-context learning factorization, where we apply factorization to large pretrained language models and perform in-context learning followingBrown et al. (2020). We test our toolkit on 3 text classification tasks and 2 image classification tasks. We show the effectiveness of our Greenformer toolkit in all use cases in Figure 2.
We present Greenformer, an automatic factorization toolkit that provides significant efficiency improvement while maintaining the model performance. In addition, Greenformer is flexible, easy-to-use, and applicable for multiple scenarios. For future work, it is interesting to extend Greenformer for more energy-intensive use cases, such as on large models pretraining and network architecture search.
Compressing pre-trained language models by matrix decomposition.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, pp. 884–889. External Links: Cited by: Introduction.
- Language models are few-shot learners. External Links: Cited by: Design and Consideration.
- Greenformers: improving computation and memory efficiency in transformer models via low-rank approximation. External Links: Cited by: Introduction.
- Multimodal end-to-end sparse model for emotion recognition. In NAACL, Cited by: Introduction.
- Singular value decomposition and least squares solutions. Numer. Math. 14 (5), pp. 403–420. External Links: Cited by: Introduction, Design and Consideration.
- Measuring the algorithmic efficiency of neural networks. External Links: Cited by: Introduction.
- Factorization tricks for lstm networks. ICLR Workshop. Cited by: Introduction.
Quantifying the carbon emissions of machine learning. Workshop on Tackling Climate Change with Machine Learning at NeurIPS 2019. Cited by: Introduction.
- Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp (Eds.), Vol. 13, pp. . External Links: Cited by: Introduction, Design and Consideration.
- PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. External Links: Cited by: Introduction.
- Carbon emissions and large neural network training. External Links: Cited by: Introduction.
- Green ai. Commun. ACM 63 (12), pp. 54–63. External Links: Cited by: Introduction.
- Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 3645–3650. External Links: Cited by: Introduction.
- Lightweight and efficient end-to-end speech recognition using low-rank transformer. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020, pp. 6144–6148. External Links: Cited by: Introduction.
- On the effectiveness of low-rank matrix factorization for lstm model compression. In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, pp. 253–262. Cited by: Introduction.