Introduction
With the significant computational growth of DNN models Hernandez and Brown (2020), AI researchers all around the globe have started to promote and adopt the concept of ‘Green AI’ Schwartz et al. (2020). Many recent works Strubell et al. (2019); Lacoste et al. (2019); Patterson et al. (2021); Dai et al. (2021)
address the environmental challenges such as energy usage and carbon emission level of DNN models and develop more efficient deep learning solutions. In response to this problem, we introduce a robust and easytouse lowrank matrix factorization toolkit which can reduce not only the computational cost but also the size of the model with minimal performance loss.
Lowrank matrix factorization is done by decomposing a large matrix into two or more smaller matrices, reducing computation and memory costs. Posttraining factorization
methods with singularvalue decomposition (SVD)
Golub and Reinsch (1970) and nonnegative matrix factorization (NMF) Lee and Seung (2001) have been applied to approximate the weight matrix of a trained model Winata et al. (2019); Ben Noach and Goldberg (2020). On the other line of work, factorizationbydesign applies matrix factorization is directly to the model structure prior to the training. This method produces impressive results with the compressed model is not only smaller and faster but also able to outperform the uncompressed model Winata et al. (2020); Cahyawijaya (2021); Kuchaiev and Ginsburg (2017).Despite the fact that many works have been published on lowrank matrix factorization, all the solutions are modeldependent, making applicability to different model architecture difficult and cumbersome. To improve the generalization and applicability of the lowrank matrix factorization method, we introduce Greenformer, an eloquent lowrank matrix factorization toolkit that supports multiple use cases of matrix factorization and is currently implemented for the PyTorch framework
Paszke et al. (2019). As shown in Figure 1, with Greenformer, we can easily factorize any deep neural networks to perform both factorizationbydesign and posttraining factorization. We further demonstrate the effectiveness of our Greenformer toolkit for three different use cases: 1) factorizationbydesign, 2) posttraining factorization, and 3) fewshot via incontext learning factorization.Design and Consideration
Greenformer performs decomposition to the weight matrices of linear and convolution layers. Namely, a weight matrix is decomposed into two lowrank matrices and , where .
Greenformer decomposes a matrix by utilizing a factorization solver. There are three different factorization solvers implemented in Greenformer: Random, SVD Golub and Reinsch (1970), and SemiNonnegative Matrix Factorization (SNMF) Lee and Seung (2001). Random solver replaces the original matrix with two random matrices by referring the original size and the specified target rank. Note that random solver is not suitable for posttraining factorization, since it may break what the model learnt in the main training as it does not approximate the original matrix. SVD solver computes where
is a diagonal and has singular values. SNMF is an extension of NMF which alleviates the nonnegative constraint on
. SNMF solver performs decomposition of , where is strictly nonnegative yet has no restriction on signs.As the three solvers mentioned above cannot handle tensors, Greenformer rearranges weight tensors to matrices for decomposition of convolutional layers. For instance, a 1D convolution layer consists of a weight
, where and denote the number of input channel and output channel, and denotes the size of the convolution kernel. Greenformer rearranges the weight into a 2dimensional matrix . The matrix is then decomposed and converted back into the original dimension producing tensors and . The same trick is also applied for 2D and 3D convolution layers.The decomposed matrices and/or tensors are then wrapped into a compatible lowrank module which is then used to replace the original linear an/or convolution layers of the model. Specifically, we replace a linear layer into a Linear EncoderDecoder (LED) layer and replace a convolution layer into a Convolution EncoderDecoder (CED) layer. The depiction of LED and/or CED layers work is shown in Figure 3. Both LED, and CED have the same input and output with the linear and convolution layers; hence, they can maintain compatibility with the model.
To maximize the outcome of automatic factorization, Greenformer only performs factorization when the lowrank is less than the maximum rank to ensure reduction of the theoretical computational cost. For a given weight matrix of size the maximum rank is defined as:
(1) 
To improve its flexibility, Greenformer supports factorization with a dynamic rank across all layers by computing the rank based on a ratio to the maximum rank of the corresponding layer. Additionally, we also observe that applying factorization to all layers of large pretrained models leads to significant performance loss. To address this problem, Greenformer is equipped with a filtering feature that enables factorization only on a specific set of modules.
We test our toolkit on three use cases: 1) Factorizationbydesign, where we train models prior to the training; 2) posttraining factorization, where we factorize models prior to evaluation phase; and incontext learning factorization, where we apply factorization to large pretrained language models and perform incontext learning following
Brown et al. (2020). We test our toolkit on 3 text classification tasks and 2 image classification tasks. We show the effectiveness of our Greenformer toolkit in all use cases in Figure 2.Conclusion
We present Greenformer, an automatic factorization toolkit that provides significant efficiency improvement while maintaining the model performance. In addition, Greenformer is flexible, easytouse, and applicable for multiple scenarios. For future work, it is interesting to extend Greenformer for more energyintensive use cases, such as on large models pretraining and network architecture search.
References

Compressing pretrained language models by matrix decomposition.
In
Proceedings of the 1st Conference of the AsiaPacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
, Suzhou, China, pp. 884–889. External Links: Link Cited by: Introduction.  Language models are fewshot learners. External Links: 2005.14165 Cited by: Design and Consideration.
 Greenformers: improving computation and memory efficiency in transformer models via lowrank approximation. External Links: 2108.10808 Cited by: Introduction.
 Multimodal endtoend sparse model for emotion recognition. In NAACL, Cited by: Introduction.
 Singular value decomposition and least squares solutions. Numer. Math. 14 (5), pp. 403–420. External Links: ISSN 0029599X, Link, Document Cited by: Introduction, Design and Consideration.
 Measuring the algorithmic efficiency of neural networks. External Links: 2005.04305 Cited by: Introduction.
 Factorization tricks for lstm networks. ICLR Workshop. Cited by: Introduction.

Quantifying the carbon emissions of machine learning
. Workshop on Tackling Climate Change with Machine Learning at NeurIPS 2019. Cited by: Introduction.  Algorithms for nonnegative matrix factorization. In Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp (Eds.), Vol. 13, pp. . External Links: Link Cited by: Introduction, Design and Consideration.
 PyTorch: an imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. External Links: Link Cited by: Introduction.
 Carbon emissions and large neural network training. External Links: 2104.10350 Cited by: Introduction.
 Green ai. Commun. ACM 63 (12), pp. 54–63. External Links: ISSN 00010782, Link, Document Cited by: Introduction.
 Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 3645–3650. External Links: Link, Document Cited by: Introduction.
 Lightweight and efficient endtoend speech recognition using lowrank transformer. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 48, 2020, pp. 6144–6148. External Links: Link, Document Cited by: Introduction.
 On the effectiveness of lowrank matrix factorization for lstm model compression. In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, pp. 253–262. Cited by: Introduction.