Greenformer: Factorization Toolkit for Efficient Deep Neural Networks

While the recent advances in deep neural networks (DNN) bring remarkable success, the computational cost also increases considerably. In this paper, we introduce Greenformer, a toolkit to accelerate the computation of neural networks through matrix factorization while maintaining performance. Greenformer can be easily applied with a single line of code to any DNN model. Our experimental results show that Greenformer is effective for a wide range of scenarios. We provide the showcase of Greenformer at


page 1

page 2

page 3


The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding

We present MT-DNN, an open-source natural language understanding (NLU) t...

Recent Advances in Understanding Adversarial Robustness of Deep Neural Networks

Adversarial examples are inevitable on the road of pervasive application...

Global Optimality in Tensor Factorization, Deep Learning, and Beyond

Techniques involving factorization are found in a wide range of applicat...

Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks

With the rapid development of deep neural networks (DNN), there emerges ...

EvoJAX: Hardware-Accelerated Neuroevolution

Evolutionary computation has been shown to be a highly effective method ...

Feature-Based Matrix Factorization

Recommender system has been more and more popular and widely used in man...

Efficient Inference on Deep Neural Networks by Dynamic Representations and Decision Gates

The current trade-off between depth and computational cost makes it diff...


With the significant computational growth of DNN models  Hernandez and Brown (2020), AI researchers all around the globe have started to promote and adopt the concept of ‘Green AI’  Schwartz et al. (2020). Many recent works  Strubell et al. (2019); Lacoste et al. (2019); Patterson et al. (2021); Dai et al. (2021)

address the environmental challenges such as energy usage and carbon emission level of DNN models and develop more efficient deep learning solutions. In response to this problem, we introduce a robust and easy-to-use low-rank matrix factorization toolkit which can reduce not only the computational cost but also the size of the model with minimal performance loss.

Low-rank matrix factorization is done by decomposing a large matrix into two or more smaller matrices, reducing computation and memory costs. Post-training factorization

methods with singular-value decomposition (SVD)  

Golub and Reinsch (1970) and non-negative matrix factorization (NMF)  Lee and Seung (2001) have been applied to approximate the weight matrix of a trained model Winata et al. (2019); Ben Noach and Goldberg (2020). On the other line of work, factorization-by-design applies matrix factorization is directly to the model structure prior to the training. This method produces impressive results with the compressed model is not only smaller and faster but also able to outperform the uncompressed model  Winata et al. (2020); Cahyawijaya (2021); Kuchaiev and Ginsburg (2017).

Figure 1: Model factorization with Greenformer for an efficient compute time. Greenformer provides efficiency boost with a minimum tweak on the code base.

Despite the fact that many works have been published on low-rank matrix factorization, all the solutions are model-dependent, making applicability to different model architecture difficult and cumbersome. To improve the generalization and applicability of the low-rank matrix factorization method, we introduce Greenformer, an eloquent low-rank matrix factorization toolkit that supports multiple use cases of matrix factorization and is currently implemented for the PyTorch framework 

Paszke et al. (2019). As shown in Figure  1, with Greenformer, we can easily factorize any deep neural networks to perform both factorization-by-design and post-training factorization. We further demonstrate the effectiveness of our Greenformer toolkit for three different use cases: 1) factorization-by-design, 2) post-training factorization, and 3) few-shot via in-context learning factorization.

Figure 2: Performance and efficiency trade-off of utilizing Greenformer on (left) factorization-by-design, (center) post-training factorization, and (right) in-context learning factorization use cases. Purple and green lines denote the relative performance and speed up ratio against the uncompressed model averaged across all tasks.

Design and Consideration

Figure 3: Automatic factorization flow with LED. (a) Linear layer is factorized creating an LED layer. (b) The LED layer is used to replace the linear layer in the model producing (c) which requires more efficient than the original linear layer.

Greenformer performs decomposition to the weight matrices of linear and convolution layers. Namely, a weight matrix is decomposed into two low-rank matrices and , where .

Greenformer decomposes a matrix by utilizing a factorization solver. There are three different factorization solvers implemented in Greenformer: Random, SVD Golub and Reinsch (1970), and Semi-Nonnegative Matrix Factorization (SNMF) Lee and Seung (2001). Random solver replaces the original matrix with two random matrices by referring the original size and the specified target rank. Note that random solver is not suitable for post-training factorization, since it may break what the model learnt in the main training as it does not approximate the original matrix. SVD solver computes where

is a diagonal and has singular values. SNMF is an extension of NMF which alleviates the non-negative constraint on

. SNMF solver performs decomposition of , where is strictly non-negative yet has no restriction on signs.

As the three solvers mentioned above cannot handle tensors, Greenformer rearranges weight tensors to matrices for decomposition of convolutional layers. For instance, a 1D convolution layer consists of a weight

, where and denote the number of input channel and output channel, and denotes the size of the convolution kernel. Greenformer rearranges the weight into a 2-dimensional matrix . The matrix is then decomposed and converted back into the original dimension producing tensors and . The same trick is also applied for 2D and 3D convolution layers.

The decomposed matrices and/or tensors are then wrapped into a compatible low-rank module which is then used to replace the original linear an/or convolution layers of the model. Specifically, we replace a linear layer into a Linear Encoder-Decoder (LED) layer and replace a convolution layer into a Convolution Encoder-Decoder (CED) layer. The depiction of LED and/or CED layers work is shown in Figure  3. Both LED, and CED have the same input and output with the linear and convolution layers; hence, they can maintain compatibility with the model.

To maximize the outcome of automatic factorization, Greenformer only performs factorization when the low-rank is less than the maximum rank to ensure reduction of the theoretical computational cost. For a given weight matrix of size the maximum rank is defined as:


To improve its flexibility, Greenformer supports factorization with a dynamic rank across all layers by computing the rank based on a ratio to the maximum rank of the corresponding layer. Additionally, we also observe that applying factorization to all layers of large pretrained models leads to significant performance loss. To address this problem, Greenformer is equipped with a filtering feature that enables factorization only on a specific set of modules.

We test our toolkit on three use cases: 1) Factorization-by-design, where we train models prior to the training; 2) post-training factorization, where we factorize models prior to evaluation phase; and in-context learning factorization, where we apply factorization to large pretrained language models and perform in-context learning following  

Brown et al. (2020). We test our toolkit on 3 text classification tasks and 2 image classification tasks. We show the effectiveness of our Greenformer toolkit in all use cases in Figure 2.


We present Greenformer, an automatic factorization toolkit that provides significant efficiency improvement while maintaining the model performance. In addition, Greenformer is flexible, easy-to-use, and applicable for multiple scenarios. For future work, it is interesting to extend Greenformer for more energy-intensive use cases, such as on large models pretraining and network architecture search.


  • M. Ben Noach and Y. Goldberg (2020) Compressing pre-trained language models by matrix decomposition. In

    Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

    Suzhou, China, pp. 884–889. External Links: Link Cited by: Introduction.
  • T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei (2020) Language models are few-shot learners. External Links: 2005.14165 Cited by: Design and Consideration.
  • S. Cahyawijaya (2021) Greenformers: improving computation and memory efficiency in transformer models via low-rank approximation. External Links: 2108.10808 Cited by: Introduction.
  • W. Dai, S. Cahyawijaya, Z. Liu, and P. Fung (2021) Multimodal end-to-end sparse model for emotion recognition. In NAACL, Cited by: Introduction.
  • G. H. Golub and C. Reinsch (1970) Singular value decomposition and least squares solutions. Numer. Math. 14 (5), pp. 403–420. External Links: ISSN 0029-599X, Link, Document Cited by: Introduction, Design and Consideration.
  • D. Hernandez and T. B. Brown (2020) Measuring the algorithmic efficiency of neural networks. External Links: 2005.04305 Cited by: Introduction.
  • O. Kuchaiev and B. Ginsburg (2017) Factorization tricks for lstm networks. ICLR Workshop. Cited by: Introduction.
  • A. Lacoste, A. Luccioni, V. Schmidt, and T. Dandres (2019)

    Quantifying the carbon emissions of machine learning

    Workshop on Tackling Climate Change with Machine Learning at NeurIPS 2019. Cited by: Introduction.
  • D. Lee and H. S. Seung (2001) Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, T. Leen, T. Dietterich, and V. Tresp (Eds.), Vol. 13, pp. . External Links: Link Cited by: Introduction, Design and Consideration.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), pp. 8024–8035. External Links: Link Cited by: Introduction.
  • D. Patterson, J. Gonzalez, Q. Le, C. Liang, L. Munguia, D. Rothchild, D. So, M. Texier, and J. Dean (2021) Carbon emissions and large neural network training. External Links: 2104.10350 Cited by: Introduction.
  • R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni (2020) Green ai. Commun. ACM 63 (12), pp. 54–63. External Links: ISSN 0001-0782, Link, Document Cited by: Introduction.
  • E. Strubell, A. Ganesh, and A. McCallum (2019) Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 3645–3650. External Links: Link, Document Cited by: Introduction.
  • G. I. Winata, S. Cahyawijaya, Z. Lin, Z. Liu, and P. Fung (2020) Lightweight and efficient end-to-end speech recognition using low-rank transformer. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020, pp. 6144–6148. External Links: Link, Document Cited by: Introduction.
  • G. I. Winata, A. Madotto, J. Shin, E. J. Barezi, and P. Fung (2019) On the effectiveness of low-rank matrix factorization for lstm model compression. In Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, pp. 253–262. Cited by: Introduction.