Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?

05/06/2021
by   Yue Song, et al.
0

Global covariance pooling (GCP) aims at exploiting the second-order statistics of the convolutional feature. Its effectiveness has been demonstrated in boosting the classification performance of Convolutional Neural Networks (CNNs). Singular Value Decomposition (SVD) is used in GCP to compute the matrix square root. However, the approximate matrix square root calculated using Newton-Schulz iteration <cit.> outperforms the accurate one computed via SVD <cit.>. We empirically analyze the reason behind the performance gap from the perspectives of data precision and gradient smoothness. Various remedies for computing smooth SVD gradients are investigated. Based on our observation and analyses, a hybrid training protocol is proposed for SVD-based GCP meta-layers such that competitive performances can be achieved against Newton-Schulz iteration. Moreover, we propose a new GCP meta-layer that uses SVD in the forward pass, and Padé Approximants in the backward propagation to compute the gradients. The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2022

Fast Differentiable Matrix Square Root and Inverse Square Root

Computing the matrix square root and its inverse in a differentiable man...
research
12/04/2017

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

Global covariance pooling in Convolutional neural neworks has achieved i...
research
07/21/2017

Improved Bilinear Pooling with CNNs

Bilinear pooling of Convolutional Neural Network (CNN) features [22, 23]...
research
01/21/2022

Fast Differentiable Matrix Square Root

Computing the matrix square root or its inverse in a differentiable mann...
research
07/05/2022

Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality

Inserting an SVD meta-layer into neural networks is prone to make the co...
research
04/08/2021

Robust Differentiable SVD

Eigendecomposition of symmetric matrices is at the heart of many compute...
research
12/11/2022

Orthogonal SVD Covariance Conditioning and Latent Disentanglement

Inserting an SVD meta-layer into neural networks is prone to make the co...

Please sign up or login with your details

Forgot password? Click here to reset