Log In Sign Up

Cogradient Descent for Dependable Learning

by   Runqi Wang, et al.

Conventional gradient descent methods compute the gradients for multiple variables through the partial derivative. Treating the coupled variables independently while ignoring the interaction, however, leads to an insufficient optimization for bilinear models. In this paper, we propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem, providing a systematic way to coordinate the gradients of coupling variables based on a kernelized projection function. CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint, as often occurs in modern learning paradigms. CoGD can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs) and improve the model capacity. CoGD is applied in representative bilinear problems, including image reconstruction, image inpainting, network pruning and CNN training. Extensive experiments show that CoGD improves the state-of-the-arts by significant margins. Code is available at


page 13

page 14


Cogradient Descent for Bilinear Optimization

Conventional learning methods simplify the bilinear model by regarding t...

Multi-Objective Matrix Normalization for Fine-grained Visual Recognition

Bilinear pooling achieves great success in fine-grained visual recogniti...

Multiregion Bilinear Convolutional Neural Networks for Person Re-Identification

In this work we propose a new architecture for person re-identification....

Recurrent Bilinear Optimization for Binary Neural Networks

Binary Neural Networks (BNNs) show great promise for real-world embedded...

Approximated Bilinear Modules for Temporal Modeling

We consider two less-emphasized temporal properties of video: 1. Tempora...

Epileptic Seizure Classification with Symmetric and Hybrid Bilinear Models

Epilepsy affects nearly 1 be treated by anti-epileptic drugs and a much ...

Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need

Network sparsity receives popularity mostly due to its capability to red...


  • A. Abdelhamed, M. A. Brubaker, and M. S. Brown (2019) Noise flow: noise modeling with conditional normalizing flows. In ICCV, pp. 3165–3173.
  • D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba (2017) Network dissection: quantifying interpretability of deep visual representations. In CVPR, pp. 6541–6549.
  • S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers.

    Foundations and Trends® in Machine learning

    3 (1), pp. 1–122.
  • H. Bristow, A. Eriksson, and S. Lucey (2013) Fast convolutional sparse coding. In CVPR, pp. 391–398.
  • B. Choudhury, R. Swanson, F. Heide, G. Wetzstein, and W. Heidrich (2017) Consensus convolutional sparse coding. In ICCV, pp. 4280–4288.
  • A. Del Bue, J. Xavier, L. Agapito, and M. Paladini (2011) Bilinear modeling via augmented lagrange multipliers (balm). PAMI 34 (8), pp. 1496–1508.
  • E. Denton, W. Zaremba, J. Bruna, Y. Lecun, and R. Fergus (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In NIPS,
  • X. Ding, G. Ding, Y. Guo, and J. Han (2019) Centripetal sgd for pruning very deep convolutional networks with complicated structure. In CVPR, pp. 4943–4953.
  • T. Dozat (2016)

    Incorporating nesterov momentum into adam

    In International Conference on Learning Representations, pp. 1–8.
  • S. Goldt, M. S. Advani, A. M. Saxe, F. Krzakala, and L. Zdeborová (2019) Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup. arXiv preprint arXiv:1906.08632.
  • S. Gu, W. Zuo, Q. Xie, D. Meng, X. Feng, and L. Zhang (2015)

    Convolutional sparse coding for image super-resolution

    In ICCV, pp. 1823–1831.
  • Y. Guo, A. Yao, and Y. Chen (2016) Dynamic network surgery for efficient dnns. In NIPS, pp. 1379–1387.
  • S. Han, H. Mao, and W. J. Dally (2015a)

    Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding

    Fiber 56 (4), pp. 3–7.
  • S. Han, J. Pool, J. Tran, and W. Dally (2015b) Learning both weights and connections for efficient neural network. In NIPS, pp. 1135–1143.
  • B. Hassibi and D. G. Stork (1993) Second order derivatives for network pruning: optimal brain surgeon. In NIPS, pp. 164–171.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, pp. 770–778.
  • Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang (2019) Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR, pp. 4340–4349.
  • Y. He, X. Zhang, and J. Sun (2017) Channel pruning for accelerating very deep neural networks. In ICCV, pp. 1398–1406.
  • F. Heide, W. Heidrich, and G. Wetzstein (2015) Fast and flexible convolutional sparse coding. In CVPR, pp. 5135–5143.
  • G. Hinton, O. Vinyals, and J. Dean (2015) Distilling the knowledge in a neural network. Computer Science 14 (7), pp. 38–39.
  • H. Hu, R. Peng, Y. Tai, and C. Tang (2016) Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250.
  • G. Huang, S. Liu, L. Van der Maaten, and K. Q. Weinberger (2018) Condensenet: an efficient densenet using learned group convolutions. In CVPR, pp. 2752–2761.
  • Z. Huang and N. Wang (2018a) Data-driven sparse structure selection for deep neural networks. In ECCV, pp. 304–320.
  • Z. Huang and N. Wang (2018b) Data-driven sparse structure selection for deep neural networks. In ECCV, pp. 304–320.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. Computer Science.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In NIPS, pp. 1097–1105.
  • A. Krogh and J. A. Hertz (1992) A simple weight decay can improve generalization. In NIPS, pp. 950–957.
  • Y. LeCun, J. S. Denker, and S. A. Solla (1990) Optimal brain damage. In NIPS, pp. 598–605.
  • C. Lemaire, A. Achkar, and P. Jodoin (2019) Structured pruning of neural networks with budget-aware regularization. In CVPR, pp. 9108–9116.
  • H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf (2016) Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710.
  • Y. Li, N. Wang, J. Liu, and X. Hou (2017) Factorized bilinear models for image recognition. In ICCV, pp. 2079–2087.
  • Y. Li, S. Lin, B. Zhang, J. Liu, D. Doermann, Y. Wu, F. Huang, and R. Ji (2019) Exploiting kernel sparsity and entropy for interpretable cnn compression. In CVPR, pp. 2800–2809.
  • Q. Liao, D. Wang, H. Holewa, and M. Xu (2019) Squeezed bilinear pooling for fine-grained visual categorization. In ICCV Workshop,
  • M. Lin, R. Ji, Y. Wang, Y. Zhang, B. Zhang, Y. Tian, and S. Ling (2020) HRank: filter pruning using high-rank feature map. In

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  • S. Lin, R. Ji, Y. Li, Y. Wu, F. Huang, and B. Zhang (2018) Accelerating convolutional networks via global & dynamic filter pruning.. In IJCAI, pp. 2425–2432.
  • S. Lin, R. Ji, C. Yan, B. Zhang, and D. Doermann (2019) Towards optimal structured cnn pruning via generative adversarial learning. In CVPR,
  • T. Lin, A. RoyChowdhury, and S. Maji (2015) Bilinear cnn models for fine-grained visual recognition. In ICCV, pp. 1449–1457.
  • Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang (2017) Learning efficient convolutional networks through network slimming. In ICCV, pp. 2736–2744.
  • J. Luo, J. Wu, and W. Lin (2017) ThiNet: a filter level pruning method for deep neural network compression. In ICCV, pp. 5068–5076.
  • J. Mairal, F. Bach, J. Ponce, and G. Sapiro (2010) Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research 11 (Jan), pp. 19–60.
  • P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz (2016) Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440.
  • N. Parikh, S. Boyd, et al. (2014) Proximal algorithms. Foundations and Trends® in Optimization 1 (3), pp. 127–239.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pp. 8024–8035.
  • K. B. Petersen, M. S. Pedersen, et al. (2008) The matrix cookbook. Technical University of Denmark 7 (15), pp. 510.
  • M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi (2016) XNOR-net: imagenet classification using binary convolutional neural networks. In ECCV,
  • A. Romero, N. Ballas, S. E. Kahou, A. Chassang, and Y. Bengio (2015) FitNets: hints for thin deep nets. Computer Science.
  • M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In CVPR, pp. 4510–4520.
  • C. Sch, I. Laptev, and B. Caputo (2004) Recognizing human actions: a local svm approach. In ICPR,
  • Y. Suh, J. Wang, S. Tang, T. Mei, and K. Mu Lee (2018) Part-aligned bilinear representations for person re-identification. In ECCV, pp. 402–419.
  • B. Wohlberg (2014) Efficient convolutional sparse coding. In ICASSP, pp. 7173–7177.
  • J. Wu, L. Cong, Y. Wang, Q. Hu, and C. Jian (2016) Quantized convolutional neural networks for mobile devices. In CVPR,
  • L. Yang, C. Li, J. Han, C. Chen, Q. Ye, B. Zhang, X. Cao, and W. Liu (2017a) Image reconstruction via manifold constrained convolutional sparse coding for image sets. JSTSP 11 (7), pp. 1072–1081.
  • T. Yang, Y. Chen, and V. Sze (2017b) Designing energy-efficient convolutional neural networks using energy-aware pruning. In CVPR, pp. 5687–5695.
  • J. Ye, X. Lu, Z. Lin, and J. Z. Wang (2018) Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In ICLR,
  • N. Yokoya, J. Chanussot, and A. Iwasaki (2012) Generalized bilinear model based nonlinear unmixing using semi-nonnegative matrix factorization. In IEEE International Geoscience and Remote Sensing Symposium, pp. 1365–1368.
  • J. Yoon and S. J. Hwang (2017) Combined group and exclusive sparsity for deep neural networks. In ICML, pp. 3958–3966.
  • S. I. Young, A. T. Naman, B. Girod, and D. Taubman (2019) Solving vision problems via filtering. In ICCV,
  • R. Yu, A. Li, C. Chen, J. Lai, V. I. Morariu, X. Han, M. Gao, C. Lin, and L. S. Davis (2018)

    Nisp: pruning networks using neuron importance score propagation

    In CVPR, pp. 9194–9203.
  • Z. Yu, J. Yu, J. Fan, and D. Tao (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In ICCV, pp. 1821–1830.
  • S. Zagoruyko and N. Komodakis (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer.
  • M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus (2010) Deconvolutional networks. In CVPR, pp. 2528–2535.
  • M. D. Zeiler (2012) ADADELTA: an adaptive learning rate method. arXiv preprint.
  • B. Zhang, A. Perina, V. Murino, and A. Del Bue (2015) Sparse representation classification with manifold constraints transfer. In CVPR,
  • e. al. Zhang K (2018) Learning causality and causality-related learning: some recent progress. National science review.
  • L. Zhang, M. Yang, X. Feng, Y. Ma, and D. Zhang (2012) Collaborative representation based classification for face recognition. arXiv preprint arXiv:1204.2358.
  • Y. Zhou, Y. Zhang, Y. Wang, and Q. Tian (2019) Accelerate cnn via recursive bayesian pruning. In ICCV, pp. 3306–3315.
  • L. Zhuo, B. Zhang, L. Yang, H. Chen, Q. Ye, D. S. Doermann, G. Guo, and R. Ji (2020) Cogradient descent for bilinear optimization. CoRR abs/2006.09142.