How and When Random Feedback Works: A Case Study of Low-Rank Matrix Factorization

11/17/2021
by   Shivam Garg, et al.
0

The success of gradient descent in ML and especially for learning neural networks is remarkable and robust. In the context of how the brain learns, one aspect of gradient descent that appears biologically difficult to realize (if not implausible) is that its updates rely on feedback from later layers to earlier layers through the same connections. Such bidirected links are relatively few in brain networks, and even when reciprocal connections exist, they may not be equi-weighted. Random Feedback Alignment (Lillicrap et al., 2016), where the backward weights are random and fixed, has been proposed as a bio-plausible alternative and found to be effective empirically. We investigate how and when feedback alignment (FA) works, focusing on one of the most basic problems with layered structure – low-rank matrix factorization. In this problem, given a matrix Y_n× m, the goal is to find a low rank factorization Z_n × rW_r × m that minimizes the error ZW-Y_F. Gradient descent solves this problem optimally. We show that FA converges to the optimal solution when r≥(Y). We also shed light on how FA works. It is observed empirically that the forward weight matrices and (random) feedback matrices come closer during FA updates. Our analysis rigorously derives this phenomenon and shows how it facilitates convergence of FA. We also show that FA can be far from optimal when r < (Y). This is the first provable separation result between gradient descent and FA. Moreover, the representations found by gradient descent and FA can be almost orthogonal even when their error ZW-Y_F is approximately equal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/27/2020

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

We provide an explicit analysis of the dynamics of vanilla gradient desc...
research
11/07/2018

Global Optimality in Distributed Low-rank Matrix Factorization

We study the convergence of a variant of distributed gradient descent (D...
research
05/11/2023

Convergence of Alternating Gradient Descent for Matrix Factorization

We consider alternating gradient descent (AGD) with fixed step size η > ...
research
12/26/2017

Algorithmic Regularization in Over-parameterized Matrix Recovery

We study the problem of recovering a low-rank matrix X^ from linear meas...
research
06/29/2021

Meta-learning for Matrix Factorization without Shared Rows or Columns

We propose a method that meta-learns a knowledge on matrix factorization...
research
03/06/2023

Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

We consider a deep matrix factorization model of covariance matrices tra...
research
06/02/2016

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

In this paper, we propose LexVec, a new method for generating distribute...

Please sign up or login with your details

Forgot password? Click here to reset