Sparse Coding and Autoencoders

08/12/2017
by   Akshay Rangamani, et al.
0

In "Dictionary Learning" one tries to recover incoherent matrices A^* ∈R^n × h (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors x^* ∈R^h with a small support of size h^p for some 0 <p < 1 while having access to observations y ∈R^n where y = A^*x^*. In this work we undertake a rigorous analysis of whether gradient descent on the squared loss of an autoencoder can solve the dictionary learning problem. The "Autoencoder" architecture we consider is a R^n →R^n mapping with a single ReLU activation layer of size h. Under very mild distributional assumptions on x^*, we prove that the norm of the expected gradient of the standard squared loss function is asymptotically (in sparse code dimension) negligible for all points in a small neighborhood of A^*. This is supported with experimental evidence using synthetic data. We also conduct experiments to suggest that A^* is a local minimum. Along the way we prove that a layer of ReLU gates can be set up to automatically recover the support of the sparse codes. This property holds independent of the loss function. We believe that it could be of independent interest.

READ FULL TEXT
research
02/17/2014

Performance Limits of Dictionary Learning for Sparse Coding

We consider the problem of dictionary learning under the assumption that...
research
05/31/2021

PUDLE: Implicit Acceleration of Dictionary Learning by Backpropagation

The dictionary learning problem, representing data as a combination of f...
research
03/12/2018

Representation Learning and Recovery in the ReLU Model

Rectified linear units, or ReLUs, have become the preferred activation f...
research
02/13/2021

On the convergence of group-sparse autoencoders

Recent approaches in the theoretical analysis of model-based deep learni...
research
10/25/2018

Subgradient Descent Learns Orthogonal Dictionaries

This paper concerns dictionary learning, i.e., sparse coding, a fundamen...
research
05/07/2018

Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks

We analyze Gradient Descent applied to learning a bounded target functio...
research
05/31/2020

Tree-Projected Gradient Descent for Estimating Gradient-Sparse Parameters on Graphs

We study estimation of a gradient-sparse parameter vector θ^* ∈ℝ^p, havi...

Please sign up or login with your details

Forgot password? Click here to reset