Learning sparse transformations through backpropagation

10/22/2018
by   Peter Bloem, et al.
0

Many transformations in deep learning architectures are sparsely connected. When such transformations cannot be designed by hand, they can be learned, even through plain backpropagation, for instance in attention mechanisms. However, during learning, such sparse structures are often represented in a dense form, as we do not know beforehand which elements will eventually become non-zero. We introduce the adaptive, sparse hyperlayer, a method for learning a sparse transformation, paramatrized sparsely: as index-tuples with associated values. To overcome the lack of gradients from such a discrete structure, we introduce a method of randomly sampling connections, and backpropagating over the randomly wired computation graph. To show that this approach allows us to train a model to competitive performance on real data, we use it to build two architectures. First, an attention mechanism for visual classification. Second, we implement a method for differentiable sorting: specifically, learning to sort unlabeled MNIST digits, given only the correct order.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2021

Tangent Space Backpropagation for 3D Transformation Groups

We address the problem of performing backpropagation for computation gra...
research
11/13/2019

Learning Non-Parametric Invariances from Data with Permanent Random Connectomes

One of the fundamental problems in supervised classification and in mach...
research
12/04/2019

Are skip connections necessary for biologically plausible learning rules?

Backpropagation is the workhorse of deep learning, however, several othe...
research
02/29/2020

Deep differentiable forest with sparse attention for the tabular data

We present a general architecture of deep differentiable forest and its ...
research
02/26/2020

Sparse Sinkhorn Attention

We propose Sparse Sinkhorn Attention, a new efficient and sparse method ...
research
06/27/2021

A Reinforcement Learning Approach for Sequential Spatial Transformer Networks

Spatial Transformer Networks (STN) can generate geometric transformation...
research
05/21/2018

Sparse and Constrained Attention for Neural Machine Translation

In NMT, words are sometimes dropped from the source or generated repeate...

Please sign up or login with your details

Forgot password? Click here to reset