Sparse Linear Networks with a Fixed Butterfly Structure: Theory and Practice

07/17/2020
by   Nir Ailon, et al.
0

Fast Fourier transform, Wavelets, and other well-known transforms in signal processing have a structured representation as a product of sparse matrices which are referred to as butterfly structures. Research in the recent past have used such structured linear networks along with randomness as pre-conditioners to improve the computational performance of large scale linear algebraic operations. With the advent of deep learning and AI and the computational efficiency of such structured matrices, it is natural to study sparse linear deep networks in which the location of the non-zero weights are predetermined by the butterfly structure. This work studies, both theoretically and empirically, the feasibility of training such networks in different scenarios. Unlike convolutional neural networks, which are structured sparse networks designed to recognize local patterns in lattices representing a spatial or a temporal structure, the butterfly architecture used in this work can replace any dense linear operator with a gadget consisting of a sequence of logarithmically (in the network width) many sparse layers, containing a total of near linear number of weights. This improves on the quadratic number of weights required in a standard dense layer, with little compromise in expressibility of the resulting operator. We show in a collection of empirical experiments that our proposed architecture not only produces results that match and often outperform existing known architectures, but it also offers faster training and prediction in deployment. This empirical phenomenon is observed in a wide variety of experiments that we report, including both supervised prediction on NLP and vision data, as well as in unsupervised representation learning using autoencoders. Preliminary theoretical results presented in the paper explain why training speed and outcome are not compromised by our proposed approach.

READ FULL TEXT

page 2

page 5

research
11/18/2015

ACDC: A Structured Efficient Linear Layer

The linear layer is one of the most pervasive modules in deep learning r...
research
06/02/2023

MLP-Mixer as a Wide and Sparse MLP

Multi-layer perceptron (MLP) is a fundamental component of deep learning...
research
12/29/2020

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Modern neural network architectures use structured linear transformation...
research
03/14/2019

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

Fast linear transforms are ubiquitous in machine learning, including the...
research
01/29/2019

On the Expressive Power of Deep Fully Circulant Neural Networks

In this paper, we study deep fully circulant neural networks, that is de...
research
02/25/2015

On Convolutional Approximations to Linear Dimensionality Reduction Operators for Large Scale Data Processing

In this paper, we examine the problem of approximating a general linear ...
research
01/31/2018

Deep Learning of Constrained Autoencoders for Enhanced Understanding of Data

Unsupervised feature extractors are known to perform an efficient and di...

Please sign up or login with your details

Forgot password? Click here to reset