Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

02/16/2021
by   Itay Hubara, et al.
0

Recently, researchers proposed pruning deep neural network weights (DNNs) using an N:M fine-grained block sparsity mask. In this mask, for each block of M weights, we have at least N zeros. In contrast to unstructured sparsity, N:M fine-grained block sparsity allows acceleration in actual modern hardware. So far, this was used for DNN acceleration at the inference phase. First, we suggest a method to convert a pretrained model with unstructured sparsity to a N:M fine-grained block sparsity model, with little to no training. Then, to also allow such acceleration in the training phase, we suggest a novel transposable-fine-grained sparsity mask where the same mask can be used for both forward and backward passes. Our transposable mask ensures that both the weight matrix and its transpose follow the same sparsity pattern; thus the matrix multiplication required for passing the error backward can also be accelerated. We discuss the transposable constraint and devise a new measure for mask constraints, called mask-diversity (MD), which correlates with their expected accuracy. Then, we formulate the problem of finding the optimal transposable mask as a minimum-cost-flow problem and suggest a fast linear approximation that can be used when the masks dynamically change while training. Our experiments suggest 2x speed-up with no accuracy degradation over vision and language models. A reference implementation can be found at https://github.com/papers-submission/structured_transposable_masks.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

02/08/2021

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Sparsity in Deep Neural Networks (DNNs) has been widely studied to compr...
11/01/2018

Balanced Sparsity for Efficient DNN Inference on GPU

In trained deep neural networks, unstructured pruning can reduce redunda...
04/03/2021

Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

The unstructured sparsity after pruning poses a challenge to the efficie...
05/23/2016

Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition

Fine-grained image recognition is a challenging computer vision problem,...
04/30/2021

Studying the Consistency and Composability of Lottery Ticket Pruning Masks

Magnitude pruning is a common, effective technique to identify sparse su...
03/02/2021

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network

Resistive Random-Access-Memory (ReRAM) crossbar is a promising technique...
05/31/2021

1×N Block Pattern for Network Sparsity

Though network sparsity emerges as a promising direction to overcome the...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.