A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication

11/27/2018
by   Sanghamitra Dutta, et al.
0

This paper has two contributions. First, we propose a novel coded matrix multiplication technique called Generalized PolyDot codes that advances on existing methods for coded matrix multiplication under storage and communication constraints. This technique uses "garbage alignment," i.e., aligning computations in coded computing that are not a part of the desired output. Generalized PolyDot codes bridge between Polynomial codes and MatDot codes, trading off between recovery threshold and communication costs. Second, we demonstrate that Generalized PolyDot can be used for training large Deep Neural Networks (DNNs) on unreliable nodes prone to soft-errors. This requires us to address three additional challenges: (i) prohibitively large overhead of coding the weight matrices in each layer of the DNN at each iteration; (ii) nonlinear operations during training, which are incompatible with linear coding; and (iii) not assuming presence of an error-free master node, requiring us to architect a fully decentralized implementation without any "single point of failure." We allow all primary DNN training steps, namely, matrix multiplication, nonlinear activation, Hadamard product, and update steps as well as the encoding/decoding to be error-prone. We consider the case of mini-batch size B=1, as well as B>1, leveraging coded matrix-vector products, and matrix-matrix products respectively. The problem of DNN training under soft-errors also motivates an interesting, probabilistic error model under which a real number (P,Q) MDS code is shown to correct P-Q-1 errors with probability 1 as compared to P-Q/2 for the more conventional, adversarial error model. We also demonstrate that our proposed strategy can provide unbounded gains in error tolerance over a competing replication strategy and a preliminary MDS-code-based strategy for both these error models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

Folded Polynomial Codes for Coded Distributed AA^⊤-Type Matrix Multiplication

In this paper, due to the important value in practical applications, we ...
research
03/04/2019

CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors

This work proposes the first strategy to make distributed training of ne...
research
09/30/2019

Cross Subspace Alignment Codes for Coded Distributed Batch Matrix Multiplication

The goal of coded distributed matrix multiplication (CDMM) is to efficie...
research
05/03/2022

Private Matrix Multiplication From MDS-Coded Storage With Colluding Servers

In this paper, we study the two problems of Private and Secure Matrix Mu...
research
04/25/2019

Array BP-XOR Codes for Parallel Matrix Multiplication using Hierarchical Computing

This study presents a novel coded computation technique for parallel mat...
research
11/28/2018

An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation

We propose a novel application of coded computing to the problem of the ...
research
03/20/2019

Numerically Stable Polynomially Coded Computing

We study the numerical stability of polynomial based encoding methods, w...

Please sign up or login with your details

Forgot password? Click here to reset