DeepAI AI Chat
Log In Sign Up

Common Subexpression-based Compression and Multiplication of Sparse Constant Matrices

03/26/2023
by   Emre Bilgili, et al.
0

In deep learning inference, model parameters are pruned and quantized to reduce the model size. Compression methods and common subexpression (CSE) elimination algorithms are applied on sparse constant matrices to deploy the models on low-cost embedded devices. However, the state-of-the-art CSE elimination methods do not scale well for handling large matrices. They reach hours for extracting CSEs in a 200 × 200 matrix while their matrix multiplication algorithms execute longer than the conventional matrix multiplication methods. Besides, there exist no compression methods for matrices utilizing CSEs. As a remedy to this problem, a random search-based algorithm is proposed in this paper to extract CSEs in the column pairs of a constant matrix. It produces an adder tree for a 1000 × 1000 matrix in a minute. To compress the adder tree, this paper presents a compression format by extending the Compressed Sparse Row (CSR) to include CSEs. While compression rates of more than 50% can be achieved compared to the original CSR format, simulations for a single-core embedded system show that the matrix multiplication execution time can be reduced by 20%.

READ FULL TEXT
05/20/2017

Sparse Matrix Multiplication On An Associative Processor

Sparse matrix multiplication is an important component of linear algebra...
11/18/2018

Stark: Fast and Scalable Strassen's Matrix Multiplication using Apache Spark

This paper presents a new fast, highly scalable distributed matrix multi...
12/18/2018

MatRox: A Model-Based Algorithm with an Efficient Storage Format for Parallel HSS-Structured Matrix Approximations

We present MatRox, a novel model-based algorithm and implementation of H...
12/22/2022

EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models

With the advent of deep learning application on edge devices, researcher...
01/13/2023

Linear Computation Coding: Exponential Search and Reduced-State Algorithms

Linear computation coding is concerned with the compression of multidime...
06/02/2019

Sparse Matrix to Matrix Multiplication: A Representation and Architecture for Acceleration (long version)

Accelerators for sparse matrix multiplication are important components i...
02/09/2023

Exact computations with quasiseparable matrices

Quasi-separable matrices are a class of rank-structured matriceswidely u...