Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices

03/28/2022
by   Paolo Ferragina, et al.
0

As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless compression scheme for real-valued matrices which achieves efficient performance in terms of compression ratio and time for linear-algebra operations. Experiments show that, as a compressor, our tool is clearly superior to gzip and it is usually within 20 ratio. In addition, our compressed format supports matrix-vector multiplications in time and space proportional to the size of the compressed representation, unlike gzip and xz that require the full decompression of the compressed matrix. To our knowledge our lossless compressor is the first one achieving time and space complexities which match the theoretical limit expressed by the k-th order statistical entropy of the input. To achieve further time/space reductions, we propose column-reordering algorithms hinging on a novel column-similarity score. Our experiments on various data sets of ML matrices show that, with a modest preprocessing time, our column reordering can yield a further reduction of up to 16 memory usage during matrix-vector multiplication. Finally, we compare our proposal against the state-of-the-art Compressed Linear Algebra (CLA) approach showing that ours runs always at least twice faster (in a multi-thread setting) and achieves better compressed space occupancy for most of the tested data sets. This experimentally confirms the provably effective theoretical bounds we show for our compressed-matrix approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2020

Impossibility Results for Grammar-Compressed Linear Algebra

To handle vast amounts of data, it is natural and popular to compress ve...
research
02/25/2022

Compressed Matrix Computations

Frugal computing is becoming an important topic for environmental reason...
research
09/08/2023

Value-Compressed Sparse Column (VCSC): Sparse Matrix Storage for Redundant Data

Compressed Sparse Column (CSC) and Coordinate (COO) are popular compress...
research
02/28/2018

Fast Lempel-Ziv Decompression in Linear Space

We consider the problem of decompressing the Lempel-Ziv 77 representatio...
research
06/05/2020

Can the Multi-Incoming Smart Meter Compressed Streams be Re-Compressed?

Smart meters have currently attracted attention because of their high ef...
research
03/27/2018

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression

We present memory-efficient and scalable algorithms for kernel methods u...
research
06/13/2019

Post-Processing of High-Dimensional Data

Scientific computations or measurements may result in huge volumes of da...

Please sign up or login with your details

Forgot password? Click here to reset