Sparse Factorization of Large Square Matrices

09/16/2021
by   Ruslan Khalitov, et al.
11

Square matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only N(log N)^2 non-zero numbers for an N× N full matrix. We present both non-parametric and parametric ways to find the factorization. In the former, we learn the factorizing matrices directly, and in the latter, we train neural networks to map input data to the non-zero matrix entries. The sparse factorization method is tested for a variety of synthetic and real-world square matrices. The experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high-rank. Based on this finding, we use our parametric method as a scalable attention architecture that performs strongly in learning tasks for long sequential data and defeats Transformer and its several variants.

READ FULL TEXT

page 5

page 8

page 9

page 13

page 14

page 15

research
04/22/2022

Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention

Self-Attention is a widely used building block in neural modeling to mix...
research
05/01/2018

Compact Factorization of Matrices Using Generalized Round-Rank

Matrix factorization is a well-studied task in machine learning for comp...
research
02/09/2023

NeuKron: Constant-Size Lossy Compression of Sparse Reorderable Matrices and Tensors

Many real-world data are naturally represented as a sparse reorderable m...
research
07/02/2022

A Structured Sparse Neural Network and Its Matrix Calculations Algorithm

Gradient descent optimizations and backpropagation are the most common m...
research
12/17/2021

Sublinear Time Approximation of Text Similarity Matrices

We study algorithms for approximating pairwise similarity matrices that ...
research
02/23/2020

Sketching Transformed Matrices with Applications to Natural Language Processing

Suppose we are given a large matrix A=(a_i,j) that cannot be stored in m...
research
07/02/2018

Multi-distance Support Matrix Machines

Real-world data such as digital images, MRI scans and electroencephalogr...

Please sign up or login with your details

Forgot password? Click here to reset