a-Tucker: Input-Adaptive and Matricization-Free Tucker Decomposition for Dense Tensors on CPUs and GPUs

10/20/2020
by   Min Li, et al.
11

Tucker decomposition is one of the most popular models for analyzing and compressing large-scale tensorial data. Existing Tucker decomposition algorithms usually rely on a single solver to compute the factor matrices and core tensor, and are not flexible enough to adapt with the diversities of the input data and the hardware. Moreover, to exploit highly efficient GEMM kernels, most Tucker decomposition implementations make use of explicit matricizations, which could introduce extra costs in terms of data conversion and memory usage. In this paper, we present a-Tucker, a new framework for input-adaptive and matricization-free Tucker decomposition of dense tensors. A mode-wise flexible Tucker decomposition algorithm is proposed to enable the switch of different solvers for the factor matrices and core tensor, and a machine-learning adaptive solver selector is applied to automatically cope with the variations of both the input data and the hardware. To further improve the performance and enhance the memory efficiency, we implement a-Tucker in a fully matricization-free manner without any conversion between tensors and matrices. Experiments with a variety of synthetic and real-world tensors show that a-Tucker can substantially outperform existing works on both CPUs and GPUs.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 9

page 10

research
09/03/2018

Tensor Networks for Latent Variable Analysis: Higher Order Canonical Polyadic Decomposition

The Canonical Polyadic decomposition (CPD) is a convenient and intuitive...
research
08/31/2023

Efficient Computation of Tucker Decomposition for Streaming Scientific Data Compression

The Tucker decomposition, an extension of singular value decomposition f...
research
04/04/2019

VeST: Very Sparse Tucker Factorization of Large-Scale Tensors

Given a large tensor, how can we decompose it to sparse core tensor and ...
research
03/24/2022

DPar2: Fast and Scalable PARAFAC2 Decomposition for Irregular Dense Tensors

Given an irregular dense tensor, how can we efficiently analyze it? An i...
research
02/04/2018

Out-of-Core and Distributed Algorithms for Dense Subtensor Mining

How can we detect fraudulent lockstep behavior in large-scale multi-aspe...
research
10/14/2019

A High-Throughput Solver for Marginalized Graph Kernels on GPU

We present the design of a solver for the efficient and high-throughput ...
research
05/28/2023

Fast and Accurate Dual-Way Streaming PARAFAC2 for Irregular Tensors – Algorithm and Application

How can we efficiently and accurately analyze an irregular tensor in a d...

Please sign up or login with your details

Forgot password? Click here to reset