Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

by   Shail Dave, et al.

Machine learning (ML) models are widely used in many domains including media processing and generation, computer vision, medical diagnosis, embedded systems, high-performance and scientific computing, and recommendation systems. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular-shaped computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This paper provides a comprehensive survey on how to efficiently execute sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses additional enhancement modules in architecture design and software support; categorizes different hardware designs and acceleration techniques and analyzes them in terms of hardware and execution costs; highlights further opportunities in terms of hardware/software/algorithm co-design optimizations and joint optimizations among described hardware and software enhancement modules. The takeaways from this paper include: understanding the key challenges in accelerating sparse, irregular-shaped, and quantized tensors; understanding enhancements in acceleration systems for supporting their efficient computations; analyzing trade-offs in opting for a specific type of design enhancement; understanding how to map and compile models with sparse tensors on the accelerators; understanding recent design trends for efficient accelerations and further opportunities.


page 1

page 10

page 24

page 25


GPTPU: Accelerating Applications using Edge Tensor Processing Units

Neural network (NN) accelerators have been integrated into a wide-spectr...

A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms

Recent trends in deep learning (DL) imposed hardware accelerators as the...

DPar2: Fast and Scalable PARAFAC2 Decomposition for Irregular Dense Tensors

Given an irregular dense tensor, how can we efficiently analyze it? An i...

Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures

Continual demand for memory bandwidth has made it worthwhile for memory ...

∇SD: Differentiable Programming for Sparse Tensors

Sparse tensors are prevalent in many data-intensive applications, yet ex...

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation

Tensor computations overwhelm traditional general-purpose computing devi...

Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More

Deep learning implementations on CPUs (Central Processing Units) are gai...

Please sign up or login with your details

Forgot password? Click here to reset