DeepAI AI Chat
Log In Sign Up

Optimized Sparse Matrix Operations for Reverse Mode Automatic Differentiation

by   Nicolas Nytko, et al.

Sparse matrix representations are ubiquitous in computational science and machine learning, leading to significant reductions in compute time, in comparison to dense representation, for problems that have local connectivity. The adoption of sparse representation in leading ML frameworks such as PyTorch is incomplete, however, with support for both automatic differentiation and GPU acceleration missing. In this work, we present an implementation of a CSR-based sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix operations, as well as automatic differentiability. We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.


page 1

page 2

page 3

page 4


Source-to-Source Automatic Differentiation of OpenMP Parallel Loops

This paper presents our work toward correct and efficient automatic diff...

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsit...

MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems

Sparse linear algebra kernels play a critical role in numerous applicati...

Optimizing Block-Sparse Matrix Multiplications on CUDA with TVM

We implemented and optimized matrix multiplications between dense and bl...

Dynamic Automatic Differentiation of GPU Broadcast Kernels

We show how forward-mode automatic differentiation (AD) can be employed ...

AutoMat – Automatic Differentiation for Generalized Standard Materials on GPUs

We propose a universal method for the evaluation of generalized standard...

Differentiate Everything with a Reversible Domain-Specific Language

Traditional machine instruction level reverse mode automatic differentia...