Efficient Inference via Universal LSH Kernel

06/21/2021
by   Zichang Liu, et al.
1

Large machine learning models achieve unprecedented performance on various tasks and have evolved as the go-to technique. However, deploying these compute and memory hungry models on resource constraint environments poses new challenges. In this work, we propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations. Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name, providing a generic fundamental alternative to the problem of efficient inference that goes beyond the popular approach such as quantization, iterative pruning and knowledge distillation. A neural network function is transformed to its weighted kernel density representation, which can be very efficiently estimated with our sketching algorithm. Empirically, we show that Representer Sketch achieves up to 114x reduction in storage requirement and 59x reduction in computation complexity without any drop in accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

A One-Pass Private Sketch for Most Machine Learning Tasks

Differential privacy (DP) is a compelling privacy definition that explai...
research
11/15/2018

Sketch based Reduced Memory Hough Transform

This paper proposes using sketch algorithms to represent the votes in Ho...
research
04/06/2020

A Count Sketch Kaczmarz Method For Solving Large Overdetermined Linear Systems

In this paper, combining count sketch and maximal weighted residual Kacz...
research
07/21/2023

Model Compression Methods for YOLOv5: A Review

Over the past few years, extensive research has been devoted to enhancin...
research
11/24/2020

Effective and Sparse Count-Sketch via k-means clustering

Count-sketch is a popular matrix sketching algorithm that can produce a ...
research
01/07/2022

GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks

We develop the "generalized consistent weighted sampling" (GCWS) for has...
research
01/24/2021

A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks

In this work, to limit the number of required attention inference hops i...

Please sign up or login with your details

Forgot password? Click here to reset