Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

by   Tahseen Rabbani, et al.

Locality-sensitive hashing (LSH) based frameworks have been used efficiently to select weight vectors in a dense hidden layer with high cosine similarity to an input, enabling dynamic pruning. While this type of scheme has been shown to improve computational training efficiency, existing algorithms require repeated randomized projection of the full layer weight, which is impractical for computational- and memory-constrained devices. In a distributed setting, deferring LSH analysis to a centralized host is (i) slow if the device cluster is large and (ii) requires access to input data which is forbidden in a federated context. Using a new family of hash functions, we develop one of the first private, personalized, and memory-efficient on-device LSH frameworks. Our framework enables privacy and personalization by allowing each device to generate hash tables, without the help of a central host, using device-specific hashing hyper-parameters (e.g. number of hash tables or hash length). Hash tables are generated with a compressed set of the full weights, and can be serially generated and discarded if the process is memory-intensive. This allows devices to avoid maintaining (i) the fully-sized model and (ii) large amounts of hash tables in local memory for LSH analysis. We prove several statistical and sensitivity properties of our hash functions, and experimentally demonstrate that our framework is competitive in training large-scale recommender networks compared to other LSH frameworks which assume unrestricted on-device capacity.


page 1

page 2

page 3

page 4


Cuckoo++ Hash Tables: High-Performance Hash Tables for Networking Applications

Hash tables are an essential data-structure for numerous networking appl...

Hyperdimensional Hashing: A Robust and Efficient Dynamic Hash Table

Most cloud services and distributed applications rely on hashing algorit...

Dash: Scalable Hashing on Persistent Memory

Byte-addressable persistent memory (PM) brings hash tables the potential...

Large-scale Speaker Retrieval on Random Speaker Variability Subspace

This paper describes a fast speaker search system to retrieve segments o...

Learning to Hash for Indexing Big Data - A Survey

The explosive growth in big data has attracted much attention in designi...

Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering

First introduced in 1954, linear probing is one of the oldest data struc...

Efficient algorithms for collecting the statistics of large-scale IP address data

Compiling the statistics of large-scale IP address data is an essential ...

Please sign up or login with your details

Forgot password? Click here to reset