Accelerating Polynomial Multiplication for Homomorphic Encryption on GPUs

09/02/2022
by   Kaustubh Shivdikar, et al.
0

Homomorphic Encryption (HE) enables users to securely outsource both the storage and computation of sensitive data to untrusted servers. Not only does HE offer an attractive solution for security in cloud systems, but lattice-based HE systems are also believed to be resistant to attacks by quantum computers. However, current HE implementations suffer from prohibitively high latency. For lattice-based HE to become viable for real-world systems, it is necessary for the key bottlenecks - particularly polynomial multiplication - to be highly efficient. In this paper, we present a characterization of GPU-based implementations of polynomial multiplication. We begin with a survey of modular reduction techniques and analyze several variants of the widely-used Barrett modular reduction algorithm. We then propose a modular reduction variant optimized for 64-bit integer words on the GPU, obtaining a 1.8x speedup over the existing comparable implementations. Next, we explore the following GPU-specific improvements for polynomial multiplication targeted at optimizing latency and throughput: 1) We present a 2D mixed-radix, multi-block implementation of NTT that results in a 1.85x average speedup over the previous state-of-the-art. 2) We explore shared memory optimizations aimed at reducing redundant memory accesses, further improving speedups by 1.2x. 3) Finally, we fuse the Hadamard product with neighboring stages of the NTT, reducing the twiddle factor memory footprint by 50 speedup of 123.13x and 2.37x over the previous state-of-the-art CPU and GPU implementations of NTT kernels, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 7

page 9

page 10

research
03/30/2021

Intel HEXL: Accelerating Homomorphic Encryption with Intel AVX512-IFMA52

Modern implementations of homomorphic encryption (HE) rely heavily on po...
research
07/27/2023

Accelerating Polynomial Modular Multiplication with Crossbar-Based Compute-in-Memory

Lattice-based cryptographic algorithms built on ring learning with error...
research
06/03/2023

Optimized Vectorization Implementation of CRYSTALS-Dilithium

CRYSTALS-Dilithium is a lattice-based signature scheme to be standardize...
research
04/14/2023

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

Collaborative filtering (CF) has been proven to be one of the most effec...
research
08/21/2022

Scrooge: A Fast and Memory-Frugal Genomic Sequence Aligner for CPUs, GPUs, and ASICs

Motivation: Pairwise sequence alignment is a very time-consuming step in...
research
09/20/2023

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

Fully Homomorphic Encryption (FHE) enables the processing of encrypted d...
research
09/07/2021

OSKR/OKAI: Systematic Optimization of Key Encapsulation Mechanisms from Module Lattice

In this work, we make systematic optimizations of key encapsulation mech...

Please sign up or login with your details

Forgot password? Click here to reset