PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption

03/03/2023
by   Weihang Tan, et al.
0

High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes four novel contributions. First, parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. This can enable real-time processing for HE applications, as the number of clock cycles to process the polynomial is inversely proportional to the level of parallelism. Second, the proposed architecture eliminates the need for permuting the NTT outputs before their product is input to the iNTT. This reduces latency by n/4 clock cycles, where n is the length of the polynomial, and reduces buffer requirement by one delay-switch-delay circuit of size n. Third, an approach to select special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Fourth, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate as these feed-forward architectures can be pipelined at arbitrary levels.

READ FULL TEXT

page 1

page 11

page 12

page 13

research
10/23/2021

Low-Latency VLSI Architectures for Modular Polynomial Multiplication via Fast Filtering and Applications to Lattice-Based Cryptography

This paper presents a low-latency hardware accelerator for modular polyn...
research
09/16/2023

A Low-Latency FFT-IFFT Cascade Architecture

This paper addresses the design of a partly-parallel cascaded FFT-IFFT a...
research
06/21/2023

NTT-Based Polynomial Modular Multiplication for Homomorphic Encryption: A Tutorial

Homomorphic Encryption (HE) allows any third party to operate on the enc...
research
12/24/2021

Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform

The Discrete Periodic Radon Transform (DPRT) has been extensively used i...
research
09/11/2023

Multiplierless Design of High-Speed Very Large Constant Multiplications

In cryptographic algorithms, the constants to be multiplied by a variabl...
research
11/25/2020

Low Latency CMOS Hardware Acceleration for Fully Connected Layers in Deep Neural Networks

We present a novel low latency CMOS hardware accelerator for fully conne...

Please sign up or login with your details

Forgot password? Click here to reset