Faster multiplication over 𝔽_2[X] using AVX512 instruction set and VPCLMULQDQ instruction

01/25/2022
βˆ™
by   Jean-Marc Robert, et al.
βˆ™
0
βˆ™

Code-based cryptography is one of the main propositions for the post-quantum cryptographic context, and several protocols of this kind have been submitted on the NIST platform. Among them, BIKE and HQC are part of the five alternate candidates selected in the third round of the NIST standardization process in the KEM category. These two schemes make use of multiplication of large polynomials over binary rings, and due to the polynomial size (from 10,000 to 60,000 bits), this operation is one of the costliest during key generation, encapsulation, or decapsulation mechanisms. In this work, we revisit the different existing constant-time algorithms for arbitrary polynomial multiplication. We explore the different Karatsuba and Toom-Cook constructions in order to determine the best combinations for each polynomial degree range, in the context of AVX2 and AVX512 instruction sets. This leads to different kernels and constructions in each case. In particular, in the context of AVX512, we use the VPCLMULQDQ instruction, which is a vectorized binary polynomial multiplication instruction. This instruction deals with up to four polynomial (of degree up to 63) multiplications, the four results being stored in one single 512-bit word. This allows to divide by roughly 3 the retired instruction number of the operation in comparison with the AVX2 instruction set implementations, while the speedup is up to 39 cycles. These results are different than the ones estimated in Drucker (Fast multiplication of binary polynomials with the forthcoming vectorized vpclmulqdq instruction, 2018). To illustrate the benefit of the new VPCLMULQDQ instruction, we used the HQC code to evaluate our approaches. When implemented in the HQC protocol, for the security levels 128, 192, and 256, our approaches provide up to 12

READ FULL TEXT
research
βˆ™ 02/12/2018

Frobenius Additive Fast Fourier Transform

In ISSAC 2017, van der Hoeven and Larrieu showed that evaluating a polyn...
research
βˆ™ 09/21/2020

On Software Implementation of Gabidulin Decoders

This work compares the performance of software implementations of differ...
research
βˆ™ 01/16/2019

Realize special instructions on clustering VLIW DSP: multiplication-accumulation instruction

BWDSP is a 32bit static scalar digital signal processor with VLIW and SI...
research
βˆ™ 09/30/2020

An Embedded RISC-V Core with Fast Modular Multiplication

One of the biggest concerns in IoT is privacy and security. Encryption a...
research
βˆ™ 01/05/2022

Comparison of methods for the calculation of the real dilogarithm regarding instruction-level parallelism

We compare different methods for the computation of the real dilogarithm...
research
βˆ™ 08/03/2020

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands

Matrix multiplications between asymmetric bit-width operands, especially...
research
βˆ™ 09/02/2020

Benchmarking 50-Photon Gaussian Boson Sampling on the Sunway TaihuLight

Boson sampling is expected to be one of an important milestones that wil...

Please sign up or login with your details

Forgot password? Click here to reset