Devising efficient post-quantum cryptographic schemes is a primary challenge, as also witnessed by the recently started NIST post-quantum standardization initiative . Among post-quantum cryptographic primitives, solutions based on error correcting codes and lattices play a primary role.
In this paper we deal with post-quantum cryptographic primitives based on codes and, in particular, code-based digital signatures. While it is relatively simple to devise code-based public key encryption schemes, mostly derived from the well-known McEliece system , the same cannot be said for digital signature schemes. In fact, code-based public-key encryption schemes are characterized by an expansion of the plaintext into the ciphertext, due to the redundancy added by encoding. Such an expansion results in the fact that some of the bit-strings of the same length of a ciphertext do not belong to the encryption function codomain. Therefore, it is not possible to exploit the same symmetry present in, e.g., the Rivest-Shamir-Adleman (RSA) scheme, to derive a signature scheme from a public key encryption cryptosystem.
This makes the problem of finding secure yet efficient code-based digital signature schemes a challenging one. Currently, the scheme introduced by Curtois, Finiasz and Sendrier (CFS) 
is the best known solution to this problem, withstanding seventeen years of cryptanalysis. The main drawback of this scheme is the need of decoding any syndrome vector obtained as the hash of the message to be signed, which is addressed appending a counter to the message or performing complete decoding. This solution however yields choices of the code parameters resulting in high complexity and may weaken the system security . More recent approaches exploit different families of codes, such as low-density generator matrix (LDGM) codes  and codes in the form , in order to design more practical code-based digital signature schemes.
In this paper we focus on the former solution, and describe a code-based digital signature scheme we name Low-dEnsity generator matrix coDe-bAsed digital signature algorithm (LEDAsig). It implements and improves the LDGM code-based digital signature scheme proposed in , that is standing since 2013 as a very fast code- based digital signature scheme with very compact public keys. In fact, this system has been implemented on embedded hardware achieving the fastest implementation of code-based signatures in open literature, with a signature generation throughputs of around signatures per second .
This code-based signature scheme is characterized by very fast key generation, signature generation and signature verification procedures. This is achieved by exploiting a special instance of the syndrome decoding problem (SDP) which allows to reduce decoding to a straightforward vector manipulation. This is done by considering only a subset of all possible syndromes, formed by those having a fixed and low Hamming weight. For this reason, we can say that LEDAsig relies on the sparse SDP, which however is not easier to solve than the general SDP without efficient algorithms exploiting the secret structure of the code.
The main known attacks against LEDAsig are those already devised against the system in , plus statistical attacks recently introduced in . As shown in , the digital signature scheme proposed in  can use the same keypair to perform a limited amount of signatures, before this exposes the system to statistical attacks that may be able to recover the secret key. LEDAsig defines new choices of the system parameters which allow to achieve a reasonably long lifespan for each key pair. Besides detailing recent statistical attacks, we carefully analyze all known attacks and provide a parametrization for LEDAsig to achieve some computational security guarantees, taking into account the cost reduction which follows from the use of a quantum computer in the solution of the underlying computationally hard problems. We also provide efficient algorithmic solutions for the implementation of all the LEDAsig functions. These solutions have been included in a reference software implementation of LEDAsig that is publicly available in . Based on this implementation, we carry out performance benchmarks of LEDAsig and provide performance figures that highlight its benefits in terms of signature generation and verification time.
The paper is organized as follows. In Section 2 we describe the scheme and the efficient algorithmic solutions we propose for its implementation. In Section 3 we consider all known attacks that can be mounted against LEDAsig and provide complexity estimates by considering both classical and quantum computers. In Section 4 we design some system instances to achieve given security levels. In Section 5 we assess performance based on the reference implementation of LEDAsig. In Section 6 we provide some conclusive remarks.
2 Description of the Scheme
Following , in LEDAsig the secret and the public keys are the characteristic matrices of two linear block codes: a private quasi-cyclic low-density generator matrix (QC-LDGM) code and a public quasi-cyclic (QC) code derived from the former. Some background concepts about these codes are recalled in Section 2.1. In the description of the cryptoscheme, two public functions are used: a hash function and a function that converts the output vector of into a sparse vector with length and weight . The vector is a public syndrome vector resulting from the signature generation procedure. The output of
is uniformly distributed over all the-bits long vectors with weight , and depends on a parameter , which is chosen for each message to be signed and is made public by the signer.
The design of is discussed next, where we provide a procedural description of the main steps of LEDAsig, i.e., key generation, signature generation and signature verification. We also provide some methods to accelerate the generation of the elements of the private key and to guarantee the non-singularity condition which is required for some of the involved matrices. Efficient representations of the matrices involved in LEDAsig are also introduced. In the procedural descriptions we consider the following functions:
randGen: generates distinct integers in ;
matrGen: generates a random binary matrix;
circGen: generates a random circulant matrix with row (and column) weight equal to ;
permGen: generates a random permutation matrix.
We use to denote the Kronecker product, while the classical matrix product is denoted with only when the Kronecker product appears in the same equation, otherwise it is omitted. We denote as the element of matrix in the -th row and -th column. We use to denote the null matrix and to denote the matrix with all entries equal to one.
2.1 Coding Background
Let denote the -dimensional vector space defined over the binary field . A binary linear block code, denoted as , is defined as a bijective linear map , , , between any binary -tuple (i.e., an information word) and a binary -tuple (denoted as codeword). The value is known as the length of the code, while denotes its dimension. A generator matrix (resp. parity-check matrix ) for is a matrix whose row span (resp. kernel) coincides with the set of codewords of .
A binary linear block code is said to be LDGM if at least one of its generator matrices is sparse, i.e., has a fraction of its entries set to one. LEDAsig uses a secret binary LDGM code with length and dimension , characterized by a generator matrix in the systematic form
where is the identity matrix and is a sparse matrix (with being the code redundancy). As it will be shown next, a special form of is considered, known as QC form, which makes the LDGM code a QC code as well. The rows of have fixed Hamming weight , which means that has constant row weight equal to .
Due to their sparse nature, it is very likely that, by adding two or more rows of the generator matrix of an LDGM code, a vector with Hamming weight is obtained. If the linear combination of any group of rows of yields a codeword with weight greater than or equal to , then the LDGM code has minimum distance . This is even more likely if the rows of are chosen in such a way as to be quasi-orthogonal, that is, with a small number of overlapping ones.
The code defined by in eq. (1) admits a sparse parity-check matrix in the form
where is the identity matrix. Due to the sparsity of , the parity-check matrix in eq. (2) is a sparse matrix as well. Therefore, an LDGM code with generator matrix as in eq. (1) also is a low-density parity-check (LDPC) code.
The special class of LDGM codes used in LEDAsig is that of QC-LDGM codes, having generator and parity-check matrices formed by circulant bocks with size . In fact, the QC property allows to reduce the memory needed to store these matrices, and yields important advantages in terms of algorithmic complexity. In case of a QC-LDGM code, the matrix in eq. (1) and eq. (2) is denoted as and has the following general form
where represents either a sparse circulant matrix or a null matrix with size . Hence, in this case the code length, dimension and redundancy are , and , respectively. For the rest of the paper, we will use the superscript to denote QC matrices and, for a given QC matrix , we refer to its circulant block at position as .
Since a circulant matrix is defined by one of its rows (conventionally the first), storing a binary matrix as in eq. (3) requires bits, yielding a reduction by a factor with respect to a matrix with a general form. Moreover, given the sparse form of the matrices, a further size reduction can be achieved storing only the positions of the set coefficients of each first row of a circulant block.
The set of
binary circulant matrices form a ring under the operations of modulo-2 matrix addition and multiplication. The zero element is the all-zero matrix, and the identity element is theidentity matrix. If we consider the algebra of polynomials over , , the following map is an isomorphism between this algebra and that of circulant matrices over
According to eq. (4), any binary circulant matrix is associated to a polynomial in the variable having coefficients over which coincide with the entries in the first row of the matrix, i.e.,
Also according to eq. (4), the all-zero circulant matrix corresponds to the null polynomial and the identity matrix to the unitary polynomial. In the same way, the set of QC matrices formed by circulant blocks of size is a ring under the standard operations of modulo-2 matrix addition and multiplication. The null element corresponds to the null matrix, the identity element is the identity matrix . Matrices in QC form can be efficiently represented by the polynomials associated to the circulant blocks, leading to very compact representations.
The LDGM codes used in LEDAsig are described by generator matrices with constant row weight , a feature which ie employed to easily obtain a random codeword with weight , with being a small integer. In fact, since the rows of the generator matrix are sparse, it is very likely that, by adding together a few of them, the Hamming weight of the resulting vector is about the sum of the Hamming weights of its addends, bar some cancellations due to overlapping ones. If the sum of a set of rows does not fit the desired weight , some other row can be added, or some row replaced, or another combination of rows can be tested, in order to approach . In fact, using codewords with weight slightly smaller than may still allow achieving the target security level. In any case, generating a codeword with weight equal or almost equal to can be accomplished very quickly.
Based on these considerations, the number of random codewords with weight close to which can be easily generated at random from an LDGM code having row weight of equal to , with dividing , can be roughly estimated as
2.2 Private Key Generation
The private key in LEDAsig includes the characteristic matrices of an LDGM code with length , dimension and co-dimension . In particular, we consider circulant matrices of size , and so we have , and . We denote the generator and the parity-check matrix as and , respectively. Because of their systematic forms (1) and (2), these matrices can be represented just through . The public key is a dense QC matrix.
2.2.1 Generation of and
2.2.2 Generation of and
The matrix is a binary matrix, with constant row and column weight equal to . There are several methods for generating such a matrix. We consider the procedure described in Algorithm 2, which allows an efficient computation of .
According to Algorithm 2, and denoting as a diagonal matrix with blocks along the main diagonal, we can write:
Based on eq. (2.2.2), we have
Now, since and are permutation matrices, their inverses correspond to their transposes, yielding
This approach allows to achieve significant speedups in the inversion of , since the most complex part of the computation is the inversion of , which is an matrix, with being typically two orders of magnitude smaller than the code length . The existence of is sufficient to guarantee that is non singular. If we choose odd and such that is irreducible , then always exists.
2.2.3 Generation of and
The matrix is a matrix obtained as
where is a dense matrix with rank and is a sparse matrix. The density of can be considered as a parameter of the system design. In the following, we assume that has constant row and column weight equal to (i.e., it is a permutation matrix), since this choice has several advantages from the complexity standpoint. In particular, we propose the following construction for the matrices in the r.h.s. of (11):
in which and are two random binary matrices, denotes a QC diagonal matrix having the circulant permutation matrices along the main diagonal and is an permutation matrix. We choose , such that the matrix has maximum rank ; since has rank equal to , the rank of equals the one of , and so cannot be larger than . The overall row and column weight of will be denoted as in the following. As we show next, the inverse of can be easily computed and its existence depends on the choice of and . This is already considered in Algorithm 3 for their generation.
For the sake of simplicity, let us define and . We exploit the following result to obtain a strategy for performing an efficient inversion of .
Woodbury identity: Given two matrices and , where , we have
In the case of , we have and , so , and . Using the Woodbury identity, we obtain
In order to facilitate the computation of , let us first consider that . We have
The last equality is justified by the fact that the matrix can be thought as the composition of vectors that having either the form of or , and are thus invariant to permutations. Hence, we have
The product corresponds to the sum of ones. Therefore it is equal to when is odd and equal to otherwise. Based on these considerations, we obtain
and so we can define the matrix , such that
So, combining these results, regardless of the parity of , we have that
This expression can be further simplified by considering the special structure of the involved matrices, thus obtaining the following expression for that is convenient from the complexity standpoint:
We report the full derivation in Appendix A.
Based on this analysis, we note that the choice of an even simplifies the computation of and , since it guarantees that can always be inverted because . However, it has been recently shown that using circulant blocks with even size may reduce the security of the systems relying on them . Therefore, it is advisable to choose odd values of , although in this case the non-singularity of is no longer guaranteed and more than one attempt may be needed to generate a non-singular . We point out that, in the case of assuming small values (such as the ones we consider in this paper) this choice has a negligible impact on the efficiency of the scheme, since generating and checking its non-singularity is fast.
2.3 Public key generation
The public key is simply computed as
Exploiting the systematic structure of , we have
2.4 Signature Generation
In order to implement the function introduced in Section 2, let us consider a constant weight encoding function that takes as input a binary vector of given length and returns a length- vector with weight . In particular, given a message that must be signed, we choose , where is a public hash function. The input given to the constant weight encoding function is the concatenation of the digest with the binary representation of the parameter , which can be the value of a counter or a pseudo-random integer variable. In other words, given a message , the parameter is used to obtain several different outputs from the constant weight encoding function. This feature is necessary because, as we explain in section 2.4.2, the output of must be in the kernel of . If, for a given , the current output does not verify this property, we just change the value of and try again. The signature of a message is constituted by the binary string and the chosen value of , denoted as . In the signature generation, is a codeword of the code with weight , and is an error vector with weight which is generated from and through the function .
2.4.1 Random codeword generation
Each signature is built upon a random sparse codeword , with weight . As we explained in Section 2.1, such a codeword can be easily obtained by choosing , with being a small integer. Let be the length- information sequence corresponding to ; thanks to the systematic form of , we have
This means that we can easily obtain such a codeword randomly picking a sequence of weight and computing the corresponding codeword as in eq. (24) picking a different set of codewords to be added together if the weight of the sum does not fit.
2.4.2 Error vector generation
In order to generate the error vector , we first compute its syndrome as the digest of the message , through the function , by choosing a value of such that the product has the same weight of . Subseuently, the error vector is obtained as . We point out that the constraint on the weight of can be simply satisfied by imposing . Indeed, recalling Eq. (12), we have
Since is a permutation matrix, when the product is null, just corresponds to a permuted version of . This condition can be checked efficiently. First of all, let us write , where each is a length- sequence. In the same way, we write . Through some straightforward computations, it can be verified that only when the sum of the Hamming weights of the blocks indexed by the -th row of is even.
The syndrome is constructed from through and has fixed weight equal to . An algorithmic way to compute the syndrome and the corresponding error vector is described in Algorithm 4. A parameter to optimize is , representing the maximum value allowed for , which must be sufficiently large to ensure that a value such that
is found with very high probability. Thus, by increasing, the probability of a signing failure can be made negligible.
To this end, in the implementation we chose to represent the counter as a -bit unsigned integer value. This limits the probability of never finding a -bit value such that to for up to . We remark that the current parametrization for the proposed LEDAsig primitive employs , thus making the failure probability negligible for all practical purposes. Once the error vector is obtained, the signature is computed as .
2.4.3 Number of different signatures
An important parameter for any digital signature scheme is the total number of different signatures. Computing such a number is useful, for example, to verify that collision attacks are unfeasible (see Section 3.3). In LEDAsig, a unique signature corresponds to a specific -bit vector , having weight . Only vectors being in the kernel of are acceptable: since has rank equal to , then its kernel has dimension , which means that the number of binary vectors being in its kernel is equal to . We suppose that these vectors are uniformly distributed among all the vectors of length . This in turn implies that, considering the -tuples with weight , we expect a fraction of them to be in the kernel of . Thus, the total number of different signatures is
2.5 Signature verification
According to , the signature generation basically coincides with the computation of a new syndrome through the public code and the execution of some checks on the result. The final check consists in verifying that the new syndrome coincides with the one resulting from feeding the message digest to the function. These two vectors should coincide because
An algorithmic description of the signature verification procedure is reported in Algorithm 5.
3 Security Analysis
In this section we review the main known attack strategies against LEDAsig and their complexity.
3.1 Decoding attacks
In LEDAsig, an attacker knows that , with being a codeword of the public code, i.e, such that . Hence, can be considered as an error vector with weight affecting the codeword of the public code and having as its syndrome. Therefore, an attack strategy consists in exploiting decoding algorithms for general linear codes to recover from and . If this succeeds, then the attacker has to find a codeword of the public code with suitable weight to be added to in order to forge a valid signature.
The problem of finding from is known as SDP. If the SDP admits a unique solution, information set decoding (ISD) algorithms are those achieving the best performance in solving it.
In order to determine whether the SDP has a unique solution or not, we need to estimate the minimum distance of the public code. The public code admits a generator matrix in the form , which is also sparse. Hence the public code contains low weight codewords, coinciding with the rows of , which have weight approximately equal to . Since the sum of any two or more rows gives a codeword with weight with overwhelming probability, we can take as a reliable estimate of the minimum distance of the public code, which will hence be characterized by decoding spheres with radius . Since we want to guarantee the uniqueness of the SDP solution, we must impose
In order to satisfy the previous condition, we choose , leading to . With this choice, we guarantee that there is no algorithm that can solve the SDP more efficiently than ISD, thus we consider the ISD work factor (WF) to compute the security level of LEDAsig against decoding attacks.
The ISD approach, which was pioneered by Prange in , attempts at performing the decoding of a general linear code more efficiently than an exhaustive search approach. Subsequent improvements of Prange’s algorithm were presented by Lee and Brickell , Leon  and Stern . Among these variants, Stern’s algorithm  is currently the one best exploiting the speedups provided by quantum computers, as shown in . In particular, a significant portion of Stern’s algorithm can be solved employing Grover’s algorithm  to reduce the running time to the square root of the one needed for the computation on a classical platform. By contrast, when execution on classical computers is considered, the most efficient ISD turns out to be the Becker-Joux-May-Meurer (BJMM) algorithm proposed in , which is part of a family of results on the subject [21, 22, 23, 24]. All the aforementioned approaches have a running time growing exponentially in the effective key size of the scheme (a function of the number of errors, code size and rate), regardless of the availability of a quantum computer.
As a consequence, the security levels against attackers performing a decoding attack (DA) with classical computers have been estimated by considering the WF of the BJMM algorithm, while the security levels against quantum computer-equipped attackers were computed taking into account Stern’s algorithm.
We defend LEDAsig from DAs employing parameters which prevent the syndrome decoding from succeeding given a computational power bounded by the desired security level. To this end, we take into account the fact that the nature of the QC codes employed in LEDAsig provides a speedup by a factor with respect to the running time of the ISD algorithm employed to perform decoding of a general linear code .
3.1.1 Quantum Stern’s algorithm
Considering the fact that Stern’s algorithm  is the one best suited for quantum computer execution, and will thus be employed to determine the parameters of LEDAsig, we briefly resume the results in , describing how the application of Grover’s algorithm to ISD algorithms can be taken into account when computing the complexity of key recovery attacks and DAs.
An ISD algorithm is an algorithm taking as input a code with length , dimension , and trying to find a codeword with weight or, equivalently, an error vector with weight given the code and the corresponding syndrome of the error through the code. In LEDAsig, employing ISD to perform general decoding we have it acting on an bits long code, with dimension , trying to correct an error vector with weight .
The basic structure of each ISD algorithm is essentially the same, and relies on the identification of an information set, that is, guessing a set of error-free positions in the error-vector, corresponding to a set of linearly independent columns of the generator matrix of the code. Recovering the entries of the error vector affecting this set is enough to reconstruct the whole error vector. The algorithm must be run iteratively, and each iteration has a probability of success . Thus, the expected number of iterations that makes the attack successful is . The probability is obtained as the product of and , where is the probability that an iteration of ISD has selected a set of linearly independent vectors, while is the probability that the error vector entries affecting the selected set can be recovered. It can be proven that converges to as the size of the binary matrix being inverted increases , while for we have
according to , where and are parameters which influence the complexity of the algorithm and must be optimized to minimize the value of .
Taking into account the speedup following from the application of Grover’s algorithm to Stern’s algorithm, it follows that the algorithm is successful after performing only iterations on average, instead of . Let us define:
as the cost in qubit operations of decoding the input qubits to the inputs of the classical algorithm which must be performed whenever an iteration is completed on the quantum computer;
as the number of bit operations needed to perform an iteration of the classical Stern’s algorithm;
as the cost of inverting the matrix obtained with the columns selected during the iteration; in fact, since a quantum implementation of Stern’s algorithm must be performed entirely with revertible operations, skipping an iteration is not possible, even if the selected columns do not correspond to an information set (i.e., they are not linearly independent).
By taking the conservative assumption that a qubit operation has the same cost of a bit operation, it is possible to express the amount of operations required to execute Stern’s algorithm on a quantum computer as
Estimating the actual value of may be very hard, since it depends on the size of the input given to . For example, some input parameters can be fixed (in this case, the number of bits needed to represent the input given to decreases) but, at the same time, the value of might get lower (since, in this case, we might not consider an optimal input choice). While estimates for have put it in the range , we conservatively consider . Finally, to compute the two remaining computational costs, we refer to the following expressions (from )
We point out that, for the cases we are interested in, the values of (28) slightly depend on , so we can conservatively neglect it, without significant variations in the attack complexity.
3.1.2 BJMM algorithm complexity.
As already mentioned, when only classical computers are available, the most efficient ISD algorithm turns out to be the BJMM algorithm proposed in . A precise estimate of the WF of this algorithm in the finite-length regime can be found in , and it has been used to compute the WF of attacks based on ISD against the proposed instances of LEDAsig, when performed with classical computers. While the complete expression of the computational complexity of the BJMM algorithm is rather involved, we point out that a simple expression providing an approximate but fairly intuitive expression for it is reported in  and is , where .
3.2 Key recovery attacks
An attacker could aim to mount a KRA against LEDAsig, aimed at obtaining the private key. A potential vulnerability in this sense comes from the use of LDGM codes: these codes offer the advantage of having a predictable (and sufficiently high) number of codewords with a moderately low weight , and of making their random selection very easy for the signer. On the other hand, as pointed out in the previous section, the public code is characterized by low weight codewords with weight approximately equal to . Since has rows, and summing any two of them gives higher weight codewords with overwhelming probability, we can consider that the multiplicity of these low weight codewords in the public code is .
It is possible to show that the low weight codeword finding problem is equivalent to the general linear code decoding problem, thus allowing ISD to be retrofit to this task too. Thus, rows of might be recovered using ISD algorithms to search for codewords of weight in the public code.
We assume that knowing one row of can be enough to recover the whole matrix, even if this is a very conservative approach. Taking into account the multiplicity of low weight codewords, we consider a speedup factor of , with respect to the application of ISD to a general code.
As already explained in Section 3.1, in the case of a quantum computer-equipped attacker, the best ISD algorithm is Stern’s algorithm, (described in Section 3.1.1), while in the case of a classical computer, the best solution is the BJMM algorithm (described in Section 3.1.2).
Another possible vulnerability comes from the fact that an attacker could obtain the vector space generated by , as well as its dual space, by observing public syndromes , since . Hence, we can suppose that an attacker knows an matrix such that . The attacker also knows that and that the public code admits any non-singular generator matrix in the form , which becomes for . The matrix , which corresponds to the choice of , is likely to be the most sparse one among them, and it can be attacked by searching for low weight codewords in the public code, as we have already observed. On the other hand, knowing does not help to reduce the complexity of attacking either or one of the possible , hence it cannot be exploited by an attacker to perform a KRA. A newly devised attack which is more efficient in recovering the secret key is instead targeted against the matrix , and will be discussed in Section 3.7.
3.3 Collision attacks
As for any other hash-and-sign scheme, classical collision birthday attacks represent a threat for LEDAsig. Since the system admits up to different signatures, where is given by (25), it is sufficient to collect different signatures to have a high probability of finding a collision with a classical computer. Hence, the security level reached by the system under classical computing cannot exceed .
If we consider an attacker provided with a quantum computer, we must take into account the BHT algorithm , implying that the security level cannot exceed .
3.4 Forgery attacks based on right-inverse matrices
In order to forge signatures, an attacker could search for an right-inverse matrix of . If the signature were dense, it would have been easy to find a right-inverse matrix able to forge it. In fact, provided that is invertible, then is a right-inverse matrix of . The matrix