The code-based cryptosystems introduced by McEliece  and Niederreiter  are among the oldest and most studied post-quantum public-key cryptosystems. They are commonly built upon a family of error correcting codes, for which an efficient decoding algorithm is known. The security of these systems is based on the hardness of the so called SDP (SDP), i.e., the problem of decoding a random linear code, which has been proven to be NP-hard . The best SDP solvers are known as ISD (ISD) algorithms and were introduced in 1962 by Prange ; improved through the years (see [17, 26, 6], for some well known variants), these algorithms are characterized by exponential complexity, even when considering adversaries equipped with quantum computers . Because of their well studied and assessed security, code-based cryptosystems are nowadays considered among the most promising candidates for the post-quantum world .
In the above schemes, and others of the same type, the private key is the representation of a code, whose parameters are chosen in such a way to guarantee decoding of a given amount of errors, which are intentionally introduced in the plaintext during encryption. The public key is obtained through linear transformations of the secret key, with the aim of masking the structure of the secret code. In the original McEliece proposal, Goppa codes were used: on the one hand, this choice leads to a well assessed security (the original proposal is still substantially unbroken); on the other hand, the corresponding public keys do not allow for any compact representation, and thus have very large sizes.
A well-known way to address the latter issue is that of adding some geometrical structure to the code, in order to guarantee that the public key admits a compact representation. The use of QC (QC) codes with a sparse parity-check matrix naturally fits this framework: the sparsity of the parity-check matrix allows for efficient decoding techniques, while the quasi-cyclic structure guarantees compactness in the public key. In such a context, the additional geometrical structure can be added without exposing the secret code: QC-LDPC (QC-LDPC) and QC-MDPC (QC-MDPC) code-based cryptosystems have been extensively studied in recent years [2, 19, 5, 4, 1]
and currently achieve relatively very small public keys. However, differently from the bounded-distance decoders used for algebraic codes (like the mentioned Goppa codes), the iterative decoders used for sparse parity-check codes are not characterized by a deterministic decoding radius and, thus, decoding might fail with some probability, orDFR (DFR).
Such a feature is crucial, since it has been shown how this probabilistic nature of the decoder actually exposes the system to cryptanalysis techniques based on the observation of the decryption phase. State-of-the-art attacks of this kind are commonly called reaction attacks, when based on decoding failures events [15, 12, 13, 22], or side-channel attacks, when based on information such as the duration of the decoding phase (in this case we speak properly of timing attacks) or other quantities [11, 10, 21]. All these previous techniques exploit the QC
structure of the code and aim at recovering some characteristics of the secret key by performing a statistical analysis on a sufficiently large amount of collected data. The rationale is that many quantities that are typical of the decryption procedure depend on a statistical relation between some properties of the secret key and the error vector that is used during encryption. Thus, after observing a sufficiently large number of decryption instances, an adversary can exploit the gathered information to reconstruct the secret key, or an equivalent version of it. The reconstruction phase is commonly very efficient, unless some specific choices in the system design are made[24, 25] which, however, may have some significant drawbacks in terms of public key size. All the aforementioned attack techniques are instead prevented if the DFR has negligible values  and the algorithm is implemented with constant time. Nevertheless, at their current state, these solutions are far from being practical and efficient, and the use of ephemeral keys (which means that each key-pair is refreshed after just one decryption) is necessary to make these systems secure [1, 5].
In this paper we study reaction and timing attacks, and we show that information leakage in the decoding phase can actually be related to the number of overlapping ones between columns of the secret parity-check matrix. Furthermore, we show that all attacks of this kind can be analyzed as a unique procedure, which can be applied to recover information about the secret key, regardless of the code structure. Such an algorithm, when applied on a QC code, permits to recover an amount of information greater than that retrievable through previously published attacks. Moreover, we provide an approximate model that allows predicting the behaviour of the decoder in the first iteration with good accuracy. This model justifies the phenomenon that is at the basis of all the aforementioned attacks and can be even used to conjecture new attacks. Our results are confirmed by numerical simulations and enforce the employment of constant time decoders, with constant power consumption and negligible DFR, in order to allow for the use of long-lasting keys in these systems.
The paper is organized as follows: Section II describes the notation used throughout the manuscript and provides some basic notions about cryptosystems based on sparse parity-check codes. In Section III we summarize state-of-the-art reaction and timing attacks, and present a general algorithm that can be used to attack any sparse parity-check code. An approximate model for the analysis of the first iteration of the BF decoder is presented in Section IV. In Section V we describe some additional sources of information leakage, that can be used by an opponent to mount a complete cryptanalysis of the system. Finally, in Section VI we draw some conclusive remarks.
We represent matrices and vectors through bold capital and small letters, respectively. Given a matrix , we denote as its -th column and as its element in position . Given a vector , its -th entry is referred to as ; its support is denoted as and is defined as the set containing the positions of its non-null entries. The vector corresponds to the all-zero -tuple; the function returning the Hamming weight of its input is denoted as .
Ii-a LDPC and MDPC code-based cryptosystems
The schemes we consider are built upon a code described by a sparse parity-check matrix , where is the code blocklength. We here focus on the case of regular matrices, in which all the rows and all the columns have constant weights respectively equal to and . The code is then -regular, and is commonly called LDPC (LDPC) code if , or MDPC (MDPC) code, if . Regardless of such a distinction, these two families of codes actually have similar properties: they can be decoded with the same decoding algorithms and are thus exposed in the same way to the attacks we consider. So, from now on we will not distinguish between these two families, and just refer to -regular codes.
In the McEliece framework, the public key is a generator matrix for ; a ciphertext is obtained as
where is the plaintext and is a randomly generated -tuple with weight . Decryption starts with the computation of the syndrome , where denotes transposition. Then, an efficient syndrome decoding algorithm is applied on , in order to recover .
In the Niederreiter formulation, the public key is a parity-check matrix for , where is a dense non singular matrix. The plaintext is converted into an -tuple with weight by means of an invertible mapping
where is the space of all possible plaintexts . The ciphertext is then computed as
Decryption starts with the computation of ; then, an efficient syndrome decoding algorithm is applied on , in order to recover , from which the plaintext is reconstructed by inverting (2).
Regardless of the particular system formulation we are considering (McEliece or Niederreiter), the decryption phase relies on a syndrome decoding algorithm, applied on the syndrome of a weight- error vector. Since, in the attacks we consider, information is leaked during the decoding phase, we will not distinguish between the McEliece and the Niederreiter formulation in the following.
The decoding algorithm must show a good trade-off between complexity and DFR; for this reason, a common approach is that of relying on the so-called BF (BF) decoders, whose principle has been introduced by Gallager . The description of a basic BF decoding procedure is given in Algorithm 1.
The decoder goes through a maximum number of iterations , and at each iteration it exploits a likelihood criterion to estimate the error vector . Outputs of the decoder are the estimate of the error vector and a boolean value reporting events of decoding failure. When , we have , and decoding was successful; if , then and we have encountered a decoding failure. So, clearly, the DFR can be expressed as the probability that , noted as . The likelihood criterion is based on a threshold (line 11 of the algorithm), which, in principle, might also vary during the iterations (for instance, some possibilities are discussed in ); all the simulations results we show in this paper are referred to the simple case in which the threshold is kept constant throughout all the decoding procedure. In particular, in the simulations we have run, the values of and have been chosen empirically. Our analysis is general and can be easily extended to other decoders than those considered here. Indeed, many different decoders have been analyzed in the literature (for instance, see  and ), and, as for the outcome of reaction and timing attacks, there is no meaningful difference between them. This strongly hints that such attacks are possible because of the probabilistic nature of the decoder, and are only slightly affected by the particular choice of the decoder and its settings. However, the analysis we provide in Section IV, which describes the decoder behaviour in the first iteration, takes into account the effect of the threshold value.
We point out that Algorithm 1 is commonly called an out-of-place decoder, as the syndrome is updated after the set is computed. A different procedure is the one of in-place decoders, in which the syndrome is updated every time a bit is estimated as error affected (i.e., after the instruction in line 11). In this paper we only focus on out-of-place decoders. The reason is that the attacks we consider seem to be emphasized when in-place decoders are used[10, 21]. However, even if a careful analysis is needed, it is very likely that our results can be extended also to in-place decoders.
Iii A general framework for reaction and timing attacks
In this section we describe a family of attacks based on statistical analyses, namely statistical attacks. This family includes reaction attacks, in which data is collected through the observation of Bob’s reactions, and side-channel attacks. A statistical attack of the types here considered can be described as follows.
Let us consider a public-key cryptosystem with private and public keys and , respectively, and security parameter (i.e., the best attack on the system has complexity ). We denote as Decrypt a decryption algorithm that, given a ciphertext and as inputs, returns either the plaintext or a decryption failure. We define as an oracle that, queried with a ciphertext , runs Decrypt and returns some metrics that describe the execution of the decryption algorithm. More details about the oracle’s replies are provided next. An adversary, which is given , queries the oracle with ciphertexts ; we denote as the oracle’s reply to the -th query . The adversary then runs an algorithm that takes as inputs and the pairs of oracle queries and replies, and returns . The algorithm models the procedure that performs a statistical analysis of the gathered data and reconstructs the secret key, or an equivalent version of it. The time complexity of this whole procedure can be approximated as
where corresponds to the average number of operations performed for each query and is the complexity of executing the algorithm . The adversary is then challenged with a randomly generated ciphertext , corresponding to a plaintext . We consider the attack successful if and the probability of Decrypt being equal to is not negligible (i.e., larger than ).
We point out that this formulation is general, since it does not distinguish between the McEliece and Niederreiter cases. In the same way the private and public keys might be generic. For example, this model describes also reaction attacks against LEDA cryptosystems , in which the secret key consists of and an additional sparse matrix .
The above model allows for taking into account many kinds of attacks, depending on the oracle’s reply. For instance, when considering attacks based on decryption failures, the oracle’s reply is a boolean value which is true in case of a failure and false otherwise. When considering timing attacks based on the number of iterations, then the oracle’s reply corresponds to the number of iterations run by the decoding algorithm.
In this paper we focus on systems with security against a CCA (CCA), that is, the case in which a proper conversion (like the one of ) is applied to the McEliece/Niederreiter cryptosystem, in order to achieve CCA security. In our attack model, this corresponds to assuming that the oracle queries are all randomly generated, i.e., the error vectors used during encryption can be seen as randomly picked elements from the ensemble of all -uples with weight . Opposed to the CCA case, in the CPA (CPA) case the opponent is free to choose the error vectors used during encryption: from the adversary standpoint, the CPA assumption is clearly more optimistic than that of CCA, and leads to improvements in the attack [15, 21]. Obviously, all results we discuss in this paper can be extended to the CPA case.
One final remark is about the schemes we consider: as shown in [24, 25], the complexity of algorithm can be increased with proper choices in the structure of the secret key. Basically, in these cases the adversary can gather information about the secret key, but cannot efficiently use this information to reconstruct the secret key, or to obtain an equivalent version of it. In this paper we do not consider such approaches and we assume that the algorithm always runs in a feasible time, as it occurs in [15, 12].
Iii-a State-of-the-art statistical attacks
Modern statistical attacks [15, 10, 12, 13, 21] are specific to the sole case of QC codes having the structure originally proposed in , which are defined through a secret parity-check matrix in the form
where each is a circulant matrix of weight and is a small integer. Thus, the corresponding code is a -regular code.
All existing statistical attacks are focused on guessing the existence (or absence) of some cyclic distances between symbols in .
In particular, an adversary aims at recovering the following quantities, which were introduced in .
Distance spectrum: Given a vector , with support and length , its distance spectrum is defined as
We say that a distance has multiplicity if there are distinct pairs in which produce the same distance .
Basically, the distance spectrum is the set of all distances with multiplicity larger than . It can be easily shown that all the rows of a circulant matrix are characterized by the same distance spectrum; thus, given a circulant matrix , we denote the distance spectrum of any of its rows (say, the first one) as .
Statistical attacks proposed in the literature aim at estimating the distance spectrum of the circulant blocks in the secret , and are based on the observation that some quantities that are typical of the decryption procedure depend on the number of common distances between the error vector and the rows of the parity-check matrix. In particular, the generic procedure of a statistical attack on a cryptosystem whose secret key is in the form (5) is described in Algorithm 2; we have called the algorithm Ex-GJS in order to emphasize the fact that it is an extended version of the original GJS attack , which was only focused on a single circulant block in . Our algorithm, which is inspired by that of , is a generalization of the procedure in , in which all the circulant blocks in are taken into account. We present this algorithm in order to show the maximum amount of information that state-of-the-art statistical attacks allow to recover.
The error vector used for the -th query is expressed as , where each has length . The estimates and are then used by the adversary to guess the distance spectra of the blocks in the secret key. Indeed, let us define as the ensemble of all error vectors having length , weight and such that they exhibit a distance in the distance spectrum of the -th length- block. Then, depending on the meaning of the oracle’s reply, the ratios correspond to the estimate of the average value of some quantity, when the error vector belongs to . For instance, when considering attacks based on decryption failures, the oracle’s reply is either or , depending on whether the decryption was successful or failed. In such a case, the ratio corresponds to the empirical measurement of the DFR, conditioned to the event that the error vector belongs to . In general, statistical attacks are successful because many quantities that are typical of the decoding procedure depend on the multiplicity of the distances in . In the next section we generalize this procedure, by considering different ensembles for the error vectors; then, in Section IV, we provide a theoretical explanation for such a phenomenon.
Iii-B Exploiting decryption failures on generic codes
In this section we generalize the Ex-GJS procedure, and describe an algorithm which can be used to recover information about any regular code. In particular, our analysis shows that events of decoding failure i) do not strictly depend on the QC structure of the adopted code, and ii) permit to retrieve a quantity that is more general than distance spectra.
We first show that, for generic regular codes, there is a connection between the syndrome weight and the DFR. This statement is validated by numerical simulations on -regular codes, obtained through Gallager construction , in which . In particular, we have considered two codes with length , redundancy and different pairs , decoded through Algorithm 1; their DFR (i.e., the probability of Algorithm 1 returning ) vs. syndrome weight is shown in Fig. 1. We notice from Fig. 1 that there is a strong dependence between the initial syndrome weight and the DFR and that different pairs can lead to two different trends in the DFR evolution. Section IV is devoted to the explanation of this phenomenon.
Let us now define as the ensemble of all vectors having length , weight and whose support contains elements and . Let be the syndrome of an error vector : we have
The syndrome weight has a probability distribution that depends on the interplay betweenand : basically, when these two columns overlap in a small (large) number of ones, then the average syndrome weight gets larger (lower). Moreover, motivated by the empirical evidence of Fig. 1, one can expect that the DFR experienced over error vectors belonging to different ensembles depends on the number of overlapping ones between columns and . Then, a statistical attack against a generic regular code can be mounted, as described in Algorithm 3, which we denote as General Statistical Attack (GSA). The output of the algorithm is represented by the matrices and , which are used by the adversary to estimate the average value of the oracle’s replies, as a function of the pair . Notice that in Algorithm 3 the oracle’s reply is denoted as and does not need to be better specified. We will indeed show in Section V that the same procedure can be used to exploit other information sources than the success (or failure) of decryption.
We now focus on the case of being or , depending on whether decryption was successful or not. Then, each ratio is the empirical estimate of the probability of encountering a decryption failure, when the error vector contains both and in its support.
One might expect that the ratios are distributed on the basis of the number of overlapping ones between columns and in . We have verified this intuition by means of numerical simulations; the results we have obtained are shown in Fig. 2, for the case of error vectors belonging to ensembles , with . The figure clearly shows that the ratios can be used to guess the number of overlapping ones between any pair of columns in .
These empirical results confirm the conjecture that the DFR corresponding to error vectors in depends on the number of overlapping ones between the columns and . Moreover, these results show that the same idea of , with some generalization, can be applied to whichever kind of code.
We now show that even when QC codes are considered, our algorithm recovers more information than that which can be obtained through the Ex-GJS procedure. For such a purpose, let us consider a parity-check matrix in the form (5), and let be the number of overlapping ones between columns and . Now, because of the QC structure, we have
We now consider two columns that belong to the same circulant block in , i.e. , , where ; then, (8) can be rewritten as
With some simple computations, we finally obtain
which holds for all indices . Similar considerations can be carried out if the two columns do not belong to the same circulant block. So, (10) shows that the whole information about overlapping ones between columns in is actually represented by a subset of all the possible values of . This means that the execution of Algorithm 3 can be sped up by taking the QC structure into account.
In particular, the values of can be used to obtain the distance spectra of the blocks in in a straightforward way. Let us refer to Equation (10), and look at two columns and , with , where . We denote the support of as . The support of can be expressed as
Then, we have ; this means that there are pairs such that
This proves that the procedure described by Algorithm 3 allows obtaining at least the same amount of information recovered through the Ex-GJS algorithm, which is specific to the QC case and guarantees a complete cryptanalysis of the system . In other words, our analysis confirms that Algorithm 3 is applicable and successful in at least all the scenarios in which the attack from  works. Moreover, our procedure allows for recovering a larger amount of information about the secret key, and thus defines a broader perimeter of information retrieval, which encompasses existing and future attacks.
Iv An approximated model for reaction attacks
The main result of this section is summarized in the following proposition, for which we provide theoretical justifications and empirical evidences.
Let be the parity-check matrix of a -regular code, which is decoded through Algorithm 1 with decoding threshold . Let and be two distinct pairs of indexes, and consider error vectors and . Let and be the probabilities that and result in a decoding failure, respectively. Let be the error vector estimate after the first iteration; we define and . Then, if and only if , where denotes the expected value.
Essentially, the above proposition implies that increments (reductions) of the DFR are due to the fact that, depending on the particular matrix , some error patterns tend to produce, on average, a larger (lower) amount of residual errors, after the first decoder iteration. First of all, this statement is actually supported by empirical evidences: we have run numerical simulations on the same codes as those in Fig.s 1 and 2, and have evaluated the number of residual errors after the first iteration. The results are shown in Fig. 3; as we can see, accordingly with Proposition 1, the trend of the DFR and the one of are the same for the analyzed codes.
We now derive a statistical model which approximates how the BF decoder described in Algorithm 1 evolves during the first iteration; through this model we can predict the values of and, thus, also justify the different trends of the DFR observed in Fig.s 1 and 2. We choose two distinct integers and consider the case of an error vector randomly drawn from the ensemble , that is, we take columns and of as a reference, assuming that . We also suppose that the columns and of overlap in positions, and aim at expressing the average value of as a function of the code parameters, the decoding threshold and the value of .
Let us first partition the sets of parity-check equations and variable nodes as follows.
Given an parity-check matrix , the set can be partitioned into three subsets, defined as follows:
: the set of parity-check equations that involve both bits and , that is
: the set of parity-check equations that involve either bit or bit , that is
: the set of parity-check equations that do not involve bits and , that is
Let , with , be the set defined as
The cardinality of each set depends on the particular matrix . However, when considering a regular code, we can derive some general properties, as stated in the following lemma.
Given a -regular code, the following bounds on the size of the sets , with , as defined in Definition 2, hold
The first bound in (15) follows from the fact that . Any parity-check equation in involves bits, including and . So, all the parity-check equations in involve, at most, bits other than and . The second bound in (15) can be derived with similar arguments, considering that , and either or participates to the parity-check equations in . The third bound is simply obtained by considering the remaining parity-check equations, when the other two bounds in (15) hold with equality sign. ∎
From now on, in order to make our analysis as general as possible, i.e., independent of the particular , we make the following assumption.
Let be the parity-check matrix of a -regular code. We assume that each variable node other than and either participates in just one parity-check equation from or and equations in , or participates in parity-check equations in . This means that
The previous assumption is justified by the fact that, due to the sparsity of , we have ; it then follows that and . Clearly, this assumption becomes more realistic as the matrix gets sparser.
We additionally define ; clearly, we have . The probability of having a specific configuration of , and is equal to
In analogous way, we define as the number of nodes that are in and are simultaneously set in . In other words, corresponds to the number of errors that, after the first iteration, affect bits in .
The definitions of the sets are useful to analyze how the value of influences the decoder choices. We focus on a generic -th bit, with , and consider the value of , as defined in Algorithm 1. Because of Assumption 1, we have that
if (resp. ), the -th bit participates in one parity-check equation from (resp. ) and parity-check equations in ;
if , the -th bit participates in parity-check equations in ;
if , then it participates in parity-check equations from and parity-check equations from .
Let , with , be the probability that a parity-check equation involving the -th bit (with ) and contained in is unsatisfied, in the case of , with ; the value of such a probability is expressed by Lemma 2.
Let us consider a -regular code with blocklength and an error vector with weight . Then, the probabilities , with and , can be calculated as
Let us first consider and . Let us also consider the -th bit, different from and . Any parity check equation in overlaps with the error vector in two positions, as it involves both bits and ; since we are looking at an error-free bit, then the parity-check equation will be unsatisfied only if the remaining errors intercept an odd number of ones, among the remaining ones. Simple combinatorial arguments lead to the first expression of (18). All the other expressions can be derived with similar arguments. ∎
We also define , with , as the probability that a parity-check equation involving a bit , and contained in , is unsatisfied; the value of such a probability is derived in Lemma 3.
Let us consider a -regular code with blocklength and an error vector with weight . Then, the probabilities , with , can be calculated as
The proof can be carried on with the same arguments of the proof of Lemma 2. ∎
We now consider the following assumption.
Let be the parity-check matrix of a -regular code. We assume that the parity-check equations in which the -th bit is involved are statistically independent; thus, , defined as in Algorithm 1 , can be described in the first decoding iteration as the sum of independent Bernoulli random variables, each one having its own probability of being set, which corresponds either to
, can be described in the first decoding iteration as the sum of independent Bernoulli random variables, each one having its own probability of being set, which corresponds either toor , where and .
We now define as the probability that the decoder flips the -th bit, in the case that and , when . In analogous way, denotes the probability that, when , the decoder flips the -th bit. The above probabilities are computed in Lemmas 4, 5, respectively.
When , the -th bit is involved in one parity-check equation in and equations in . The probability that the decoder in the first iteration flips the -th bit can be computed as
In particular, we have
Similarly, if , then it is involved in one parity-check equation in and equations in ; thus, we have
Finally, if , then it is involved in parity-check equations in ; using a similar reasoning as in the previous cases, we can write
This proves the lemma. ∎
In order to estimate the average number of bits flipped after one iteration, we have to consider all the possible configurations of the error vector . As for the bits which are not in