NP-Complete Problems for Lee Metric Codes

02/27/2020 ∙ by Violetta Weger, et al. ∙ 0

We consider codes over finite rings endowed with the Lee metric and prove the NP-completeness of the associated syndrome decoding problem (SDP). Then, we study the best known algorithms for solving the SDP, which are information set decoding (ISD) algorithms, and generalize them to the Lee metric case. Finally we assess their complexity for a wide range of parameters.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

To compare the hardness of mathematical problems, in complexity theory one introduces the complexity classes P, NP, NP-hard and NP-complete. A problem belongs to P if it can be solved by a deterministic Turing machine in polynomial time, whereas a problem belongs to NP if it can be solved by a non-deterministic Turing machine or, equivalently, if one can check whether an instance is a solution to the problem in polynomial time. Thus, clearly, P lies inside NP. A problem is said to be NP-hard if any problem in NP can be reduced to this problem in polynomial time; thus, in some sense, they mark the hardest problems in mathematics. To show that a new problem is NP-hard it suffices to find a polynomial time reduction from a known NP-hard problem to the new problem. In addition, a problem is said to be NP-complete if it is NP-hard and in NP.

NP-complete problems play a fundamental role in cryptography, as systems based on them are promising candidates for post-quantum cryptography. In particular, NP-complete problems in coding theory are the basis of code-based cryptography. Historically, code-based cryptography was initiated by the seminal works of McEliece in 1978 [2] and Niederreiter in 1986 [3]

. This area is deemed, at the moment, as one of the most consolidated and assessed ones in public-key cryptography

[4]. Code-based schemes are usually built upon the SDP, which is equivalent to the problem of decoding a random linear code. In [5] and [6], the SDP has been proven to be NP-complete for codes defined over some finite field and endowed with the Hamming metric. An adversary could still apply the best non-structural algorithm to attack the cryptosystem, which in the case of the SDP is called ISD algorithm. These algorithms are hence important to determine which size of the public key is needed to achieve a given security level. The first ISD algorithm was proposed by Prange in 1962 [7].

Besides these classical results, there has recently been a growing interest in changing the underlying metric or changing the underlying algebraic structure (like finite rings). This is the case of the rank version of SDP, which, analogously to the Hamming metric case, has been proven to be NP-complete [8]. Code-based cryptosystems using the rank metric provide surprisingly low key sizes (see for example [9]). This change of the classical Hamming metric to other metrics seems to be promising. Hence we want to study the impact of the Lee metric in code-based cryptography. Some cryptosystems have already been proposed over finite rings (see [10, 11, 12, 13]); in particular, Horlemann-Trautmann and Weger in [13] have considered the use of codes defined over , endowed with the Lee metric.

In this paper we prove the NP-completeness of the SDP for codes over finite rings equipped with the Lee metric by showing that the shortest path decision problem, which has been proven to be NP-complete in [14], can be reduced (in polynomial time) to our problem.

Moreover, we extend the work in [13] and propose original algorithms that are inspired by Stern’s [15], Lee-Brickell’s [16] and Prange’s [7] ISD algorithms and that solve the Lee metric variant of the SDP for any Galois ring. A detailed complexity analysis of the proposed algorithms is considered and a comparison with the Hamming case is provided.

The paper is organized as follows. In Section 2 we introduce the notation used throughout the paper, give some preliminary notions on the Lee metric and we formulate some general properties of the Lee metric. In Section 3 we prove the NP-completeness of the Lee metric version of the SDP. In Section 4 we extend several information set decoding algorithms to , considering the Lee metric and carry out a complexity analysis of these algorithms. We provide a comparison of the ISD algorithms in the Lee metric and in the Hamming metric in Section 5. In Section 6 we draw some concluding remarks and formulate some open problems.

2 Notation and preliminaries

Let be a prime power and be a positive integer. We denote with the ring of integers modulo , and with the finite field with elements, as usual. Given an integer , we denote its absolute value as . We use capital letters to denote sets of integers; for an ordered set , we refer to its -th element as . The cardinality of a set is denoted as

. We use bold lower case (respectively upper case) letters to denote vectors (respectively matrices). The identity matrix with size

is denoted as . Given a vector and a set , we denote by the vector consisting of the entries of indexed by . In the same way, for a matrix , denotes the matrix obtained by taking the columns of that are indexed by . This, of course, can be easily generalized to . The support of a vector is defined as . For , we denote by the vectors in having support in .

2.1 Coding Theoretic Preliminaries

In this subsection we recall the definitions and main properties of linear codes over finite fields endowed with the Hamming metric, as well as linear codes over finite rings endowed with the Lee metric.

Definition 1

An linear code over is a linear subspace of of dimension .

The size of the code, denoted as , is the number of its codewords. Notice that, for an linear code over , we have . The generator matrix of is a matrix whose row space is . Moreover, is the null space of an parity-check matrix, where . In classical coding theory one considers codes endowed with the Hamming metric, formally defined as follows.

Definition 2

The Hamming weight of is equal to the size of its support, i.e.,

The Hamming distance of , is defined as the Hamming weight of their difference, i.e.,

Definition 3

Let be an linear code, then we call its minimum distance the minimum Hamming weight of a non-zero codeword, i.e.,

We will sometimes refer to as an code. For a linear code over and we denote by

We will use the following definition of information set, which fits perfectly in the context of ring-linear codes.

Definition 4

For a code over of length and dimension , we call a set of size an information set if .

These definitions can be extended to finite rings.

Definition 5

Let and be positive integers and let be a finite ring. is called an -linear code of length and type if is a submodule of , with .

We will restrict to the most preferred case of Galois rings , for some prime and a positive integer .

Definition 6

We say that is a ring linear code of length if is an additive subgroup of .

can be endowed with several metrics, e.g., the Hamming metric, the Lee metric, the homogeneous metric, the Euclidean metric and so on; for an overview see [17].

Definition 7

For we define the Lee value to be

Then, for , we define the Lee weight to be the sum of the Lee values of its coordinates:

As for the Hamming case, we then get a distance.

Definition 8

For , the Lee distance is defined as

Definition 9

We say that is a Lee metric code of length if is an additive subgroup of of type endowed with the Lee metric.

We can define the minimum distance and the concept of information set for Lee metric codes.

Definition 10

Let be a Lee metric code over of length ; then, we call its minimum Lee distance the minimum Lee weight of a non-zero codeword:

Definition 11

For a Lee metric code over of length and type

we call a set of size a (ring-linear) information set if .

This definition makes more sense when we look at the generator matrix and the parity check matrix of ring-linear codes.

Definition 12

Let be a linear code over of length and type . Then is permutation equivalent to a code having the following generator matrix of size , where .

Similarly, is permutation equivalent to a code that has the following parity check matrix of size

(1)

2.2 Properties of the Lee metric

In this subsection we devise some general properties of the Lee metric that will be useful for the rest of the paper. In the following lemma, resulting from a Plotkin-type bound in the Lee metric (see [18, Problem 10.15]), we compute the average Lee weight of an element in .

Lemma 1

Let chosen randomly; then the expected Lee weight of is given by

Proof.

If is even, then summing up all weights gives

If is odd, then we get

To get the average we divide both cases by and get the desired formula. ∎∎

Next, we want to count the vectors in having Lee weight i.e.,

We will consider two cases: either is even, or is odd. Indeed, in the former case there exists only one element in having Lee value , whereas in the latter case there exist two such elements. We will first count the vectors in having Lee weight and a fixed size of support . For this, we introduce

Proposition 1

Let , let and , such that . Then

  • if is even:

  • if is odd:

Proof.

A vector having a support of size has at least Lee weight and can have at most Lee weight , which implies that there are no vectors such that .

In the case where is even, there exists only one element in having Lee value , thus if , we can only choose this element in the non-zero positions, which can be done in different ways.

Now we check whether or . In the first case the vector cannot have an entry of Lee value , thus we can choose non-zero positions, compose the wanted Lee weight into parts and for each choice of a part , there exists also the choice , hence many. In the other case, firstly, an entry of the vector could have Lee value , so we cannot simply multiply by anymore and, secondly, the compositions of into parts also consists of parts being greater than which, however, is the largest possible Lee value. For this reason, we have to define recursively. We start with all possible orderings of the desired Lee weight into parts and then take away the orderings that we cannot have, which are starting from a part being and proceed until the largest part is . Thus, we have to take away , repeating this times: the factor 2 is justified by the fact that we have assumed that there are always two choices for an element having Lee value , and times for the position of the entry having Lee value . The case has to be taken away only once, since, in the case where is even, we only have one element having Lee value .

The case in which is odd is simpler, since an element having Lee value does not need to be treated as a special case. ∎∎

Finally, to get the amount of vectors in having Lee weight , we only have to sum all from to .

Corollary 1

Let let and let . Then

(2)

An upper bound, also observed in [18, Proposition 10.10], and a lower bound on (2) can easily be derived as reported next.

Corollary 2

Let and . Then, is at most

(3)

and at least

(4)
Proof.

The proof of the upper bound is given in [18, Proposition 10.10]. For the lower bound, if we count the vectors in with entries in . If , we count the vectors in . ∎∎

Simple computations show that the addends of the sum in (3) are monotonically increasing if and only if, for ,

(5)

Under these assumptions, the following relation holds

3 An NP-complete coding-theory problem for the Lee metric

In this section we prove NP-completeness of the Decisional Lee - Syndrome Decoding Problem (DL-SDP) and the Computational Lee - Syndrome Decoding Problem (CL-SDP), which are formalized in the following.

Problem 1

Decisional Lee - Syndrome Decoding Problem (DL-SDP)
Let and be positive integers. Given , and , does there exist a vector such that and ?

Problem 2

Computational Lee - Syndrome Decoding Problem (CL-SDP)
Let and be positive integers. Given , and , find a vector , such that and .

Notice that we consider finite rings whose size is not necessarily a prime power, hence in order to avoid confusion with the variable , where is a prime number and a positive integer, we use a to denote the size of the considered ring.

Clearly, checking whether a vector is in fact a solution of the CL-SDP can be done in polynomial time. Hence for the NP-completeness, it is enough to show that CL-SDP is NP-hard.

Proving that there does not exist a polynomial time algorithm that solves the L-SDP for all choices of is straightforward, since for and the Lee metric on , respectively on , is the same as the Hamming metric, where it is proven that such a solver does not exist. The more interesting question is if there exists a polynomial time algorithm that solves the L-SDP for an arbitrary but fixed .

3.1 The shortest path problem in circulant graphs and its connection with the Lee metric

In this section we introduce the Shortest Path Problem (SPP), proven NP-complete for the class of cyclic graphs [14, Theorem 5], upon which we mainly rely for our reduction of DL-SDP.

Let be positive integers. Let be of size and be a graph with nodes and edges , such that

Observe that the considered graph is circulant, i.e., its adjacency matrix is circulant. A path from to of length is a vector , such that and for all . In our case a path from to is associated to a vector , such that there are steps of the form i) if , or ii) if . In other words, we can write

(6)

Then, the length of the path corresponds to the -norm of the associated vector , that is . In particular, (6) depends only on the difference , rather than on the particular values and . Then, for , we define the set of all possible paths connecting two nodes having label difference , that is

(7)

We may then be interested in finding the shortest length of such paths, that is

(8)

The (decisional) shortest path problem on a circulant graph is then formalized as follows.

Problem 3

Circulant - Shortest Path Problem (C-SPP)
Given the positive integers , a set , and a bound , is ?

The above problem is NP-complete [14, Theorem 5]. We remark that the hardness of the problem comes from the cyclicity of the considered graph. Indeed, the shortest path problem for undirected unweighted graph is known to be a non NP-complete problem in general terms, i.e., if the graph is not necessarily circulant. Furthermore, an efficient solver is known, running with time complexity that grows with the graph size that is, . A circulant graph, instead, is unambiguously described by the set , that can be represented with bits. A graph representation that grows as the logarithm of the number of nodes (i.e., that allows a logarithmic reduction in the graph representation) is what differentiates the variant of the problem on circulant graphs from its general formulation on standard graphs.

In the following lemma, we provide an important analogy between the Lee metric and the -norm.

Lemma 2

Let be positive integers, such that , and , then

where, with a slight abuse of notation,111This abuse of notation will be kept throughout the paper. we consider

Proof.

Since the -norm (resp. the Lee weight) of a vector is defined as the sum of the absolute value (resp. the Lee value) of its entries, it is enough to prove the claim for . Let and , then

If , then for and for all it holds that Therefore there exists at least one with Since we are interested in the minimal absolute value of the elements in it is enough to consider elements in Notice that on the set the -norm and the Lee value of an element coincide. In fact: if is such that is minimal, then

∎∎

As a consequence of Lemma 2, Problem 3 can also be stated as follows.

Problem 4

Lowest Lee Subset Sum Problem (LLSSP)
Given the positive integers , a set , and a bound , decide whether the following relation holds

Then, since Problem 3 is NP-complete and Lemma 2 holds, Problem 4 is NP-complete as well.

Finally, we introduce a general version of Problem 4, which is described as follows. We consider the collection of sets

and a vector , and define

We then define the following problem, strongly related to LLSSP.

Problem 5

Multiple Lowest Lee Subset Sum Problem (MLLSSP)
Let and be positive integers, let be a collection of length- sets over and . Given a bound , decide whether the following relation holds

Theorem 3.1

The MLLSSP is NP-hard.

Proof.

We reduce MLLSSP to the NP-hard problem LLSDP.

Given an instance of LLSSDP with input , and , we can construct a MLLSSDP instance with an arbitrary value of , and such that , and . Thus, solving MLLSSDP in polynomial time allows an efficient solution of LLSSDP. ∎∎

Remark 1

Observe that does not need to consist of distinct elements, since we can clearly transform in polynomial time that instance to one with a set , formed by the distinct elements of . It is very easy to see that, as , we have .

3.2 NP-completeness of DL-SDP and CL-SDP

In this section we prove NP-completeness of the Lee metric syndrome decoding problems DL-SDP and CL-SDP by using the results of the previous subsection. We first provide some additional notation.

Let and be positive integers, and , we define

(9)

Furthermore, let

(10)

Then, we introduce the following problem.

Problem 6

Decisional Minimum Lee Syndrome Decoding Problem (DML-SDP)
Let and be positive integers; given , and , is ?

Theorem 3.2

The DML-SDP, the DL-SDP and the CL-SDP are NP-hard.

Proof.

We first reduce DML-SDP to the NP-hard problem MLLSSP.

Let be a given instance of MLLSSP. Define , and as and . It is obvious that a solution of the DML-SDP on provides a solution for the initial instance of MLLSSP. Since MLLSSP is NP-hard, DML-SDP is NP-hard as well.

As a next step, we reduce DL-SDP to the NP-hard problem DML-SDP.

Starting from an DML-SDP instance , we can consider an instance of DL-SDP with the same input. A yes (resp. no) answer to DL-SDP implies a yes (resp. no) answer to the DML-SDP. Thus, the NP-hardness of DML-SDP implies the NP-hardness of DL-SDP.

And clearly, if the decisional problem DL-SDP is NP-hard, also the computational problem CL-SDP is NP-hard. ∎∎

4 Information set decoding over : adaptation to the Lee metric

The first ISD algorithm was proposed by Prange in 1962 [7]

and can be summarized as follows. As a first step, one chooses an information set and, then, the parity-check matrix is brought into a standard form through Gaussian elimination. Assuming that the errors are outside of the information set, we perform the same row operations on the syndrome and check if the weight of the transformed syndrome is now equal to the given weight (usually the error correction capacity of the code). If this is the case the transformed syndrome is indeed the error vector. Notice that, in this formulation, we only consider a particular pattern for the error vector; this restriction plays an important role in all ISD algorithms. The weight distribution of the error vector assumed in Prange’s algorithm is indeed not very likely and, even though the cost of one iteration is low, the entire cost of the algorithm, which is, in general, given by the product of the cost of one iteration and the inverted success probability of one iteration, is huge, due to the relatively large amount of iterations needed.

Observe that ISD algorithms are not brute-force algorithms: in brute-force algorithms one has to fix an information set and go through all possible error patterns; on the other hand, in ISD algorithms we fix an error pattern and go through all information sets. As a result, ISD algorithms are not deterministic. There have been many improvements upon the original algorithm by Prange, focusing on a more likely error pattern. These approaches increase the cost of one iteration but, on average, require a smaller number of iterations (see [16, 19, 15, 20, 21, 22, 23, 24, 25, 26, 27]). For a complete overview for the binary case see [28]. With new cryptographic schemes proposed over general finite fields, most of these algorithms have been generalized (see [29, 30, 31, 32, 33]).

All ISD algorithms are characterized by the same approach of first randomly choosing a set of positions in the code and then applying some operations that, if the chosen set has a relatively small intersection with the error vector, allow to retrieve the error vector itself. For each ISD variant, the average computational cost is estimated by multiplying the complexity of each iteration by the expected number of performed iterations; the latter quantity corresponds to the reciprocal of the probability that a random choice of the set leads to a successful iteration. Then, for all ISD algorithms, we have a computational cost that is estimated as

, where is the expected number of (binary) operations that are performed in each iteration and is the probability that the choice of the set of positions is indeed successful. We now derive some formulas for the complexity of Prange’s, Stern’s and Lee-Brickell’s ISD algorithms, when adapted to the Lee metric.

Notice that, in Definition 12, we observed that for Lee linear codes over of length and type we have a different systematic form to the one in the Hamming metric over finite fields and that a Lee linear code over has an information set of size .

4.1 Prange’s ISD adaptation to the Lee metric

The idea of Prange’s algorithm is to first find an information set that does not overlap with the support of the searched error vector ; when such a set is found, permuting

and computing its row echelon form is enough to reveal the error vector. In the Lee analogue of this algorithm we use the same idea. Our proposed adaptation of Prange’s ISD is reported in Algorithm

1. We first find an information set , and then bring the matrix

into a systematic form, by multiplying it by an invertible matrix

. For the sake of clarity, we assume that the information set is , such that

where and . Since we assume that no errors occur in the information set, we have that , with . Thus, if we also partition the new syndrome into parts of the same sizes as the (row-)parts of , and we multiply by the unknown , we get the following situation

It follows that , hence we are only left to check the weight of .

Input: , , .

Output: with and .

1:Choose an information set of size and define .
2:Compute such that
where and .
3:Compute with .
4:if :  then
5:     Return such that and .
6:Start over with Step 1 and a new selection of .
Algorithm 1 Prange’s Algorithm over in the Lee metric

4.2 Complexity analysis: Prange’s ISD in the Lee metric

In this section we provide a complexity estimate of our adaptation of Prange’s ISD to the Lee metric. First of all, we assume that adding two elements in costs binary operations and multiplying two elements costs binary operations [34, 35]. An iteration of Prange’s ISD only consists in bringing into systematic form and to apply the same row operations on the syndrome; thus, the cost can be assumed equal to that of computing , from which we obtain a broad estimate as

(11)

The success probability is given by having chosen the correct weight distribution of ; in this case, we require that does not overlap with the chosen information set, hence

(12)

The estimated overall computational cost of Prange’s ISD in the Lee metric is

(13)

We now analytically compare the complexity of Prange’s ISD in the Lee and Hamming metric, exploiting the properties derived in Section 2. Under the assumption that , with , from Corollary 2 we derive the following chain of inequalities

(14)

where corresponds to the success probability of an iteration of Prange’s ISD over the Hamming metric, seeking for an error vector of Hamming weight , in a code with length and dimension . A crude approximation, which however is particularly tight when , shows that  [36]. Then, we have

Since does not depend on the considered metric, this simple analysis shows that the complexity of Prange’s algorithm over the Lee metric and over the Hamming metric differ at most by a polynomial factor. For all known ISD variants, the complexity grows asymptotically as , where is a constant that depends on the code rate [37]; different ISD variants essentially differ only in the value of . Our analysis shows that, for the Lee metric, Prange’s algorithm leads to an analogous expression. Thus, our results indicate confirm in the Lee metric are as hard as their corresponding Hamming counterparts, except for a relatively small polynomial factor.

4.3 Stern’s ISD adaptation to the Lee metric

As a further contribution of this paper, we improve upon the basic algorithm by Prange by adapting the idea of Stern’s ISD to the Lee metric. In this algorithm, we relax the requirements on the weight distribution, by allowing an information set with small Lee weight and the existence of a (small) set of size , called zero-window, within the redundant set, where no errors occur. Our proposed adaptation of Stern’s algorithm to the Lee metric is reported in Algorithm 2.

For the sake of readability, in the following explanation we consider an information set and a zero-window given by , such that , with and . The systematic form of is obtained as

where and . Using the same row-partitions for the syndrome , we get

which implies the following three conditions

(15)
(16)
(17)

We want to choose such that it has support in the information set and Lee weight , whereas should have a support disjoint from that of , and the remaining Lee weight . More precisely, we test , where and have disjoint supports of respective maximal sizes and and equal weight . In order for (15) and (17) to be satisfied we construct two sets and , where contains the equations regarding and contains the equations regarding . For all choices of and , we check whether the entries of and coincide, if they do we call this a collision. For each collision, we construct from (16) and check if has the missing Lee weight : if this occurs, we have found the error vector .

All these considerations are incorporated in Algorithm 2, where we allow any choice of and .

Input: , , , such that , , and .

Output: with and .

1:Choose an information set of size .
2:Choose a set of size and define .
3:Choose a uniform random partition of into disjoint sets and of size and , respectively.
4:Find an invertible matrix such that
where and .
5:Compute with and .
6:Compute the set consisting of all triples