# Improved Singleton bound on insertion-deletion codes and optimal constructions

Insertion-deletion codes (insdel codes for short) play an important role in synchronization error correction. The higher the minimum insdel distance, the more insdel errors the code can correct. Haeupler and Shahrasbi established the Singleton bound for insdel codes: the minimum insdel distance of any [n,k] linear code over 𝔽_q satisfies d≤2n-2k+2. There have been some constructions of insdel codes through Reed-Solomon codes with high capabilities, but none has come close to this bound. Recently, Do Duc et al. showed that the minimum insdel distance of any [n,k] Reed-Solomon code is no more than 2n-2k if q is large enough compared to the code length n; optimal codes that meet the new bound were also constructed explicitly. The contribution of this paper is twofold. We first show that the minimum insdel distance of any [n,k] linear code over 𝔽_q satisfies d≤2n-2k if n>k>1. This result improves and generalizes the previously known results in the literature. We then give a sufficient condition under which the minimum insdel distance of a two-dimensional Reed-Solomon code of length n over 𝔽_q is exactly equal to 2n-4. As a consequence, we show that the sufficient condition is not hard to achieve; we explicitly construct an infinite family of optimal two-dimensional Reed-Somolom codes meeting the bound.

## Authors

• 1 publication
• 1 publication
08/19/2021

### A construction of maximally recoverable codes

We construct a family of linear maximally recoverable codes with localit...
11/28/2021

### Bounds and Constructions for Insertion and Deletion Codes

The present paper mainly studies limits and constructions of insertion a...
09/08/2019

### Explicit Constructions of Two-Dimensional Reed-Solomon Codes in High Insertion and Deletion Noise Regime

Insertion and deletion (insdel for short) errors are synchronization err...
05/31/2018

### Optimal cyclic (r,δ) locally repairable codes with unbounded length

Locally repairable codes with locality r (r-LRCs for short) were introdu...
11/10/2020

### Optimal Locally Repairable Codes: An Improved Bound and Constructions

We study the Singleton-type bound that provides an upper limit on the mi...
01/29/2018

### Almost Optimal Scaling of Reed-Muller Codes on BEC and BSC Channels

Consider a binary linear code of length N, minimum distance d_min, trans...
03/03/2021

### A Study of the Separating Property in Reed-Solomon Codes by Bounding the Minimum Distance

According to their strength, the tracing properties of a code can be cat...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Insertion-deletion codes (insdel codes for short) are designed to protect against synchronization errors [5], [6] in communication systems caused by the loss of positional information of the message. Insdel codes have found applications in many interesting fields such as DNA storage, DNA analysis [7], [17], language processing [2], [12] and race-track memory error correction [3].

The insdel distance between two vectors is defined as the smallest number of insertions and deletions needed to transform one codeword into another. The minimum insdel distance of a code is defined in the natural way: the minimum insdel distance among all its distinct codewords. Like the classical linear codes with respect to the Hamming distance, the minimum insdel distance of an insdel code is an important parameter, which shows its insdel error-correcting capability. The higher the minimum insdel distance, the more insdel errors the code can correct. The study of insdel codes can be date back to the 1960s

[15]. Insdel codes with small code lengths were constructed explicitly by using various mathematical methods, e.g., see [1], [10], [18]. Sloane [13] constructed a family of codes capable of correcting single deletion.

For a fixed code length, it would certainly be nice if both the code size (which is a measure of the efficiency of the code) and the minimum insdel distance could be as large as possible. However, as in the Hamming metric case, these two parameters are restricted each other for any fixed code length. The Singleton bound for insdel codes says that the minimum insdel distance of any linear code over satisfies , see [5]. It is, therefore, natural to consider the problem of constructing insdel linear codes achieving the Singleton bound with equality. Unfortunately, there have been few constructions of Reed-Solomon codes with high capabilities, but none of them meets or comes close to this bound. For example, Wang et al. [16] constructed a class of Reed-Solomon codes with code length , dimension and minimum insdel distance at most ; Tonien [14] et al. constructed a class of generalized Reed-Solomon codes of length and dimension with deletion error-correcting capability of up to . These led to investigate whether the Singleton bound is a tight upper bound for the minimum insdel distance of Reed-Solomon codes. In 2007, McAven et al. [11] showed that Reed-Solomon codes of length and dimension over prime fields can never meet the Singleton bound. Recently, Do Duc et al. [4] improved the result by showing that when the field size is sufficiently large compared to the code length, the Singleton bound cannot be achieved by Reed-Solomon codes; more explicitly, it was shown that the minimum insdel distance of a -dimensional Reed-Solomon code is at most if the code length satisfies and ([4, Theorem 1]); optimal codes that meet the new bound were also constructed explicitly [4, Theorems 2 and 3]. Very recently, Liu et al. in [8] established a set of sufficient conditions for two-dimensional insdel Reed-Solomon codes to have optimal asymptotic error-correcting capabilities.

The aforementioned works lead us to the study of Singleton-type bounds for the minimum insdel distance of general linear codes and optimal constructions of such codes. The contribution of this paper is twofold. We first show that the minimum insdel distance of any linear code over satisfies if . This result improves and generalizes [4, Theorem 1] in two directions: First, our result holds true for general linear codes, not just Reed-Solomon codes; second, we do not require that . More precisely, we obtain the following result.

Theorem A  Suppose is an linear code over with . Then the minimum insdel distance of is at most , i.e., .

We then give a sufficient condition under which the minimum insdel distance of a two-dimensional Reed-Solomon code of length over is exactly equal to . Our approach and conclusion are quite different from those given in [4, Theorems 2 and 3]: the proofs for [4, Theorems 2 and 3] are long and technical, and [4, Theorems 3] requires conditions on the divisors of and some related values; our methods are more natural and direct, and our result mainly concerns the order of the finite field. Our result is stated below.

Theorem B  Let be a finite field with elements, where is a prime number and is a positive integer. Suppose is a primitive element of , i.e., the order of in the multiplicative group of is equal to . Let

 C={(λ+μθi1,λ+μθi2,⋯,λ+μθin)∣∣λ,μ∈Fq}

be a two-dimensional Reed-Solomon code of length over . We assume that . If

• and

• the number of elements of is equal to ,

then the minimum insdel distance of is equal to . In other words, is optimal in the sense that it meets the bound obtained in Theorem A.

As a consequence, we show that the conditions and listed in Theorem B are not hard to achieve; we explicitly construct an infinite family of optimal two-dimensional Reed-Somolom codes meeting the bound, as we show below.

Corollary C  Let be a prime number and let be a positive integer. Let for satisfying . Let be a primitive element in the finite field . Then

 C={(λ+μθi1,λ+μθi2,⋯,λ+μθin)∣∣λ,μ∈Fpe}

is a two-dimensional Reed-Solomon code of length over whose minimum insdel distance is equal to

This paper is organized as follows. In Section , we recall some definitions and basic results about general linear codes, insdel codes and Reed-Solomon codes. In Section , we give the proof for Theorem A, and in Section , the proofs for Theorem B and Corollary C are presented. We conclude this paper with remarks on possible future works in Section .

## 2 Preliminaries

Let be a finite field with elements and let be the set of all vectors of length over . A subspace of over is called a linear code of length over . The Hamming distance between two vectors , which is defined to be the number of coordinates in which and differ, is denoted by . The minimum Hamming distance of a code is the smallest Hamming distance among all pairs of distinct codewords of . The Hamming weight of a vector is the number of nonzero coordinates in . It is well known that if is a linear code, then the minimum Hamming distance is the same as the minimum Hamming weight of the nonzero codewords of . A linear code of length , dimension and minimum Hamming distance over is often called a -ary code or, if is clear from the context, an code. It is well known that an linear code over must obey the Singleton bound, i.e., the code length , dimension and minimum Hamming distance satisfy

 dH(C)≤n−k+1.

The linear codes over meeting the Singleton bound are called maximum distance separable code (MDS code for short).

In this paper, for linear codes over , we mainly consider the insdel distance used in high insertion and deletion noise regime. We restate this definition as follows.

###### Definition 2.1.

For two vectors , the insdel distance between and is the minimum number of insertions and deletions which are needed to transform into . It can be verified that is indeed a metric on .

It has been shown that the insdel distance between any two vectors can be characterized via their longest common subsequences.

###### Lemma 2.2.

[4, Lemma 1] Let . Then we have

 d(a,b)=2n−2ℓ,

where denotes the length of a longest common subsequence of and .

Lemma 2.2 is useful in calculating the insdel distance of two vectors in .

Similar to the definition of minimum Hamming distance of linear codes over , we give the definition of minimum insdel distance of a linear code over below, which is one of the most important parameters as it indicates the insdel error-correcting capability.

###### Definition 2.3.

An insdel linear code of length is a linear subspace of with minimum insdel distance being defined as

 d(C)=minc1,c2∈C,c1≠c2{d(c1,c2)}.

An linear code over of length , dimension and minimum insdel distance is called an insdel linear code over . As we mentioned in the first section, an insdel linear code must obey the following Singleton-type bound.

###### Proposition 2.4.

(Singleton Bound [5]) Let be an insdel linear code over . Then

 d(C)≤2n−2k+2.

In the rest of this section we give the definition and some basic facts about Reed-Solomon codes. Let be two positive integers. Let be a finite field with elements and choose distinct elements of . Denote by the set of polynomials in of degree less that . For , the Reed-Solomon code of length and dimension with code locators is defined as

 RSn,k(α)={(f(α1),f(α2),⋯,f(αn))|f(x)∈F

Then is an linear code over with length . In particular, Reed-Solomon codes are MDS codes.

## 3 Proof of Theorem A

Let be a linear code of length over . As before, the minimum Hamming distance of the linear code is denoted by ; the minimum insdel distance of is denoted by . For two typical codewords of , the Hamming (resp. insdel) distance between and is denoted by (resp. ). It follows from Lemma 2.2 that the insdel distance between and is less than or equal to , i.e., . In order to prove Theorem A, we first need to improve this upper bound, as we show below.

###### Lemma 3.1.

Let and be two vectors of length over . Then we have

 d(a,b)≤2dH(a,b).
###### Proof.

Let denote the length of a longest common subsequence of and . Observe that and then by Lemma 2.2 we immediately have

 d(a,b)=2n−2ℓ≤2n−2(n−dH(a,b))=2dH(a,b).

The lemma is proved. ∎

We have shown that the insdel distance of arbitrary two vectors in is at most twice of their Hamming distance. We next show that the same conclusion holds for the minimum insdel distance and the minimum Hamming distance of any linear code.

###### Lemma 3.2.

Let be a non-zero linear code of length over . We then have

 d(C)≤2dH(C).
###### Proof.

Choose two distinct codewords and of such that . Recall that . Thus by Lemma 3.1 we have

 d(C)≤d(a0,b0)≤2dH(a0,b0)=2dH(C).

We are done. ∎

###### Remark 3.3.

Note, by the classical Singleton bound of an linear code , that . We therefore conclude from Lemma 3.2 that the minimum insdel distance of an linear code must be less than or equal to , i.e.,

 d(C)≤2n−2k+2.

This Singleton bound for an insdel code was exhibited in [5]. We also note by Lemma 3.2 that if the minimum Hamming distance of is at most , then , proving Theorem A in this special case. By virtue of this fact, for the goal of completing the proof Theorem A, we only need to restrict ourself to the case where is an MDS code.

The following lemma gives the desired result, which is a crucial step in the process of proving our Theorem A.

###### Lemma 3.4.

Let be a linear MDS code over with . Then

 d(C)≤2n−2k.
###### Proof.

We use a characterization of MDS codes to complete the proof: An linear code over is MDS if and only if has a minimum weight codeword in any coordinates (see [9, Chap. 11, Theorem 4]). Now is an MDS code over with , which gives that has a minimum weight codeword in any coordinates. First we choose the last coordinates to be non-zero, and we suppose further that the th coordinate is equal to , i.e.,

 a=(0,⋯,0k−1,1,∗,⋯,∗)

where denotes some non-zero elements of . It follows that suitable non-zero elements of can be found such that is a codeword of . Likewise, suitable non-zero elements of can be found such that

 b=(⋆,0,⋯,0k−1,1,⋆,⋯,⋆)

is also a codeword of . Now we have two distinct codewords and . Thus the length of a longest common subsequence of and satisfies . Therefore, by Lemma 2.2 we get that

 d(C)≤d(a,b)=2n−2ℓ≤2n−2k.

We are done. ∎

By Remark 3.3 and Lemma 3.4, we immediately arrive at Theorem A, which says that the minimum insdel distance of any linear code with is at most .

## 4 Proofs of Theorem B and Corollary C

The primary goal of this section is to present a proof for Theorem B, which gives a sufficient condition to guarantee a two-dimensional Reed-Solomon code of length over to have minimum insdel distance . Such codes are optimal in the sense that they meet the upper bound obtained in Theorem A.

We first fix some notation. Let be a finite field with elements, where is a prime number and is a positive integer. Let be the multiplicative group of , and let be a primitive element of , i.e., generates the cyclic group . Then is the degree of the minimal polynomial of over the prime field .

Let be integers such that , where . Let be a two-dimensional Reed-Solomon code of length over with code locators , i.e.,

 C={(λ+μθi1,λ+μθi2,⋯,λ+μθin)∣∣λ,μ∈Fq}. (4.1)

Our goal is, therefore, converting to find suitable numbers such that has minimum insdel distance . For this purpose, let

 D={ij−ik|1≤k

then

 D⊆{1,2,⋯,q−2}.

We are now in a position to prove Theorem B.

###### Proof.

Suppose is a two-dimensional Reed-Solomon code of length over as given in (4.1). According to Theorem A we have that . Thus it remains to show that . To this end, by virtue of Lemma 2.2, it is enough to prove the following claim:

Claim: for any distinct codewords , where is the length of a longest common subsequence of and .

Now we consider cases separately to investigate the number .

Case : Either or is the zero vector. It is trivial to see that because is an MDS code with parameters .

Henceforth, we can assume that both and are non-zero.

Case : Assume that

where are elements of with . Then it is easy to see that .

Case : Assume that

 a=(λ,λ,⋯,λ),  b=(μθi1,μθi2,⋯,μθin),

where . Suppose otherwise that there exist three integers satisfying such that

 ⎧⎪⎨⎪⎩λ=μθir1,λ=μθir2,λ=μθir3.

This gives

 θir1=θir2=θir3=λμ.

Thus

 θir2−ir1=θir3−ir1=1.

Recall that , and for . We then have

 ir2−ir1=ir3−ir1.

This contradicts to the condition in the theorem. It follows that .

Case : Assume that

 a=(λ,λ,⋯,λ),  b=(λ1+μ1θi1,λ1+μ1θi2,⋯,λ1+μ1θin),

where . Suppose otherwise that there exist three integers satisfying such that

 ⎧⎪⎨⎪⎩λ=λ1+μ1θir1,λ=λ1+μ1θir2,λ=λ1+μ1θir3.

 θir1=θir2=θir3=λ−λ1μ1.

Thus

 θir2−ir1=θir3−ir1=1,

which gives

 ir2−ir1=ir3−ir1.

This contradicts to the condition in the theorem again. It follows that .

Case : Assume that

 a=(λθi1,λθi2,⋯,λθin),  b=(μθi1,μθi2,⋯,μθin),

where with . Suppose otherwise that there exist six integers satisfying such that

 ⎧⎪⎨⎪⎩λθik1=μθir1,λθik2=μθir2,λθik3=μθir3.

This gives

 θik1−ir1=θik2−ir2=θik3−ir3=μλ,

which implies that

 {ik1−ir1≡ik2−ir2(mod q−1),ik1−ir1≡ik3−ir3(mod q−1).

This is equivalent to

 {ik2−ik1≡ir2−ir1(mod q−1),ik3−ik1≡ir3−ir1(mod q−1).

Thus we have

 {ik2−ik1=ir2−ir1,ik3−ik1=ir3−ir1.

This is a contradiction again. It follows that .

Case : Assume that

 a=(λθi1,λθi2,⋯,λθin),  b=(λ1+μ1θi1,λ1+μ1θi2,⋯,λ1+μ1θin),

where . Suppose otherwise that there exist six integers satisfying such that

 ⎧⎪⎨⎪⎩λθik1=λ1+μ1θir1,λθik2=λ1+μ1θir2,λθik3=λ1+μ1θir3.

In the matrix version, that is equivalent to saying that

 ⎛⎜ ⎜⎝θik1θir11θik2θir21θik3θir31⎞⎟ ⎟⎠⎛⎜⎝λ−μ1−λ1⎞⎟⎠=0.

Since

 ⎛⎜⎝λ−μ1−λ1⎞⎟⎠≠0,

we have that the determinant of the coefficient matrix is zero, i.e.,

 ∣∣ ∣ ∣∣θik1θir11θik2θir21θik3θir31∣∣ ∣ ∣∣=0.

Now suppose is an indeterminate over the finite field . Then we have the polynomial

 f(x)=∣∣ ∣ ∣∣xik1xir11xik2xir21xik3xir31∣∣ ∣ ∣∣.

It is clear that . On the other hand, we can expand the determinant to have

 f(x) = ∣∣ ∣ ∣∣xik1xir11xik2xir21xik3xir31∣∣ ∣ ∣∣ = ∣∣ ∣ ∣∣xik1−xik3xir1−xir30xik2−xik3xir2−xir30xik3xir31∣∣ ∣ ∣∣ = (xik1−xik3)(xir2−xir3)−(xik2−xik3)(xir1−xir3) = xik1+ir2+xik2+ir3+xik3+ir1−xik1+ir3−xik2+ir1−xik3+ir2.

Observing that and , we have

 ik1+ir2

Thus the term of the minimum degree in the polynomial is or . As , otherwise we would have , a contradiction, we conclude that .

Since is a non-zero polynomial, and is the degree of the minimal polynomial of over , we obtain that . Thus it is easy to see that the degree of satisfies

 e≤deg(f(x))≤in−1+in

The last inequality is from our condition . This is a contradiction, and we conclude the proof of .

Case : Assume that

 a=(λ2+μ2θi1,λ2+μ2θi2,⋯,λ2+μ2θin),  b=(λ1+μ1θi1,λ1+μ1θi2,⋯,λ1+μ1θin),

where . Suppose otherwise that there exist six integers satisfying such that

 ⎧⎪⎨⎪⎩λ2+μ2θik1=λ1+μ1θir1,λ2+μ2θik2=λ1+μ1θir2,λ2+μ2θik3=λ1+μ1θir3.

We then have

 ⎛⎜ ⎜⎝θik1θir11θik2θir21θik3θir31⎞⎟ ⎟⎠⎛⎜⎝μ2−μ1λ2−λ1⎞⎟⎠=0.

Since

 ⎛⎜⎝μ2−μ1λ2−λ1⎞⎟⎠≠0,

we have that

 ∣∣ ∣ ∣∣θik1θir11θik2θir21θik3θir31∣∣ ∣ ∣∣=0.

Using the same discussion as in the Case , we get a contradiction and conclude the proof of .

Based on the above cases, we have established the claim. Then by Lemma 2.2 we obtain , which forces by Theorem A. This completes the proof. ∎

With Theorem B, we can prove Corollary C easily, which generates an infinite family of optimal two-dimensional Reed-Somolom codes meeting the bound in Theorem A.

###### Proof.

It is enough to check the conditions (1) and (2) in Theorem B are all satisfied. Condition (1) in Theorem B is satisfied since . Next, let us compute . Suppose that

 2i−2j=2s−2t (0≤i,j,s,t≤n−1, j

If , then without loss of generality, assume that . This gives , a contradiction, which shows that and . Therefore . We have shown that Condition (2) in Theorem B is satisfied. By Theorem B, we conclude that the code is a two-dimensional Reed-Solomon code of length with minimum insdel distance . ∎

## 5 Conclusion and future work

In this paper, we showed that if is an linear code over with , then the minimum insdel distance of is at most (see Theorem A). This result significantly improves the previously known results in [4] and [5] as we mentioned in the Introduction section. We gave a sufficient condition under which a two-dimensional Reed-Solomon code of length over has minimum insdel distance (see Theorem B); as a corollary, we showed that the conditions listed in Theorem B are easy to achieve (see Corollary C). Consequently, we have explicitly constructed an infinite family of optimal two-dimensional Reed-Somolom codes meeting the bound in Theorem A. Comparing with [4], our methods are more direct and easy to understand.

A possible direction for future work is to find non-MDS codes that meet the bound exhibited in Theorem A; if this can be done, it may lead us to know more about insertion-deletion metric. Apart from this problem, there could be many other interesting problems associated with insertion-deletion codes. For instance, it would be interesting to establish other bounds with respect to the insertion-deletion metric and give some optimal constructions.

## References

• [1] P. A. H. Bours, On the construction of perfect deletion-correcting codes using design theory, Designs, Codes and Cryptography, vol. 6, no. 1, pp. 5-20, 1995.
• [2] E. Brill and R. C. Moore, An improved error model for noisy channel spelling correction, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL ¡¯00), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 286-293, 2000.
• [3] Y. M. Chee, H. M. Kiah, A. Vardy, V. K. Vu and E. Yaakobi, Codes correcting position errors in racetrack memories, 2017 IEEE Information Theory Workshop (ITW), Kaohsiung, pp. 161-165, 2017.
• [4] T. Do Duc, S. Liu, I. Tjuawinata and C. Xing, Explicit constructions of two-dimensional Reed-Solomon codes in high insertion and deletion noise regime, IEEE Transactions on Information Theory, DOI 10.1109/TIT.2021.3065618, 2021.
• [5]

B. Haeupler and A. Shahrasbi, Synchronization strings: codes for insertions and deletions approaching the Singleton Bound, Proceedings of the Forty-Ninth Annual ACM Symposium on Theory of Computing, 2017.

• [6] B. Haeupler, A. Shahrasbi and M. Sudan, Synchronization strings: list decoding for insertions and deletions, 45th International Colloquium on Automata, Languages and Programming (ICALP), 2018.
• [7] S. Jain, F. F. Hassanzadeh, M. Schwartz and J. Bruck, Duplication-correcting codes for data storage in the DNA of living organisms, IEEE Transactions on Information Theory, vol. 63, no. 8, pp. 4996-5010, 2017.
• [8] S. Liu and I. Tjuawinata, On -dimensional insertion-deletion Reed-Solomon codes with optimal asymptotic error-correcting capability, Finite Fields and Their Applications, 73(2021) 101841.
• [9] F. J. MacWilliams and N. J. A. Sloane, The theory of error-correcting codes, North Holland, 1983.
• [10] A. Mahmoodi, Existence of perfect -deletion-correcting codes, Designs, Codes and Cryptography, vol. 14, no.1, pp. 81-87, 1998.
• [11] L. McAven and R. Safavi-Naini, Classification of the deletion correcting capabilities of Reed-Solomon codes of dimension over prime fields, IEEE Transactions on Information Theory, vol. 53, no. 6, pp. 2280-2294, June 2007.
• [12] F. J. Och, Minimum error rate training in statistical machine translation, proceedings of the 41st annual meeting on association for computational linguistics - Volume 1 (ACL ¡¯03), Association for Computational Linguistics, Stroudsburg, PA, USA,, vol. 1. pp. 160-167, 2003.
• [13] N. J. A. Sloane, On single-deletion correcting codes, Codes and Designs Columbus, OH: Math. Res. Inst. Publications, Ohio Univ., vol. 10, pp. 273-291,2002.
• [14] J. Tonien and R. Safavi-Naini, Construction of deletion correcting codes using generalized Reed-Solomon codes and their subcodes, Designs, Codes and Cryptography, vol. 42, pp. 227-237, 2007.
• [15] R. R. Varshamov and G. M. Tenengolts, Codes which correct single asymmetric errors (in Russian), Automatika i Telemkhanika, vol. 161, no. 3, pp. 288-292, 1965.
• [16] Y. Wang, L. McAven and R. Safavi-Naini, Deletion correcting using generalized Reed-Solomon codes, Progress in Computer Science and Applied Logic, vol. 23, pp.345-358, 2004.
• [17]

R. Xu and D. Wunsch, Survey of clustering algorithms, IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645-678, 2005.

• [18] J. Yin, A combinatorial construction for perfect deletion-correcting codes, Designs, Codes and Cryptography, vol. 23, no. 1, pp. 99-110, 2001.