1 Introduction
Insertiondeletion codes (insdel codes for short) are designed to protect against synchronization errors [5], [6] in communication systems caused by the loss of positional information of the message. Insdel codes have found applications in many interesting fields such as DNA storage, DNA analysis [7], [17], language processing [2], [12] and racetrack memory error correction [3].
The insdel distance between two vectors is defined as the smallest number of insertions and deletions needed to transform one codeword into another. The minimum insdel distance of a code is defined in the natural way: the minimum insdel distance among all its distinct codewords. Like the classical linear codes with respect to the Hamming distance, the minimum insdel distance of an insdel code is an important parameter, which shows its insdel errorcorrecting capability. The higher the minimum insdel distance, the more insdel errors the code can correct. The study of insdel codes can be date back to the 1960s
[15]. Insdel codes with small code lengths were constructed explicitly by using various mathematical methods, e.g., see [1], [10], [18]. Sloane [13] constructed a family of codes capable of correcting single deletion.For a fixed code length, it would certainly be nice if both the code size (which is a measure of the efficiency of the code) and the minimum insdel distance could be as large as possible. However, as in the Hamming metric case, these two parameters are restricted each other for any fixed code length. The Singleton bound for insdel codes says that the minimum insdel distance of any linear code over satisfies , see [5]. It is, therefore, natural to consider the problem of constructing insdel linear codes achieving the Singleton bound with equality. Unfortunately, there have been few constructions of ReedSolomon codes with high capabilities, but none of them meets or comes close to this bound. For example, Wang et al. [16] constructed a class of ReedSolomon codes with code length , dimension and minimum insdel distance at most ; Tonien [14] et al. constructed a class of generalized ReedSolomon codes of length and dimension with deletion errorcorrecting capability of up to . These led to investigate whether the Singleton bound is a tight upper bound for the minimum insdel distance of ReedSolomon codes. In 2007, McAven et al. [11] showed that ReedSolomon codes of length and dimension over prime fields can never meet the Singleton bound. Recently, Do Duc et al. [4] improved the result by showing that when the field size is sufficiently large compared to the code length, the Singleton bound cannot be achieved by ReedSolomon codes; more explicitly, it was shown that the minimum insdel distance of a dimensional ReedSolomon code is at most if the code length satisfies and ([4, Theorem 1]); optimal codes that meet the new bound were also constructed explicitly [4, Theorems 2 and 3]. Very recently, Liu et al. in [8] established a set of sufficient conditions for twodimensional insdel ReedSolomon codes to have optimal asymptotic errorcorrecting capabilities.
The aforementioned works lead us to the study of Singletontype bounds for the minimum insdel distance of general linear codes and optimal constructions of such codes. The contribution of this paper is twofold. We first show that the minimum insdel distance of any linear code over satisfies if . This result improves and generalizes [4, Theorem 1] in two directions: First, our result holds true for general linear codes, not just ReedSolomon codes; second, we do not require that . More precisely, we obtain the following result.
Theorem A Suppose is an linear code over with . Then the minimum insdel distance of is at most , i.e., .
We then give a sufficient condition under which the minimum insdel distance of a twodimensional ReedSolomon code of length over is exactly equal to . Our approach and conclusion are quite different from those given in [4, Theorems 2 and 3]: the proofs for [4, Theorems 2 and 3] are long and technical, and [4, Theorems 3] requires conditions on the divisors of and some related values; our methods are more natural and direct, and our result mainly concerns the order of the finite field. Our result is stated below.
Theorem B Let be a finite field with elements, where is a prime number and is a positive integer. Suppose is a primitive element of , i.e., the order of in the multiplicative group of is equal to . Let
be a twodimensional ReedSolomon code of length over . We assume that . If

and

the number of elements of is equal to ,
then the minimum insdel distance of is equal to . In other words, is optimal in the sense that it meets the bound obtained in Theorem A.
As a consequence, we show that the conditions and listed in Theorem B are not hard to achieve; we explicitly construct an infinite family of optimal twodimensional ReedSomolom codes meeting the bound, as we show below.
Corollary C Let be a prime number and let be a positive integer. Let for satisfying . Let be a primitive element in the finite field . Then
is a twodimensional ReedSolomon code of length over whose minimum insdel distance is equal to
This paper is organized as follows. In Section , we recall some definitions and basic results about general linear codes, insdel codes and ReedSolomon codes. In Section , we give the proof for Theorem A, and in Section , the proofs for Theorem B and Corollary C are presented. We conclude this paper with remarks on possible future works in Section .
2 Preliminaries
Let be a finite field with elements and let be the set of all vectors of length over . A subspace of over is called a linear code of length over . The Hamming distance between two vectors , which is defined to be the number of coordinates in which and differ, is denoted by . The minimum Hamming distance of a code is the smallest Hamming distance among all pairs of distinct codewords of . The Hamming weight of a vector is the number of nonzero coordinates in . It is well known that if is a linear code, then the minimum Hamming distance is the same as the minimum Hamming weight of the nonzero codewords of . A linear code of length , dimension and minimum Hamming distance over is often called a ary code or, if is clear from the context, an code. It is well known that an linear code over must obey the Singleton bound, i.e., the code length , dimension and minimum Hamming distance satisfy
The linear codes over meeting the Singleton bound are called maximum distance separable code (MDS code for short).
In this paper, for linear codes over , we mainly consider the insdel distance used in high insertion and deletion noise regime. We restate this definition as follows.
Definition 2.1.
For two vectors , the insdel distance between and is the minimum number of insertions and deletions which are needed to transform into . It can be verified that is indeed a metric on .
It has been shown that the insdel distance between any two vectors can be characterized via their longest common subsequences.
Lemma 2.2.
Lemma 2.2 is useful in calculating the insdel distance of two vectors in .
Similar to the definition of minimum Hamming distance of linear codes over , we give the definition of minimum insdel distance of a linear code over below, which is one of the most important parameters as it indicates the insdel errorcorrecting capability.
Definition 2.3.
An insdel linear code of length is a linear subspace of with minimum insdel distance being defined as
An linear code over of length , dimension and minimum insdel distance is called an insdel linear code over . As we mentioned in the first section, an insdel linear code must obey the following Singletontype bound.
Proposition 2.4.
(Singleton Bound [5]) Let be an insdel linear code over . Then
In the rest of this section we give the definition and some basic facts about ReedSolomon codes. Let be two positive integers. Let be a finite field with elements and choose distinct elements of . Denote by the set of polynomials in of degree less that . For , the ReedSolomon code of length and dimension with code locators is defined as
Then is an linear code over with length . In particular, ReedSolomon codes are MDS codes.
3 Proof of Theorem A
Let be a linear code of length over . As before, the minimum Hamming distance of the linear code is denoted by ; the minimum insdel distance of is denoted by . For two typical codewords of , the Hamming (resp. insdel) distance between and is denoted by (resp. ). It follows from Lemma 2.2 that the insdel distance between and is less than or equal to , i.e., . In order to prove Theorem A, we first need to improve this upper bound, as we show below.
Lemma 3.1.
Let and be two vectors of length over . Then we have
Proof.
Let denote the length of a longest common subsequence of and . Observe that and then by Lemma 2.2 we immediately have
The lemma is proved. ∎
We have shown that the insdel distance of arbitrary two vectors in is at most twice of their Hamming distance. We next show that the same conclusion holds for the minimum insdel distance and the minimum Hamming distance of any linear code.
Lemma 3.2.
Let be a nonzero linear code of length over . We then have
Proof.
Choose two distinct codewords and of such that . Recall that . Thus by Lemma 3.1 we have
We are done. ∎
Remark 3.3.
Note, by the classical Singleton bound of an linear code , that . We therefore conclude from Lemma 3.2 that the minimum insdel distance of an linear code must be less than or equal to , i.e.,
This Singleton bound for an insdel code was exhibited in [5]. We also note by Lemma 3.2 that if the minimum Hamming distance of is at most , then , proving Theorem A in this special case. By virtue of this fact, for the goal of completing the proof Theorem A, we only need to restrict ourself to the case where is an MDS code.
The following lemma gives the desired result, which is a crucial step in the process of proving our Theorem A.
Lemma 3.4.
Let be a linear MDS code over with . Then
Proof.
We use a characterization of MDS codes to complete the proof: An linear code over is MDS if and only if has a minimum weight codeword in any coordinates (see [9, Chap. 11, Theorem 4]). Now is an MDS code over with , which gives that has a minimum weight codeword in any coordinates. First we choose the last coordinates to be nonzero, and we suppose further that the th coordinate is equal to , i.e.,
where denotes some nonzero elements of . It follows that suitable nonzero elements of can be found such that is a codeword of . Likewise, suitable nonzero elements of can be found such that
is also a codeword of . Now we have two distinct codewords and . Thus the length of a longest common subsequence of and satisfies . Therefore, by Lemma 2.2 we get that
We are done. ∎
4 Proofs of Theorem B and Corollary C
The primary goal of this section is to present a proof for Theorem B, which gives a sufficient condition to guarantee a twodimensional ReedSolomon code of length over to have minimum insdel distance . Such codes are optimal in the sense that they meet the upper bound obtained in Theorem A.
We first fix some notation. Let be a finite field with elements, where is a prime number and is a positive integer. Let be the multiplicative group of , and let be a primitive element of , i.e., generates the cyclic group . Then is the degree of the minimal polynomial of over the prime field .
Let be integers such that , where . Let be a twodimensional ReedSolomon code of length over with code locators , i.e.,
(4.1) 
Our goal is, therefore, converting to find suitable numbers such that has minimum insdel distance . For this purpose, let
then
We are now in a position to prove Theorem B.
Proof.
Suppose is a twodimensional ReedSolomon code of length over as given in (4.1). According to Theorem A we have that . Thus it remains to show that . To this end, by virtue of Lemma 2.2, it is enough to prove the following claim:
Claim: for any distinct codewords , where is the length of a longest common subsequence of and .
Now we consider cases separately to investigate the number .
Case : Either or is the zero vector. It is trivial to see that because is an MDS code with parameters .
Henceforth, we can assume that both and are nonzero.
Case : Assume that
where are elements of with . Then it is easy to see that .
Case : Assume that
where . Suppose otherwise that there exist three integers satisfying such that
This gives
Thus
Recall that , and for . We then have
This contradicts to the condition in the theorem. It follows that .
Case : Assume that
where . Suppose otherwise that there exist three integers satisfying such that
This leads to
Thus
which gives
This contradicts to the condition in the theorem again. It follows that .
Case : Assume that
where with . Suppose otherwise that there exist six integers satisfying such that
This gives
which implies that
This is equivalent to
Thus we have
This is a contradiction again. It follows that .
Case : Assume that
where . Suppose otherwise that there exist six integers satisfying such that
In the matrix version, that is equivalent to saying that
Since
we have that the determinant of the coefficient matrix is zero, i.e.,
Now suppose is an indeterminate over the finite field . Then we have the polynomial
It is clear that . On the other hand, we can expand the determinant to have
Observing that and , we have
Thus the term of the minimum degree in the polynomial is or . As , otherwise we would have , a contradiction, we conclude that .
Since is a nonzero polynomial, and is the degree of the minimal polynomial of over , we obtain that . Thus it is easy to see that the degree of satisfies
The last inequality is from our condition . This is a contradiction, and we conclude the proof of .
Case : Assume that
where . Suppose otherwise that there exist six integers satisfying such that
We then have
Since
we have that
Using the same discussion as in the Case , we get a contradiction and conclude the proof of .
Based on the above cases, we have established the claim. Then by Lemma 2.2 we obtain , which forces by Theorem A. This completes the proof. ∎
With Theorem B, we can prove Corollary C easily, which generates an infinite family of optimal twodimensional ReedSomolom codes meeting the bound in Theorem A.
Proof.
It is enough to check the conditions (1) and (2) in Theorem B are all satisfied. Condition (1) in Theorem B is satisfied since . Next, let us compute . Suppose that
If , then without loss of generality, assume that . This gives , a contradiction, which shows that and . Therefore . We have shown that Condition (2) in Theorem B is satisfied. By Theorem B, we conclude that the code is a twodimensional ReedSolomon code of length with minimum insdel distance . ∎
5 Conclusion and future work
In this paper, we showed that if is an linear code over with , then the minimum insdel distance of is at most (see Theorem A). This result significantly improves the previously known results in [4] and [5] as we mentioned in the Introduction section. We gave a sufficient condition under which a twodimensional ReedSolomon code of length over has minimum insdel distance (see Theorem B); as a corollary, we showed that the conditions listed in Theorem B are easy to achieve (see Corollary C). Consequently, we have explicitly constructed an infinite family of optimal twodimensional ReedSomolom codes meeting the bound in Theorem A. Comparing with [4], our methods are more direct and easy to understand.
A possible direction for future work is to find nonMDS codes that meet the bound exhibited in Theorem A; if this can be done, it may lead us to know more about insertiondeletion metric. Apart from this problem, there could be many other interesting problems associated with insertiondeletion codes. For instance, it would be interesting to establish other bounds with respect to the insertiondeletion metric and give some optimal constructions.
References
 [1] P. A. H. Bours, On the construction of perfect deletioncorrecting codes using design theory, Designs, Codes and Cryptography, vol. 6, no. 1, pp. 520, 1995.
 [2] E. Brill and R. C. Moore, An improved error model for noisy channel spelling correction, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL ¡¯00), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 286293, 2000.
 [3] Y. M. Chee, H. M. Kiah, A. Vardy, V. K. Vu and E. Yaakobi, Codes correcting position errors in racetrack memories, 2017 IEEE Information Theory Workshop (ITW), Kaohsiung, pp. 161165, 2017.
 [4] T. Do Duc, S. Liu, I. Tjuawinata and C. Xing, Explicit constructions of twodimensional ReedSolomon codes in high insertion and deletion noise regime, IEEE Transactions on Information Theory, DOI 10.1109/TIT.2021.3065618, 2021.

[5]
B. Haeupler and A. Shahrasbi, Synchronization strings: codes for insertions and deletions approaching the Singleton Bound, Proceedings of the FortyNinth Annual ACM Symposium on Theory of Computing, 2017.
 [6] B. Haeupler, A. Shahrasbi and M. Sudan, Synchronization strings: list decoding for insertions and deletions, 45th International Colloquium on Automata, Languages and Programming (ICALP), 2018.
 [7] S. Jain, F. F. Hassanzadeh, M. Schwartz and J. Bruck, Duplicationcorrecting codes for data storage in the DNA of living organisms, IEEE Transactions on Information Theory, vol. 63, no. 8, pp. 49965010, 2017.
 [8] S. Liu and I. Tjuawinata, On dimensional insertiondeletion ReedSolomon codes with optimal asymptotic errorcorrecting capability, Finite Fields and Their Applications, 73(2021) 101841.
 [9] F. J. MacWilliams and N. J. A. Sloane, The theory of errorcorrecting codes, North Holland, 1983.
 [10] A. Mahmoodi, Existence of perfect deletioncorrecting codes, Designs, Codes and Cryptography, vol. 14, no.1, pp. 8187, 1998.
 [11] L. McAven and R. SafaviNaini, Classification of the deletion correcting capabilities of ReedSolomon codes of dimension over prime fields, IEEE Transactions on Information Theory, vol. 53, no. 6, pp. 22802294, June 2007.
 [12] F. J. Och, Minimum error rate training in statistical machine translation, proceedings of the 41st annual meeting on association for computational linguistics  Volume 1 (ACL ¡¯03), Association for Computational Linguistics, Stroudsburg, PA, USA,, vol. 1. pp. 160167, 2003.
 [13] N. J. A. Sloane, On singledeletion correcting codes, Codes and Designs Columbus, OH: Math. Res. Inst. Publications, Ohio Univ., vol. 10, pp. 273291,2002.
 [14] J. Tonien and R. SafaviNaini, Construction of deletion correcting codes using generalized ReedSolomon codes and their subcodes, Designs, Codes and Cryptography, vol. 42, pp. 227237, 2007.
 [15] R. R. Varshamov and G. M. Tenengolts, Codes which correct single asymmetric errors (in Russian), Automatika i Telemkhanika, vol. 161, no. 3, pp. 288292, 1965.
 [16] Y. Wang, L. McAven and R. SafaviNaini, Deletion correcting using generalized ReedSolomon codes, Progress in Computer Science and Applied Logic, vol. 23, pp.345358, 2004.

[17]
R. Xu and D. Wunsch, Survey of clustering algorithms, IEEE Transactions on Neural Networks, vol. 16, no. 3, pp. 645678, 2005.
 [18] J. Yin, A combinatorial construction for perfect deletioncorrecting codes, Designs, Codes and Cryptography, vol. 23, no. 1, pp. 99110, 2001.
Comments
There are no comments yet.