I Introduction
Deletions or insertions can occur in many systems; for example, they can occur in some communication and storage channels, in biological sequences, etc. Therefore, studying deletion/insertion correcting codes may lead to important insight into genetic processes and into many communication problems. Deletion correcting codes have been the subject of intense research for more than fifty years [36, 37, 44], with recent results settling long standing open problems regarding constructions of multiple deletion correcting codes with low redundancy [10, 9]. Nevertheless, our understanding about these codes and channels with this type of errors is still very limited and many open problems in the area remain, especially when considering constructions of deletion correcting codes that satisfy additional constraints, such as weight or parity constraints. Examples include codes in the Damerau distance [20], based on single deletion correcting codes with even weight, and Shifted Varshamov–Tenengolts codes [42]
used for burst deletion correction. In such settings, one important question is to determine the weight enumerators of the component deletion correcting codes in order to estimate the size
[13, 28] of the weight-constrained deletion correcting codes. The component deletion correcting code is frequently defined in terms of a linear congruence for which the number of solutions of some fixed weight determines the size of the constrained code.Here, we introduce a general class of codes which includes several well-known classes of deletion/insertion correcting codes as special cases. Then, using a number theoretic method, we give an explicit formula for the weight enumerator of our code which in turn gives explicit formulas for the weight enumerators and for the sizes of the aforedescribed codes (see also [13, 28] for some general upper bounds for the size of deletion correcting codes). Our initial motivation for studying this problem comes from number theory, and pertains to a possible -ary generalization of Lehmer’s Theorem (see Section II).
Before we proceed with our technical exposition, we review some well-known classes of deletion correcting codes.
Throughout the paper, we let . Varshamov and Tenengolts [49] in 1965 introduced an important class of codes, known as the Varshamov–Tenengolts codes (henceforth, VT-codes), and proved that these codes are capable of correcting single asymmetric errors on a -channel.
Definition I.1.
Let be a positive integer and . The Varshamov–Tenengolts code is the set of all binary -tuples such that
A generalization of VT-codes to Abelian groups where the code length is one less than the order of the group was proposed by Constantin and Rao [12]; the size and weight distribution of the latter codes were studied in [16, 24, 26, 34]. Despite the fact that the VT codes can correct only a single deletion [32], the codes and their variants have found many applications, including DNA-based data storage [20, 31] and distributed message synchronization [51, 52].
Levenshtein [32] proved that any code that can correct deletions (or insertions) can also correct a total of deletions and insertions. In the same paper, he also proposed the following important generalization of VT codes.
Definition I.2.
Let , be positive integers and . The Levenshtein code is the set of all binary -tuples such that
By giving an elegant decoding algorithm, Levenshtein [32] showed that if , then the code can correct a single deletion (and consequently, can correct a single insertion). Furthermore, Levenshtein [32] proved that if then the code can correct either a single deletion/insertion error or a single substitution error. The Levenshtein code has found many interesting applications and is considered to be one of the most important examples of deletion/insertion correcting codes.
Motivated by applications in burst of deletion correction, a variant of the Levenshtein code was introduced in [42] under the name of Shifted Varshamov–Tenengolts codes. Gabrys et al. [20] used Shifted VT-codes to construct codes in the Damerau distance. Shifted VT-codes combine a linear congruence constraint with a parity constraint, as stated in the next definition.
Definition I.3.
Let , be positive integers, , and . The Shifted Varshamov–Tenengolts code is the set of all binary -tuples such that
The reason why these codes are called “shifted” is that they can correct a single deletion where the location of the deleted bit is known to be within certain consecutive positions. A variation of the Shifted VT-codes appeared in [14, 15].
Helberg and Ferreira [23] introduced a generalization of the Levenshtein code, referred to as the Helberg code, by replacing the coefficients (weights) with modified versions of the Fibonacci numbers.
Definition I.4.
Let , be positive integers. The Helberg code is the set of all binary -tuples such that
where , for , , for , , and . Note that the multipliers depend on , and depends on both and .
Clearly, the Helberg code with coincides with the VT code. Helberg and Ferreira [23] gave numerical values for the maximum cardinality of this code for some special parameter choices. Abdel-Ghaffar et al. [1] proved that the Helberg code can correct multiple deletion/insertion errors (see also [22] for a short proof of this result). Furthermore, multiple deletion correcting codes over nonbinary alphabets generalizing the Helberg code were recently proposed by Le and Nguyen [29]. The Helberg code constraint was combined with the parity constraint of Shifted VT-codes for the purpose of devising special types of DNA-based data storage codes in [20].
We now introduce our general code family which includes the above codes as special cases.
Definition I.5.
Let , be positive integers, , and . We define the Binary Linear Congruence Code (BLCC) as the set of all binary -tuples such that
The Hamming weight of a string over an alphabet, denoted by , is the number of non-zero symbols in . Equivalently, the Hamming weight of a string is the Hamming distance between that string and the all-zero string of the same length. The weight enumerator of a code is defined as follows.
Definition I.6.
Let be a positive integer, be a finite field, and let . Then the weight enumerator of the code is defined as
where is the Hamming weight of , and is the number of codewords in of Hamming weight . Also, the homogeneous weight enumerator of the code is defined as
Clearly, by setting in the weight enumerator (or in the homogeneous weight enumerator) we obtain the size of code .
What can we say about the size, or more generally, about the weight enumerator of the Binary Linear Congruence Code (BLCC) ? In the next section, we review linear congruences, exponential sums and in particular, Ramanujan sums. Then, in Section III, we give an explicit formula for the weight enumerator of . In Section IV, we derive explicit formulas for the weight enumerators and for the sizes of the previously described deletion correcting codes. We also obtain a formula for the size of the Shifted Varshamov–Tenengolts codes.
Ii Linear congruences and Ramanujan sums
Let , . Throughout the paper, an ordered -tuple of integers is denoted by . Also, by
we mean the scalar product of the vectors
and . A linear congruence in unknowns is of the form(II.1) |
A solution of (II.1) is an ordered -tuple of integers that satisfies (II.1). The following result, proved by Lehmer [30], gives the number of solutions of the above linear congruence.
Theorem II.1.
Let , . The linear congruence has a solution if and only if , where . Furthermore, if this condition is satisfied, then there are solutions.
Lehmer’s Theorem and its variants have been studied extensively and have found intriguing applications in several areas of mathematics, computer science, and physics (see [2, 3, 4, 5, 6, 11, 25] and the references therein).
Now, we pose the following problem that asks for a -ary generalization of Lehmer’s Theorem:
Problem II.2.
Let , , and . Give an explicit formula for the number of solutions of the linear congruence with .
Note that we have only changed to . For example, when , the problem is asking for an explicit formula for the number of binary solutions of an arbitrary linear congruence. This is a very natural problem and might lead to interesting applications. In Section III, we solve the binary version of the above problem as an immediate consequence of our main result.
Remark II.3.
Next, we review some properties of exponential sums and in particular, Ramanujan sums. Throughout the paper, we let denote the complex exponential with period .
Lemma II.4.
Let be a positive integer and be a real number. Then we have
(II.2) |
Proof.
When the result is clear because in this case . So, we let . Since , summing the geometric progression gives
∎
For integers and with the quantity
(II.3) |
is called a Ramanujan sum. It is the sum of the -th powers of the primitive -th roots of unity, and is also denoted by in the literature. Even though the Ramanujan sum is defined as a sum of some complex numbers, it is integer-valued (see Theorem II.5 below). From (II.3), it is clear that .
Ramanujan sums and some of their properties were certainly known before Ramanujan’s paper [41], as Ramanujan himself declared [41]
; nonetheless, probably the reason that these sums bear Ramanujan’s name is that “Ramanujan was the first to appreciate the importance of the sum and to use it systematically”, according to Hardy (see,
[19] for a discussion about this).Ramanujan sums have important applications in additive number theory, for example, in the context of the Hardy-Littlewood circle method, Waring’s problem, and sieve theory (see, e.g., [38, 39, 50]
and the references therein). As a major result in this direction, one can mention Vinogradov’s theorem (in its proof, Ramanujan sums play a key role) stating that every sufficiently large odd integer is the sum of three primes, and so every sufficiently large even integer is the sum of four primes (see, e.g.,
[39, Chapter 8]). Ramanujan sums have also interesting applications in cryptography [6, 43], coding theory [4, 21], combinatorics [5, 35], graph theory [18, 33], signal processing [47, 48], and physics [2, 40].Clearly, , where is Euler’s totient function. Also, by Theorem II.5 (see below), , where is the Möbius function defined by
(II.4) |
The following theorem, attributed to Kluyver [27], gives an explicit formula for :
Theorem II.5.
For integers and , with ,
(II.5) |
Thus, can be easily computed provided can be factored efficiently. One should compare (II.5) with the formula
(II.6) |
Iii Weight enumerator of the Binary Linear Congruence Code
Using a simple number theoretic argument, we give an explicit formula for the weight enumerator (and the size) of the Binary Linear Congruence Code (BLCC) . Another result which automatically follows from our result is an explicit formula for the number of binary solutions of an arbitrary linear congruence which, to the best of our knowledge, is the first result of its kind in the literature and may be of independent interest.
The following lemma is useful for proving our main result.
Lemma III.1.
Let , be positive integers. For any -tuple , we have
(III.1) |
Proof.
Expand the left-hand side of (III.1) and note that . ∎
Now we are ready to state and prove our main result.
Theorem III.2.
Let , be positive integers, , and . The weight enumerator of the Binary Linear Congruence Code (BLCC) is
(III.2) |
Proof.
By Lemma III.1, for any -tuple we have
Let be a solution of the linear congruence . Then we have
Let and . Note that since is a solution of the linear congruence , we get , for some . Similarly, , for some and .
Therefore,
Thus,
Thus,
By Lemma II.4,
Note that if then (and so ), and if then (and so because ). This implies that
and
Consequently,
∎
Setting in (III.2) gives the size of the Binary Linear Congruence Code (BLCC) . Equivalently, it solves Problem II.2 when , that is, it gives an explicit formula for the number of binary solutions of an arbitrary linear congruence.
Corollary III.3.
Let , be positive integers, , and . The number of solutions of the linear congruence in is
(III.3) |
where . This implies that
(III.4) |
Proof.
We have
where . Consequently, we have
∎
Iv Weight enumerators of the aforementioned codes
Using Theorem III.2, we now describe explicit formulas for the weight enumerators (and the sizes) of the Helberg code, the Levenshtein code, and the Varshamov–Tenengolts code. Note that the same approach may be used to derive the weight enumerators of most variants of these codes since they are special cases of Binary Linear Congruence Codes (BLCC) . In addition, we derive a formula for the size of the Shifted Varshamov–Tenengolts code.
Iv-a Weight enumerator of the Helberg code
The Helberg code has the same structure as the Binary Linear Congruence Code (BLCC) but with some additional restrictions on the coefficients and the modulus. So, Theorem III.2 immediately gives the following result.
Theorem IV.1.
The weight enumerator of the Helberg code is
(IV.1) |
As the coefficients in the Helberg code are a modified version of the Fibonacci numbers, it may be possible to connect trigonometric sums as described above with the Fibonacci and Lucas numbers [8], and hence simplify (IV.1).
Corollary IV.2.
The size of the Helberg code equals
(IV.2) |
where . This implies that
(IV.3) |
Iv-B Weight enumerator of the Levenshtein code
Theorem III.2 also allows for deriving an explicit formula for the weight enumerator of the Levenshtein code.
Theorem IV.3.
The weight enumerator of the Levenshtein code is
(IV.4) |
Corollary IV.4.
The size of the Levenshtein code equals
(IV.5) |
where . This implies that
(IV.6) |
Iv-C The size of the Shifted Varshamov–Tenengolts code
Next, using Theorem IV.3 once again, we give an explicit formula for the size of the Shifted Varshamov–Tenengolts code . Note that represents the set of codewords in the Levenshtein code with even Hamming weight (when ) or with odd Hamming weight (when ).
Theorem IV.5.
If then the size of the Shifted Varshamov–Tenengolts code is
(IV.7) |
and if then the size of is
(IV.8) |
where ,
Proof.
To find the number of codewords in the Levenshtein code with even Hamming weight (when ) and with odd Hamming weight (when ), we proceed as follows. If then the size of equals , and if , the size of equals . Invoking Theorem IV.3 proves the claimed result. ∎
Iv-D Weight enumerators of VT codes
Using Theorem III.2 we re-derive the formula for the weight enumerator of the Varshamov–Tenengolts code. Due to the special structure of the coefficients int he congruences, our formula simplifies significantly.
We start with the following lemma.
Lemma IV.6.
Let be a positive integer and be a non-negative integer. Then, we have
where .
Theorem IV.7.
The weight enumerator of the VT code is
(IV.9) |
Based on Theorem IV.7, one can easily obtain the following explicit formula for the general term of the weight distribution of VT codes. This result was recently proved using a different method by Bibak et al. [4] (for a related earlier result, see also [17]).
Theorem IV.8.
The number of codewords with Hamming weight in the Varshamov–Tenengolts code equals
(IV.10) |
Proof.
The proof reduces to using the binomial theorem to find the coefficient of in the sum of (IV.9). ∎
Corollary IV.9.
The size of the VT code equals
(IV.11) |
Remark IV.10.
Ginzburg [21] in 1967 proved the following explicit formula for the size of the -ary, rather than binary, Varshamov–Tenengolts code , where is an arbitrary positive integer:
(IV.12) |
Formula (IV.12) (in fact, a more complicated version of it) was later rediscovered by Stanley and Yoder [46] in 1973. Formula (IV.12) for the binary case was also rediscovered by Sloane [44] in 2002. Bibak et al. [4] derived the binary case formula as a corollary of a general number theory problem.
Remark IV.11.
Remark IV.12.
Setting in Formula (IV.11) gives the bound
On the other hand, by a result of Levenshtein [32], the size of the largest single deletion correcting binary code of length , where is sufficiently large, is roughly . Therefore, as it is well-known, the VT-codes , for sufficiently large , are close to optimal.
Acknowledgements
The authors would like to thank the reviewers for helpful comments that improved the presentation of this paper. This work was supported in part by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCR-0939370, and by the NSF grant CCF1618366.
References
- [1] K. A. S. Abdel-Ghaffar, F. Paluncic, H. C. Ferreira, and W. A. Clarke, On Helberg’s generalization of the Levenshtein code for multiple deletion/insertion error correction, IEEE Trans. Inform. Theory 58 (2012), 1804–1808.
- [2] K. Bibak, B. M. Kapron, and V. Srinivasan, Counting surface-kernel epimorphisms from a co-compact Fuchsian group to a cyclic group with motivations from string theory and QFT, Nuclear Phys. B 910 (2016), 712–723.
- [3] K. Bibak, B. M. Kapron, and V. Srinivasan, MMH with arbitrary modulus is always almost-universal, Inform. Process. Lett. 116 (2016), 481–483.
- [4] K. Bibak, B. M. Kapron, and V. Srinivasan, Unweighted linear congruences with distinct coordinates and the Varshamov–Tenengolts codes, Des. Codes Cryptogr. 86 (2018), 1893–1904.
- [5] K. Bibak, B. M. Kapron, V. Srinivasan, R. Tauraso, and L. Tóth, Restricted linear congruences, J. Number Theory 171 (2017), 128–144.
- [6] K. Bibak, B. M. Kapron, V. Srinivasan, and L. Tóth, On an almost-universal hash function family with applications to authentication and secrecy codes, Internat. J. Found. Comput. Sci. 29 (2018), 357–375.
- [7] K. Bibak and O. Milenkovic, Weight enumerators of some classes of deletion correcting codes, ISIT 2018, 431–435.
- [8] K. Bibak and M. H. Shirdareh Haghighi, Some trigonometric identities involving Fibonacci and Lucas numbers, J. Integer Seq. 12 (2009), Article 09.8.4.
- [9] J. Brakensiek, V. Guruswami, and S. Zbarsky, Efficient low-redundancy codes for correcting multiple deletions, IEEE Trans. Inform. Theory 64 (2018), 3403–3410.
- [10] B. Bukh, V. Guruswami, and J. Håstad, An improved bound on the fraction of correctable deletions, IEEE Trans. Inform. Theory 63 (2017), 93–103.
- [11] E. Cohen, A class of arithmetical functions, Proc. Natl. Acad. Sci. USA 41 (1955), 939–944.
- [12] S. D. Constantin and T. R. N. Rao, On the theory of binary asymmetric error correcting codes, Inform. Contr. 40 (1979), 20–36.
- [13] D. Cullina and N. Kiyavash, An improvement to Levenshtein’s upper bound on the cardinality of deletion correcting codes, IEEE Trans. Inform. Theory 60 (2014), 3862–3870.
- [14] D. Cullina, N. Kiyavash, and A. A. Kulkarni, Restricted composition deletion correcting codes, IEEE Trans. Inform. Theory 62 (2016), 4819–4832.
- [15] D. Cullina, A. A. Kulkarni, and N. Kiyavash, A coloring approach to constructing deletion correcting codes from constant weight subgraphs, ISIT 2012, 513–517.
- [16] Ph. Delsarte and Ph. Piret, Spectral enumerators for certain additive-error-correcting codes over integer alphabets, Inform. Contr. 48 (1981), 193–210.
- [17] L. Dolecek and V. Anantharam, Repetition error correcting sets: Explicit constructions and prefixing methods, SIAM J. Discrete Math. 23 (2010), 2120–2146.
- [18] A. Droll, A classification of Ramanujan unitary Cayley graphs, Electron. J. Combin. 17 (2010), #N29.
- [19] C. F. Fowler, S. R. Garcia, and G. Karaali, Ramanujan sums as supercharacters, Ramanujan J. 35 (2014), 205–241.
- [20] R. Gabrys, E. Yaakobi, and O. Milenkovic, Codes in the Damerau distance for deletion and adjacent transposition correction, IEEE Trans. Inform. Theory 64 (2018), 2550–2570.
- [21] B. D. Ginzburg, A certain number-theoretic function which has an application in coding theory (Russian), Problemy Kibernet. 19 (1967), 249–252.
- [22] M. Hagiwara, A short proof for the multi-deletion error correction property of Helberg codes, IEICE Comm. Express 5 (2016), 49–51.
- [23] A. S. J. Helberg and H. C. Ferreira, On multiple insertion/deletion correcting codes, IEEE Trans. Inform. Theory 48 (2002), 305–308.
- [24] T. Helleseth and T. Kløve, On group-theoretic codes for asymmetric channels, Inform. Contr. 49 (1981), 1–9.
- [25] D. Jacobson and K. S. Williams, On the number of distinguished representations of a group element, Duke Math. J. 39 (1972), 521–527.
- [26] T. Kløve, Error correcting codes for the asymmetric channel, Tech. Rep., Department of Informatics, University of Bergen, Norway, 1995.
- [27] J. C. Kluyver, Some formulae concerning the integers less than and prime to , In Proc. R. Neth. Acad. Arts Sci. (KNAW) 9 (1906), 408–414.
- [28] A. A. Kulkarni and N. Kiyavash, Nonasymptotic upper bounds for deletion correcting codes, IEEE Trans. Inform. Theory 59 (2013), 5115–5130.
- [29] T. A. Le and H. D. Nguyen, New multiple insertion/deletion correcting codes for non-binary alphabets, IEEE Trans. Inform. Theory 62 (2016), 2682–2693.
- [30] D. N. Lehmer, Certain theorems in the theory of quadratic residues, Amer. Math. Monthly 20 (1913), 151–157.
- [31] A. Lenz, P. H. Siegel, A. Wachter-Zeh, and E. Yaakobi, Coding over sets for DNA storage, ISIT 2018, 2411–2415.
- [32] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals (in Russian), Dokl. Akad. Nauk SSSR 163 (1965), 845–848. English translation in Soviet Physics Dokl. 10 (1966), 707–710.
- [33] B. Mans and I. Shparlinski, Random walks, bisections and gossiping in circulant graphs, Algorithmica 70 (2014), 301–325.
- [34] R. J. McEliece and E. R. Rodemich, The Constantin–Rao construction for binary asymmetric error-correcting codes, Inform. Contr. 44 (1980), 187–196.
- [35] A. Mednykh and R. Nedela, Enumeration of unrooted maps of a given genus, J. Combin. Theory Ser. B 96 (2006), 706–729.
- [36] H. Mercier, V. K. Bhargava, and V. Tarokh, A survey of error-correcting codes for channels with symbol synchronization errors, IEEE Commun. Surv. Tutor. 12 (2010), 87–96.
- [37] M. Mitzenmacher, A survey of results for deletion channels and related synchronization channels, Probab. Surv. 6 (2009), 1–33.
- [38] H. L. Montgomery and R. C. Vaughan, Multiplicative Number Theory I: Classical Theory, Cambridge University Press, (2006).
- [39] M. B. Nathanson, Additive Number Theory: The Classical Bases, Springer-Verlag, (1996).
- [40] M. Planat, M. Minarovjech, and M. Saniga, Ramanujan sums analysis of long-period sequences and noise, Europhys. Lett. EPL 85 (2009), 40005.
- [41] S. Ramanujan, On certain trigonometric sums and their applications in the theory of numbers, Trans. Cambridge Philos. Soc. 22 (1918), 259–276.
- [42] C. Schoeny, A. Wachter-Zeh, R. Gabrys, and E. Yaakobi, Codes correcting a burst of deletions or insertions, IEEE Trans. Inform. Theory 63 (2017), 1971–1985.
- [43] P. Scholl and N. Smart, Improved key generation for Gentry’s fully homomorphic encryption scheme, Cryptogr. Coding, LNCS 7089 (2011), 10–22.
- [44] N. J. A. Sloane, On single-deletion-correcting codes, In Codes and Designs, Ohio State University, May 2000 (Ray-Chaudhuri Festschrift), K. T. Arasu and A. Seress (editors), Walter de Gruyter, Berlin, 2002, pp. 273–291.
- [45] R. P. Stanley, Enumerative Combinatorics, Vol. 1, 2nd ed., Cambridge University Press, (2012).
- [46] R. P. Stanley and M. F. Yoder, A study of Varshamov codes for asymmetric channels, Jet Propulsion Laboratory, Technical Report 32-1526, Vol. XIV (1973), 117–123.
- [47] P. P. Vaidyanathan, Ramanujan sums in the context of signal processing–Part I: Fundamentals, IEEE Trans. Signal Process. 62 (2014), 4145–4157.
- [48] P. P. Vaidyanathan, Ramanujan sums in the context of signal processing–Part II: FIR representations and applications, IEEE Trans. Signal Process. 62 (2014), 4158–4172.
- [49] R. R. Varshamov and G. M. Tenengolts, A code which corrects single asymmetric errors (in Russian), Avtomat. i Telemeh. 26 (1965), 288–292. English translation in Automat. Remote Control 26 (1965), 286–290.
- [50] R. C. Vaughan, The Hardy-Littlewood Method, second edition, Cambridge University Press, (1997).
- [51] R. Venkataramanan, V. N. Swamy, and K. Ramchandran, Low-complexity interactive algorithms for synchronization from deletions, insertions, and substitutions, IEEE Trans. Inform. Theory 61 (2015), 5670–5689.
- [52] S. M. S. Tabatabaei Yazdi and L. Dolecek, A deterministic polynomial-time protocol for synchronizing from deletions, IEEE Trans. Inform. Theory 60 (2014), 397–409.
Comments
There are no comments yet.