 # Tutorial on algebraic deletion correction codes

The deletion channel is known to be a notoriously diffcult channel to design error-correction codes for. In spite of this difficulty, there are some beautiful code constructions which give some intuition about the channel and about what good deletion codes look like. In this tutorial we will take a look at some of them. This document is a transcript of my talk at the coding theory reading group on some interesting works on deletion channel. It is not intended to be an exhaustive survey of works on deletion channel, but more as a tutorial to some of the important and cute ideas in this area. For a comprehensive survey, we refer the reader to the cited sources and surveys. We also provide an implementation of VT codes that correct single insertion/deletion errors for general alphabets at https://github.com/shubhamchandak94/VT_codes/.

Comments

There are no comments yet.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Deletion channel is one of the most fundamental of the channels, and is still not well understood. Most of you would know what a deletion channel/error looks like. But, to give an example, this is what a single deletion looks like:

 10110→1010

Here the decoder receives the message and needs to recover the original message. This corresponds to a single deletion, either in position 3 or 4. We cannot say what position the deletion occurred.

In relation to our favourite erasure channel, where such a single error might look like:

 10110→10e10

Here, we know what positions the erasure happened. The deletion channel is in fact a strictly worse channel, as we can convert the erasure channel output into a deletion channel output, by simply removing all the ‘’ symbols from the output.

We will also see that it has connections to our second favourite channel the Binary Symmetric Channel (BSC) channel.

### 1.1 Capacity of Deletion Channel

We define a binary deletion channel, BDC with deletion rate , where each symbol in input

can get deleted with probability

. This definition has similarities with the BEC and BSC, but that’s where the similarity ends. Surprisingly unlike BEC and BSC, we know quite less about BDC.

1. We still do not know the capacity of the BDC channel. One reason is that BDC is not really a discrete memoryless channel (DMC). For a DMC, you should be able to write:

 p(yn|xn)=n∏i=1p(yi|xi)

We cannot write the same for deletion channels, one reason: as the output does not have “length”

, and is in fact a random variable. Shannon theory gives us a nice characterization for the capacity of DMCs:

 C=maxp(x)I(X;Y) (1)

But, without this nice expression, finding the capacity is a much more difficult task.

2. We can of course get some bounds on the deletion channel capacity. For example: we know that binary erasure channel is strictly better than the BDC. Thus:

 CBDC(α)≤1−α (2)

This has been an open problem since quite some time now, but there has been some recent progress on this, which I will briefly talk about.

Most of the results are a part of the survey by Mitzmacher . Kalai, Mitzenmacher, and Sudan  showed that when the capacity is almost equal to the BSC. Best known capacity lower bounds I am aware of are : . Recently, improved capacity upper bounds were obtained by Cheraghchi  which state that:

 CBDC≤(1−α)logΦ (3)

for , where is the golden ratio. For those of you who are interested, they essentially find capacity bounds for a channel known as Poisson repeat channel (the number of times each symbol is repeated is a Poisson random variable), which are then ported over for deletion channel as a special case.

This is mainly to give a sense of how difficult the deletion channel analysis is, and how little we really know about it. Typically, not knowing the capacity directly translates to not knowing good codes. But, surprisingly we know some nice code constructions in specific cases, which we will discuss next.

Deletion channel is also related to the insertion channel, or the indel channel, where both insertions and deletions happen, or general edit distance channels: indels + substitutions/reversals. A better understanding of the deletion channel is useful not just for communication, but for the problem of denoising, which is quite common (say when we type things and miss a character).

## 2 Adversarial deletion error

We spoke about “Shannon-type” random deletion error capacity. But, for majority of the talk, we will mainly talk about the adversarial error, where when we say errors, we mean at max symbols are deleted. In this context let us define an -deletion error correction code.

###### Definition 1

For any vector

we define as the descendants; i.e. all the vectors in obtained after deletions.

For example:
For ,
For ,

Thus the descendant balls need not be of the same sizes unlike the Hamming balls we are familiar with. And this will lead to some interesting scenarios as we will see. In fact we can analyze the size of 1-deletion balls:

###### Lemma 1

The size of 1-deletion Descendant ball of a vector is equal to the number of “runs” in .

For example:
has runs in total . Thus, . It is easy to see why this is true: Any deletion in a run essentially leads to the same length sequence in the descendant ball.

###### Definition 2

We call a subset to be a -error correction code, if for any ,

We will mainly deal with during the talk.

### 2.1 Repetition coding

Lets start with the simplest of the -deletion correction codes you can think of: “repetition codes”. We repeat every symbol

times. Is that sufficient? Let us say, we observe some symbol an odd number of times, then we know that a symbol was deleted in that runs of

’s or ’s. Note that, we still cannot figure out the location of the deletion, but can figure out what was deleted.

This idea can in fact be extended to correct deletions by repeating every symbol times. But, this is quite bad: For correcting error, our communication rate is . Still better that the BSC case, where we had to repeat things 3 times.

Cool! Can we do better? We will next look at a cool class of codes known as the VT-codes; or the Varshamov-Tenengolts code. But before doing let’s look at a puzzle.

### 2.2 A puzzle

I believe the puzzle has some connection with VT-codes, might help with the understanding; but if not, it is a cool simple puzzle! So lets say, Mary is the Queen of the seven kingdoms, and she had ordered 100 big barrels filled with gold coins, where each coin has a weight of 10gm. But, she knows from her secret agency that one of the barrels contains counterfeit coins weighing only 9gm. She has an electric weighing scale which she can use; so the question is:

How can she determine which barrel contains the counterfeit coins with a single measurement

The solution is simple: she takes coins from the barrel and places them on the electronic weighing scale. Now if the weight is less that expected by grams, then it is the barrel which is counterfeit! We will come back to this puzzle :)

## 3 VT-codes

Allright! We are all set to define VT-codes.

###### Definition 3

Varshamov-Tenengolts code is defined as:

 VTa(n)={xn|n∑i=1ixi≡amod(n+1)} (4)

Some historical context on these codes: These codes were first proposed as error correction codes for 1-bit Z-channel error! Which means essentially , but not the other way round. Z-channel errors are known as asymmetric bit-flips. Varshamov and Tenengolts proposed these codes in 1965 , and then Levenshtein discovered that these codes in fact work well also for the deletion channel as well!

#### Z-channel correction

So before we look into 1 deletion correction, let us see how they can correct one Z-channel error, i.e. one of the ’s can flip to a .

Let be the received symbol: then we can still compute:

 S=a−n∑i=1i^xi

In case, there is a flip at position , then . Thus, we can correct the Z-channel error! Here is where the similarity with the puzzle can be seen.

### 3.1 VT-codes decoding

We are all set to discuss the decoding for deletion channel:

1. First of all, note that if it is only a deletion channel, then “error detection” comes for free from the length of the code, unlike the bit-flipping error

2. Let be the received erroneous codeword after deletion. Define:

 w =n−1∑i=1yi (5) S =a−n−1∑i=1iyi (6)
3. Let be the position at which deletion occurred, as in was deleted. Let be the counts of 1,0 to the left of position . and be the counts to the right. In that case: if and if .

 S =p+R1 =L0+L1+1+R1 =w+L0+1

Thus, as , if , is deleted at a position with 1’s to its right, otherwise, is deleted such that there are zeros to its left.

4. Note that we can thus uniquely determine the sequence, but we cannot determine the exact location of the deletion here, as it can be any or in the run we identified.

This beautiful decoding algorithm was given by Levenshtein in his 1965 work . Also most of the things which I will talk are from the survey by Sloane .

### 3.2 VT-codes rate

As the VT-codes are combinatorial, we can get exact formulae for their sizes. Exact combinatorial formulate can be found in . Here are few interesting things:

1. ###### Lemma 2

For some :

 |VTa(n)|≥2nn+1

As every lies in exactly one of sets:

 n∑a=0|VTa(n)|=2n

This leads to the lemma.

2. It can be in fact shown that leads to the largest code size, and the smallest size.

 |VT0(n)|≤|VTa(n)|≤|VT1(a)|
3. For , all the VT-code sizes are in fact equal to .

### 3.3 Optimality of VT-codes

We say a -error correction code is “optimal” if it has the smallest size amongst all -error correction codes. Let us analyze the “optimality” of VT-codes.

1. Levenshtein  showed that, optimal -deletion correction codes have asymptotic sizes . This makes VT-codes asymptotically optimal.

2. People have not been able to prove that VT-codes are optimal non-asymptotically. Finding the “optimal” 1-deletion code is a NP-hard problem, as it involves finding the independent set on the graph where vertices are connected if they lead to the same deletion descendants . But for it is known that they are optimal, using computer programs. Sizes of these codes are:

 1,2,2,4,6,10,16,30

For higher due to the exponential nature of the algorithms, we cannot say anything yet.

3. VT-codes also have the property that they are “perfect codes”  which implies that their descendent sets which are disjoint cover the entire sized. For example:

 VT0(3) ={000,101} Descendants ={00,10,01,11}

Levenshtein showed that surprisingly, this is true for all , which is quite cool in itself! The perfect codes analogy comes from Hamming codes being Perfect. But, unlike the Hamming distortion case, here perfect codes does not imply optimal codes? Why? For example, code is not perfect but is perhaps a better code that , potentially there might be a larger code for larger . Why does this happen? Because, the number of descendants are not fixed, some have and some have more.

4. Linearity: VT codes are linear until but never after! . Variants of VT-code (restrictions on VT-codes) can be made linear by considering redundancy to be as against . Althought I am not sure how is linearity of codes useful, if the decoding is still non-linear (linear time complexity, but non-linear in nature).

### 3.4 Systematic Encoding

Now that we have taken a look at the linear-time decoding of VT-codes, it is a natural question to ask if there exist a nice way to encode data. This problem surprisingly remained open for more than 30 years until 1998 when Abdel-Gaffar et al.  provided a very convenient way of in fact “systematic encoding” of data.

1. For , let be the number of data bits, and are the “parity” bits. Let the data bits be , and the code-word to be formed in .

2. Fill in the data except in positions . Thus codeword looks like:

 x1,x2,b1,x4,b2,b3,b4,x8,…,bm

We can compute: . As most of the positions of (except ) are decided, to obtain , we need:

 q−1∑j=02jx2jmod(2q)=^a

We can now conveniently choose as the -bit binary expansion of .

3. Note that for 1-deletion correction is exactly same as the rate of Hamming code. Not sure if this is a coincidence or something more!

## 4 Insertion + Deletion + Substitution codes

We looked at -deletion correction channels in depth. In the next part of the tutorial, we will extend this understanding to more general scenarios. The first scenario is insertion instead of deletion error.

### 4.1 General indel error codes

Levenshtein  showed this general lemma:

###### Lemma 3

Any -deletion correction code can also correct deletions and insertion errors where .

Note that here we do not need to know beforehand. The general proof is quite simple. Here, we will prove the simpler version of -insertion error, as that is sufficient to get an intuitive understanding.

1. Let us assume that is a -deletion correction code. We want to show that can correct -insertion errors as well.

2. Let us assume that on the contrary, there exist codewords such that after one insertion error in them the resulting noisy codewords are equal. Let the insertion occured at position in and at position in .

3. As , even after deleting symbols in position in both , results in vectors which are equal. However, the length codewords are in fact -deletion descendants of the codewords . This is contradictory to the definition of deletion correction codes, as no descendants can be equal. Thus, has to be -insertion correction as well.

Note that, this is more of an existential result, and efficient deletion error decoding algorithms might not translate into efficient insertion detection algorithms.

### 4.2 VT-codes for 1-insertion, deletion, substitution correction

Levenshtein showed the surprising fact that with a simple modification standard VT-codes can be converted into 1-insertion, deletion and substitution correction codes. The modification is as follows:

is defined as:

 ^VTa(n)={xn|n∑i=1ixi≡amod(2n+1)} (7)

Let us try to understand why work:

1. First of all, from the length of the code, we know whether there is an insertion, deletion or a substitution.

2. Recollect that deletion error correction in codes only depends on distinct remainders modulus . This should still hold true if the modulus is taken . Thus, with , 1-deletion correction still holds.

3. 1-insertion code ability was already shown from the general lemma earlier. However, using a similar remainder trick , insertions can in fact be corrected efficiently using the codes.

4. The only case remaining to analyze is the 1-substitution or 1-bitflip case. Let at position in the codeword resulting in the noisy codework . Clearly:

 S =a−n∑i=1i^ximod(2n+1) =2n−p

If , then:

 S =a−n∑i=1i^ximod(2n+1) =p

As all these values are distinct, we can correct for 1-bitflip. Note that, this construction is not optimal just for 1-bitflip, as it essentially encodes 1 bit less than Hamming codes.

## 5 VT-codes for larger alphabet

Creating deletion codes for larger alphabets becomes a bit tricky. Of course, repetition coding still works on non-binary alphabets.

#### Code which does not work

When I started thinking about this problem, I came up with this code, which has a bug! Let us still take a look at it, as it gives some understanding as to the intricacies of code-design:

Define for as:

 βi =1, xi≠0 =0, otherwise

Then we consider code as:

 C={xn| n∑i=1iβi=amod(n+1), n∑j=1xi=bmod(q)}

Let us look at the argument as to why the code works, and try to find the bug!
Argument: The first equation, similar to the binary VT code will tell us the position of the deletion, and the second equation tells us the value.

Why does the above argument not work. The reason is that binary VT-codes not not actually tell us the position of the deletion correctly. They can tell us in which “run” the deletion happened, and hence obtain the codeword correctly, but not the exact position.

How do we solve for that?

#### Code which works

This code appears in the work of Tenengolts in 1984. 
Define for as:

 αi =1, xi≥xi−1 =0, xi

Then we consider the code:

 C={xn| n∑i=2(i−1)αi=amod(n), n∑j=1xi=bmod(q)}

Let us try to analyze the decoding for this code:

1. First of all, as in the previous code, from the second constraint , we figure out what is the value of the deleted symbol. What remains to determine is its position.

2. The sequence is essentially capturing the monotonous regions of the sequence. , when the sequence is increasing (non-decreasing), and when it is decreasing. Thus, a deletion in sequence will in fact lead to exactly deletion in the sequence (the position of deletion might be shifted by 1, but it does not matter as VT-codes do not correct for position).

3. As the first equation is a VT-code, it can determine the run in which the deletion occurred. As every run in sequence corresponds to monotonic increasing/decreasing subsequence in , from the value of the deleted symbol, we can correctly place it and complete the decoding!

Efficient systematic encoding for this code was discovered recently in a work by Abroshan et al.  and is included in the implementation at https://github.com/shubhamchandak94/VT_codes/.

## 6 Bursty deletion codes

In this section we will look at Bursty deletions. By a single bursty deletion of size , we mean that some consecutive symbols were deleted. Note that, -bursty deletion correction codes can correct for exactly 1 burst of size , but surprisingly they need not correct for a burst of size . For example: can correct 2-bursty errors, but not single deletion errors!

### 6.1 VT-code based construction

We consider a construction based on single-deletion correction VT-codes to correct a single burst of deletions. How should one do that? One simple trick is to distribute these deletions across the length sequence, so that each length subsequence has exactly 1 deletion. For simplicity, let .

Then the codeword has the property that each of the rows below, belong to a code.

 x1,xs+1,x2s+1,…,xn−s+1 x2,xs+2,x2s+2,…,xn−s+2 ⋮ xs,x2s,x2s+2,…,xn
1. Let us analyze the scenario of 1 bursty error of size from position are deleted. This corresponds to exactly 1 deletion in each of the rows.

2. We still need to figure out which symbols of the deleted codeword belong to which rows, as the alignment might no longer be true. The cool thing is that, if there are exactly consecutive deletions, then every symbol will still be correctly aligned to the rows.

3. Thus, our code can in fact correct a bursty error of deletions, but need not correct lower number of bursts, which is quite unusual!

One important caveat to observe here is that the position of deletion in each of the rows is the same, or shifted by 1. Thus, if one of the rows is a VT-code, and the other rows just tell whether the error position is odd or even, that is enough to resolve the bursty error. This observation is the basis of further improvements to bursty error correction. For more details take a look at this paper: .

## 7 Multiple Deletions

One would imagine that it should be possible to extend the elegant construction of VT-codes from single deletion correction to multiple deletions. However, this problem has proved to be much more difficult.

There have been some recent works which extend the single deletion errors to multiple errors. Gabrys et al  provide an extension of the VT-codes idea to correct two deletions. There have been other recent works which provide multiple deletion correcting codes using different (non VT-code based) ideas .

## Acknowledgement

I would like to thank Jay Mardia and Mary Wootters for interesting discussions on deletion codes.

## References

•  Khaled AS Abdel-Ghaffar and Hendrik C Ferreira. Systematic encoding of the Varshamov-Tenengol’ts codes and the Constantin-Rao codes. IEEE Transactions on Information Theory, 44(1):340–345, 1998.
•  Mahed Abroshan, Ramji Venkataramanan, and Albert Guillen I Fabregas. Efficient systematic encoding of non-binary vt codes. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 91–95. IEEE, 2018.
•  Joshua Brakensiek, Venkatesan Guruswami, and Samuel Zbarsky. Efficient low-redundancy codes for correcting multiple deletions. IEEE Transactions on Information Theory, 64(5):3403–3410, 2017.
•  Mahdi Cheraghchi. Capacity upper bounds for deletion-type channels. Journal of the ACM (JACM), 66(2):9, 2019.
•  Eleni Drinea and Michael Mitzenmacher. Improved lower bounds for the capacity of iid deletion and duplication channels. IEEE Transactions on Information Theory, 53(8):2693–2714, 2007.
•  Ryan Gabrys and Frederic Sala. Codes correcting two deletions. IEEE Transactions on Information Theory, 65(2):965–974, 2018.
•  Adam Kalai, Michael Mitzenmacher, and Madhu Sudan. Tight asymptotic bounds for the deletion channel with small deletion probabilities. In 2010 IEEE International Symposium on Information Theory, pages 997–1001. IEEE, 2010.
•  Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, pages 707–710, 1966.
•  Vladimir I Levenshtein. On perfect codes in deletion and insertion metric. Discrete Mathematics and Applications, 2(3):241–258, 1992.
•  Michael Mitzenmacher et al. A survey of results for deletion channels and related synchronization channels. Probability Surveys, 6:1–33, 2009.
•  Clayton Schoeny, Antonia Wachter-Zeh, Ryan Gabrys, and Eitan Yaakobi. Codes correcting a burst of deletions or insertions. IEEE Transactions on Information Theory, 63(4):1971–1985, 2017.
•  Neil JA Sloane. On single-deletion-correcting codes. Codes and designs, 10:273–291, 2000.
•  Grigory Tenengolts. Nonbinary codes, correcting single deletion or insertion (Corresp.). IEEE Transactions on Information Theory, 30(5):766–769, 1984.
•  RR Varshamov and GM Tenengolts. Codes which correct single asymmetric errors (in Russian). Automatika i Telemkhanika, 161(3):288–292, 1965.