 # Trace Reconstruction: Generalized and Parameterized

In the beautifully simple-to-state problem of trace reconstruction, the goal is to reconstruct an unknown binary string x given random "traces" of x where each trace is generated by deleting each coordinate of x independently with probability p<1. The problem is well studied both when the unknown string is arbitrary and when it is chosen uniformly at random. For both settings, there is still an exponential gap between upper and lower sample complexity bounds and our understanding of the problem is still surprisingly limited. In this paper, we consider natural parameterizations and generalizations of this problem in an effort to attain a deeper and more comprehensive understanding. We prove that (O(n^1/4√( n))) traces suffice for reconstructing arbitrary matrices. In the matrix version of the problem, each row and column of an unknown √(n)×√(n) matrix is deleted independently with probability p. Our results contrasts with the best known results for sequence reconstruction where the best known upper bound is (O(n^1/3)). An optimal result for random matrix reconstruction: we show that Θ( n) traces are necessary and sufficient. This is in contrast to the problem for random sequences where there is a super-logarithmic lower bound and the best known upper bound is (O(^1/3 n)). We show that (O(k^1/3^2/3 n)) traces suffice to reconstruct k-sparse strings, providing an improvement over the best known sequence reconstruction results when k = o(n/^2 n). We show that poly(n) traces suffice if x is k-sparse and we additionally have a "separation" promise, specifically that the indices of 1's in x all differ by Ω(k n).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the trace reconstruction problem, first proposed by Batu et al. (2004), the goal is to reconstruct an unknown string given a set of random subsequences of . Each subsequence, or “trace”, is generated by passing through the deletion channel in which each entry of is deleted independently with probability . The locations of the deletions are not known; if they were, the channel would be an erasure channel. The central question is to find how many traces are required to exactly reconstruct with high probability.

This intriguing problem has attracted significant attention from a large number of researchers (Kannan and McGregor, 2005; Viswanathan and Swaminathan, 2008; Batu et al., 2004; Holenstein et al., 2008; Holden et al., 2018; Peres and Zhai, 2017; Hartung et al., 2018; Nazarov and Peres, 2017; De et al., 2017; McGregor et al., 2014; Davies et al., 2019; Cheraghchi et al., 2019). In a recent breakthrough, De et al. (2017) and Nazarov and Peres (2017) independently showed that traces suffice where . This bound is achieved by a mean-based algorithm, which means that the only information used is the fraction of traces that have a 1 in each position. While is known to be optimal amongst mean-based algorithms, the best algorithm-independent lower bound is the much weaker  (Holden and Lyons, 2018).

Many variants of the problem have also been considered including: (1) larger alphabets and (2) an average case analysis where is drawn uniformly from . Larger alphabets are only easier than the binary case, since we can encode the alphabet in binary, e.g., by mapping a single character to 1 and the rest to 0 and repeating for all characters. In the average case analysis, the state-of-the-art result is that traces suffice111 is assumed to be constant in that work., whereas traces are necessary (Hartung et al., 2018; Holden et al., 2018; Holden and Lyons, 2018). Very recently, and concurrent with our work, other variants have been studied including a) where the bits of are associated with nodes of a tree whose topology determines the distribution of traces generated Davies et al. (2019) and b) where is a codeword from a code with redundancy Cheraghchi et al. (2019).

In this paper, in order to develop a deeper understanding of this intriguing problem, we consider fine-grained parameterization and structured generalizations of trace reconstruction. We prove several new results for these variations that shed new light on the problem. Moreover, in studying these settings, we refine existing tools and introduce new techniques that we believe may be helpful in closing the gaps in the fully general problem.

### 1.1 Our Results

#### Parametrizations.

We begin by considering parameterizations of the trace reconstruction problem. Given the important role that sparsity plays in other reconstruction problems (see, e.g., Gilbert and Indyk (2010)), we first study the recovery of sparse strings. Here we prove the following result.

###### Theorem 1.

If has at most non-zeros, traces suffice to recover exactly, with high probability, where is the retention probability.

As some points of comparison, note that there is a trivial upper bound, which our result improves on with a polynomially better dependence on in the exponent. The best known results for the general case is  (Nazarov and Peres, 2017; De et al., 2017) and our result is a strict improvement when . Note that since we have no restrictions on in the statement, improving upon would imply an improved bound in the general setting.

Somewhat surprisingly, our actual result is considerably stronger (See corr:del for a precise statement). We also obtain sample complexity in an asymmetric deletion channel, where each 0 is deleted with probability exponentially close to , but each 1 is deleted with probability . With such a channel, all but a vanishingly small fraction of the traces contain only 1s, yet we are still able to exactly identify the location of every 0. Since we can accommodate this result also applies to the general case with an asymmetric channel, yielding improvements over De et al. (2017) and Nazarov and Peres (2017).

We elaborate more on our techniques in the next section, but the result is obtained by establishing a connection between trace reconstruction and learning binomial mixtures. There is a large body of work devoted to learning mixtures Dasgupta (1999); Achlioptas and McSherry (2005); Kalai et al. (2010); Belkin and Sinha (2010); Arora and Kannan (2001); Moitra and Valiant (2010); Feldman et al. (2008); Chan et al. (2013); Hopkins and Li (2018); Hardt and Price (2015) where it is common to assume that the mixture components are well-separated. In our context, separation corresponds to a promise that each pair of 1s in the original string is separated by a 0-run of a certain length. Our second result concerns strings with a separation promise.

###### Theorem 2.

If has at most 1s and each 1 is separated by 0-run of length , then, with , traces suffice to recover with high probability.

Note that reconstruction with traces is straightforward if every 1 is separated by a 0-run of length ; the basic idea is that we can identify which 1s in a collection of traces correspond to the same 1 in the original sequence and then we can use the indices of these 1s in their respective traces to infer the index of the 1 in the original string. However, reducing to separation is rather involved and is perhaps the most technically challenging result in this paper.

Here as well, we actually obtain a slightly stronger result. Instead of parameterizing by the sparsity and the separation, we instead parameterize by the number of runs, and the run lengths, where a run is a contiguous sequence of the same character. We require that each 0-run has length , where is the total number of runs. Note that this parameterization yields a stronger result since is at most if the string is sparse, but it can be much smaller, for example if the 1-runs are very long. On the other hand, the best lower bound, which is  (Holden and Lyons, 2018), considers strings with runs and run length .

As our last parametrization, we consider a sparse testing problem. We specifically consider testing whether the true string is or , with the promise that the Hamming distance between and , , is at most . This question is naturally related to sparse reconstruction, since the difference sequence is sparse, although of course neither string may be sparse on its own. Here we obtain the following result.

###### Theorem 3.

For any pair with , traces suffice to distinguish between and with high probability.

#### Generalizations.

Turning to generalizations, we consider a natural multivariate version of the trace reconstruction problem, which we call matrix reconstruction. Here we receive matrix traces of an unknown binary matrix , where each matrix trace is obtained by deleting each row and each column with probability , independently. Here the deletion channel is much more structured, as there are only random bits, rather than in the sequence case. Our results show that we can exploit this structure to obtain improved sample complexity guarantees.

In the worst case, we prove the following theorem.

###### Theorem 4.

For the matrix deletion channel with deletion probability , traces suffice to recover an arbitrary matrix .

While no existing results are directly comparable, it is possible to obtain sample complexity via a combinatorial result due to Kós et al. (2009)

. This agrees with the results from the sequence case, but is obtained using very different techniques. Additionally, our proof is constructive, and the algorithm is actually mean-based, so the only information it requires are estimates of the probabilities that each received entry is

1. As we mentioned, for the sequence case, both Nazarov and Peres (2017) and De et al. (2017) prove a lower bound for mean-based algorithms. Thus, our result provides a strict separation between matrix and sequence reconstruction, at least from the perspective of mean-based approaches.

Lastly, we consider the random matrix case, where every entry of is drawn iid from . Here we show that traces are sufficient.

###### Theorem 5.

For any constant deletion probability , traces suffice to reconstruct a random with high probability over the randomness in and the channel.

This result is optimal, since with traces, there is reasonable probability that a row/column will be deleted from all traces, at which point recovering this row/column is impossible. The result should be contrasted with the analogous results in the sequence case. For sequences, the best results for random strings is  (Holden et al., 2018) and  (Holden and Lyons, 2018). In light of the lower bound for sequences, it is suprising that matrix reconstruction admits sample complexity.

### 1.2 Our Techniques

To prove our results, we refine and extend many existing ideas in prior trace reconstruction results, and we also introduce several new techniques.

thm:sparsity_intro and thm:matrix_intro follow the complex-analytic recipe introduced by De et al. (2017) and Nazarov and Peres (2017) for the general problem. The basic idea is to show that when passing two strings through the deletion channel, their expected traces differ significantly in at least one position. This is done by constructing a certain Littlewood polynomial (a polynomial whose coefficients are in ) and using a fact due to Borwein and Erdélyi (1997) that for any such polynomial, there exists a complex number such that . This bound shows that the polynomial is non-trivially large, which demonstrates separation between the traces from and . The dependence on arises because the polynomial has degree .

For thm:sparsity_intro, our insight is that we can construct a polynomial that has degree , which (a) can be estimated from traces, and (b) uniquely identifies . This polynomial arises from a connection to learning binomial mixtures: for any 0 received in a trace, the number of 1s that precede it in the trace is drawn from , where is the partial sum of the original string up to this 0 and is the retention probability of the channel. Learning the binomial parameters recovers the partial sums and hence the string, and to solve this latter problem, we apply the recipe above. While the polynomial we construct now has degree , which leads to the refined guarantee, it is not a Littlewood polynomial. Fortunately, the result of Borwein and Erdélyi (1997) actually applies more broadly to polynomials with lower bounded coefficients, although this was not used in the prior analyses. Here we leverage this generalization, and we expect moving beyond Littlewood polynomials will be useful elsewhere.

For thm:matrix_intro, we extend the Littlewood argument to multivariate polynomials. Since the matrices are , we use a natural bivariate polynomial of degree , which yields the improvement. However, the result of Borwein and Erdélyi (1997) applies only to univariate polynomials. Our key technical result is a generalization of their result to accommodate bivariate Littlewood polynomials, which we then use to demonstrate separation.

In contrast with the above analytic arguments, thm:gaps_intro is proved using classical algorithmic methods. The algorithm performs a hierarchical clustering to group the individual

1s in all received traces according to their corresponding position in the original string. This clustering step requires a careful recursion, where in each step we ensure no false negatives (two 1s from the same origin are always clustered together) but we have many false positives, which we successively reduce. At the bottom of the recursion, we can identify a large fraction 1s from each 1 in the original string. However, as the recursion eliminates many of the 1s, simply averaging the positions of the surviving fraction leads to a biased estimate. To resolve this, we introduce a de-biasing step which eliminates even more 1s, but ensures the survivors are unbiased, so that we can accurately estimate the location of each 1 in the original string. Somewhat interestingly, the initial recursion has levels, which is critical since the debiasing step involves conditioning on the presence of 1s in a trace, which only happens with probability .

For thm:random_matrix_intro, our approach is also algorithmic. Using an averaging argument and exploiting randomness in the original matrix, we construct a statistical test to determine if two rows (or columns) from two different traces correspond to the same row (column) in the original string. We show that this test succeeds with overwhelming probability, which lets us align the rows and columns in all traces. Once aligned, we know which rows/columns were deleted from each trace, so we can simply read off the original matrix .

Lastly, thm:hamming_intro leverages combinatorial arguments about -decks (the multiset of subsequences of a string) due to Krasikov and Roditty (1997). We defer details to app:hamming, but mention the result, as it demonstrates the utility of these combinatorial tools in trace reconstruction. As further evidence for the utility of combinatorial tools, the connection to -decks was also used by Ban et al. (2019) in independent concurrent work on the deletion channel.

### 1.3 Notation

Throughout, is the length of the binary string being reconstructed, is the number of 0s, is the number of 1s, i.e., the sparsity or weight. For matrices is the total number of entries, and we focus on square matrices. For most of our results, we assume throughout that are known since, if not, they can easily be estimated using a polynomial number of traces. Let denote the deletion probability when the 1s and 0s are deleted with the same probability. We also study a channel where the 1s and 0s are deleted with different probabilities; in this case, is the deletion probability of a 0 and is the deletion probability of a 1. We refer to the corresponding channel as the -Deletion Channel or the asymmetric deletion channel. It will also be convenient to define and as the corresponding retention probabilities. Throughout, denotes the number of traces.

## 2 Sparsity and Learning Binomial Mixtures

We begin with the sparse trace reconstruction problem, where we assume that the unknown string has at most 1

s. Our analysis for this setting is based on a simple reduction from trace reconstruction to learning a mixture of binomial distributions, followed by a new sample complexity guarantee for the latter problem. This approach yields two new results: first, we obtain an

sample complexity bound for sparse trace reconstruction, and second, we show that this guarantee applies even if the deletion probability for 0s is exponentially close to .

To establish our results, we introduce a slightly more challenging channel which we refer to as the Austere Deletion Channel. The bulk of the proof analyzes this channel, and we obtain results for the channel via a simple reduction.

###### Theorem 6 (Austere Deletion Channel Reconstruction).

In the Austere Deletion Channel, all but exactly one 0 are deleted (the choice of which 0 to retain is made uniformly at random) and each 1 is deleted with probability . For such a channel,

 m=exp(O((k/q1)1/3log2/3n))

traces suffice for sparse trace reconstruction where , provided .

We will prove this result shortly, but we first derive our main result for this section as a simple corollary.

###### Corollary 1 (Deletion Channel Reconstruction).

For the -deletion channel,

 m=q−10exp(O((k/q1)1/3logn))

traces suffice for sparse trace reconstruction where and .

###### Proof.

This follows from thm:austere. By focusing on just a single 0, it is clear that the probability that a trace from the -deletion channel contains at least one 0 is at least . If among the retained 0s we keep one at random and remove the rest, we generate a sample from the austere deletion channel. Thus, with samples from the deletion channel, we obtain at least samples from the austere channel and the result follows. Note that thm:sparsity_intro is a special case where . ∎

#### Remarks.

First, note that the case where is constant (a typical setting for the problem) and is not covered by the corollary. However, in this case a simpler approach applies to argue that traces suffice: with probability no 1s are deleted in the generation of the trace and given such traces, we can infer the original position of each 1 based on the average position of each 1 in each trace. Second, note that the weak dependence on ensures that as long as , we still have the bound. Thus, our result shows that sparse trace reconstruction is possible even when zeros are retained with exponentially small probability.

#### Reduction to Learning Binomial Mixtures.

We prove thm:austere via a reduction to learning binomial mixtures. Given a string of length , let be the number of ones before the zero in . For example, if then Note that the multi-set uniquely determines , that each , and that the multi-set has size . The reduction from trace reconstruction to learning binomial mixtures is appealingly simple:

1. Given traces from the austere channel, let be the number of leading ones in .

2. Observe that each is generated by a uniform222Note that since the are not necessarily distinct some of the binomial distributions are the same. mixture of where . Hence, learning from allows us to reconstruct .

To obtain thm:austere, we establish the following new guarantee for learning binomial mixtures.

###### Theorem 7 (Learning Binomial Mixtures).

Let be a mixture of binomials:

 Draw sample from Bin(ai,q) with probability αi

where are distinct integers, the values have precision, and . Then samples suffice to learn the parameters exactly with high probability.

We defer the proof to app:omittedproofs, as it uses many ideas from the work of Nazarov and Peres (2017) and De et al. (2017). thm:austere now follows from thm:learningmix, since in the reduction, we have binomials, one per 0 in , is a multiple of and importantly, we have . The key is that we have a polynomial with degree rather than a degree polynomial as in the previous analysis.

#### Remark.

If all are equal, thm:learningmix can be improved to by using a more refined bound from Borwein and Erdélyi (1997) in our proof. This follows by observing that if , then is a multiple of a Littlewood polynomial and we may use the stronger bound , see Borwein and Erdélyi (1997).

#### Lower Bound on Learning Binomial Mixtures.

As an aside, we prove that the exponential dependence on in thm:learningmix is necessary. The proof is deferred to app:omittedproofs.

###### Theorem 8 (Binomial Mixtures Lower Bound).

There exists subsets such that if and , then . Thus, samples are required to distinguish from .

## 3 Well-Separated Sequences

We now prove thm:gaps_intro, showing that traces suffice for reconstruction of a -sparse string when there are 0s between each consecutive 1. We call such sequences of 0s the 0-runs of the string. We also refer to the length of the shortest 0-run as the gap of the string .

###### Theorem (Restatement of thm:gaps_intro).

Let be a -sparse string of length and gap at least for a large enough . Then traces from the -Deletion Channel suffice to recover with high probability.

In sec:overview, we present the basic ideas and technical challenges in proving the theorem. We also describe the algorithm in detail and explain how to set the parameters. Full details are presented in app:letsgettechnical. In sec:runs, we strengthen thm:gaps_intro to show that traces suffice under the weaker assumption that each 0-run has length where is the total number of runs (0-runs + 1-runs). Observe that this is a weaker assumption, since always, but can be much less than .

### 3.1 A Recursive Hierarchical Clustering Algorithm and Its Analysis: Overview

Let denote the positions (index of the coordinate from the left) of the 1s in the original string . Let denote the multi-set of all positions of all received 1s and call . We will construct a graph on vertices where every vertex is associated with a received 1. We decorate each vertex with a number , which is the position of the associated received 1. Each vertex also has an unknown label denoting the corresponding 1 in the original string.

At a high level, our approach uses the observed values to recover the unknown labels . Once this “alignment” has been performed, the original string can be recovered easily, since the average of

is an unbiased estimator for

.

#### A starting observation.

Our first observation is a simple fact about binomial concentration, which we will use to define the edge set in : by the Chernoff bound, with high probability, for every vertex , if then we must have for some constant . Defining the edges in to be then guarantees that all vertices with are connected. This immediately yields an algorithm for the much stronger gap condition , since with such separation, no two vertices with will have an edge. Therefore, the connected components reveal the labeling so that traces suffice with .

Intuitively, we have constructed a clustering of the received 1s that corresponds to the underlying labeling. To tolerate a weaker gap condition, we proceed recursively, in effect constructing a hierarchical clustering. However there are many subtleties that must be resolved.

#### The first recursion.

To proceed, let us consider the weaker gap condition of . In this regime, still maintains a consistency property that for each all vertices with are in the same connected component, but now a connected component may have vertices with different labels, so that each connected component identifies a continguous set of the original 1s. Moreover, due to the sparsity assumption, must have length, defined as , at most . Therefore if we can correctly identify every trace that contains the left-most and right-most 1 in , we can recurse and are left to solve a subproblem of length . Appealing to our starting observation, this can be done with a gap of .

The challenge for this step is in identifying every trace that contains the left-most and right-most 1 in , which we call and respectively. This is important for ensuring a “clean” recursion, meaning that the traces used in the subproblem are generated by passing exactly the same substring through the deletion channel. To solve this problem we use a device that we call a Length Filter. For every trace, consider the subtrace that starts with the first received 1 in and ends with the last received 1 in (this subtrace can be identified using ). If the trace contains then the length of this subtrace is where is the distance between in the original string. On the other hand, if the subtrace does not contain both end points, then the length is where . Since we know that and we are operating with gap condition , binomial concentration implies that with high probability we can exactly identify the subtraces containing and .

#### Further recursion.

The difficulty in applying a second recursive step is that when the length filter cannot isolate the subtraces that contain the leftmost and rightmost 1s for a block , so we cannot guarantee a clean recursion. However, substrings that pass through the filter are only missing a short prefix/suffix which upper bounds any error in the indices of the received 1s. We ensure consistency at subsequent levels by incorporating this error into a more cautious definition of the edge set (in fact the additional error is the same order as the binomial deviation at the next level, so it has negligible effect). In this way, we can continue the recursion until we have isolated each 1 from the original string. The lower bound on run length arises since the gap at level of the recursion, , is related to the gap at level via with , and this recursion asymptotes at .

The last technical challenge is that, while we can isolate each original 1, the error in our length filter introduces some bias into the recursion, so simply averaging the values of the clustered vertices does not accurately estimate the original position. However, since we have isolated each 1 into pure clusters, for any connected component corresponding to a block of 1s, we can identify all traces that contain the first and last 1 in the block. Applying this idea recursively from the bottom up allows us to debias the recursion and accurately estimate all positions.

#### The algorithm in detail: recursive hierarchical clustering.

We now describe the recursive process in more detail. Let us define the thresholds:

 τ1=~O(n1/2),τ2=~O(k1/2n1/4),τ3=~O(k3/4n1/8),…,τD=~O(k1−1/2Dn1/2D),

which will be used in the length filter and in the definitions of the edge set. Observe that with , we have . Let denote the traces. We will construct a sequence of graphs on the vertex sets , where each vertex corresponds to a received 1 in some trace and is decorated with its position and the unknown label . The round of the algorithm is specified as follows with and as the set of all received 1s.

1. Define with edge set .

2. Extract connected components from .

3. For each connected component , extract subtraces where is the substring of starting with the first 1 in and ending with the last 1 in . Formally, with and , we define .

4. Length Filter: Define . If

 len(~x(d,i)j)≤L(d,i)−Ω(√L(d,i)log(L(d,i))),

delete all vertices with . Let be all surviving vertices.

5. For , define .

We analyze the procedure via sequence of lemmas. The first one establishes a basic consistency property: that two 1s originating from the same source 1 are always clustered together.

###### Lemma 1 (Consistency).

At level let for each . Then with high probability, for each and there exists some component at level such that .

The next lemma provides a length upper bound on any component, which is important for the recursion. At a high level since we are using a threshold at level and the string is -sparse, no connected component can span more than positions.

###### Lemma 2 (Length Bound).

For every component at level , we have . Moreover if is a contiguous subsequence of with , then .

Finally we characterize the length filter.

###### Lemma 3 (Length Filter).

For a component at level , let be the maximal contiguous subsequence of such that . Define and . Then for any , if and are present in , then survives to round , that is . Moreover, for any , let denote the original position of the first 1 from that is also in the trace . Then we have .

The lemmas are all interconnected and proved formally in the appendix. It is important that the error incurred by the length filter is which is exactly the binomial deviation at level . Thus the threshold used to construct accounts for both the length filter error and the binomial deviation. This property, established in lem:filter, is critical in the proof of lem:consistency.

For the hierarchical clustering, observe that after iterations, we have . With gap condition and applying lem:consistency, this means that the connected components at level each correspond to exactly one 1 in the original string. Moreover since the length filter preserves every trace containing the left-most and right-most 1 in the component, the probability that a subtrace passes through the length filter is at least . Hence, after levels, the expected number of surviving traces in each cluster is . Thus for each original 1 , our recursion identifies at least vertices such that .

#### Removing Bias.

The last step in the algorithm is to overcome bias introduced by the length filter. The de-biasing process works upward from the bottom of the recursion. Since we have isolated the vertices corresponding to each 1 in the original string, for a component at level , we can identify all subtraces that survived to this level that contain the first and last 1 of the corresponding block . Thus, we can eliminate all subtraces that erroneously passed this length filter.

Working upwards, consider a component that corresponds to a block of 1s in the original string. Since we have performed further clustering, we have effectively partitioned into sub-blocks . We would like to identify exactly the subtraces that survived to level that contain the first and last 1 of , but unfortunately this is not possible due to a weak gap condition. However, by induction, we can exactly identify all subtraces that survive to level that contain the first and last 1 of the first and last sub-block of , namely and . Thus we can de-bias the length filter at level by filtering based on a more stringent event, namely the presence of nodes. In total to de-bias all length filters above a particular component, we require the presence of nodes, which happens with probability . Thus we can debias with only a polynomial overhead in sample complexity. See fig:debias for an illustration.

### 3.2 Strengthening to a Parameterization by Runs

We next consider parameterizing the problem by the number of runs in the string being reconstructed. The number of runs in is defined as . We will argue that if every 0-run has length then traces suffice. The proof is via a reduction to the -sparse case in the previous sections.

#### Reduction to Sparse Case.

Let be the string formed by replacing every run of 1s in by a single 1. We first argue that we can reconstruct with high probability using traces generated by applying the -Deletion Channel to .

We will prove this result for the case since otherwise traces is sufficient even with no gap promise.333Specifically, if , with probability at least a trace also has runs and given traces with runs we can estimate the length of each run because we know the run in each such trace corresponds to the run in the original string. Observe that with traces, if every 0-run in has length at least for some sufficiently large constant , then a bit in every 0-run of appears in every trace with high probability. Conditioned on this event, no two 1’s that originally appeared in different runs of are adjacent in any trace. Next replace each run of 1s in each trace with a single 1. The end result is that we generate traces that are generated as if we had deleted each 0 in with probability and each 1 in with probability where is the length of the run that the 1 belonged to in . This channel is not equivalent to the -Deletion channel, but our analysis for the sparse case continues to hold even if the deletion probability of each 1 is different. Thus we can apply thm:gaps_intro to recover , and the sparsity of is at most .

Since the algorithm identifies corresponding 1s in in the different traces, we can then estimate the length of the 1-runs in that were collapsed to each single 1 of by looking at the lengths of the corresponding 1-runs in the traces of before they were collapsed.

###### Theorem 9.

For the -Deletion Channel, traces suffice if the lengths of the 0-runs are where is the number of runs in .

## 4 Reconstructing Arbitrary Matrices

Recall that in the matrix reconstruction problem, we are given samples of a matrix passed through a matrix deletion channel, which deletes each row and each column independently with probability . In this section we prove thm:matrix_intro.

###### Theorem (Restatement of thm:matrix_intro).

For matrix reconstruction, traces suffice to recover an arbitrary matrix , where is the deletion probability and .

At a high level we follow the recipe in Nazarov and Peres (2017) for the sequence case. The bulk of the proof involves designing a procedure to test between two matrices and . This test is based on identifying a particular received entry where the traces must differ significantly, and to show this, we analyze a certain bivariate Littlewood polynomial, which is the bulk of the proof. Equipped with this test, we can apply a union bound and simply search over all pairs of matrices to recover the string.

For a matrix , let denote a matrix trace. Let us denote the entry of the matrix as , an indexing protocol we adhere to for every matrix. For two complex numbers , observe that

 E⎡⎣√n−1∑i,j=0~Xi,jwi1wj2⎤⎦ =q2∑i,jwi1wj2∑ki≥i,kj≥jXki,kj(kii)(kjj)pki−iqipkj−jqj =q2√n−1∑k1,k2=0Xk1,k2(qw1+p)k1(qw2+p)k2

Thus, for two matrices , we have

 1q2E⎡⎣√n−1∑i,j=0(~Xi,j−~Yi,j)wi1wj2⎤⎦=√n−1∑k1,k2=0(Xk1,k2−Yk1,k2)(qw1+p)k1(qw2+p)k2≜A(z1,z2)

where we are rebinding and . Observe that is a bivariate Littlewood polynomial; all coefficients are in , and the degree is . For such polynomials, we have the following estimate, which extends a result due of Borwein and Erdélyi (1997) for univariate polynomials. The proof is deferred to the end of this section.

###### Lemma 4.

Let be non-zero degree Littlewood polynomial. Then,

 |f(z⋆1,z⋆2)|≥exp(−C1L2logn)

for some where , and is a universal constant.

Let denote the arc specified in lem:bivariate_littlewood. For any , Nazarov and Peres (2017) provide the following estimate for the modulus of :

 ∀z∈γL:|(z−p)/q|≤exp(C2p/(Lq)2).

Using these two estimates, we may sandwich by

 exp(−C1L2log(n))≤maxz1,z2∈γL|A(z1,z2)| ≤exp(C′p√n/(Lq)2)q2∑ij∣∣\bb E[~Xij−~Yij]∣∣.

This implies that there exists some coordinate such that

 ∣∣\bb E[~Xij−~Yij]∣∣≥q2nexp(−C1L2logn−C′p√nL2q2)≥q2nexp(−Cn1/4√plognq),

where the second inequality follows by optimizing for .

The remainder of the proof follows the argument of (Nazarov and Peres, 2017): Since we have witnessed significant separation between the traces received from and those received from , we can test between these cases with samples (via a simple Chernoff bound). Since we do not know which of the traces is the truth, we actually test between all pairs, where the test has no guarantee if neither matrix is the truth. However, via a union bound, the true matrix will beat every other in these tests and this only introduces a factor in the sample complexity. For details, see app:omittedproofs, where we use a similar argument towards proving thm:sparsity_intro, or see (Nazarov and Peres, 2017).

### 4.1 Proof of the polynomial lemma (lem:bivariate_littlewood)

Fix and define the polynomial

 F(z1,z2)=∏1≤a≤L,1≤b≤Lf(z1eπia/L,z2eπib/L).

We first show that there exists on the unit disk such that . This follows from an iterated application of the maximum modulus principle. First factorize where is chosen such that has no common factors of . Since has non-zero coefficients, this implies that is a non-zero univariate polynomial. Further factorize so that terms in have no common factors of . is also a Littlewood polynomial and moreover it has non-zero leading term, so that . Thus by the maximum modulus principle:

 |F(z⋆1,z⋆2)|=|G(z⋆1,z⋆2)|≥|G(z⋆1,0)|≥|H(z⋆1)|≥|H(0)|≥1.

Now, for any we have

 1≤|F(z⋆1,z⋆2)|≤|f((z⋆1)πia/L,(z⋆2)πib/L)|⋅n(L2−1),

where we are using the fact that . This proves the lemma, since we may choose such that for .

## 5 Reconstructing Random Matrices

In this section, we prove thm:random_matrix_intro, showing that traces suffice to reconstruct a random matrix with high probability for any constant deletion probability . This is optimal since traces are necessary to just ensure that every bit appears in a least one trace.

Our result is proved in two steps. We first design an oracle that allows us to identify when two rows (or two columns) in different matrix traces correspond to the same row (resp. column) of the original matrix. We then use this oracle to identify which rows and columns of the original matrix have been deleted to generate each trace. This allows us to identify the original position of each bit in each trace. Hence, as long as each bit is preserved in at least one trace (and traces is sufficient to ensure this with high probability), we can reconstruct the entire original matrix.

#### Oracle for Identifying Corresponding Rows/Columns:

We will first design an oracle that given two strings and distinguishes, for any constant , with high probability between the cases:

Case 1:

and are traces generated by the deletion channel with preservation probability from the same random string

Case 2:

and are traces generated by the deletion channel with preservation probability from independent random strings

It and are two rows (or two columns) from two different matrix traces, then this test determines whether and correspond to the same or different row (resp. column) of the original matrix. In sec:oracle, we show how to perform this test with failure probability at most . In fact, the failure probability can be made exponentially small but a polynomially small failure probability will be sufficient for our purposes.

#### Using the Oracle for Reconstruction.

Given traces we can ensure that every bit of appears in at least one of the matrix traces with high probability. We then use this oracle to associate each row in each trace with the rows in other traces that are subsequences of the same original row. This requires at most

 (m√n2)≤(m√n)2

applications of the oracle and so, by the union bound, this can performed with failure probability at most where the inequality applies for sufficiently large .

After using the oracle to identify corresponding rows amongst the different traces we group all the rows of the traces into groups where the expected size of each group is . We next infer which group corresponds to the row of for each . Let be the bijection between groups and that we are trying to learn, i.e., if the group corresponds to the row of . If suffices to determine whether or for each pair . If there exists a matrix trace that includes a row in and a row in then we can infer the relative ordering of and based on whether the row from appears higher or lower in than the row in . The probability there exists such a trace is and we can learn the bijection with high probability.

We also perform an analogous process with columns. After both rows and columns have been processed, we know exactly which rows and columns were deleted to form each trace, which reveals the original position of each received bit in each trace. Given that every bit of appeared in at least some trace, this suffices to reconstruct , proving thm:random_matrix_intro.

###### Theorem (Restatement of thm:random_matrix_intro).

For any constant deletion probability , traces are sufficient to reconstruct a random .

### 5.1 Oracle: Testing whether two traces come from same random string

Define to be a contiguous subsets of size . Note that there are size gaps between each and , i.e., elements that are both larger than and smaller than . This will later help us argue that the bits in positions and in different traces are independent. Given a traces , define the three quantities:

 Xi=∑j∈SitjYi=