In the problem of channel coding a sender needs to communicate data over a noisy channel to a receiver. In the most general setting a message is encoded into a codeword . This codeword is transmitted over the channel who distorts into . Then a decoder tries to reconstruct from .
The channel is viewed as an adversary and is characterized by the type of operations it uses to produce the noise, and by a parameter , which quantitatively describes the maximum noise that we want to tolerate. Roughly speaking, most studies have focused on channels defined by a fixed set of possible operations that add noise and by setting to be the maximum number of operations from that the (encoder, decoder) pair can handle. Perhaps the most investigated setting, is the theory of error-correcting codes, where consists of the single operation of -bit flip (). In this case, is the maximum Hamming distance between and that is tolerated. Another case where there has recently been significant progress is when consists of the operations of -bit flip, -bit deletion, and -bit insertion. In this case, is the maximum edit distance between and that is tolerated. Still another case is when consists of the erasure operation which transforms a bit into “?”. Other types of channel that distort in various ways have also been investigated.
Our setting is different in two important ways. Firstly, we consider channels that can do arbitrary distortion. We consider two different scenarios on how the channel does the distortion, depending on whether it “knows” the codeword or only the message.
In the Hamming scenario a channel is defined by a bipartite graph where left nodes represent codewords that are inputs of the channel, and right nodes represent distorted codewords that are outputs. A left and a right element are connected, if the channel may distort the codeword at the left, to the one at the right. The level of the noise in the channel is the logarithm of the maximal degree of a right node, i.e., the logarithm of the maximal number of input codewords of the channel that can produce the same distorted codeword. No other assumptions are made on the channel.
In the oblivious scenario
a channel takes as input codewords from an additive group. The channel is defined by a set of error vectors. On input a codeword for a message, it will add a vector from this set to the codeword. The choice of the error vector does not depend on the codeword, but on the message. The levelof the noise is the logarithm of the size of the set.
Secondly, our goal is to have a single encoding function that works against any channel. We call this a universal code. Differently said, a universal code is resilient to any type of distortion, provided the noise level is within the tolerated bound. On the other hand, the decoding function is assumed to have full knowledge of the channel.
In order to construct universal codes, we assume a special set-up for the communication process: the universal encoder and the decoder functions are probabilistic and share random bits. These type of codes are called private codes. They have been introduced by Shannon [Sha58] (under the name random codes), and more recently studied by Langberg [Lan04] (see also [Smi07, GS16]). The channel does not have access to the random shared bits, although, in the Hamming setting, the codeword might reveal some information about the randomness indirectly.
There are two important parameters. The first is the rate of the code, which is defined by , where is the number of messages that we can send, and is the number of codewords that the channel can transmit. The second is the shared randomness of the code, which is the number of random bits that the encoder and decoder share. Given a noise level , we want to maximize the rate and minimize the shared randomness.
It is not difficult to show that for a universal code, the value of the product can not be larger than , where
is the error probability of the reconstruction of the message, and. This implies that the rate of such a code is at most , where , see section 2. We construct universal codes with rates that converge to the optimal value and have small shared randomness. The following simplified statements are valid for constant probability error.
Theorem 1.1 (Main Result - informal statement).
There exists a universal code in the Hamming scenario with rate and shared randomness .
There exists a universal code in the oblivious scenario with rate and shared randomness .
For both codes in Theorem 1.1, the universal encoding function is polynomial-time computable, but the decoding functions, which depend on the channel, are in general not efficiently computable.
We prove lower bounds for the amount of shared randomness in both scenarios. When is a constant fraction of , which is typical in most applications, the amount of shared randomness is optimal, among universal codes with optimal rate, according to our precise model for shared randomness.111 We are currently investigating a model that allows the encoder to use both shared and nonshared randomness. Our results indicate that the codes presented here, also use an optimal amount of shared randomness in this more general model. However, the analysis is surprisingly more difficult. Thus, for , the universal codes in Theorem 1.1 are optimal for both rate and randomness.
In general, by simple random coding one can easily obtain private codes, but this method uses many random bits. In the proof of Theorem 1.1, (a) the number of shared random bits is reduced by standard pairwise independent hashing. The proof of Theorem 1.1, (b) is more involved and is the main technical contribution of this paper.
Note that one can always remove the shared randomness by letting the decoder try all possible random strings. In this way we obtain a list decodable code in which encoding is still probabilistic but decoding is deterministic and with list size exponential in the randomness of the code (the list has one element for each possible random string). Thus, Theorem 1.1 (b) implies a universal list decodable code for the oblivious scenario with a deterministic decoder that produces a list of polynomial size, which, with high probability, contains the message that was encoded.
We next present the full details of our model and state the results formally.
1.1 Definitions and results
A Hamming channel from a set to is a bipartite graph with left set and right set . The set represents the set of codewords that are the input of the channel, and the distorted outputs returned by the channel. On input the channel may output if is an edge of the graph. The distortion of the channel is the maximal right degree. We assume that the left degree of each node is at least 1.
Let be an additive group. An oblivious channel is a subset of . On input a codeword from , the channel adds a codeword from . The distortion is the size of .
Example. Consider a bit flip channel that has -bit strings as input and output, and may flip at most bits. This channel can be represented as a Hamming channel. Indeed, we have and a left node is connected to a right node if its Hamming distance is at most . The distortion of the channel is equal to the size of a Hamming ball of radius . The bit flip channel can also be viewed as an oblivious channel. The sum of two bitstrings is defined by bitwise addition modulo 2, and the set contains all strings of Hamming weight at most .
An encoding function is a mapping , where the second argument is used for the shared randomness. A decoding function is a mapping . We use the notation and . A channel function of a Hamming channel is a mapping from left nodes to right nodes.
A private code is -resilient in the Hamming scenario if for every Hamming channel from to a set with distortion at most there exists a decoding function such that for all channel functions of this channel and all
Let be an additive group. A private code is -resilient in the oblivious scenario if for every oblivious channel of size at most , there exists a decoding function such that for all and
The next two theorems restate the two claims in Theorem 1.1 with full specification of parameters.
For every satisfying , there exists a polynomial time computable private code that is -resilient in the Hamming scenario such that
The encoder and the decoder functions share random bits.
Moreover, given oracle access to a channel in Definition 1.2, we can evaluate a corresponding decoding function in polynomial space.
For the results regarding the oblivious scenario, we view as the vector space in the natural way.
There exists a constant such that for every satisfying , there exists a polynomial-time computable private code that is -resilient in the oblivious scenario such that
The encoder and the decoder functions share random bits.
Moreover, given oracle access to a channel in Definition 1.3, we can evaluate a corresponding decoding function in polynomial space.
Note that if , then the rate of the code is . The next code for the oblivious scenario has even better rate (for larger than ) but uses more shared random bits.
There exists a constant such that for every satisfying , there exists a polynomial-time computable private code that is -resilient in the oblivious scenario such that
The encoder and the decoder functions share random bits.
Moreover, given oracle access to a channel in Definition 1.3, we can evaluate a corresponding decoding function in polynomial space.
1.2 Related works and comparison with our results
The setting of our results has two distinctive features: there is no restriction on the type of channel distortion, and the codes we construct are universal, meaning that the encoder does not know the type of channel he has to cope with.
Channels with general distortion capabilities have been studied starting with the paper of Shannon [Sha48] that has initiated Information Theory, and which contains one of the most basic results of this theory, the Channel Coding theorem. In [Sha48], a channel is given by probability mass functions (one such function for each symbol in a given finite alphabet), with the interpretation that when is transmitted, (also a symbol from a finite alphabet) is received with probability . In Shannon’s paper, the channel is memoryless: when an -symbol string is transmitted, the string is received with probability . The Channel Coding theorem determines the maximum encoding rate for which decoding is possible with error probability converging to as grows. Verdù and Han [VH94] prove a Channel Coding theorem for channels that are not required to be memoryless (in their model is defined for and being blocks of symbols). We note that to achieve maximum rate, the encoding function in both [Sha48] and [VH94] knows the values , and therefore it is not universal.
General channels have also been studied in Zero-Error Information Theory, a subfield in which the goal is that encoding/decoding have to succeed for all transmitted messages. A channel is given by the set of pairs . can be viewed as the set of edges of a bipartite graph, with the same interpretation as in our definition for the Hamming scenario: when a left node is transmitted, the receiver gets one of ’s neighbors, chosen by the channel. One can retain just the graph (ignoring the conditions , so that the channel behaves adversarially), and obtain a pure combinatorial framework. Two left nodes are separated if they have no common neighbor, and encoding amounts essentially to finding a set of strings that are pairwise separated, so that they form the codewords of a code. This model is very general, but most results assume that the bipartite graph has certain properties, see the survey paper [KO98]. To the best of our knowledge, all the results assume that the encoding function knows the bipartite graph, and thus it is not universal. The settings in Zero-Error Information Theory and our study have some similar features: besides modeling a channel by a bipartite graph, both of them do not assume any stochastic process and, furthermore, both of them require encoding/decoding to succeed for all messages (in our setting the success is with high probability over the shared random bits).
Guruswami and Smith [GS16] study channels in the oblivious scenario (they call them oblivious channels or additive channels) and in the Hamming scenario, similar to our definitions, except that the channel may only add noise vectors of Hamming weight at most , while in our setting, we may add noise vectors from an arbitrary but fixed set (of the same size as a Hamming ball of radius and this set is only known to the decoder). In their setting, the encoder is probabilistic and the decoder is deterministic. They obtain codes in the oblivious scenario with polynomial-time encoding and decoding and optimal rate. In our results, the encoder and the decoder share randomness and the decoder is not efficient, but the codes are universal and are resilient to a more general type of noise, because the set of noise vectors may contain vectors of any Hamming weight.
The concept of a universal code introduced in this paper is directly inspired from the universal compressor in [BZ19]. There, a decompressor is a (deterministic) partial function mapping strings to strings. For a string , the Kolmogorov complexity is the length of a shortest string such that . We consider probabilistic compression algorithms that have a target length and target error probability as extra inputs. More precisely, a compressor maps every triple (error probability , length , string ) to a string of length , representing the compressed version of . Such a compressor is universal with overhead if for every decompressor there exists another decompressor such that for all triples with , we have with probability .
It is shown in [BZ19], that there exists a universal compressor computable in polynomial time and having polylogarithmic overhead . In other words, for every compressor/decompressor pair , no matter how slow is, or even if is not computable, the universal compressor produces in polynomial time codes that are almost as short as those of (the difference in length is the polylogarithmic overhead). The cost is that decompression from such codes is slower.
The universal compressor also provides an optimal solution to the so-called document exchange problem.222 This problem is also called information reconciliation. In the Information Theory literature it is typically called compression with side information at the receiver or asymmetric Slepian-Wolf coding. In this problem, Alice holds , the updated version of a file, and Bob holds , an obsolete version of the file. Using the universal compressor, Alice can compute in polynomial time a string of length which she sends to Bob, and if (for some decompressor ), then Bob can compute from and . What is remarkable is that Alice does not know . Moreover, she does not know . The connection to our setting comes from the fact that a decompressor is equivalent to a bipartite graph as in our definitions, and the condition is the same as saying that is the left neighbor of the right node , which has degree less than .
As we have already mentioned, the proof of Theorem 1.4 for the Hamming scenario uses random coding and the well-known technique of pairwise-independent hashing to reduce the number of shared random bits from exponential to linear in .
The proofs of Theorem 1.5 and Theorem 1.6 for the oblivious scenario reduce the number of shared random bits to logarithmic in (respectively, polylogarithmic) and they use more advanced techniques. They are based on a similarity that exists between the document exchange problem and channel coding. In both problems, the receiver needs to reconstruct from , which is close to in the sense that , or, in this paper, is one of the at most neighbors of in the bipartite graph that represents the channel (this holds for the Hamming scenario; in the oblivious scenario, a similar “closeness” relation exists). The difference is that in the document exchange problem, the receiver holds before transmission, while in channel coding, is received via transmission and is the channel-distorted version of .
The connection between the two problems has been exploited in several papers starting with the original proof of the Slepian-Wolf theorem [SW73], which solves the document exchange problem using codes obtained via the standard technique in the Channel Coding Theorem. Wyner [Wyn74] gives an alternative proof using linear error correcting codes and syndromes, and there are other papers that have used this idea [Orl93, GD05, CR18]. Our approach is similar but works in the other direction: we take linear codes obtained via the method from [BZ19] for the document exchange problem and use them for channel coding.
The technique used in [BZ19] is based on condensers and is related to previous solutions for several versions of the document exchange problem which used a stronger tool, namely extractors [BFL01, Muc02, MRS11, BMVZ18, BZ14, Zim17]. We remark that all these previous papers do not require linear codes, which are crucial for the method in this paper.
It is common to first obtain non-explicit objects using the probabilistic method and then to attempt explicit constructions. In our case, however, it is not clear how to show the existence of linear extractors with the probabilistic method. Instead of extractors, we use condensers, and fortunately, a random linear function is a condenser. Moreover, the explicit condensers obtained by Guruswami, Umans, and Vadhan [GUV09], Ta-Shma and Umans [TU12], and Raz, Reingold and Vadhan [RRV02] (this one is actually an extractor) happen to be linear.
2 Rate upper bounds and lower bounds for the number of shared random bits for universal codes
If the encoder and the decoder do not use randomness, an upper bound for the rate can be derived via the following standard sphere-packing argument. Consider an oblivious channel defined by a set of size . The maximal number of messages we can send with codewords is equal to , because for any 2 messages and , the sets and must be disjoint. The same holds for the Hamming scenario, because we can view the channel as a bipartite graph, (2 nodes are connected if their difference is in ), and the right degree is at most as well. In the next proposition, we adapt this argument for private codes.
Let be a private code that is -resilient in the oblivious scenario, or in the Hamming scenario. Then
We consider the oblivious scenario. For the Hamming scenario, the argument is similar. Let be a set of size exactly . For a random selection of , and , we have
For , consider the set
For a random , we have
because the left-hand side is precisely the probability above. This implies that there must exist a for which Fix such a . Note that for no two pairs in , the value of can be equal. Hence, . The statement of the theorem follows by combining these 2 inequalities. ∎
We now move to lower bounds for the amount of randomness. We note that there exist universal codes in which the encoder is randomized and the decoder is deterministic, and, thus they do not share randomness. We provide a non-explicit construction of such a code in Appendix E. This code does not achieve an optimal rate. In an extended version of this paper, we show that for some choices of in the oblivious scenario, any universal code that is -resilient and has optimal rate must use shared randomness. In general the trade-off between shared randomness and rate for universal codes is very intricate and for a (lengthy) discussion we refer to the extended version of this paper.
Therefore, in what follows we restrict to private codes, i.e., to the model in which the universal encoder and the channel-dependent decoders share randomness, and the encoder does not have access to other types of randomness. We show lower bounds for the number of random bits in both the Hamming and oblivious scenarios.333In Appendix F, we discuss a different model, which is intermediate between oblivious and Hamming.
We first show that for any private universal code in the oblivious scenario, the encoding function must use at least random bits, regardless of rate, where is the noise level. The universal code for the oblivious scenario in Theorem 1.5 has random bits, and has optimal rate in the asymptotical sense. Thus the number of random bits in Theorem 1.5 matches the lower bound (up to the constant hidden in the notation), in the case of noise level , which is typical.
If , and is a private -resilient code in the oblivious scenario, then , i.e., requires more than random bits.
It is enough to prove the theorem for only 2 messages. Let and . Consider the channel defined by the set given by the span of the vectors
with and . Select and randomly and consider the value of on the above vector, which is a value in . Note that if we used message b instead of a in the expression above, then the probabilities with which the messages appear do not change (since this corresponds to flipping all bits of ). Assume that the value b appears with probability at least . If this is not the case, we flip the roles of a and b in the expression above and the explanations below. There exists a choice of such that for at least half of the values , the value of for the above vector is equal to b. Let be the corresponding vector. For , the probability in (1) is at most . Hence, for the inequality is false, and this implies that if equation (1) can not be satisfied. ∎
We prove a similar result for the Hamming scenario.
If , , and is a private -resilient code in the Hamming scenario, then , i.e., requires more than random bits.
Again, it is enough to prove the statement for two messages. Let and . Thus, we are given a universal code for some arbitrary and the code is resilient in the Hamming scenario up to distortion with probability , where . This means that for every bipartite graph with left nodes and right nodes, with degree of every right node , the event (when is chosen at random in )
has probability at most . We show in the next lemma that if , then , from which the conclusion follows
For every encoding function , there exists a bipartite graph of the above type such that the event in (* ‣ 2) has probability at least .
We construct a bipartite graph with the set of left nodes and right nodes both equal to , and with left and right degrees at most (thus the lower bound is valid even for channels where the left degree is also bounded by ). Consider the matrix obtained by setting the -th entry equal to the number of random strings for which and . Since there are strings , the sum of all entries of this matrix is as well.
The weight of a column is the sum of all its entries. Similarly for the weight of a row. A column is heavy if its weight is and a heavy row is defined in the same way. Note that there are at most heavy rows and at most heavy columns. We consider 3 cases:
The set of heavy columns have total weight at least .
The set of heavy rows have total weight at least .
None of the conditions above are true.
In the last case the construction is easy. We set all entries of heavy columns and rows equal to zero. The remaining matrix has weight at least , and all its rows and columns have weight less than (because they are not heavy).
We define the bipartite graph in which a left node is connected to a right node if or the entry of the matrix is positive.
Since the matrix contains nonnegative integers, every column has less than positive entries, and hence every left node has degree at most . By a symmetric argument with rows, we conclude that also the right degrees are at most .
We prove that the event (* ‣ 2) has probability at least . Indeed, select randomly, and let and . With probability at least the entry is positive, and this implies that is a neighbor of both and . In the last case the lemma is satisfied.
Note that the first and second case are symmetric after flipping the first and second message in . Hence, it remains to prove the claim for the second case. In the matrix, we set all rows that have weight less than equal to zero. The assumption states that the remaining matrix has weight at least .
The idea to prove (* ‣ 2), is to consider a set of values , which we call pointers. We connect each heavy row to every pointer. Each nonzero column will be connected to a single pointer as well. Since there are at most nonzero columns, we can indeed satisfy the degree bound using at most pointers. Finally, choose to be this pointer for each nonzero column . Now the inequality fails for , since with probability , we have that is a heavy row and that is a nonzero column. Hence, they are both connected to the pointer . Now the details.
By the assumption and taking into account that there are at most heavy rows, we can select rows containing only zeros. The choosen rows are called pointers. We assign to each nonzero column a pointer so that no pointer is assigned to more than columns. Note that there are at most nonzero columns and pointers, and thus this assignment is possible.
The bipartite graph connects a left node to a right node
if is a heavy row and is a pointer, or
if is a nonzero column and is its associated pointer.
The conditions on the degree are satisfied, because every left node is only connected to pointers, and there are at most of them. Every right node has degree at most , because we only need to check this for pointers , and they are connected to heavy rows and to at most nonzero associated columns.
Finally, we need to prove that the event (* ‣ 2) has probability at least . For each nonzero column , let be the associated pointer, and so also a neighbor of . With probability for a random , the value of will be a heavy row and a nonzero column. This means that is a pointer, and hence connected to all heavy rows, thus in particular it is also a neighbor of . Thus, the event in (* ‣ 2) happens with probability at least . ∎
3 Construction of universal codes for the Hamming scenario
We prove Theorem 1.4.
Let . We construct a code that satisfies the conditions of the theorem for . We identify with and with . We first construct a code that uses shared random bits. Let
be a string of length , with each being an -bit string chosen independently at random. We define the encoding function by , for each .
We need to prove that this code is -resilient. Consider a channel, and for any , let be the set of left neighbours of in the bipartite graph. The size of is at most . For a fixed , by the union bound, the probability that there exists such that is at most .
Let and let . The string is independent of the value of , for every , and thus, for every channel and every channel function , the value of is also independent of . Therefore, the probability that for some we have , is also less than . Consequently, with probability at least , one can recover from and by exhaustive search.
We now reduce the number of shared random bits from to . The observation is that in the above argument we only need that the codewords are pairwise independent. It is well-known that if we pick at random in the field , and consider the function , the values are pairwise independent. Therefore we replace in from Equation (2) each by , for . Now the encoder and the decoder only need to share and and the conclusion follows.
4 Construction of universal codes for the oblivious scenario
4.1 Proof overview
The basic idea of our constructions is to take the code to be a linear subspace of picked at random from a class of subspaces. More precisely, the codewords belong to the null space of a random linear function , i.e., for all codewords , where is chosen at random from a certain set of matrices . The encoder and the decoder share . The decoder receives the noisy , and, since , he knows , which we view as a random fingerprint of (also called the syndrome of in the terminology of linear codes). If has certain properties, this allows him to find , assuming that is within the tolerated noise level. The next result implements this idea in a simple way by taking to consist of all matrices of appropriate size. It has a short proof and produces a universal code for the oblivious scenario with close-to-optimal rate for large . It has the disadvantage that the number of shared random bits is more than linear in .
For every such that , there exists a private code that is -resilient in the oblivious scenario, with rate , where .
The encoder and the decoder share random bits.
The encoder and the decoder share a random linear function .
Since has rank at most , the null space of has dimension at least . The encoder maps every message into the -th element of the null space of (for details, see Remark 1).
Consider now an oblivious channel of size at most , a message , let be the codeword for and let , where is the noise added by a channel. Observe that
The decoder works as follows. On input and , he first computes . He knows that (by (3)), and he also knows that belongs to . For each different from , the probability over that is . By the union bound, with probability , there is only one element in such that , namely . Consequently, can find with probability , by doing an exhaustive search. Next he finds , and finally from he finds .
The rate of the code is . ∎
The encoder function in Proposition 4.1 can be computed in time polynomial in as follows. First we compute independent vectors in the null space of by finding solutions of the equation with having in the last coordinates the values (the single is in position ). Next, we form the -by- matrix having rows and finally .
On the other hand, the computation of the decoder function is slow, because it requires the enumeration of all the elements in .
The codes in Theorem 1.5 and Theorem 1.6 are constructed using pseudo-randomness tools to reduce the space from which is selected and consequently reduce the number of shared random bits to logarithmic in (respectively, polylogarithmic in ). The construction of the codes in these two theorems is done in two steps:
In Step 2, we show how condensers (a type of functions that have been studied in the theory of pseudorandomness) can be used to construct invertible functions. This step is based on the technique in [BZ19] and is presented in Section 4.3.
Theorem 1.5 and Theorem 1.6 are obtained by taking condensers built by Guruswami, Umans, and Vadhan [GUV09], Ta-Shma and Umans [TU12] and Raz, Reingold, and Vadhan [RRV02], and using Step 2 to obtain invertible functions, followed by Step 1, to obtain the codes. The details are presented in Section 4.4.
4.2 Construction of private universal codes in the oblivious scenario from linear invertible functions
A -invertible function is a probabilistic function that on input produces a random fingerprint of , such that if (the “list of suspects”) is a set of size at most that contains , then there is an algorithm that given the set and the fingerprint, with probability correctly identifies among the suspects. To be useful in the construction of codes, we need the invertible function to be linear for any fixed value of randomness. Also, in order to obtain codes with good rates, we want the length of the fingerprint to be , for small . We also define an online version, which is useful in case the list of suspects is not available as a whole set at the beginning of the algorithm, but is instead enumerated by a process, and thus is accessible in an online manner.
Definition 4.2 (Invertible function).
A function is -invertible if there exists a partial function mapping a -tuple into such that for every subset of size and for every
is linear if for every , the function is linear, i.e., for every , , where we view and as elements of the linear space , and the output of as an element of the linear space .
is online-invertible if the function satisfying (4) is monotone in , meaning that if extends , then is an extension of . For the online-invertible property, is a list (i.e., a totally-ordered set), and extends as a list.
The next two lemmas show that, as announced, a linear, -invertible function can be used to construct a -resilient private code in the oblivious scenario (and also in the weak Hamming scenario discussed in section F). In the oblivious scenario, the encoder and the decoder share the random bits used by the invertible function (in the weak Hamming case, they share more random bits, namely the random bits of the invertible function).
Lemma 4.3 (Invertible function code in the oblivious scenario).
Let be a family of functions such that for every , is linear, -invertible and .
Then there exists a private code that is -resilient in the oblivious scenario, with rate , and such that the encoder and the decoder share random bits.
To simplify the notation, we assume a fixed and drop the subscript in and .
Since is a linear function, it is given by a -by- matrix with entries in , such that (recall that we view as an -vector over ). The matrices are viewed as parity-check matrices of linear codes.
The encoding and decoding procedures are as follows:
The encoder and the decoder share a random string .
on input a message of length computes the codeword of length as follows:
View as a positive integer in the natural way (based on the base 2 representation of integers).
The codeword is obtained by picking the -th element in the null space of (so ). Note that the dimension of the null space of is at least , because the rank of is at most . Thus the encoder is well defined.
Consider an oblivious channel of size at most .
The decoder , on input , where is the noise added by the channel, attempts to find as follows:
computes (i.e., is the syndrome of ).
Thus is also the syndrome of , and, consequently, .
uses the inverter function given by (4). It runs on input and with probability , obtains . Next, , and finally from , he finds .
The rate of the code is
We make the following observations regarding the complexity of the encoder function and decoder functions in Lemma 4.3. The invertible function is assumed to be linear and thus , for some matrix . If the mapping is computable in time polynomial in , then is computable in time polynomial in . This can be shown in the same way as in Remark 1.
If the inverse of can be evaluated in polynomial space with oracle access to , then is computable in polynomial space given oracle access to the oblivious channel . This is the case for all invertible functions constructed with explicit condensers, obtained through the method in Corollary 2.13 in [BZ19], which is also used in this paper (this follows from Remark 3 in [BZ19]).
An interesting approach to define channels is to use conditional Kolmogorov complexity. We might consider the set of all distortion vectors that satisfy , and there exist at most such vectors. The corresponding channel is not computable, but on input and , the set can be enumerated. If is online-invertible, then the decoding algorithm explained above can be used with a simple modification of step 4, (c). Each time an element is enumerated in , we rerun the monotone inverse with the augmented set . If one of the runs of halts with some output, then also halts with the same output. Note that when is enumerated in , on input , and returns with probability . By the monotonicity of , later updates of can not change a given value of once it has been generated, and this implies that with probability , no previous runs of generated a different output. Thus also returns with probability .
4.3 Construction of invertible functions from condensers
A condenser is a type of function that has been studied in the theory of pseudorandomness, which can be seen as a relaxation of randomness extractors (see [Vad12]
). It has the property that it maps a random variable ranging on-bit strings and having min-entropy , which can be far less than , into a random variable that ranges over shorter -bit strings and which is within statistical distance from a random variable that has min-entropy closer to .444Recall that the min-entropy of a finite distribution is the largest integer such that all probabilities in the distribution are at most . The statistical distance between two finite distributions and is .
A condenser can be viewed as a procedure that takes as input a distribution that is far from the uniform distribution and a uniform distribution over a set of short strings, called seeds, and outputs a distribution that is closer to uniform.
Given a set , we denote to be a random variable that is uniformly distributed on . As mentioned, a condenser is using an additional random variable, which is uniformly distributed over the set of -bit strings, for some small . We let and identify with .
A function is a condenser, if for every of size at least , the random variable is -close to a random variable that has min-entropy at least .
The quantity is called the entropy loss of the condenser (because the input has min-entropy and the output is close to having min-entropy ). will always be a function such that the entropy loss is non-decreasing in .
We use functions that are condensers for an entire range of . More precisely, in the following discussion, for some , is a condenser for all .
We view as a bipartite graph in the usual way: the left nodes are the strings in , the right nodes are the strings in and for each , there is an edge (thus, for some , there may exist multiple edges ).
Let of size at most , and let . When we use the following definitions, (with the corresponding graph ) is a condenser and the set is clear from the context.
A right node is heavy if it has more than neighbors in .
denotes the set of heavy nodes.
A left node is deficient if it has more than neighbors in .
Let us give some intuition for the notions defined above. Recall that is a random neighbor of and is a right node in the bipartite graph. We view as a random fingerprint of . A right node is heavy, if it causes many collisions with other strings in . A left node is deficient if more than fraction of its neighbors are heavy. Thus, for a non-deficient , if we pick as fingerprint one of its right neighbors at random, with probability , we obtain a fingerprint that causes few collisions with other strings in .
The next lemma shows the main property of a condenser that is used in pruning the list of suspects. It is proved in the appendix, Section A.
Let be a condenser. For every set of left nodes with , the number of deficient strings in is at most .
The next lemma shows that a fingerprint produced by a condenser permits with high probability the reduction of a list of suspects of size into a smaller list of size approximately .
Lemma 4.7 (Pruning Lemma).
Let be a function that is a condenser for all , for some . Then there exists a function that maps a -tuple into a set of size at most with the following property: for every of size at most and for every ,
Moreover, is online-computable, meaning that if then, for all and , .
Let , i.e., is a random neighbor of in the graph corresponding to the condenser . The pruning algorithm runs as follows:
on input :
enumerate the elements that are right neighbors of in the bipartite graph ,
(i.e., all in such that for some , ).
(1) add the first enumerated elements in .
(2) add all the enumerated strings that are deficient for into
We next establish two facts about the iterations of the while loop in the above algorithm. The iterations are indexed by the current value of , which takes in order the values .
At the start of each iteration , the size of is at most .
Proof by (reverse) induction on . Initially, and . Consider now a generic iteration . By the induction hypothesis, at the start of iteration , . If , then by Lemma 4.6, the number of deficient strings is at most , which is at most . If, on the other hand, , then again the number of deficient strings is at most , because the deficient strings form a subset of . Since in step (2) is updated to be a subset of deficient strings (namely, the set of neighbors of which are deficient), the conclusion follows. ∎
There must be an iteration , at which is non-deficient for .
Otherwise, remains in at every iteration (because by step (2), the deficient neighbors of survive in ). But by Fact 1, eventually has at most two elements, and such a set has no deficient strings. ∎
We can now finish the proof. Consider the iteration , guaranteed by Fact 2, at which is non-deficient for the first time. It means that at the beginning of this iteration is in , because at previous iterations has been deficient and all deficient strings that are neighbors of survive in . With probability , is one of the non-heavy right neighbors of , so has at most neighbors in ( being one of them). By step (1), is added to with probability . Since the size of increases at iteration with at most (recall that the entropy loss is non-decreasing), and the number of iterations is , the size of at the end is at most .
The fact that is online-computable can be seen from the algorithm highlighted with grey background, by taking into account the fact that if (as lists, see Definition 4.2, (3)), then any element deficient for is also deficient for . ∎
We use the Pruning Lemma 4.7 to construct a invertible function that uses random bits, and has overhead (assuming ).
We use the following condensers.
Theorem 4.8 ([Tu12], Theorem 3.2).
For every , , there exists an explicit function such that
For every , is a condenser, with ,