# Quantum Communication Complexity of Distribution Testing

The classical communication complexity of testing closeness of discrete distributions has recently been studied by Andoni, Malkin and Nosatzki (ICALP'19). In this problem, two players each receive t samples from one distribution over [n], and the goal is to decide whether their two distributions are equal, or are ϵ-far apart in the l_1-distance. In the present paper we show that the quantum communication complexity of this problem is Õ(n/(tϵ^2)) qubits when the distributions have low l_2-norm, which gives a quadratic improvement over the classical communication complexity obtained by Andoni, Malkin and Nosatzki. We also obtain a matching lower bound by using the pattern matrix method. Let us stress that the samples received by each of the parties are classical, and it is only communication between them that is quantum. Our results thus give one setting where quantum protocols overcome classical protocols for a testing problem with purely classical samples.

## Authors

• 7 publications
• 1 publication
• 20 publications
• 1 publication
• 7 publications
• ### On relating one-way classical and quantum communication complexities

Let f: X × Y →{0,1,} be a partial function and μ be a distribution with ...
07/24/2021 ∙ by Naresh Goud Boddu, et al. ∙ 0

• ### Exponential Separation between Quantum Communication and Logarithm of Approximate Rank

Chattopadhyay, Mande and Sherif (ECCC 2018) recently exhibited a total B...
11/25/2018 ∙ by Makrand Sinha, et al. ∙ 0

• ### Quantum versus Randomized Communication Complexity, with Efficient Players

We study a new type of separation between quantum and classical communic...
11/06/2019 ∙ by Uma Girish, et al. ∙ 0

• ### Distributional property testing in a quantum world

A fundamental problem in statistics and learning theory is to test prope...
02/02/2019 ∙ by András Gilyén, et al. ∙ 0

• ### Exponential quantum communication reductions from generalizations of the Boolean Hidden Matching problem

In this work we revisit the Boolean Hidden Matching communication proble...
01/15/2020 ∙ by João F. Doriguello, et al. ∙ 0

• ### Two Party Distribution Testing: Communication and Security

We study the problem of discrete distribution testing in the two-party s...
11/09/2018 ∙ by Alexandr Andoni, et al. ∙ 0

• ### Experimental demonstrations of unconditional security in a purely classical regime

So far, unconditional security in key distribution processes has been co...
08/13/2020 ∙ by Byoung S. Ham, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

#### Background.

Property testing [Goldreich17, GR00] is the task of (approximately) distinguishing objects having some specific property from those which are “far” from having it, without necessarily looking at the objects in their entirety. An interesting subfield is discrete distribution testing [Canonne15]

, where the objects are probability distributions.

One of the main tasks in (discrete) distribution testing, namely closeness testing, is about deciding whether two distributions and over are equal or -far from each other in the -norm, given access only to a limited number of samples of each distribution. Early testers [BFRSW00, BFRSW13]

used a method based on collisions. Using improved estimators, testers with optimal sample complexity have then been constructed

[CDVV14, DK16].

Very recently, Andoni, Malkin and Nosatzki [AMN19] have, for the first time, considered distribution testing in the two-party setting. Here two players, Alice and Bob, each own as input samples of the distributions and : Alice has samples from and Bob has samples from . The goal is for Alice and Bob to decide if the two distributions are equal or -far from each other in the -norm, using as little communication as possible. By adapting the techniques from prior works [CDVV14, DK16], Andoni, Malkin and Nosatzki have shown that this problem (named in [AMN19]) can be solved with high probability using bits of communication whenever is above the information-theoretic lower bound (given in Equation (1) below) which is the minimum number of samples needed so that meaningful information about and can be extracted from them. They also showed a matching lower bound on the two-party communication complexity of .

#### Our results.

In this paper we investigate the quantum communication complexity of this problem. Our main result shows that a significant advantage can be obtained in the quantum setting when at least one of the two distributions has low -norm. Concretely, for any , we consider the version of in which the inputs satisfy the condition . We denote this problem . The lower bound from [AMN19] shows that for , this version is as hard as the original version of the problem.111Indeed, the lower bound from [AMN19] is shown for input distributions such that . As in all previous works [AMN19, CDVV14, DK16], we will assume throughout the paper that is above the information-theoretic threshold:

 t≥Cmax(n2/3⋅ϵ−4/3,√n⋅ϵ−2), (1)

where is a universal constant (see [AMN19, CDVV14] for details).

We first show the following theorem.

###### Theorem 1.

There exists an absolute constant such that the following holds: for all , the problem can be solved with high probability by a quantum protocol that uses qubits of communication.

Theorem 1 shows that a significant advantage (a quadratic improvement in the communication complexity) can be obtained in the quantum setting when at least one of the two distributions has low -norm.

We also obtain the following lower bound, which shows that the upper bound of Theorem 1 is optimal, even for .

###### Theorem 2.

There exists an absolute constant such that the following statement holds for any value and any : any quantum protocol that solves with high probability requires qubits of communication.

Precisely, the lower bound holds for any greater than some . Therefore there is a regime where , for which our upper bound is then tight.

#### Overview of our main techniques.

Our upper bound (Theorem 1) is obtained by following the framework used in [AMN19], which relies on the estimator from [CDVV14] for the -distance. Indeed, as suggested by [BFRSW00, CDVV14] and then extensively studied in [DK16], there is a reduction from closeness testing in the -distance to closeness testing in the -distance. The efficiency of the reduction, however, depends on the -norm of the distribution. The protocol in [AMN19] thus proceeds in two steps. In the first step, Bob shares some information with Alice about the observed shape of his distribution, so that they can recast their distributions into two distributions and that have smaller -norm while preserving the -distance (i.e., ). In the second step, Alice and Bob use the reduction to closeness testing in the -distance mentioned and implement the estimator of [CDVV14] in the two-party setting. This estimator requires estimating with good precision the

-distance between two vectors. This is done by using a two-party implementation of the sketching method by Alon, Matias and Szegedy

[AMS99].

For the case of low-norm distribution (more precisely, when considering the problem with constant), the first step is unnecessary: the two distributions already have a low enough -norm, so that the reduction to closeness testing in the -distance can be used without any preprocessing. We thus only need to show how to implement the second step from [AMN19] more efficiently using quantum communication. The key idea is to use the quantum algorithm by Montanaro [Montanaro16] which gives a quadratic speedup over the classical sketching method from [AMS99] in the query complexity model. We show how to adapt this quantum algorithm to the two-party setting in Section 3.2.

Our lower bound (Theorem 2) first applies the same reduction as in [AMN19], which reduces some specific version of the Gap-Hamming distance to . In the classical case, [AMN19] showed that the communication complexity of that version of the Gap-Hamming distance is bits. Our main technical contribution proves that the quantum communication complexity of this problem is . We use the pattern matrix method [SherstovSICOMP11]. To obtain our lower bound, we show that the pattern matrix method, which is generally formulated only for total functions, can be generalized to partial functions.

#### Relation with known quantum advantages in testing and learning.

Several quantum algorithms have been designed for property testing and learning theory (we refer to [Arunachalam+17] and [Montanaro+16] for excellent surveys of these fields). In most settings considered so far in quantum learning theory, however, the quantum algorithms crucially exploit the fact that the data can be accessed in a quantum way (e.g., we can query a quantum superposition of samples), which makes it difficult to directly compare the performance of quantum algorithms with the performance of classical methods (which can only access the data in a classical way). The results of the present paper show a quantum advantage, in terms of communication cost, for the setting where both classical and quantum protocols can access the data in the same way – the input is given as a set of classical samples.

#### Open problem.

An intriguing question is whether a similar quantum advantage is achievable when both input distributions have higher -norm, i.e., whether the upper bound we obtain in Theorem 1 holds not only for constant  but also for larger values of . Currently, we do not know how to improve the complexity of the first part of the classical protocol from [AMN19], which (as mentioned above) converts the input distributions into distributions of sufficiently small -norm, using quantum communication. We left this question as an open problem.

## 2 Preliminaries

### 2.1 Definitions and Notations

A typical communication task for Alice and Bob is to compute (sometimes only with some probability of success) a function on some inputs where is given to Alice and to Bob. A communication protocol is an algorithmic description of message sending between Alice and Bob that solves the task for any possible pair of inputs. The communication complexity [KN97] of such function is the minimum required numbers of bits the most efficient protocol solving the task must exchange in the worst case (regarding inputs).

The quantum communication complexity [Wolf02, Brassard04] of a function is the equivalent using qubits instead of bits. Qubits correspond to elements of some Hilbert space of dimension . We will use the bra-ket notation to denote a qubit (and by extension an -qubit string) of a register .

As described in the introduction, Alice’s input consists of samples from a discrete distribution . Bob’s input consists of samples from a discrete distribution . We call (resp. ) the number of samples of Alice (resp. Bob) corresponding to element . We call the occurrence vectors of Alice and Bob, i.e., the vectors with -th coordinate and , respectively.

For a vector , we will denote by the -norm, which is defined as , and denote by the -norm, which is defined as . We use instead of when we neglect factors of logarithmic order in the parameters of the problem (). We denote by

the Poisson distribution with parameter

.

### 2.2 Cdvv [Cdvv14] Estimator

The idea behind the estimator from [CDVV14] is similar to estimation using collisions [BFRSW13], except it assumes Poisson sampling for getting some independence that simplifies the analysis, and therefore needs some corrective terms to shift the mean so that the estimation is unbiased. It is defined by first drawing a number , where is some parameter high enough, and then taking arbitrary samples on each side and computing using the occurrence vectors of those samples:

 Z =√∑i((Xi−Yi)2−Xi−Yi)M =√∑i(Xi−Yi)2−2MM.

### 2.3 Classical Protocol

We now describe the protocol from [AMN19] for the case of small -norm.

The main idea behind the protocol is based on the CDVV estimator described in the previous subsection, combined with a reduction from -distance estimation to -distance estimation. Considering as well the errors of the estimator, after rescaling and shifting from , Andoni, Malkin and Nosatzki found that it is enough to compare some approximation of the term to some threshold to distinguish the two cases. As discussed in the introduction, the original protocol from [AMN19] has a communication step to recast the probability distributions into ones with smaller -norms. Since this step is not necessary for the case of distributions with small -norms, we omit it in the following description.

The analysis from [AMN19], in particular Lemma 6, shows the correctness of the protocol. More precisely, the following statement can be obtained for the case of input distributions with low -norms.

###### Theorem 3.

[AMN19] There exists an absolute constant such that the following holds: for any input distributions and such that , the above protocol correctly distinguish between the case and the case with probability at least 2/3.

The communication complexity is dominated by the third step, which requires bits.

## 3 Quantum Protocol

In this section we describe our quantum protocol for the problem , and prove Theorem 1.

### 3.1 Description of the Whole Protocol

The communication complexity is again dominated by the third step, which requires only qubits, as described in the next subsection. The correctness is guaranteed by the analysis of Theorem 3, since the quantum protocol performs identical calculations as the classical protocol. This proves Theorem 1.

### 3.2 Montanaro Approximation

The classical protocol [AMN19] uses standard techniques, such as the AMS algorithm [AMS99], in order to approximate . The AMS algorithm uses a family of hash functions that are 4-wise independent. Given a list of numbers , the AMS algorithm gives an estimate of by computing random estimates with the following subroutine many times and taking the median of the results.222The idea behind the AMS algorithm is that by developing the square, the “crossed” product terms’ influence should vanish, i.e., for because then , while the terms will always stay.

In the classical setting, this subroutine has to be repeated times to get a -approximation.

Montanaro showed how to achieve the same approximation quantumly using this subroutine only times (see Theorems 12 and 14 in [Montanaro16]). The subroutine, however, needs to be called in superposition, i.e., Montanaro’s approach requires a quantum oracle that performs the following map:

 Of:|i⟩|y⟩→|i⟩|y+f(i)⟩.

In our communication setting, we want to use this approach with . A difficulty is that the data is split between the two parties: only Alice knows  and only Bob knows . We now explain how to overcome this difficulty. For a particular index , Alice can compute , and then transmit it to Bob. Bob can similarly do his own computation , then subtract and square in order to get

 (σa(i)−σb(i))2=(n∑j=1hi(j)⋅(Xj−Yj))2=f(i). (2)

We describe below our implementation of the oracle , based on these ideas. In the following description, Bob’s input is the quantum state for some bit-strings and , where and are two quantum registers.

Note that it is the linear property of the AMS computation (more precisely, Equation (2)) that allows the approximation to be computed even when the data are shared by several parties. Transmitting register requires qubits. Transmitting register requires qubits. The overall communication complexity of the oracle protocol is thus .

An implementation of the inverse can be obtained similarly, by subtracting instead of adding the value of at Step 3. We can thus apply Montanaro’s algorithm [Montanaro16], simply by replacing each oracle query in Montanaro’s algorithm by our distributed implementation of (or ). This enables us to obtain, with high probability, an -approximation of using

 ~O((1/α)(logn+logt))=~O(1/α)

qubits of communication, as claimed.

## 4 Quantum Lower Bound

In this section we prove Theorem 2.

### 4.1 Hamming Reduction

For bit-strings , we denote by the set of indices where and  both have a one, and write for its size. Note that for bit-strings the -norm corresponds to Hamming weight, i.e., the number of ones in the string, and the -distance corresponds to the Hamming distance.

In the classical communication setting, Andoni, Malkin and Nosatzki [AMN19] proved a lower bound for some closeness testing problem by considering the following problem involving the Hamming distance of two binary strings.

Let be a multiple of . Let , and . With probability at least , for with , distinguish between the case where versus .

Notice that if then the equality

 ||x−y||1=2(n/2−|x∩y|)=n−2|x∩y|

holds. The above problem then can be reformulated as the following communication problem that we call PromisedGHD():

PromisedGHD() For , where is a multiple of , with , , and , with probability at least distinguish between and .

Particularly, the reduction of [AMN19] only concerns the regime where .

The main result of this subsection is:

###### Theorem 4.

The quantum communication complexity of PromisedGHD() is .

In order to prove Theorem 4, we will consider a similar problem defined on smaller inputs. More precisely, define the following problem SmallPGHD():

SmallPGHD() For where is a multiple of , with , with probability at least distinguish between and

We first show a reduction from SmallPGHD() to PromisedGHD() for appropriate parameters.

###### Lemma 1.

For any , SmallPGHD() reduces to PromisedGHD().

###### Proof.

Assume are inputs of size to the SmallPGHD() problem. By repeating them

times in a padding fashion, we build inputs

for the PromisedGHD() problem where and therefore . If , we have . Otherwise, if , we have as long as . Indeed, for the upper part of the interval:

 n4−β2=n′24−n′2√32≥n′(n′4−1)

while for the lower part, we need:

 n4−κβ2=n′24−κn′2√32≤n′(n′4−g),

which is equivalent to the bound on mentioned just above. ∎

The hardness of SmallPGHD() will follow from the hardness of a more restricted problem stated in Theorem 5. The latter theorem is the main technical contribution of this section, and we devote Section 4.2 to its proof.

### 4.2 Main Technical Result

We will model partial functions as mappings , where is a finite set and the range element represents undefined output. We let It will be convenient here to represent the Boolean values “true” and “false” by and respectively. This departure from the classical representation (of using and ) has no effect on quantum communication complexity. For a communication problem we let denote the -error quantum communication complexity of with arbitrary prior entanglement. Note that an -error protocol for is allowed to behave arbitrarily on inputs outside

We are now ready to prove the main technical result used in the proof of Theorem 2.

###### Theorem 5.

Let be an integer divisible by Consider the partial communication problem given by

 Fn(x,y)=⎧⎨⎩−1if |x|=|y|=n/2 and |x∩y|=n/4,+1if |x|=|y|=n/2 and |x∩y|=n/4−1,∗otherwise.

Then

The remaining part of this section is devoted to the proof of this theorem. We start by reviewing relevant background on the pattern matrix method [SherstovSICOMP11] for quantum communication lower bounds. Let and be positive integers, where and Partition into contiguous blocks, each with elements:

 [n]={1,2,…,nk}∪{nk+1,…,2nk}∪⋯∪{(k−1)nk+1,…,n}.

Let denote the family of subsets that have exactly one element in each of these blocks (in particular, ). Clearly, For a bit string and a set define the projection of onto by where are the elements of For the -pattern matrix is the matrix given by

 A=[ϕ(x|V⊕w)]x∈{0,1}n,(V,w)∈V(n,k)×{0,1}k.

In words, is the matrix of size  by  whose rows are indexed by strings whose columns are indexed by pairs and whose entries are given by

The pattern matrix method gives a lower bound on the quantum communication complexity of a pattern matrix in terms of the approximate degree of its generating function. We now define this notion formally. Let be given, for a finite subset The -approximate degree of denoted is the least degree of a real polynomial such that for all One generalizes this definition to partial functions by letting be the least degree of a real polynomial with

 |f(x)−π(x)|≤ϵ, x∈domf, |π(x)|≤1+ϵ, x∈X∖domf.

We will need the following version of the pattern matrix method for quantum lower bounds.

###### Theorem 6.

Let be the -pattern matrix, where is given. Then for every and every

 Q∗δ(F) ≥14degϵ(f)log(nk)−12log(3ϵ−2δ).

Theorem 6 is a generalization of the original pattern matrix method of [SherstovSICOMP11] to partial functions. For the reader’s convenience, we give a detailed proof of Theorem 6 in Appendix A.

Proof of Theorem 5. The communication complexity of is monotone in due to As a result, it suffices to prove the theorem for divisible by Under this divisibility assumption, define and consider the function given by

 PMAJk(x)=⎧⎨⎩−1if |x|=k/2,+1if |x|=k/2−1,∗otherwise.

Let be the -pattern matrix. It is a well-known fact [paturi92approx, bun-thaler13and-or-tree] that As a result, Theorem 6 implies that and hence also

Writing makes it clear that is a restriction of the more general communication problem defined by

 G(x,y)=⎧⎨⎩−1if |x|=2k,|y|=k, and |x∩y|=k/2,+1if |x|=2k,|y|=k, and |x∩y|=k/2−1,∗otherwise.

As a result, This in turn implies that because .

### 4.3 Closeness Testing Reduction

In this subsection we explain how the lower bound on the quantum communication complexity of PromisedGHD() (Theorem 4) implies Theorem 2.

Proof of Theorem 2. In [AMN19], Andoni, Malkin and Nosatzki show a reduction from the problem PromisedGHD() to the problem with parameters and .333The reduction is actually stated, in Theorem 9 in [AMN19], as a reduction from PromisedGHD() to the variant of where the number of samples is Poi() instead of exactly . Nevertheless the Poisson version easily reduces to the original problem with number of samples if we allow some extra error, since by Chebyshev’s inequality the probability that the Poisson version gives more than samples is less than . Theorem 4 thus gives us the claimed lower bound. In the remaining of the proof, we show that the distributions used to prove the lower bound have low -norm.

The input distributions of used in the reduction shown in [AMN19], which we will denote and

, are of the following form: half of the mass is uniformly distributed on

elements, and the other half of the mass on other elements, where is some constant. Therefore:

 ||a||2=||b||2 =√d(12d)2+l(12l)2 =12√1d+1l =12√10n+1C0tlogn ≤12√(10+1C0)1tlogn,

since because .

Define as the smallest such that holds. The above calculations show that

 γLW=||a||2ntϵ2≤12√10+1C0ntϵ2√tlogn≤12C3/2√10+1C01√logn

because by (1). Thus . This concludes the proof.

Acknowledgements. Guillaume Malod was partially supported by a JSPS Invitational Fellowships for Research in Japan. Aleksandrs Belovs is supported by the ERDF project number 1.1.1.2/I/16/113. Arturo Castellanos is grateful to Shin-ichi Minato for his support, and also to MEXT. Alexander A. Sherstov was supported by NSF grant CCF-1814947. François Le Gall was supported by JSPS KAKENHI grants Nos. JP16H01705, JP19H04066, JP20H00579, JP20H04139 and by the MEXT Quantum Leap Flagship Program (MEXT Q-LEAP) grant No. JPMXS0118067394.

## Appendix A The Pattern Matrix Method for Partial Functions

The purpose of this appendix is to provide a detailed proof of Theorem 6 for partial functions. Our proof closely follows the original proof of the pattern matrix method in [SherstovSICOMP11], developed there for total functions.

We start by recalling the Fourier transform for functions

For define by Then every function has a unique representation of the form

 f=∑S⊆{1,2,…,n}^f(S)χS,

where The reals are called the Fourier coefficients of

For a real matrix we let denote the sum of the absolute values of the entries of We let denote the spectral norm of Recall that The following theorem [SherstovSICOMP11, Theorem 4.3] determines the spectral norm of a pattern matrix in terms of the Fourier spectrum of its generating function.

###### Theorem 7.

Let be given. Let be the -pattern matrix. Then

 ∥A∥=√2n+k(nk)kmaxS⊆[k]{|^ϕ(S)|(kn)|S|/2}.

We will also need the following dual characterization of approximate degree of partial functions, analogous to the dual characterization for total functions used in [SherstovSICOMP11].

###### Theorem 8.

Let be a given function, an integer. Then if and only if there exists such that

 ∑x∈domff(x)ψ(x)−∑x∉domf|ψ(x)|−ϵ∥ψ∥1>0,

and for

Theorem 8

follows from linear programming duality; see

[SherstovSICOMP11, sherstov11quantum-sdpt] for details.

Next, we derive a version of the generalized discrepancy method for partial functions, by adapting the analogous proof in [SherstovSICOMP11, Theorem 2.8] for total functions.

###### Theorem 9.

Let be finite sets and a given function. Let be any real matrix with Then for each

 4Q∗ϵ(F)≥13∥Ψ∥√|X||Y|⎛⎝∑(x,y)∈domFΨx,yF(x,y)−∑(x,y)∉domF|Ψx,y|−2ϵ⎞⎠.
###### Proof.

Let be a quantum protocol with prior entanglement that computes with error and cost Let be the matrix of acceptance probabilities of so that is the probability that accepts the input It is shown in the proof of [SherstovSICOMP11, Theorem 2.8] that

 ∑x∈X∑y∈YΨx,y(1−2Πx,y) ≤∥Ψ∥(2⋅4C+1)√|X||Y|. (3)

Now observe that ranges in on and is bounded in absolute value by otherwise. This gives

 ∑x∈X∑y∈YΨx,y (1−2Πx,y) ≥∑(x,y)∈domF(Ψx,yF(x,y)−2ϵ|Ψx,y|)−∑(x,y)∉domF|Ψx,y| ≥∑(x,y)∈domFΨx,yF(x,y)−2ϵ−∑(x,y)∉domF|Ψx,y|, (4)

where the last step uses The theorem follows by comparing the upper bound (3) with the lower bound (4). ∎

We are now in a position to prove Theorem 6, which we restate here for the reader’s convenience.

###### Theorem 10 (restatement of Theorem 6).

Let be the -pattern matrix, where is given. Then for every and every

 Q∗δ(F) ≥14degϵ(f)log(nk)−12log(3ϵ−2δ). (5)
###### Proof.

Let By Theorem 8, there is a function such that:

 ^ψ(S)=0 (|S|ϵ. (8)

Let be the -pattern matrix. Then (7) and (8) show that

 ∥Ψ∥1=1, (9) ∑(x,y)∈domFFx,yΨx,y−∑(x,y)∉domF|Ψx,y|>ϵ. (10)

Our last task is to calculate It follows from (7) that

 maxS⊆[k]|^ψ(S)|≤2−k. (11)

Theorem 7 yields, in view of (6) and (11):

 (12)

Now (5) follows from (9), (10), (12), and Theorem 9. ∎