# On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product

In this paper we study the (Bichromatic) Maximum Inner Product Problem (Max-IP), in which we are given sets A and B of vectors, and the goal is to find a ∈ A and b ∈ B maximizing inner product a · b. Max-IP is very basic and serves as the base problem in the recent breakthrough of [Abboud et al., FOCS 2017] on hardness of approximation for polynomial-time problems. It is also used (implicitly) in the argument for hardness of exact ℓ_2-Furthest Pair (and other important problems in computational geometry) in poly-log-log dimensions in [Williams, SODA 2018]. We have three main results regarding this problem. First, we study the best multiplicative approximation ratio for Boolean Max-IP in sub-quadratic time. We show that, for Max-IP with two sets of n vectors from {0,1}^d, there is an n^2 - Ω(1) time ( d/ n )^Ω(1)-multiplicative-approximating algorithm, and we show this is conditionally optimal, as such a (d/ n)^o(1)-approximating algorithm would refute SETH. Second, we achieve a similar characterization for the best additive approximation error to Boolean Max-IP. We show that, for Max-IP with two sets of n vectors from {0,1}^d, there is an n^2 - Ω(1) time Ω(d)-additive-approximating algorithm, and we show this is conditionally optimal, as such an o(d)-approximating algorithm would refute SETH. Last, we revisit the hardness of solving Max-IP exactly for vectors with integer entries. We show that, under SETH, for Max-IP with sets of n vectors from Z^d for some d = 2^O(^* n), every exact algorithm requires n^2 - o(1) time. With the reduction from [Williams, SODA 2018], it follows that ℓ_2-Furthest Pair and Bichromatic ℓ_2-Closest Pair in 2^O(^* n) dimensions require n^2 - o(1) time.

## Authors

• 15 publications
11/29/2018

### An Equivalence Class for Orthogonal Vectors

The Orthogonal Vectors problem (OV) asks: given n vectors in {0,1}^O( n)...
05/27/2018

### Toward Super-Polynomial Size Lower Bounds for Depth-Two Threshold Circuits

Proving super-polynomial size lower bounds for TC^0, the class of consta...
04/24/2020

### Optimal Streaming Approximations for all Boolean Max-2CSPs

We prove tight upper and lower bounds on approximation ratios of all Boo...
07/09/2019

### Global Cardinality Constraints Make Approximating Some Max-2-CSPs Harder

Assuming the Unique Games Conjecture, we show that existing approximatio...
09/30/2019

### Understanding and Improving Proximity Graph based Maximum Inner Product Search

The inner-product navigable small world graph (ip-NSW) represents the st...
10/15/2018

### An Illuminating Algorithm for the Light Bulb Problem

The Light Bulb Problem is one of the most basic problems in data analysi...
08/23/2019

### Revisiting Wedge Sampling for Budgeted Maximum Inner Product Search

Top-k maximum inner product search (MIPS) is a central task in many mach...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We study the following fundamental problem from similarity search and statistics, which asks to find the most correlated pair in a dataset:

###### Definition 1.1 (Bichromatic Maximum Inner Product (Max-IP)).

For , the problem is defined as: given two sets of vectors from compute

 OPT(A,B):=maxa∈A,b∈Ba⋅b.

We use () to denote the same problem, but with being sets of vectors from ().

##### Hardness of Approximation Max-IP.

A natural brute-force algorithm solves Max-IP in -time. Assuming SETH111SETH (Strong Exponential Time Hypothesis) states that for every there is a such that -SAT cannot be solved in time [IP01]., there is no -time algorithm for when  [Wil05].

Despite being one of the most central problems in similarity search and having numerous applications [IM98, AI06, RR07, RG12, SL14, AINR14, AIL15, AR15, NS15, SL15, Val15, AW15, KKK16, APRS16, TG16, CP16, Chr17], until recently it was unclear whether there could be a near-linear-time, -approximating algorithm, before the recent breakthrough of Abboud, Rubinstein and Williams [ARW17] (see [ARW17] for a thorough discussion on the state of affairs on hardness of approximation in P before their work).

In [ARW17], a framework for proving inapproximability results for problems in is established (the distributed PCP framework), from which it follows:

###### Theorem 1.2 ([Arw17]).

Assuming SETH, there is no -multiplicative-approximating -time algorithm for .

Theorem 1.2 is an exciting breakthrough for hardness of approximation in , implying other important inapproximability results for a host of problems including Bichromatic LCS Closest Pair Over Permutations, Approximate Regular Expression Matching, and Diameter in Product Metrics [ARW17]. However, we still do not have a complete understanding of the approximation hardness of Max-IP yet. For instance, consider the following two concrete questions:

###### Question 1.

Is there a -multiplicative-approximating -time algorithm for ? What about a -multiplicative-approximating for ?

###### Question 2.

Is there a -additive-approximating -time algorithm for ?

We note that the lower bound from [ARW17] cannot answer Question 1. Tracing the details of their proofs, one can see that it only shows approximation hardness for dimension . Question 2 concerning additive approximation is not addressed at all by [ARW17]. Given the importance of Max-IP, it is interesting to ask:

For what ratios do -time -approximation algorithms exist for Max-IP?

Does the best-possible approximation ratio (in time) relate to the dimensionality, in some way?

In an important recent work, Rubinstein [Rub18] improved the distributed PCP construction in a very crucial way, from which one can derive more refined lower bounds on approximating Max-IP. Building on its technique, in this paper we provide full characterizations, determining essentially optimal multiplicative approximations and additive approximations to Max-IP, under SETH.

##### Hardness of Exact Z-Max-IP.

Recall that from [Wil05], there is no -time algorithm for exact Boolean . Since in real life applications of similarity search, one often deals with real-valued data instead of just Boolean data, it is natural to ask about -Max-IP (which is certainly a special case of ): what is the maximum such that can be solved exactly in time?

Besides being interesting in its own right, there are also reductions from -Max-IP to -Furthest Pair and Bichromatic -Closest Pair. Hence, lower bounds for -Max-IP imply lower bounds for these two famous problems in computational geometry (see [Wil18] for a discussion on this topic).

Prior to our work, it was implicitly shown in [Wil18] that:

###### Theorem 1.3 ([Wil18]).

Assuming SETH, there is no -time algorithm for with vectors of -bit entries.

However, the best known algorithm for -Max-IP runs in time [Mat92, AESW91, Yao82]222[AESW91, Yao82] are for -Furthest Pair or Bichromatic -Closest Pair. They also work for -Max-IP as there are reductions from -Max-IP to these two problems, see [Wil18] or Lemma 4.5 and Lemma 4.6., hence there is still a gap between the lower bound and the best known upper bounds. To confirm these algorithms are in fact optimal, we would like to prove a lower bound with dimensions.

In this paper, we significantly strength the previous lower bound from dimensions to dimensions ( is an extremely slow-growing function, see preliminaries for its formal definition).

### 1.1 Our Results

We use to denote the Orthogonal Vectors problem: given two sets of vectors each consisting of vectors from , determine whether there are and such that .333Here we use the bichromatic version of OV instead of the monochromatic one for convenience, as they are equivalent. Similarly, we use to denote the same problem except for that consists of vectors from (which is also called Hopcroft’s problem).

All our results are based on the following widely used conjecture about OV:

###### Conjecture 1.4 (Orthogonal Vectors Conjecture (OVC) [Wil05, Avw14]).

For every , there exists a such that requires time when .

OVC is a plausible conjecture as it is implied by the popular Strong Exponential Time Hypothesis [IP01, CIP09] on the time complexity of solving - [Wil05, WY14].

### Characterizations of Hardness of Approximate Max-IP

The first main result of our paper characterizes when there is a truly sub-quadratic time ( time, for some universal constant hidden in the big-) -multiplicative-approximating algorithm for Max-IP, and characterizes the best-possible additive approximations as well. We begin with formal definitions of these two standard types of approximation:

• We say an algorithm for () is -multiplicative-approximating, if for all , outputs a value such that .

• We say an algorithm for () is -additive-approximating, if for all , outputs a value such that .

• To avoid ambiguity, we call an algorithm computing exactly an exact algorithm for ().

##### Multiplicative Approximations for Max-IP.

In the multiplicative case, our characterization (formally stated below) basically says that there is a -multiplicative-approximating -time algorithm for if and only if . Note that in the following theorem we require , since in the case of , there are -time algorithms for exact  [AW15, ACW16].

###### Theorem 1.5.

Letting and ,444Note that and are both functions of , we assume they are computable in time throughout this paper for simplicity. the following holds:

1. There is an -time -multiplicative-approximating algorithm for if

 t=(d/logn)Ω(1),

and under SETH (or OVC), there is no -time -multiplicative-approximating algorithm for if

 t=(d/logn)o(1).
2. Moreover, let . There are -multiplicative-approximating deterministic algorithms for running in time

 O⎛⎝n2+o(1)−0.31⋅1ε−1+0.312⎞⎠=O(n2+o(1)−Ω(ε))

or time

 O⎛⎝n2−0.17⋅1ε−1+0.172⋅polylog(n)⎞⎠=O(n2−Ω(ε)⋅polylog(n)).
###### Remark 1.6.

The first algorithm is slightly faster, but only truly quadratic when , while the second algorithm still gets a non-trivial speed up over the brute force algorithm as long as .

We remark here that the above algorithms indeed work for the case where the sets consisting of non-negative reals (i.e., -Max-IP):

###### Corollary 1.7.

Assuming and letting , there is a -multiplicative-approximating deterministic algorithm for running in time

 O(n2−Ω(ε)⋅polylog(n)).

The lower bound is a direct corollary of the new improved protocols for Set-Disjointness from [Rub18], which is based on Algebraic Geometry codes. Together with the framework of [ARW17], that -protocol implies a reduction from OV to approximating Max-IP.

Our upper bounds are application of the polynomial method [Wil14, AWY15]: defining appropriate sparse polynomials for approximating Max-IP on small groups of vectors, and use fast matrix multiplication to speed up the evaluation of these polynomials on many pairs of points.

Via the known reduction from Max-IP to LCS-Pair in [ARW17], we also obtain a more refined lower bound for approximating the LCS Closest Pair problem (defined below).

###### Definition 1.8 (LCS Closest Pair).

The problem is: given two sets of strings from ( is a finite alphabet), determine

 maxa∈A,b∈BLCS(a,b),

where is the length of the longest common subsequence of strings and .

###### Corollary 1.9 (Improved Inapproximability for LCS-Closest-Pair).

Assuming SETH (or OVC), for every , -multiplicative-approximating  requires time, if .

##### A Different Approach Based on Approximate Polynomial for OR.

Making use of the -degree approximate polynomial for  [BCDWZ99, dW08], we also give a completely different proof for the hardness of multiplicative approximation to -Max-IP.555That is, Max-IP with sets and being vectors from . Lower bound from that approach is inferior to Theorem 1.5: in particular, it cannot achieve a characterization.

It is asked in [ARW17] that whether we can make use of the communication protocol for Set-Disjointness [BCW98] to prove conditional lower bounds. Indeed, that quantum communication protocol is based on the -time quantum query algorithm for (Grover’s algorithm [Gro96]), which induces the needed approximate polynomial for . Hence, the following theorem in some sense answers their question in the affirmative:

###### Theorem 1.10 (Informal).

Assuming SETH (or OVC), there is no time -multiplicative-approximating algorithm for .

The full statement can be found in Theorem C.1 and Theorem C.2.

Our characterization for additive approximations to Max-IP says that there is a -additive-approximating -time algorithm for if and only if .

###### Theorem 1.11.

Letting and , the following holds:

1. There is an -time -additive-approximating algorithm for if

 t=Ω(d),

and under SETH (or OVC), there is no -time -additive-approximating algorithm for if

 t=o(d).
2. Moreover, letting , there is an

 O(n2−Ω(ε1/3/logε−1))

time, -additive-approximating randomized algorithm for when .

The lower bound above is already established in [Rub18], while the upper bound works by reducing the problem to the case via random-sampling coordinates, and solving the reduced problem via known methods [AW15, ACW16].

###### Remark 1.12.

We want to remark here that the lower bounds for approximating Max-IP are direct corollaries of the new protocols for Set-Disjointness in [Rub18]. Our main contribution is providing the complementary upper bounds to show that these lower bounds are indeed tight assuming .

##### All-Pair-Max-IP.

Finally, we remark here that our algorithms (with slight adaptions) also work for the following stronger problem666Since All-Pair-Max-IP is stronger than Max-IP, lower bounds for Max-IP automatically apply for All-Pair-Max-IP.: , in which we are given two sets and of vectors from , and for each we must compute . An algorithm is -multiplicative-approximating (additive-approximating) for All-Pair-Max-IP if for all ’s, it computes corresponding approximating answers.

###### Corollary 1.13.

Suppose , and let

 εM:=min(logtlog(d/logn),1) and εA:=min(t,d)d.

There is an time -multiplicative-approximating algorithm and an time -additive-approximating algorithm for , when .

### Hardness of Exact Z-Max-IP in 2O(log∗n) Dimensions

Thirdly, we show that -Max-IP is hard to solve in time, even with -dimensional vectors:

###### Theorem 1.14.

Assuming SETH (or OVC), there is a constant such that any exact algorithm for for dimensions requires time, with vectors of -bit entries.

As direct corollaries of the above theorem, using reductions implicit in [Wil18], we also conclude hardness for -Furthest Pair and Bichromatic -Closest Pair under SETH (or OVC) in dimensions.

###### Theorem 1.15 (Hardness of ℓ2-Furthest Pair in clog∗n Dimensions).

Assuming SETH (or OVC), there is a constant such that -Furthest Pair in dimensions requires time, with vectors of -bit entries.

###### Theorem 1.16 (Hardness of Bichromatic ℓ2-Closest Pair in clog∗n Dimensions).

Assuming SETH (or OVC), there is a constant such that Bichromatic -Closest Pair in dimensions requires time, with vectors of -bit entries.

The above lower bounds on -Furthest Pair and Bichromatic -Closest Pair are in sharp contrast with the case of -Closest Pair, which can be solved in time [BS76, KM95, DHKP97].

### Improved Dimensionality Reduction for Ov and Hopcroft’s Problem

Our hardness of -Max-IP is established by a reduction from Hopcroft’s problem, whose hardness is in turn derived from the following significantly improved dimensionality reduction for OV.

###### Lemma 1.17 (Improved Dimensionality Reduction for Ov).

Let . There is an

 O(n⋅ℓO(6log∗d⋅(d/ℓ))⋅poly(d))-time

reduction from to instances of , with vectors of entries with bit-length .

##### Comparison with [Wil18].

Comparing to the old construction in [Wil18], our reduction here is more efficient when is much smaller than (which is the case we care about). That is, in [Wil18], can be reduced to instances of , while we get instances in our improved one. So, for example, when , the old reduction yields instances (recall that for an arbitrary constant ), while our improved one yields only instances, each with dimensions.

From Lemma 1.17, the following theorem follows in the same way as in [Wil18].

###### Theorem 1.18 (Hardness of Hopcroft’s Problem in clog∗n Dimensions).

Assuming SETH (or OVC), there is a constant such that with vectors of -bit entries requires time.

### Connection between Z-Max-IP lower bounds and NP⋅UPP communication protocols

We also show a new connection between -Max-IP and a special type of communication protocol. Let us first recall the Set-Disjointness problem:

###### Definition 1.19 (Set-Disjointness).

Let , in Set-Disjointness (), Alice holds a vector , Bob holds a vector , and they want to determine whether .

Recall that in [ARW17], the hardness of approximating Max-IP is established via a connection to communication protocols (in particular, a fast communication protocol for Set-Disjointness). Our lower bound for (exact) -Max-IP can also be connected to similar protocols (note that ).

Formally, we define protocols as follows:

###### Definition 1.20.

For a problem with inputs of length (Alice holds and Bob holds ), we say a communication protocol is an -efficient communication protocol if the following holds:

• There are three parties Alice, Bob and Merlin in the protocol.

• Merlin sends Alice and Bob an advice string of length , which is a function of and .

• Given and , Bob sends Alice bits, and Alice decides to accept or not.777In , actually one-way communication is equivalent to the seemingly more powerful one in which they communicate [PS86]. They have an unlimited supply of private random coins (not public, which is important) during their conversation. The following conditions hold:

• If , then there is an advice

from Merlin such that Alice accepts with probability

.

• Otherwise, for all possible advice strings from Merlin, Alice accepts with probability .

Moreover, we say the protocol is

-computational-efficient, if in addition the probability distributions of both Alice and Bob’s behavior can be computed in

time given their input and the advice.

Our new reduction from OV to Max-IP actually implies a super-efficient protocol for Set-Disjointness.

###### Theorem 1.21.

For all , there is an

 (α⋅6log∗n⋅(n/2α),O(α))-% computational-efficient

communication protocol for .

For example, when , Theorem 1.21 implies there is an -computational-efficient communication protocol for . Moreover, we show that if the protocol of Theorem 1.21 can be improved a little (removing the term), we would obtain the desired hardness for -Max-IP in -dimensions.

###### Theorem 1.22.

Assuming SETH (or OVC), if there is an increasing and unbounded function such that for all , there is an

 (n/f(α),α)-computational-efficient

communication protocol for , then requires time with vectors of -bit entries. The same holds for -Furthest Pair and Bichromatic -Closest Pair.

### Improved MA Protocols for Set-Disjointness

Finally, we also obtain a new protocol for Set-Disjointness, which improves on the previous protocol in [AW09], and is closer to the lower bound by [Kla03]. Like the protocol in [AW09], our new protocol also works for the following slightly harder problem Inner Product.

###### Definition 1.23 (Inner Product).

Let , in Inner Product (), Alice holds a vector , Bob holds a vector , and they want to compute .

###### Theorem 1.24.

There is an protocol for and with communication complexity

 O(√nlognloglogn).

In [Rub18], the author asked whether the communication complexity of DISJ (IP) is or , and suggested that may be necessary for IP. Our result makes progress on that question by showing that the true complexity lies between and .

### 1.2 Intuition for Dimensionality Self Reduction for Ov

The factor in Lemma 1.17 is not common in theoretical computer science888Other examples include an algorithm for  [Mat93], algorithms (Fürer’s algorithm with its modifications) for Fast Integer Multiplication [Für09, CT15, HVDHL16] and an old time algorithm for Klee’s measure problem [Cha08]., and our new reduction for OV is considerably more complicated than the polynomial-based construction from [Wil18]. Hence, it is worth discussing the intuition behind Lemma 1.17, and the reason why we get a factor of .

##### A Direct Chinese Remainder Theorem Based Approach.

We first discuss a direct reduction based on the Chinese Remainder Theorem (CRT) (see Theorem 2.5 for a formal definition). CRT says that given a collection of primes , and a collection of integers , there exists a unique integer such that for each (CRR stands for Chinese Remainder Representation).

Now, let , suppose we would like to have a dimensionality reduction from to . We can partition an input into blocks, each of length , and represent each block via CRT: that is, for a block , we map it into a single integer , and the concatenations of over all blocks of is .

The key idea here is that, for , is simply . That is, the multiplication between two integers simulates the coordinate-wise multiplication between two vectors and !

Therefore, if we make all primes larger than , we can in fact determine from , by looking at for each . That is,

 x⋅y=0⇔φ(x)⋅φ(y)≡0(modqi)for all i.

Hence, let be the set of all integer that for all , we have

 x⋅y=0⇔φ(x)⋅φ(y)∈V.

The reduction is completed by enumerating all integers , and appending corresponding values to make and (this step is from [Wil18]).

Note that a nice property for is that each only depends on the -th block of , and the mapping is the same on each block (); we call this the block mapping property.

##### Analysis of the Direct Reduction.

To continue building intuition, let us analyze the above reduction. The size of is the number of instances we create, and . These primes have to be all distinct, and it follows that is . Since we want to create at most instances (or for arbitrarily small ), we need to set . Moreover, to base our hardness on OVC which deals with -dimensional vectors, we need to set for an arbitrary constant . Therefore, we must have , and the above reduction only obtains the same hardness result as [Wil18].

##### Key Observation: “Most Space Modulo qi” is Actually Wasted.

To improve the above reduction, we need to make smaller. Our key observation about is that, for the primes ’s, they are mostly larger than , but for all these ’s. Hence, “most space modulo ” is actually wasted.

##### Make More “Efficient” Use of the “Space”: Recursive Reduction.

Based on the previous observation, we want to use the “space modulo ” more efficiently. It is natural to consider a recursive reduction. We will require all our primes ’s to be larger than . Let be a very small integer compared to , and let with a set and a block mapping be a similar reduction on a much smaller input: for , . We also require here that for all and .

For an input and a block of , our key idea is to partition again into “micro” blocks each of size . And for a block in , let be its micro blocks, we map into an integer .

Now, given two blocks , we can see that

 φblock(z)⋅φblock(z′)≡ψblock(zi)⋅ψblock(z′i)(modqi).

That is, in fact is equal to , where is the concatenation of the -th micro blocks of in each block, and is defined similarly. Hence, we can determine whether from for all , and therefore also determine whether from .

We can now observe that , smaller than before; thus we get an improvement, depending on how large can be. Clearly, the reduction can also be constructed from even smaller reductions, and after recursing times, we can switch to the direct construction discussed before. By a straightforward (but tedious) calculation, we can derive Lemma 1.17.

##### High-Level Explanation on the 2O(log∗n) Factor.

Ideally, we want to have a reduction from OV to -OV with only instances, in other words, we want . The reason we need to pay an extra factor in the exponent is as follows:

In our reduction, is at least , which is also the bound on each coordinate of the reduction: equals to a encoding of a vector with , whose value can be as large as . That is, all we want is to control the upper bound on the coordinates of the reduction.

Suppose we are constructing an “outer” reduction from the “micro” reduction with coordinate upper bound (), and let (that is, is the extra factor comparing to the ideal case). Recall that we have to ensure to make our construction work, and therefore we have to set larger than .

Then the coordinate upper bound for becomes . Therefore, we can see that after one recursion, the “extra factor” at least doubles. Since our recursion proceeds in rounds, we have to pay an extra factor on the exponent.

### 1.3 Related Works

##### SETH-based Conditional Lower Bound.

SETH is one of the most fruitful conjectures in the Fine-Grained Complexity. There are numerous conditional lower bounds based on it for problems in among different areas, including: dynamic data structures [AV14], computational geometry [Bri14, Wil18, DKL16]

[AVW14, BI15, BI16, BGL16, BK18], graph algorithms [RV13, GIKW17, AVY15, KT17]. See [Vas18] for a very recent survey on SETH-based lower bounds (and more).

##### Communication Complexity and Conditional Hardness.

The connection between communication protocols (in various model) for Set-Disjointness and SETH dates back at least to [PW10], in which it is shown that a sub-linear, computational efficient protocol for -party Number-On-Forehead Set-Disjointness problem would refute SETH. And it is worth mentioning that [AR18]’s result builds on the IP communication protocol for Set-Disjointness in [AW09].

##### Distributed PCP.

Using Algebraic Geometry codes, [Rub18] obtains a better protocol, which in turn improves the efficiency of the previous distributed PCP construction of [ARW17]. He then shows the time hardness for -approximation to Bichromatic Closest Pair and -additive approximation to with this new technique.

[KLM17] use the Distributed PCP framework to derive inapproximability results for -Dominating Set under various assumptions. In particular, building on the techniques of [Rub18], it is shown that under SETH, -Dominating Set has no approximation in time999where is some function.

##### Hardness of Approximation in P.

Making use of Chebychev embeddings, [APRS16] prove a inapproximability lower bound on .101010which is improved by Theorem 1.10 [AB17] take an approach different from Distributed PCP, and shows that under certain complexity assumptions, LCS does not have a deterministic -approximation in time. They also establish a connection with circuit lower bounds and show that the existence of such a deterministic algorithm implies does not have non-uniform linear-size Valiant Series Parallel circuits. In [AR18], it is improved to that any constant factor approximation deterministic algorithm for LCS in time implies that does not have non-uniform linear-size circuits. See [ARW17] for more related results in hardness of approximation in .

### Organization of the Paper

In Section 2, we introduce the needed preliminaries for this paper. In Section 3, we prove our characterizations for approximating Max-IP and other related results. In Section 4, we prove dimensional hardness for -Max-IP and other related problems. In Section 5, we establish the connection between communication protocols and SETH-based lower bounds for exact -Max-IP. In Section 6, we present the protocol for Set-Disjointness.

## 2 Preliminaries

We begin by introducing some notation. For an integer , we use to denote the set of integers from to . For a vector , we use to denote the -th element of .

We use to denote the logarithm of with respect to base with ceiling as appropriate, and to denote the natural logarithm of .

In our arguments, we use the iterated logarithm function , which is defined recursively as follows:

 log∗(n):={0n≤1;log∗(logn)+1n>1.

### 2.1 Fast Rectangular Matrix Multiplication

Similar to previous algorithms using the polynomial method, our algorithms make use of the algorithms for fast rectangular matrix multiplication.

###### Theorem 2.1 ([Gu18]).

There is an time algorithm for multiplying two matrices and with size and , where .

###### Theorem 2.2 ([Cop82]).

There is an time algorithm for multiplying two matrices and with size and , where .

### 2.2 Number Theory

Here we recall some facts from number theory. In our reduction from OV to -OV

, we will apply the famous prime number theorem, which supplies a good estimate of the number of primes smaller than a certain number. See e.g.

[Apo13] for a reference on this.

###### Theorem 2.3 (Prime Number Theorem).

Let be the number of primes , then we have

 limn→∞π(n)n/lnn=1.

From a simple calculation, we obtain:

###### Lemma 2.4.

There are distinct primes in for a large enough .

###### Proof.

For a large enough , from the prime number theorem, the number of primes in is equal to

 π(n2)−π(n)∼n2/2lnn−n/lnn≫10n.

Next we recall the Chinese remainder theorem, and Chinese remainder representation.

###### Theorem 2.5.

Given pairwise co-prime integers , and integers , there is exactly one integer such that

 t≡ri(modqi)for all i∈[d].

We call this the Chinese remainder representation (or the CRR encoding) of the ’s (with respect to these ’s). We also denote

 t=CRR({ri};{qi})

for convenience. We sometimes omit the sequence for simplicity, when it is clear from the context.

Moreover, can be computed in polynomial time with respect to the total bits of all the given integers.

### 2.3 Communication Complexity

In our paper we will make use of a certain kind of protocol, we call them -efficient protocols111111Our notations here are adopted from [KLM17]. They also defined similar -party communication protocols, while we only discuss -party protocols in this paper..

###### Definition 2.6.

We say an Protocol is -efficient for a communication problem, if in the protocol:

• There are three parties Alice, Bob and Merlin in the protocol, Alice holds input and Bob holds input .

• Merlin sends an advice string of length to Alice, which is a function of and .

• Alice and Bob jointly toss coins to obtain a random string of length .

• Given and , Bob sends Alice a message of length .

• After that, Alice decides whether to accept or not.

• When the answer is yes, Merlin has exactly one advice such that Alice always accept.

• When the answer is no, or Merlin sends the wrong advice, Alice accepts with probability at most .

### 2.4 Derandomization

We make use of expander graphs to reduce the amount of random coins needed in one of our communication protocols. We abstract the following result for our use here.

###### Theorem 2.7 (see e.g. Theorem 21.12 and Theorem 21.19 in [Ab09]).

Let be an integer, and set . Suppose . There is a universal constant such that for all , there is a -time computable function , such that

 Prw∈{0,1}logm+c1⋅logε−1[a∉B% for all a∈F(w)]≤ε,

here means is one of the element in the sequence .

## 3 Hardness of Approximate Max-IP

In this section we prove our characterizations of approximating Max-IP.

### 3.1 The Multiplicative Case

We begin with the proof of Theorem 1.5. We recap it here for convenience.

Reminder of Theorem 1.5 Letting and , the following holds:

1. There is an -time -multiplicative-approximating algorithm for if

 t=(d/logn)Ω(1),

and under SETH (or OVC), there is no -time -multiplicative-approximating algorithm for if

 t=(d/logn)o(1).
2. Moreover, let . There are -multiplicative-approximating deterministic algorithms for running in time

 O⎛⎝n2+o(1)−0.31⋅1ε−1+0.312⎞⎠=O(n2+o(1)−Ω(ε))

or time

 O⎛⎝n2−0.17⋅1ε−1+0.172⋅polylog(n)⎞⎠=O(n2−Ω(ε)⋅polylog(n)).

In Lemma 3.2, we construct the desired approximate algorithm and in Lemma LABEL:lm:lowb-Max-IP-M we prove the lower bound.

#### The Algorithm

First we need the following simple lemma, which says that the -th root of the sum of the -th powers of non-negative reals gives a good approximation to their maximum.

###### Lemma 3.1.

Let be a set of non-negative real numbers, be an integer, and . We have

 (∑x∈Sxk)1/k∈[xmax,xmax⋅|S|1/k].
###### Proof.

Since

 (∑x∈Sxk)∈[xkmax,|S|⋅xkmax],

the lemma follows directly by taking the -th root of both sides.

###### Lemma 3.2.

Assuming and letting , there are -multiplicative-approximating deterministic algorithms for running in time

 O⎛⎝n2+o(1)−0.31⋅1ε−1+0.312⎞⎠=O(n2+o(1)−Ω(ε))

or time

 O⎛⎝n2−0.17⋅1ε−1+0.172⋅polylog(n)⎞⎠=O(n2−Ω(ε)⋅polylog(n)).
###### Proof.

Let . From the assumption, we have , and . When , we simply use a -multiplicative-approximating algorithm instead, hence in the following we assume . We begin with the first algorithm here.

##### Construction and Analysis of the Power of Sum Polynomial Pr(z).

Let be a parameter to be specified later and be a vector from , consider the following polynomial

 Pr(z):=(d∑i=1zi)r.

Observe that since each takes value in , we have for . Therefore, by expanding out the polynomial and replacing all with by , we can write as

 Pr(z)=∑S⊆[d],|S|≤rcS⋅zS.

In which , and the ’s are the corresponding coefficients. Note that has

 m:=r∑k=0(dk)≤(edr)r

terms.

Then consider , plugging in , it can be written as

 Pr(x,y):=∑S⊆[d],|S|≤rcS⋅xS⋅yS,

where , and is defined similarly.

##### Construction and Analysis of the Batch Evaluation Polynomial Pr(X,Y).

Now, let and be two sets of vectors from , we define

 Pr(X,Y):=∑x∈X,y∈Y