1 Introduction
A matrix is rigid if it is far in Hamming distance from low rank matrices; it is explicit if its entries are computable in polynomial time. A classic result of Valiant proves that explicit rigid matrices imply superlinear lower bounds for linear circuits [35], a major open problem in computational complexity [34, 37]. Implications of new lower bounds for communication complexity and other models are also known [24, 39]. Unfortunately, the current bounds for explicit matrices are very far from the required parameters [16, 33], and natural candidates (e.g., Fourier and Hadamard matrices) have been discovered to be less rigid than desired [3, 12, 14]. This motivates alternative avenues for constructing rigid matrices. Recently, multiple connections between data structures and circuits have arisen [7, 11, 13, 38]. The premise of these results is that hard problems for these models may shed new light on rigid matrices and circuits. We take a similar angle, studying a generic linear problem for a model that resembles a depthtwo circuit with linear gates.
Valiant’s result concerns arithmetic circuits computing the linear map for a matrix . In other words, the circuit computes the inner products between and the rows of . We study a related data structure problem, the inner product problem. The task is to preprocess an bit vector to compute inner products over for queries , where is the query set. This problem generalizes the prefixsum problem [17] and vectormatrixvector problem [8, 23].
We consider solving this problem using a restricted data structure model, the systematic linear model. This model may only store verbatim along with a small number of redundant bits, which are the evaluations of linear functions of . To compute for , the query algorithm must output a linear function of these bits along with any bits of , where is the query time. We motivate this model with a simple upper bound. Suppose that the query set happens to be close to an dimensional subspace . More precisely, assume that for any , where and denotes the Hamming distance. The systematic linear model will store bits that correspond to inner products between and some vectors that form a basis for . The query algorithm computes by invoking the identity , using any vector with . Indeed, the precomputed bits suffice to determine , and at most bits of are needed to calculate .
We observe that rigidity exactly captures the complexity of the inner product problem in the above model. This connection uses a notion of rigid sets, defined by Alon, Panigrahy and Yekhanin [5]. Our result shows that an efficient algorithm exists in the above model if and only if the query set is not rigid in their sense. Conversely, it is possible to derive new rigidity lower bounds by proving lower bounds for the systematic linear model. A parameter of interest is the size of the rigid set, which corresponds to the number of queries in the inner product problem.
Dvir, Golovnev, and Weinstein also demonstrate a connection between rigidity and a different linear model, which is a restriction of the cell probe model [13]. This model stores linear functions, and the query algorithm outputs a linear function of of these bits. For the inner product problem with query set , they show that a lower bound for linear data structures leads to a semiexplicit rigid set. When , their result uses a time algorithm that requires access to an oracle. Compared to their work, our connection preserves explicitness and offers a twoway equivalence via the systematic linear model. In particular, when , a lower bound of in the systematic linear model implies that is rigid with better parameters than known results. Their work requires a lower bound of against the linear model, and the resulting set is not explicit. Our results also extend to show that linear data structure lower bounds lead to explicit rigid matrices. However, compared to the work of Dvir, Golovnev, and Weinstein, we require stronger lower bounds to achieve new rigidity parameters.
As an application of our framework, we provide new results for the vectormatrixvector problem. The task is to preprocess a 01 matrix to compute when given vectors as the query. The boolean semiring version of this problem has received much recent attention due to connections to the online matrixvector multiplication conjecture [18]. Moreover, this problem has motivated the study of data structures for a super polynomial number of queries, even when the output is binary [8, 9]. Other prior work has either studied binary output problems with queries (see e.g. [28, 30]) or achieved better lower bounds by looking at multioutput problems (see e.g. [10, 20]). In general, the vectormatrixvector problem is a good testbed for proving better data structure lower bounds, because linear algebraic tools could provide new insights.
The variant of this problem specializes the inner product problem because equals the inner product of and (viewed as vectors). The query set consists of matrices with rank one; its size satisfies . As another contribution, we lower bound the rigidity of this set, and consequently, we obtain a query time lower bound of for the systematic linear model with redundancy . Any asymptotically better lower bounds for this problem (in the systematic linear model) would directly imply that this query set is rigid with better parameters than the currently known results for explicit matrices [4, 5].
As a final result, we prove a new cell probe lower bound for the vectormatrixvector problem, without restrictions on the data structure. Our result improves the current best lower bound due to Chattopadhyay, Koucký, Loff, and Mukhopadhyay [9]. Our lower bound matches the limit of present techniques and achieves the current best timespace tradeoff in terms of query set size.
1.1 Rigid sets, systematic linear model, and the inner product partial function
Throughout, let and and denote positive integers, with . Alon, Panigrahy and Yekhanin defined the following notion of a rigid set [5].
Definition (Rigid Set).
A set is rigid if for every subspace with dimension at most , some vector has Hamming distance at least from all vectors in , that is, .
We define rigid for nonintegral to mean rigid. It will be convenient to equate a set with a matrix by arranging vectors in as rows in in any order. If is rigid and , then the corresponding matrix is rigid in the usual sense: for any rank matrix , some row in contains at least nonzero entries. Hence, we may refer to rigid sets and rigid rectangular matrices interchangeably. A matrix in (or a set of dimensional vectors) is explicit if every entry can be computed in time.
A random matrix with will be
rigid with high probability for some constants
. The key challenge here is to construct explicit rigid matrices, because they provide circuit lower bounds for functions that can be described in polynomial time [35]. Alon, Panigrahy and Yekhanin [5] followed by Alon and Cohen [4] exhibit multiple examples of explicit matrices that are rigid with(1) 
where and is a constant. Note that when , the current best bound is . For , this amounts to , exponentially far from the ideal bounds (i.e., matching random constructions). It is an important open problem to improve the dependence on in Eq. 1 and to find other candidate sets that may be rigid with better parameters.
Our connection between rigidity and data structures arises via the inner product problem. The task is to preprocess a vector to compute inner products. The queries are specified by , which is called the query set. The data structure must compute the inner product of and any , that is, where denotes the coordinate of .
Consider the following model for solving this problem, known as a systematic linear data structure. During preprocessing, the data structure stores along with the evaluations of linear functions , where these inner products are single bits, and denote vectors in . To compute the answer on query , the data structure accesses these bits in addition to any entries of . That is, the linear functions are fixed, and the bits from may depend on and the linear functions. Finally, the query algorithm must output a linear function of these bits and the entries of . In this fashion it must be able to correctly compute for all queries . We note that a result of Jukna and Schnitger [19] shows that the vectors do not depend on without loss of generality. Letting denote the minimum value of the best data structure for this problem (over worstcase ), we formalize the model as follows.
Definition (Systematic Linear Model).
Let be a set. Define to be the maximum over all of the minimum sufficient to compute the inner product for every when only allowed to output a linear function of precomputed linear functions of along with any bits of .
Note that the model does not charge the query time for accessing the precomputed bits, even if . This coincides with the systematic model studied by Chakraborty, Kamma and Larsen [8].
1.2 Equivalence between rigidity and data structures
We prove that the rigidity of a set corresponds to the time complexity in the systematic linear data structure model. Some aspects of this result are implicit in prior work [19, 31], but no previous work seems to show this exact correspondence.
Theorem 1.
A set is rigid if and only if .
Proof.
We first prove that implies that is rigid. Assume for contradiction that there is an dimensional subspace such that for all . Let be the input data. Store along with the bits , where form a basis for . For every , there exists such that has Hamming weight less than . Using the redundant bits, the algorithm on query can compute by writing in terms of the stored basis vectors. Then, it computes by accessing fewer than coordinates of . Since , we have that , which is a contradiction.
We now prove that if is rigid, then . Let denote the standard basis, and let be the query time. We show that . Consider a systematic linear data structure whose redundant bits are given by . Let denote the span of . As is rigid, there exists with . When is the query, assume that the query algorithm accesses the bits for indices to compute . Now, define to be the span of . Observe that all points in are at distance at most from . Thus, . We will show that , which implies that . We claim that if , then the query algorithm makes an error. Since , there exists a vector with . Moreover, this vector can be taken to be orthogonal to so that for every . In other words, for every we have . Hence, the query algorithm sees the same values on input data and because it only accesses the input via vectors in , and we have . Thus, the algorithm on query must err either on input or because . ∎
1.3 Relationship to the cell probe model and other models
The systematic linear model specializes the systematic model [8, 17]. The latter model still stores the input data verbatim, and it also stores bits that can be precomputed from , where these need not be linear functions of the input data. The query time is if the query algorithm reads at most bits from to compute a query. The output can also be an arbitrary function of these bits along with the precomputed bits. The systematic linear model only makes sense for linear queries, whereas the systematic model applies to arbitrary query functions.
Yao’s cell probe model is the most general data structure model [40]. On input data , the data structure stores cells, containing bits that are arbitrary functions of . Here, is the word size and is the space. The query time is if the algorithm accesses at most cells to answer any query about from a set of possible query functions. There is a rich collection of lower bounds for this model (see e.g. [2, 15, 20, 26, 27, 28, 29]). The best lower bounds known are of the form
(2) 
where is the number of queries and is a constant. It is a longstanding problem to prove that for any explicit problem, even in the linear space regime .
A special case of the cell probe model is the linear model [1, 13]. The latter model stores linear functions of (implicitly is fixed). The query time is if the query algorithm reads at most of these bits to compute a query. The output is restricted to be a linear function of these bits. A distinguishing aspect between linear and systematic linear is that in the latter model, the query algorithm is not charged for accessing the precomputed bits. In Section 2, we compare the linear and systematic linear models in the context of rigidity and previous work [13].
Equivalences all the way down.
We note that the systematic data structure model is identical to the common bits model defined by Valiant [36]. CorriganGibbs and Kogan [11] demonstrate a relationship between the common bits model and a variant of the systematic model defined by Gal and Miltersen [17]. The common bits model is nothing but a certain depth two circuit, and the systematic linear model is simply the common bits model with the restriction that the common bits and output gates are linear functions [31]. Hence, in language of data structures, the linearization conjecture of Jukna and Schnitger posits that the systematic linear model is asymptotically as powerful as the systematic model for answering linear queries [19].
1.4 The vectormatrixvector problem
We now define the vectormatrixvector problem, which we call “the problem” for short. Let be a perfect square. After preprocessing a matrix , the goal is to output the binary value for vectors . It will be convenient to consider a matrix as an bit vector by concatenating consecutive rows. More formally, let , and for , set , where and satisfy and . Then, . In this way we consider the problem a special case of the inner product problem. The query set is the collection of rank one binary matrices. Let denote the set of vectors obtained from rank one binary matrices via , that is,
(3) 
This set has size .
A classic result of Artazarov, Dinic, Kronrod and Faradzev [6] provides a data structure with space , word size , and time . In fact, this algorithm operates in the linear cell probe model. It is a central open question to determine whether is necessary in linear space regime, that is, when .
The current best cell probe lower bound for the problem is due to Chattopadhyay, Koucký, Loff, and Mukhopadhyay [9]. Moreover, their lower bound holds for a randomized model with high error. For constants and , they prove that if for every matrix and every query , the query algorithm correctly computes with probability at least , then
(4) 
Better lower bounds for the problem are known in the systematic model. Chakraborty, Kamma, and Larsen [8] prove that and must satisfy as long as . In the case of , they prove that . As the systematic model subsumes the linear version of this model, combining their result with Theorem 1 implies that is rigid with
(5) 
1.5 New results on the rigidity of and the cell probe complexity of the problem
We lower bound the rigidity of , defined in Eq. 3. This also implies a lower bound in the systematic linear model. The proof is inspired by a result of Alon, Panigrahy, and Yekahnin [5].
Theorem 2.
Let . The set of rank one matrices is rigid with .
We improve the prior bound in Eq. 5 by an factor. For example, when , then , and when , then . Theorem 2 matches Eq. 1, the current best bound for explicit rigid sets. We do not know whether there is a subspace of linear dimension such that all elements of are at distance from (unlike for some set rigidity results, where the bounds are tight). As a corollary of Theorem 1, we immediately get that
In other words, we prove a lower bound for the problem in the systematic linear model that improves the prior bound by an factor. The proof of Theorem 2 appears in Section 3.
We also prove a general cell probe lower bound for the uMv problem in the high error regime. Our result improves the previous lower bound in Eq. 4. For example, in the linear space regime, when , we show that while the prior result gives only .
Theorem 3.
Let be a matrix. If a randomized data structure with space , word size , and time correctly computes queries for the problem with probability at least , then
where is a universal constant and .
The prior work utilizes a general lifting result for twoway communication complexity from parity decision trees
[9]. To obtain the improved bound, we use a variant of the cell sampling technique [21, 27] combined with a reduction to a new lower bound on oneway communication (via discrepancy). The modifications over standard techniques are needed to handle the high error regime for a binary output problem. We note that a recent result of Larsen, Weinstein and Yu also uses oneway communication to prove lower bounds for binary output problems for dynamic data structures [22]. However, their method seems limited to only handling zero error query algorithms. The proof of Theorem 3 appears in Section 4. Specifically, see Lemma 14 in Section 4 for the variant of cell sampling and see Theorem 11 in Section 4.1 for the discrepancy argument.2 Linear Data Structures and Rigidity
In this section, we relate linear data structures and rigidity. As linear data structures are a special case of the cell probe model, we may obtain rigidity lower bounds from strong enough static data structure lower bounds (when the queries are linear). We also compare with Dvir, Golovnev, and Weinstein, who exhibit a similar connection [13]. We first provide some notation.
Definition.
Let be a set. Define to be the maximum over all of the minimum sufficient to compute the inner product for every when the query algorithm’s output is a linear function of bits chosen from the precomputed linear functions of .
Table 1 provides a glimpse of our results on linear data structures along with a comparison to [13]. Recall that a set is explicit if each coordinate of an arbitrary element of the set can be computed in time. The prior work shows that sufficiently strong lower bounds against linear data structures will imply semiexplicit rigid sets. A bit more formally, consider a data structure query set of size for the inner product problem. They show the following: If for some constant , then there is a rigid set of size at most contained in , where . However, the set is only semiexplicit in that it is in – every element can be computed by a time algorithm with access to an oracle.
We now summarize a few differences between our work and [13]. Our result proves that polynomial lower bounds on the query time imply the existence of an explicit rigid set, which is in contrast to semiexplicit sets obtained by [13]. On the other hand, explicitness comes with a cost; when , we need much stronger data structure lower bounds to produce explicit rigid sets. When , the algorithm of [13] takes time with access to an NP oracle to compute an element of the semiexplicit rigid set. For problems such as the problem, this is super polynomial time. The rest of this section concerns proving the following theorem, which implies all of our results in Table 1.
Theorem 4.
Let and let of size be an explicit query set. There exists a set with size at most , whose elements can be computed in time. Moreover, if , then is explicit and rigid.
Note that for every , we have that . Hence, a sufficiently strong lower bound on for any will imply a rigidity lower bound. The following corollary shows the consequence of Theorem 4 for specific values of .
Corollary 5.
Let and let of size be an explicit query set. There exists a set with size at most , whose elements can be computed in time. Moreover,

If , then is explicit and rigid.

If for some , then is explicit and rigid.
Corollary 5(a) explains the first and last rows in Table 1, and Corollary 5(b) explains the middle row. Using Corollary 5(a) applied to with , we obtain that a lower bound of would imply the existence of an explicit set of size that is rigid. We note that it is an open question to prove .
2.1 Proof of Theorem 4
We already know the equivalence between systematic linear data structures and rigidity (from Theorem 1). Therefore, it is sufficient to design a linear data structure from a systematic linear data structure to relate the former with rigidity.
Proposition 6.
Let be a query set. If , then .
Proof.
Let be the input data, and let be the redundant bits stored by the systematic linear data structure. We now describe a linear data structure for with space and query time . The data structure stores , where are the standard basis vectors. The query algorithm on first accesses and then simulates the query algorithm of the systematic linear data structure on . Since the systematic linear data structure accesses at most bits from , we can conclude that the query time is at most . ∎
We prove that if a set contained in a dimensional space is rigid, then there is another rigid set which is contained in a dimensional space.
Lemma 7.
Let be positive integers. If is rigid of size , then there is a set of size at most that is rigid. Moreover, if is explicit, then each element of can be computed in time.
Proof.
Let and define by
for each . Additionally, if is not an integer, then define
otherwise set . Define . We claim that is rigid. Indeed, for the sake of contradiction assume that there is a subspace in of dimension such that all points in are at a distance less than from . Consider the subspace and project it to the first coordinates. Call this subspace , which has dimension . Now, the distance of each point in from is less than , which is a contradiction.
Regarding the explicitness of , it is clear that all coordinates of an element of correspond to some coordinate of a specific element of . Since is explicit, we can infer that each element of can be computed in . ∎
Proof of Theorem 4.
Since and , Proposition 6 implies that . Therefore by Theorem 1, we can conclude that is rigid. Lemma 7 implies that there exists a set that is rigid and the size of is at most . Moreover, every element of can be computed in time . Since , we can conclude that is explicit. ∎
3 Rigidity Lower Bounds for the Set of Rank One Matrices
Proposition 8.
For integers ,

.

if , then .
We will need a useful property about the distance of a point from a subspace.
Lemma 9.
Let be a subspace. For ,
Proof.
Let be the points in closest to and respectively. Since , we have
Note that is the number of ones in , which is at most the sum of the number of ones in and . Therefore,
A simple counting argument establishes the existence of a point that is far away in Hamming distance from a collection of large sized sets.
Lemma 10.
Let be subsets of , each of size at most . If , then there is a vector such that the Hamming distance of from each is at least .
Proof.
For every , define For any , the number of vectors in at a distance less than from is at most , where the inequality follows from Proposition 8. Hence Since
there is a such that for every . ∎
3.1 Proof of Theorem 2
Let be any dimensional subspace of , where is the smallest positive integer divisible by . We first define the inverse of . For every , define to be the matrix obtained by splitting into length consecutive blocks and stacking each of these blocks to form a matrix. Formally, is such that for every . Note that .
We provide a brief outline of the proof of Theorem 2. The first step of the proof is to produce a vector in that is at a distance of from and is low rank. The rank being low is helpful as we can express as the sum of a small number of rank one matrices. Lemma 9 will then imply the existence of a rank one matrix that is far away from . If we only cared about the existence of a vector that is far away from , Lemma 10 would suffice. To ensure that simultaneously the rank is small, we first project on to coordinates indexed by consecutive blocks each of length . Then we find a vector that is far away from all the projections, which is still guaranteed by Lemma 10. Concatenating with itself times has the property that its corresponding matrix is low rank.
Let . The goal is to find a such that and the rank of is at most . If , then define such that
for ; otherwise, define . By definition, the dimension of is at most , for every . Since and , we can infer that and . Lemma 10 implies the existence of a with the property that for every . Now define by
for all . In words, is the length vector that is the concatenation of copies of along with the vector of zeros of length . By the choice of , we get that,
Moreover, the rank of is at most Therefore we can express
for some . By Lemma 9, we know that
Hence there exists an such that . The observation that completes the proof of the theorem.
Remark (Extension to strong rigidity).
Alon and Cohen [4] defined the notion of strong rigidity; a set is strongly rigid if for every subspace of of dimension at most , the average distance of all the points to the subspace is at least . For strong rigidity, the best lower bounds known for explicit sets are also of the form given in Eq. 1. We can show that is strongly rigid with , matching the best strong rigidity bounds known for explicit sets. We sketch the proof here. We know that
where and are standard basis vectors in . This fact can be used to prove that the matrix corresponding to the set is a generator matrix of a 4query locally decodable code that tolerates a constant fraction of errors. A result of [13, Theorem 6] shows that Theorem 2 and the locally decodable code property of imply the strong rigidity of .
4 Cell Probe Lower Bounds for the Problem
We know of two techniques for proving cell probe lower bounds matching Eq. 2. One is a technique of Pǎtraşcu and Thorup [30] who combined the communication complexity simulation of Miltersen [25] with multiple queries on the same input data. The other is the technique we use, which is based on cell sampling. Cell sampling typically requires one to work with large sized fields in order to handle errors. This large field size is needed to encode a large subset of the correctly computed queries using a small subset of cells. Here, we avoid encoding the subset of queries by a reduction to oneway communication complexity.
Proof outline for Theorem 3.
By Yao’s minmax principle, it suffices to prove a lower bound on deterministic data structures. The hard distribution on the input data and query we use is given by sampling uniformly and independently at random. We prove the theorem by contradiction, and we start by assuming that the query time is small. The proof is carried out in three steps. First, modify the data structure so that for every , the fraction of queries correctly computed is at least . This modification only increases the query time and space by , and it can only increase the overall probability of the query algorithm being correct. Second, for a given , we use a variant of cell sampling (see Lemma 14) to obtain a small subset of cells and a large subset of queries such that all queries in can be computed by only accessing cells in . Moreover,
Third, we show that can be used to design an efficient protocol for the following communication game: Alice’s input is and Bob’s input is , and the goal is for Bob to correctly compute on a sufficiently good fraction of the inputs after receiving a message from Alice.
We now describe the protocol (see Figure 1). Alice sends the locations and contents of . This ensures that Bob correctly computes on a large fraction of queries in . Alice also communicates the majority value of for so that Bob is correct on half of his possible inputs that are not in . Overall, Bob’s output is correct on a sufficiently good fraction of all . Since we have assumed that the query time is small, we are able to show that Alice’s communication is small. This contradicts a lower bound on the communication complexity of this game. More precisely, we prove the following lower bound.
Theorem 11.
Suppose that Alice gets a uniformly random matrix
as input and Bob receives a uniform pair as input. If Alice sends a deterministic message to Bob and Bob computes such thatthen Alice must communicate at least bits.
Previously, in the randomized twoway communication setting, Chattopadhyay, Koucký, Loff, and Mukhopadhyay [9] proved a lower bound for the game given in Theorem 11. Their lower bound implies the lower bound in Theorem 11 against randomized protocols. We need a lower bound against deterministic protocols under the uniform distribution on the inputs, and we cannot use their theorem as a blackbox. We provide a straightforward proof of Theorem 11 in Section 4.1 by using the discrepancy method on a related communication game (resembling a direct sum, where Bob receives multiple inputs).
Preliminaries.
Before presenting the proof of Theorem 3, we define some notation. For a real valued function with a finite domain , . Similarly, for , . An argument in the proof of Theorem 3 requires an upper bound on the number of bits to encode the contents and locations of a subset of the cells, which is given by the following proposition.
Proposition 12.
Let be a subset of the cells of a data structure with word length and size . Then, the contents and locations of can be encoded in
Comments
There are no comments yet.