1 Introduction
In this work, we consider the streaming approximability of various Boolean constraint satisfaction problems, and we begin by defining these terms. See [CGSV21boolean, §1.12] for more details on the definitions.
1.1 Setup: The streaming approximability of Boolean CSPs
Boolean CSPs.
Let be a Boolean function. In an variable instance of the problem , a constraint is a pair , where is a tuple of distinct indices, and is a negation pattern.
For Boolean vectors
, let denote their coordinatewise product . An assignment satisfies iff , where is the tuple (i.e., satisfies iff ). An instance of consists of a list of constraints; the value of an assignment to is the fraction of constraints in satisfied by ; and the value of an instance is the maximum value of any assignment .Approximations to CSPs.
For , we consider the problem of approximating . In this problem, the goal of an algorithm is to, on input an instance
, output an estimate
such that with probability at least
, . For , we also consider the closely related approximation to problem (denoted for short). In this problem, the algorithm’s input instance is promised to either have value or value , and the goal is to decide which is the case with probability at least . By standard arguments, the minimum approximation ratio for equals the infimum of over all such that is approximable (see [CGSV21boolean, Proposition 2.10] for details).Streaming and sketching algorithms for CSPs.
For various Boolean functions , we consider algorithms which attempt to approximate instances in the (singlepass, insertiononly) space streaming setting. Such algorithms can only use space (which is ideally small, such as , where is the number of variables in an input instance), and, when given as input a CSP instance , can only read the list of constraints in a single, lefttoright pass.
We also consider a (seemingly) weak class of streaming algorithms for CSPs called space sketching algorithms, which are composable in the following sense. After seeing each constraint in the stream, the algorithm’s state (a string in ) is a sketch. If is the set of possible stream elements, the algorithm must provide a compression function encoding stream elements to “sketches” and a combination function for combining pairs of sketches such that:

Given current state and new stream element , the new state of the streaming algorithm is .

For two streams , we have (where denotes concatenation).
Requirement (2) represents the key feature of sketching algorithms: can be used to “compose” the results of the streaming algorithm run on and separately. ( and can be designed jointly using underlying shared randomness.) A special case of sketching algorithms are the linear sketches, where each sketch (i.e., element of ) encodes an element of a vector space and performs vector addition.
This paper.
The main goal of this paper is to explicitly determine closedform expressions for the optimal sketching approximation ratios for various MaxCSPs of interest such as .
1.2 Prior work and motivations
1.2.1 Prior results on streaming and sketching
We first give a brief review of what is already known about the streaming and sketching approximability of . For , denote . Note that the problem has a trivial approximation given by simply outputting . We refer to a function as approximationresistant for some class of algorithms (e.g., streaming or sketching algorithms with some space bound) if it cannot be approximated for any constant (equivalently, the algorithms cannot solve the problem for any constant ). Otherwise, we refer to as approximable for the class of algorithms.
The first two CSPs whose space streaming approximabilities were resolved were Max2XOR and Max2AND. Kapralov, Khanna, and Sudan [KKS15] and Kogan and Krauthgamer [KK15] concurrently showed that Max2XOR is approximationresistant to space streaming algorithms. Later, Chou, Golovnev, and Velusamy [CGV20], building on earlier work of Guruswami, Velusamy, and Velingker [GVV17], gave an space linear sketch which approximates Max2AND for every and showed that approximations require space, even for streaming algorithms.
In two recent works [CGSV21boolean, CGSV21finite], Chou, Golovnev, Sudan, and Velusamy proved socalled dichotomy theorems for sketching CSPs. [CGSV21boolean] dealt with CSPs over Boolean alphabets with negations, while [CGSV21finite] dealt with the more general case of CSPs over finite alphabets.^{1}^{1}1More precisely, [CGSV21boolean] and [CGSV21finite] both consider the more general case of CSPs defined by families of functions of a specific arity. We do not need this generality for the purposes of our paper, and therefore omit it.
[CGSV21boolean] is most relevant for our purposes, as it concerns Boolean CSPs. For a fixed constraint function , [CGSV21boolean]’s main result is a dichotomy theorem in the following sense: For any , either

has an space linear sketching algorithm, or

For all , sketching algorithms for require space.
We will defer stating the technical condition which distinguishes cases (1) and (2) until Section 2.1 (see also the discussion in Section 1.4.1), but do mention that [CGSV21boolean] extends the lower bound (case 2) to streaming algorithms when special objects called padded onewise pairs
exist (whose definition we also defer). The padded onewise pair case is sufficient to recover all previous streaming approximability results for Boolean functions (i.e.,
[KK15, KKS15, CGV20]), and prove several new ones. In particular, [CGSV21boolean] proves that if has the property that there exists such that (which they term “supporting onewise independence”), then is streaming approximationresistant. [CGSV21finite] uses analogous tools to recover streaming approximationresistance of MaxUniqueGames (proven earlier by [GT19]) and prove approximationresistance of several new problems, e.g., .^{2}^{2}2Indeed, Chou, Golovnev, Sudan, Velingker, and Velusamy [CGS+21] recently proved that some of these “nice” functions are streaming approximationresistant even in space, building on Kapralov and Krachun’s work [KK19] for Max2XOR. However, neither [CGSV21boolean] nor [CGSV21finite] explicitly analyze any new approximable problems, since Max2AND’s approximability had already been established by [GVV17, CGV20].1.2.2 Questions from previous work
In this work, we address several major questions which [CGSV21boolean] leaves unanswered:

Can we use [CGSV21boolean]’s dichotomy theorem to find closedform sketching approximability ratios for approximable problems beyond 2AND?

[CGSV21boolean] implies the following “trivial upper bound” on streaming approximability: for all , , as observed later in [CGS+21, §1.3]. How tight is this upper bound?

Does [CGSV21boolean]’s streaming lower bound — i.e., the “padded onewise pair” criterion — suffice to resolve the streaming approximability of every function?

[CGSV21boolean, Proposition 2.10] gives an approximation algorithm for , where is the infimum of such that the distinguishing problem is hard to sketch for every . However, this approximation algorithm requires running a “grid” of distinguishers for distinguishing problems in parallel. Can we give more simple and useful approximation algorithms?
1.3 Results
We study the questions in Section 1.2.2 through the lens of CSPs defined by symmetric Boolean functions. A set defines a symmetric function , which on input is the indicator for (where is ’s Hamming weight, i.e., its number of ’s). The simplest symmetric functions are and the threshold functions .
The sketching approximability of .
[CGV20] showed that (which holds even for streaming algorithms), but for , nothing was known prior to this work.
We give a closedform resolution of the sketching approximability of for every . For odd , define the constant
In Section 4, we prove the following theorem:
Theorem 1.1.
For odd , , and for even , .
For instance, . Theorem 1.1 also has the following corollary, recalling that :
Corollary 1.2.
.
Interestingly, [CGSV21boolean]’s trivial upper bound shows that ; [CGS+21] later improved this hardness to the space streaming setting (which is optimal up to logarithmic factors). Hence Corollary 1.2 implies that is an “asymptotically optimally streamingapproximable” function and [CGSV21boolean]’s trivial upper bound is tight for a function family of interest.
The sketching approximability of other symmetric functions.
In Section 5, we resolve the streaming approximability of a number of other symmetric Boolean functions. Specifically, in Section 5.1, we resolve the approximability of the functions for even :
Theorem 1.3.
For even , .
We also provide partial results for where is odd in Section 5.2, including closed forms for small and an asymptotic result:
Theorem 1.4 (Informal version of Theorem 5.9).
For odd , is the root of a quadratic in .
Corollary 1.5.
For odd , the limit of is as .
Finally, in Section 5.3, we explicitly resolve fifteen other cases (e.g., and ).
Simple approximation algorithms for threshold functions.
[CGV20]’s optimal approximation for 2AND, like [GVV17]’s earlier approximation, is based on measuring a quantity called the bias of an instance , denoted , which is defined as follows: The bias of variable is the absolute value of the difference between the number of positive and negative appearances of in , and .^{3}^{3}3[GVV17, CGV20] did not normalize by . In the sketching setting, can be estimated using standard norm sketching algorithms [Ind06, KNW10] (see Theorem 2.10 below).
In Section 3, we show that when is a threshold function, has a very simple biasbased approximation:
Theorem 1.6.
Let be a threshold function. Then for every , there exists a piecewise linear function and a constant such that the following is a sketching approximation for : On input , compute an estimate for up to a multiplicative error and output .
Our construction generalizes [CGV20]’s analysis for 2AND to all threshold functions, and is also a simplification, since [CGV20]’s algorithm computes a more complicated function of (see creftype 3.3 below).
For all CSPs we study in this paper (Sections 5.3 and 4), we apply an analytical technique which we term the “maxmin method;” see the discussion in Section 1.4 below. For these functions, our algorithm has interesting implications beyond the sketching setting for the classical problem of outputting an approximately optimal assignment (instead of simply deciding whether one exists). Indeed, we describe a simple lineartime algorithm for this problem achieving the same approximation factor as our sketching algorithm:
Corollary 1.7 (Informal version of Corollary 3.4).
Let be a function for which the maxmin method applies, such as (for any ) or (for any even ). Then there exists a constant such that following algorithm, on input , outputs an assignment which is approximately optimal in expectation: Assign every variable to if it occurs more often positively than negatively, and otherwise, and then flip each variable’s assignment independently with probability .
This algorithm can be derandomized using universal hash families (following the recent argument of Biswas and Raman [BR21] for [CGV20]’s algorithm).
Sketching vs. streaming approximability.
In Section 6, we show that [CGSV21boolean]’s techniques cannot resolve the streaming approximability of Max3AND. That is, while Theorem 1.1 implies , [CGSV21boolean] cannot show that streaming algorithms cannot outperform this limitation. However, they do give an almosttight bound:
Theorem 1.8 (Informal version of Theorem 6.1 + creftype 6.5).
[CGSV21boolean]’s padded onewise pair criterion is not strong enough to show that there is no space streaming approximation for 3AND for any ; however, it does rule out approximations.
Separately, Theorem 1.3 implies that , and the padded onewise pair criterion can be used to show that approximating requires space in the streaming setting (see creftype 5.5 below).
1.4 Techniques: the maxmin method
Next, we give more background on the technical aspects of [CGSV21boolean]’s dichotomy theorem and the novel aspects of our analysis which allow us to obtain closedform expressions for for various functions of interest.
1.4.1 Background from [CGSV21boolean]
Fix a constraint function and let denote the set of all distributions on the set . An element can be viewed as a weighted instance of on variables (where the constraint with negation pattern has weight ). For a distribution , let denote ’s vector of marginals.
Morally, [CGSV21boolean]’s dichotomy theorem states that “all sketching algorithms can do is measure the marginals of their input instances, and to design algorithms it suffices to reduce to the variable case.” Indeed, for any pair , [CGSV21boolean] defines a set of weighted “satisfiable” instances and a set of “unsatisfiable” instances . The [CGSV21boolean] dichotomy theorem then states that if there exists and such that , cannot be solved by space sketching algorithms, and otherwise, can be solved by space linear sketching algorithms. (See Definition 2.1 below for the definitions of and , and Theorem 2.2 for the formal statement of the dichotomy theorem.) Thus, [CGSV21boolean]’s dichotomy theorem implies that can be approximated by space linear sketches, but not approximated by space sketches, where
(1.9) 
Let denote the set of symmetric distributions over . In order to find a closed form for , [CGSV21boolean, Example 1] makes the observation that since 2AND is symmetric, it suffices WLOG to only consider symmetric distributions . Distributions can be represented using instead of probabilities and have scalar marginals , reducing the dimension of the optimization problem. Thus, [CGSV21boolean] considers the following more convenient optimization problem:
where and are the smallest and largest , respectively, for which there exists some distribution with the appropriate marginal in the appropriate or set. [CGSV21boolean] then shows that is a linear function of , while can be written as
where is a multivariate polynomial which is linear in the probabilities of and quadratic in . Finally, [CGSV21boolean] calculates a closed form for using the quadratic formula and uses this to determine .
1.4.2 Our contribution: The maxmin method
When is a general symmetric function, applying similar ideas to [CGSV21boolean]’s 2AND analysis leads us to consider the set of symmetric distributions over ; each element is described by instead of probabilities and has a scalar marginal . We can analogously define and . We calculate that is a piecewise linear function of (Lemma 2.8 below), while now involves a supremum over of a function which is linear in and degree in (Lemma 2.9 below). Thus, for , to the best of our knowledge the “[CGSV21boolean]style analysis” of explicitly calculating becomes impractical (as it involves working reasoning about the maxima of a generic degree polynomial).
Instead, we introduce a slightly different formulation of calculating , parametrized now by :
(1.10) 
We view optimizing directly over as an important conceptual switch. In particular, our formulation emphasizes the calculation of as the centrally difficult feature (as opposed to first calculating and then optimizing over all ), yet we can still take advantage of the easiness of calculating .
A priori, calculating still involves maximizing a degree polynomial. To get around this difficulty, we have a crucial insight, which was not noticed by [CGSV21boolean] even in the 2AND case. If minimizes the righthand side of Eq. 1.10, and maximizes , the maxmin inequality gives
(1.11) 
The righthand side of Eq. 1.11 is relatively easy to calculate, being a ratio of a linear and piecewise linear function of . Our insight is that, in a wide variety of cases, the quantity on the righthand side of Eq. 1.11 serendipitously equals ; that is, is a saddle point of .^{4}^{4}4This term comes from the optimization literature; such points are also said to satisfy the “strong maxmin property” (see, e.g., [BV04, pp. 115, 238]). The saddlepoint property is guaranteed by von Neumann’s minimax theorem for functions which are concave and convex in the first and second arguments, respectively, but this theorem and the generalizations we are aware of do not apply even to . This yields a novel technique, which we call the “maxmin method”, for resolving the sketching approximability of : find and , and then show that has a saddle point at . For instance, in Section 4, in order to give a closed form for for odd (i.e., the odd case of Theorem 1.1), we construct by placing all the probability mass on strings of Hamming weight (all of which are equally likely), set , and prove by analyzing the right hand side of the appropriate instantiation of Eq. 1.11. While we initially found this pattern for by numerically investigating small odd , in Section 4, we use the maxmin method to provide an analytical proof for all odd . We use similar techniques to the cases of for even (also Theorem 1.1), for even (Theorem 1.3, proved in Section 5.1), and several other cases in Section 5.
In all of these cases, the we construct is supported on at most two distinct Hamming weights, which is the property which makes finding tractable (using computer assistance). However, this technique is not a “silver bullet”: it is not the case that the sketching approximability of every symmetric Boolean CSP can be exactly calculated by finding the optimal supported on two elements and using the maxmin method. Indeed, (as mentioned in Section 5) we verify using computer assistance that this is not the case for .
Finally, we remark that the saddlepoint property is precisely what defines the value required for our simple classical algorithm for outputting approximately optimal assignments for where is a threshold function (see Corollary 3.4 below).
1.5 Related work
The classical approximability of has been the subject of intense study, both in terms of algorithms [GW95, FG95, Zwi98, Tre98alg, TSSW00, Has04, Has05] and hardnessofapproximation [Has01, Tre98hardness, ST98, ST00, EH08, ST09]. Currently, the best results appear to be as follows: Hast [Has05] constructed a approximation to ; Engebretsen and Holmerin [EH08] proved that it is hard to approximate , and this was improved by Samorodnitsky and Trevisan [ST09] to under the unique games conjecture, matching [Has05]’s algorithm up to logarithmic factors.
Interestingly, recalling that as , in the large limit our simple sketching algorithm (given by Theorem 1.6) matches the performance of Trevisan [Tre98alg]’s parallelizable LPbased algorithm for , which (to the best of our knowledge) was the first work on the general problem! (The subsequent works [Has04, Has05] superseding [Tre98alg] used more complex techniques involving SDPs and random restrictions.)
1.6 Future directions
In this paper, we introduce the maxmin method and use it to resolve the streaming approximability of a wide variety of symmetric Boolean CSPs (including multiple infinite families). However, these techniques are in a sense “ad hoc,” as they require numerically solving the intended optimization problem with computer assistance. We conjecture that the maxmin method applies for all symmetric Boolean CSPs. We also hope to develop new techniques for finding and in a wider variety of cases (including those where is not supported on two elements).
Separately, Theorem 6.1 proves that [CGSV21boolean]’s streaminghardness classification is incomplete and establishes resolving the streaming approximability of Max3AND as a significant frontier problem.
Code
Our Mathematica code, which we use for calculations primarily in Sections 6 and 5, is available online on Github at https://gist.github.com/singerng/48f1e28e1dc671319ad75578fb45c0f0.
2 Preliminaries
2.1 Definitions and results from [CGSV21boolean]
Let us begin by defining the sets and . For , let
where is with probability and with probability , i.e., is the probability that a “noisy” random assignment from satisfies . Then we have:
Definition 2.1 (The sets and ).
Let and . Then
and
The main result of [CGSV21boolean] is a dichotomy theorem for sketching approximations to CSPs based on the marginals of distributions in and :
Theorem 2.2 (Sketching dichotomy theorem, [CGSV21boolean, Theorem 2.3]).
For every function and for every , the following hold:

If there exist such that , then for every , every sketching algorithm for requires space.

If not, then the admits a linear sketching algorithm that uses space.
[CGSV21boolean] also defines the following condition on pairs , stronger than , which implies hardness of for streaming algorithms:
Definition 2.3 (Padded onewise pairs, [CGSV21boolean, §2.3]).
A pair of distributions forms a padded onewise pair if there exists and distributions , , and such that (1) and (2) and .
Theorem 2.4 (Streaming lower bound for padded onewise pairs, [CGSV21boolean, Theorem 2.11]).
For every function and for every , if there exists a padded onewise pair of distributions and then, for every ,  requires space in the streaming setting.
2.2 Setup for the symmetric case
Recall, we denote by the symmetric function which is the indicator for its input having Hamming weight , and denotes the set of symmetric distributions on . When studying the approximability of , we restrict WLOG to the case where every element of is larger than , since if contains elements and , not necessarily distinct, then supports onewise independence and is therefore streaming approximationresistant.
Given , we define its symmetrization as the symmetric distribution given by randomly permuting a random element of . Then following proposition lets us restrict to examining symmetric distributions in and for the purposes of determining the sketching and streaming approximability of symmetric functions.
Proposition 2.5.
Let be a symmetric function. For , suppose that there exists with . Then there exists symmetric with . Moreover, if is a padded onewise pair, then so is .
We typically write a distribution as a vector (where each is the total probability mass on strings of Hamming weight , which we refer to as the “mass on level ”, but use to indicate an element drawn from according to the induced distribution.
The following proposition encapsulates the optimization problem arising from calculating the sketching approximability of a symmetric function according to [CGSV21boolean]:
Proposition 2.6.
Let be such that every element is larger than . Then
where
and where for ,
for ,
for ,
and for and ,
Moreover, we have the following explicit formulae for , and :
Lemma 2.7.
For any ,
where for each .
Proof.
Use linearity of expectation; the contribution of weight vectors to is . ∎
Lemma 2.8.
Let , and let be its smallest element and its largest element (they need not be distinct). Then for ,
(which also equals ).
Proof.
Note that for any , . We handle the cases separately.
Case 1: .
Our strategy is to reduce to being supported on while preserving the marginal and (possibly weakly) increasing the value of .
Consider the following operation on distributions: For , increase by , increase by , and set to zero. Note that this results in a new distribution with the same marginal, since
Given an initial distribution , we can apply this operation to zero out for by redistributing to and , preserving the marginal and only increasing the value of (since while ). Similarly, we can redistribute to and , and to and . Thus, we need only consider supported on , which we assume WLOG are distinct.
We have
Substituting and multiplying through by , we have
defining , we can rearrange to get . Then given , we can zero out and , decrease by , and correspondingly increase by . This fixes the marginal since
and can only increase .
Thus, it suffices to consider supported only on ; setting and , which has marginal and the desired .
Case 2: .
We simply construct with and ; we have and .
Case 3: .
Following the symmetric logic to Case 1, we consider supported on and set and , yielding and . ∎
Lemma 2.9.
For any and , we have
Comments
There are no comments yet.