 # ETH-Hardness of Approximating 2-CSPs and Directed Steiner Network

We study the 2-ary constraint satisfaction problems (2-CSPs), which can be stated as follows: given a constraint graph G=(V,E), an alphabet set Σ and, for each {u, v}∈ E, a constraint C_uv⊆Σ×Σ, the goal is to find an assignment σ: V →Σ that satisfies as many constraints as possible, where a constraint C_uv is satisfied if (σ(u),σ(v))∈ C_uv. While the approximability of 2-CSPs is quite well understood when |Σ| is constant, many problems are still open when |Σ| becomes super constant. One such problem is whether it is hard to approximate 2-CSPs to within a polynomial factor of |Σ| |V|. Bellare et al. (1993) suggested that the answer to this question might be positive. Alas, despite efforts to resolve this conjecture, it remains open to this day. In this work, we separate |V| and |Σ| and ask a related but weaker question: is it hard to approximate 2-CSPs to within a polynomial factor of |V| (while |Σ| may be super-polynomial in |V|)? Assuming the exponential time hypothesis (ETH), we answer this question positively by showing that no polynomial time algorithm can approximate 2-CSPs to within a factor of |V|^1 - o(1). Note that our ratio is almost linear, which is almost optimal as a trivial algorithm gives a |V|-approximation for 2-CSPs. Thanks to a known reduction, our result implies an ETH-hardness of approximating Directed Steiner Network with ratio k^1/4 - o(1) where k is the number of demand pairs. The ratio is roughly the square root of the best known ratio achieved by polynomial time algorithms (Chekuri et al., 2011; Feldman et al., 2012). Additionally, under Gap-ETH, our reduction for 2-CSPs not only rules out polynomial time algorithms, but also FPT algorithms parameterized by |V|. Similar statement applies for DSN parameterized by k.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We study the 2-ary constraint satisfaction problems (2-CSPs), which can be stated as follows: given a constraint graph , an alphabet set and, for each edge , a constraint , the goal is to find an assignment that satisfies as many constraints as possible, where a constraint is said to be satisfied by if . Throughout the paper, we use to denote the number of variables , to denote the the alphabet size , and to denote the instance size .

Constraint satisfaction problems and their inapproximability have been studied extensively since the proof of the PCP theorem in the early 90’s [AS98, ALM98]. Most of the effort has been directed towards understanding the approximability of CSPs with constant arity and constant alphabet size, leading to a reasonable if yet incomplete understanding of the landscape [Hås01, Kho02, KKMO07, Rag08, AM09, Cha16]. When the alphabet size grows, the sliding scale conjecture of [BGLR93] predicts that the hardness of approximation ratio will grow as well, and be at least polynomial111Througout the paper, we use polynomial in (or ) to refer to for some real number . in the alphabet size . This has been confirmed for values of up to , see [RS97, AS97, DFK11]. Proving the same for that is polynomial in is the so-called polynomial sliding scale conjecture and is still quite open. Before we proceed, let us note that the aforementioned results of [RS97, AS97, DFK11] work only for arity strictly larger than two and, hence, do not imply inapproximability for 2-CSPs. We will discuss the special case of 2-CSPs in details below.

The polynomial sliding scale conjecture has been approached from different angles. In [DHK15] the authors try to find the smallest arity and alphabet size such that the hardness factor is polynomial in , and in [Din16] the conjecture is shown to follow (in some weaker sense) from the Gap-ETH hypothesis, which we discuss in more details later. In this work we focus on yet another angle, which is to separate and and ask whether it is hard to approximate constant arity CSPs to within a factor that is polynomial in (but possibly not polynomial in ). Observe here that obtaining -hardness of factor is likely to be as hard as obtaining one with ; this is because CSPs can be solved exactly in time , which means that, unless is contained in subexponential time (i.e. ), -hard instances of CSPs must have .

This motivates us to look for hardness of approximation from assumptions stronger than . Specifically, our result will be based on the Exponential Time Hypothesis (ETH), which states that no subexponential time algorithm can solve 3-SAT (see Conjecture 1). We show that, unless ETH fails, no polynomial time algorithm can approximate 2-CSPs to within an almost linear ratio in , as stated below. This is almost optimal since there is a straightforward -approximation for any 2-CSP, by simply satisfying all constraints that touch the variable with highest degree.

[Main Theorem] Assuming ETH, for any constant , no algorithm can, given a 2-CSP instance with alphabet size and variables such that the constraint graph is the complete graph on the variables, distinguish between the following two cases in polynomial time:

• (Completeness) , and,

• (Soundness) .

Here denotes the maximum fraction of edges satisfied by any assignment.

To paint a full picture of how our result stands in comparison to previous results, let us state what is know about the approximability of 2-CSPs; due to the vast literature regarding 2-CSPs, we will focus only the regime of large alphabets which is most relevant to our setting. In terms of -hardness, the best known inapproximability ratio is for every constant ; this follows from Moshkovitz-Raz PCP [MR10] and the Parallel Repetition Theorem for the low soundness regime [DS14]. Assuming a slightly weaker assumption that is not contained in quasipolynomial time (i.e. ), 2-CSP is hard to approximate to within a factor of for every constant ; this can be proved by applying Raz’s original Parallel Repetition Theorem [Raz98] to the PCP Theorem. In [Din16], the author observed that running time for parallel repetition can be reduced by looking at unordered sets instead of ordered tuples. This observation implies that222In [Din16], only the Gap-ETH-hardness result is stated. However, the ETH-hardness result follows rather easily., assuming ETH, no polynomial time -approximation algorithm exists for 2-CSPs for some constant . Moreover, under Gap-ETH (which will be stated shortly), it was shown that, for every sufficiently small , an -approximation algorithm must run in time . Note that, while this latest result comes close to the polynomial sliding scale conjecture, it does not quite resolve the conjecture yet. In particular, even the weak form of the conjecture postulates that there exists for which no polynomial time algorithm can approximate 2-CSPs to within factor of the optimum. This statement does not follow from the result of [Din16]. Nevertheless, the Gap-ETH-hardness of [Din16] does imply that, for any , no polynomial time algorithm can approximate 2-CSPs to within a factor of .

In all hardness results mentioned above, the constructions give 2-CSP instances in which the alphabet size is smaller than the number of variables . In other words, even if we aim for an inapproximability ratio in terms of instead of , we still get the same ratios as stated above. Thus, our result is the first hardness of approximation for 2-CSPs with factor. Note again that our result rules out any polynomial time algorithm and not just -time algorithm ruled out by [Din16]. Moreover, our ratio is almost linear in whereas the result of [Din16] only holds for that is sufficiently small depending on the parameters of the Gap-ETH Hypothesis.

An interesting feature of our reduction is that it produces 2-CSP instances with the alphabet size that is much larger than . This is reminiscence of the setting of 2-CSPs parameterized by the number of variables . In this setting, the algorithm’s running time is allowed to depend not only polynomially on but also on any function of (i.e. running time for some function ); such algorithm is called a fixed parameter tractable (FPT) algorithm parameterized by . The question here is whether this added running time can help us approximate the problem beyond the factor achieved by the straightforward algorithm. We show that, even in this parameterized setting, the trivial algorithm is still essentially optimal (up to lower order terms). This result holds under the Gap Exponential Time Hypothesis (Gap-ETH), a strengthening of ETH which states that, for some , even distinguishing between a satisfiable 3-CNF formula and one which is not even -satisfiable cannot be done in subexponential time (see Conjecture 1), as stated below.

Assuming Gap-ETH, for any constant and any function , no algorithm can, given a 2-CSP instance with alphabet size and variables such that the constraint graph is the complete graph on the variables, distinguish between the following two cases in time:

• (Completeness) , and,

• (Soundness) .

To the best of our knowledge, the only previous inapproximability result for parameterized 2-CSPs is from [CFM17]. There the authors showed that, assuming Gap-ETH, no -approximation -time algorithm exists; this is shown via a simple reduction from parameterized inapproximbability of Densest- Subgraph from [CCK17] (which is in turn based on a construction from [Man17]). Our result is a direct improvement over this result.

We end our discussion on 2-CSPs by noting that several approximation algorithms have also been devised for 2-CSPs with large alphabets [Pel07, CHK11, KKT16, MM17, CMMV17]. In particular, while our results suggest that the trivial algorithm achieves an essentially optimal ratio in terms of , non-trivial approximation is possible when we measure the ratio in terms of instead of : specifically, a polynomial time -approximation algorithm is known [CHK11].

##### Direct Steiner Network.

As a corollary of our hardness of approximation results for 2-CSPs, we obtain an inapproximability result for Directed Steiner Network with polynomial ratio in terms of the number of demand pairs. In the Directed Steiner Network (DSN) problem (sometimes referred to as the Directed Steiner Forest problem [FKN12, CDKL17]), we are given an edge-weighed directed graph and a set of demand pairs and the goal is to find a subgraph of with minimum weight such that there is a path in from to for every . DSN was first studied in the approximation algorithms context by Charikar  [CCC99] who gave a polynomial time -approximation algorithm for the problem. This ratio was later improved to for every by Chekuri  [CEGS11]. Later, a different approximation algorithm with similar approximation ratio was proposed by Feldman  [FKN12].

Algorithms with approximation ratios in terms of the number of vertices have also been devised [FKN12, BBM13, CDKL17, AB17]. In this case, the best known algorithm is that of Berman  [BBM13], which yields an -approximation for every constant in polynomial time. Moreover, when the graph is unweighted (i.e. each edge costs the same), Abboud and Bodwin recently gave an improved -approximation algorithm for the problem [AB17].

On the hardness side, there exists a known reduction from 2-CSP to DSN that preserves approximation ratio to within polynomial factor333That is, for any non-decreasing function , if DSN admits -approximation in polynomial time, then 2-CSP also admits -approximation polynomial time for some absolute constant . [DK99]. Hence, known hardness of approximation of 2-CSPs translate immediately to that of DSN: it is -hard to approximate to within any polylogarithmic ratio [MR10, DS14], it is hard to approximate to within factor for every unless  [Raz98], and it is Gap-ETH-hard to approximate to within factor [Din16]. Note that, since is always bounded above by , all these hardness results also hold when is replaced by in the ratios. Recently, this reduction was also used by Chitnis  [CFM17] to rule out -FPT-approximation algorithm for DSN parameterized by assuming Gap-ETH. Alas, none of these hardness results achieve ratios that are polynomial in either or and it remains open whether DSN is hard to approximate to within a factor that is polynomial in or in .

By plugging our hardness result for 2-CSPs into the reduction, we immediately get ETH-hardness and Gap-ETH-hardness of approximating DSN to within a factor of as stated below.

Assuming ETH, for any constant , there is no polynomial time -approximation algorithm for DSN.

Assuming Gap-ETH, for any constant and any function , there is no -time -approximation algorithm for DSN.

In other words, if one wants a polynomial time approximation algorithm with ratio depending only on and not on , then the algorithms of Chekuri  [CEGS11] and Feldman  [FKN12] are roughly within a square of the optimal algorithm. To the best of our knowledge, these are the first inapproximability results of DSN whose ratios are polynomial in terms of . Again, Corollary 1 is a direct improvement over the FPT inapproximability result from [CFM17] which, under the same assumption, rules out only -factor FPT-approximation algorithm.

#### Agreement tests

Our main result is proved through an agreement testing argument. In agreement testing there is a universe , a collection of subsets , and for each subset we are given a local function . A pair of subsets are said to agree if their local functions agree on every element in the intersection. The goal is, given a non-negligible fraction of agreeing pairs, to deduce the existence of a global function that (approximately) coincides with many of the local functions. For a more complete description see [DK17].

Agreement tests capture a natural local to global statement and are present in essentially all PCPs, for example they appear explicitly in the line vs. line and plane vs. plane low degree tests [RS96, AS97, RS97]. Our reduction is based on a combinatorial agreement test, where the universe is and the subsets have elements each and are “in general position”, namely they behave like subsets chosen independently at random. A convenient feature about this setting is that every pair of subsets intersect.

Since we are aiming for a large gap, the agreement test must work (i.e., yield a global function) with a very small fraction of agreeing pairs, which in our case is close to .

In this small agreement regime the idea, as pioneered in the work of Raz-Safra [RS97], is to zero in on a sub-collection of subsets that is (almost) perfectly consistent. From this sub-collection it is easy to recover a global function and show that it coincides almost perfectly with the local functions in the sub-collection. A major difference between our combinatorial setting and the algebraic setting of Raz-Safra is the lack of “distance” in our case: we can not assume that two distinct local functions differ on many points (in contrast, this is a key feature of low degree polynomials). We overcome this by considering different “strengths” of agreement, depending on the fraction of points on which the two subsets agree. This notion too is present in several previous works on combinatorial agreement tests [IKW12, DN17]. would be nice to formulate and prove an agreement theorem in the full version

##### Hardness of Approximation through Subexponential Time Reductions.

Our result is one of the many results in recent years that show hardness of approximation via subexponential time reductions [AIM14, BKW15, Rub16b, DFS16, Din16, BKRW17, MR17, Man17, Rub16a, Rub16b, Rub17, ARW17, CCK17, KLM18, Rub18, BGKM18]. These results are often based on the Exponential Time Hypothesis (ETH) and its variants. Proposed by Impagliazzo and Paturi [IP01], ETH can be formally stated as follows:

[Exponential Time Hypothesis (ETH) [IP01]] There exists a constant such that no algorithm can decide whether any given 3-CNF formula is satisfiable in time where denotes the number of clauses444The original conjecture states the lower bound as exponential in terms of the number of variables not clauses. However, thanks to the sparsification lemma [IPZ01], it is by now known that the two versions are equivalent..

A crucial ingredient in most, but not all555The exceptions are [Rub16b, ARW17, Rub18, KLM18, Che18] in which gaps are not created via the PCP Theorem., reductions in this line of work is a nearly-linear size PCP Theorem. For the purpose of our work, the PCP Theorem can be viewed as a polynomial time transformation of a 3-SAT instance to another 3-SAT instance that creates a gap between the YES and NO cases. Specifically, if is satisfiable, remains satisfiable. On the other hand, if is unsatisfiable, then is not only unsatisfiable but it is also not even -satisfiable for some constant (i.e. no assignment satisfies fraction of clauses). The “nearly-linear size” part refers to the size of the new instance compared to that of . Currently, the best known dependency in this form of the PCP Theorem between the two sizes is quasi-linear (i.e. with a polylogarithmic blow-up), as stated below.

[Quasi-Linear Size PCP [BS08, Din07]] For some constants , there is a polynomial time algorithm that, given any 3-CNF formula with clauses, produces another 3-CNF formula with clauses such that

• (Completeness) if , then , and,

• (Soundness) if , then , and,

• (Bounded Degree) each variable in appears in at most clauses.

The aforementioned ETH-hardness of approximation proofs typically proceed in two steps. First, the PCP Theorem is invoked to reduce a 3-SAT instance of size to an instance of the gap version of 3-SAT of size . Second, the gap version of 3-SAT is reduced in subexponential time to the problem at hand. As long as the reduction takes time , we can obtain hardness of approximation result for the latter problem. This is in contrast to proving -hardness of approximation for which a polynomial time reduction is required.

Another related but stronger version of ETH that we will also employ is the Gap Exponential Time Hypothesis (Gap-ETH), which states that even the gap version of 3-SAT cannot be solved in subexponential time:

[Gap Exponential Time Hypothesis (Gap-ETH) [Din16, MR16]] There exist constants such that no algorithm can, given any 3-CNF formula such that each of its variable appears in at most clauses666This bounded degree assumption can be assumed without loss of generality; see [MR16] for more details., distinguish between the following two cases777Note that when satisfies neither case (i.e. ), the algorithm is allowed to output anything. in time time where denotes the number of clauses:

• (Completeness) .

• (Soundness) .

By starting with Gap-ETH instead of ETH, there is no need to apply the PCP Theorem and hence a polylogarithmic loss in the size of the 3-SAT instance does not occur. As demonstrated in previous works, this allows one to improve the ratio in hardness of approximation results [Din16, MR16, Man17] and, more importantly, it can be used to prove inapproximability results for some parameterized problems [BEKP15, CCK17, CFM17]888While [BEKP15] states that the assumption is the existence of a linear-size PCP, Gap-ETH clearly suffices there., which are not known to be hard to approximate under ETH. Specifically, for many parameterized problems, the reduction from the gap version of 3-SAT to the problem has size for some function that grows to infinity with (i.e. ), where is the number of clauses in the 3-CNF formula and is the parameter of the problem. For simplicity, let us focus on the case where . If one wishes to derive a meaningful result starting form ETH, must be subexponential in terms of , the number of clauses in the original (no-gap) 3-CNF formula. This means that the term must dominate the factor blow-up from the PCP Theorem. However, since FPT algorithms are allowed to have running time of the form for any function , we can pick to be . In this case, the algorithm runs in superexponential time in terms of and we cannot deduce anything regarding the algorithm. On the other hand, if we start from Gap-ETH, we can pick to be a large constant independent of , which indeed yields hardness of the form claimed in Theorem 1 and Corollary 1.

Finally, we remark that Gap-ETH would follow from ETH if a linear-size (constant-query) PCP exists. While constructing short PCPs has long been an active area of research [BGH06, BS08, Din07, MR10, BKK16], no linear-size PCP is yet known. On the other hand, there are some supporting evidences for the hypothesis. For instance, it is known that the natural relaxation of 3-SAT in the Sum-of-Squares hierarchy cannot refute Gap-ETH [Gri01, Sch08]. Moreover, Applebaum recently showed that the hypothesis follows from certain cryptographic assumptions [App17]. For a more in-depth discussion on Gap-ETH, please refer to [Din16].

##### Organization of the Paper.

In the next section, we describe our reduction and give an overview of the proof. Then, in Section 3, we define additional notions and state some preliminaries. We proceed to provide the full proof of our main agreement theorem in Section 4. Using this agreement theorem, we deduce the soundness of our reduction in Section 5. We then plug in the parameters and prove the inapproximability results for 2-CSPs in Section 6. In Section 7, we show how the hardness of approximation result for 2-CSPs imply inapproximability for DSN as well. Finally, we conclude our work with some discussions and open questions in Section 8.

## 2 Proof Overview

Like other (Gap-)ETH-hardness of approximation results, our proof is based on a subexponential time reduction from the gap version of 3-SAT to our problem of interest, 2-CSPs. Before we describe our reduction, let us define more notations for 2-CSPs and 3-SAT, to facilitate our explanation.

2-CSPs. For notational convenience, we will modify the definition of 2-CSPs slightly so that each variable is allowed to have different alphabets; this definition is clearly equivalent to the more common definition used above. Specifically, an instance of 2-CSP now consists of (1) a constraint graph , (2) for each vertex (or variable) , an alphabet set , and, (3) for each edge , a constraint . Additionally, to avoid confusion with 3-SAT, we refrain from using the word assignment for 2-CSPs and instead use labeling, i.e., a labeling of is a tuple such that for all . An edge is said to be satisfied by a labeling if . The value of a labeling , denoted by , is defined as the fraction of edges that it satisfies, i.e., . The goal of 2-CSPs is to find with maximum value; we denote the such optimal value by , i.e., .

3-SAT. An instance of 3-SAT consists of a variable set and a clause set where each clause is a disjunction of at most three literals. For any assignment , denotes the fraction of clauses satisfied by . The goal is to find an assignment that satisfies as many clauses as possible; let denote the fraction of clauses satisfied by such assignment. For each , we use to denote the set of variables whose literals appear in . We extend this notation naturally to sets of clauses, i.e., for every , .

### Our Construction

Before we state our reduction, let us again reiterate the objective of our reduction. Roughly speaking, given a 3-SAT stance , we would like to produce a 2-CSP instance such that

• (Completeness) If , then ,

• (Soundness) If , then where is number of variables of ,

• (Reduction Time) The time it takes to produce should be where ,

where is some absolute constant.

Observe that, when plugging a reduction with these properties to Gap-ETH, we directly arrive at the claimed inapproximability for 2-CSPs. However, for ETH, since we start with a decision version of 3-SAT without any gap, we have to first invoke the PCP theorem to produce an instance of the gap version of 3-SAT before we can apply our reduction. Since the shortest known PCP has a polylogarithmic blow-up in the size (see Theorem 1), the running time lower bound for gap 3-SAT will not be exponential anymore, rather it will be of the form instead. Hence, our reduction will need to produce in time. As we shall see later in Section 6, this will also be possible with appropriate settings of parameters.

With the desired properties in place, we now move on to state our reduction. In addition to a 3-CNF formula , the reduction also takes in a collection of subsets of clauses of . For now, the readers should think of the subsets in as random subsets of

where each element is included in each subset independently at random with probability

, which will be specified later. As we will see below, we only need two simple properties that the subsets in are “well-behaved” enough and we will later give a deterministic construction of such well-behaved subsets. With this in mind, our reduction can be formally described as follows.

[The Reduction] Given a 3-CNF formula and a collection of subsets of , we define a 2-CSP instance as follows:

• The graph is the complete graph where the vertex set is , i.e., and .

• For each , the alphabet set is the set of all partial assignments to that satisfies every clause in , i.e., .

• For every , is included in if and only if they are consistent, i.e., .

Let us now examine the properties of the reduction. The number of vertices in is . For the purpose of the proof overview, should be thought of as whereas should be thought of as much larger than (e.g. . For such value of , all random sets in will have size w.h.p., meaning that the reduction time is as desired.

Moreover, when is satisfiable, it is not hard to see that ; more specifically, if is the assignment that satisfies every clause of , then we can label each vertex of by , the restriction of on . Since satisfies all the clauses, satisfies all clauses in , meaning that this is a valid labeling. Moreover, since these are restrictions of the same global assignment , they are all consistent and every edge is satisfied.

Hence, we are only left to show that, if , then ; this is indeed our main technical contribution. We will show this by contrapositive: assuming that , we will “decode” back an assignment to that satisfies fraction of clauses.

### 2.1 Soundness Analysis as an Agreement Theorem

Our task at hand can be viewed as agreement testing. Informally, in agreement testing, the input is a collection of local functions where is a collection of subsets of some universe such that, for many pairs and , and agree, i.e., for all . An agreement theorem says that there must be a global function that coincides (exactly or approximately) with many of the local functions, and thus explains the pairwise “local” agreements. In our case, a labeling with high value is exactly a collection of functions such that, for many pairs of and , and agrees. The heart of our soundness proof is an agreement theorem that recovers a global function that approximately coincides with many of the local functions ’s and thus satisfies fraction of clauses of . To discuss the agreement theorem in more details, let us define several additional notations, starting with those for (approximate) agreements of a pair of functions:

For any universe , let and be any two functions whose domains are subsets of . We use the following notations for (dis)agreements of these two functions:

• Let denote the number of that and disagree on, i.e., .

• For any , we say that and are -consistent if , and we say that the two functions are -inconsistent otherwise. For , we sometimes drop 0 and refer to these simply as consistent and inconsistent (instead of 0-consistent and 0-inconsistent).

• We use and as shorthands for -consistency and -inconsistency respectively. Again, for , we may drop 0 from the notations and simply use and .

Next, we define the notion of agreement probability for any collection of functions:

For any and any collection of functions, the -agreement probability, denoted by is the probability that is -consistent with where and are chosen independently uniformly at random from , i.e., . When , we will drop 0 from the notation and simply use .

Our main agreement theorem, which works when each is a large “random” subset, says that, if is noticeably large, then there exists a global function that is approximately consistent with many of the local functions in . This is stated more precisely (but still informally) below.

[Informal; See Theorem 4] Let be a collection of independent random -element subsets of . The following holds with high probability: for any and any collection of functions such that , there exist a function and a subcollection of size such that for all .

To see that Theorem 2.1 implies our soundness, let us view a labeling as a collection where and is simply . Now, when is large, is large as well. Moreover, while the sets are not random subsets of variables but rather variable sets of random subsets of clauses, it turns out that these sets are “well-behaved” enough for us to apply Theorem 2.1. This yields a global function that are -consistent with many ’s. Note that, if instead of -consistency we had exact consistency, then we would have been done because must satisfy all clauses that appear in any such that is consistent with ; since there are many such ’s and these are random sets, indeed satisfies almost all clauses. A simple counting argument shows that this remains true even with approximate consistency, provided that most clauses appear in at least a certain fraction of such ’s (an assumption which holds for random subsets). Hence, the soundness of our reduction follows from Theorem 2.1, and we devote the rest of this section to outline an overview of its proof.

Optimality of the parameters of Theorem 2.1. Before we proceed to the overview, we would like to note that the size of the subcollection in Theorem 2.1 is nearly optimal. This is because, we can partition into subcollections each of size and, for each , randomly select a global function and let each be the restriction of to for each . In this way, we have and any global function can be (approximately) consistent with at most local functions. This means that can be of size at most in this case and, up to a multiplicative factor, Theorem 2.1 yields almost a largest possible .

### 2.2 A Simplified Proof: δ\geqsko(1)/k1/2 Regime

We now sketch the proof of Theorem 2.1. Before we describe how we can find when , let us sketch the proof assuming a stronger assumption that . Note that this simplified proof already implies a factor ETH-hardness of approximating 2-CSPs. In the next subsection, we will then proceed to refine the arguments to handle smaller values of .

Let us consider the consistency graph of . This is the graph whose vertex set is and there is an edge between and if and only if and are consistent. Note that the number of edges in is equal to , where the subtraction of comes from the fact that includes the agreement of each set and itself (whereas does not).

Previous works on agreement testers exploit particular structures of the consistency graph to decode a global function. One such property that is relevant to our proof is the notion of almost transitivity defined by Raz and Safra in the analysis of their test [RS97]. More specifically, a graph is said to be -transitive for some if, for every non-edge (i.e. ), and can share at most common neighbors999In [RS97], the transitivity parameter is used to denote the fraction of vertices that are neighbors of both and rather than the number of such vertices as defined here. However, the latter notion will be more convenient for us.. Raz and Safra showed that their consistency graph is -transitive where denotes the number of vertices of the graph. They then proved a generic theorem regarding -transitive graphs that, for any such graph, its vertex set can be partitioned so that the subgraph induced by each partition is a clique and that the number of edges between different partitions is small. Since a sufficiently large clique corresponds to a global function in their setting, they can then immediately deduce that their result.

Observe that, in our setting, a large clique also corresponds to a global function that is consistent with many local functions. In particular, suppose that there exists of size sufficiently large such that induces a clique in . Since ’s are perfectly consistent with each other for all , there is a global function that is consistent with all such ’s. Hence, if we could show that our consistency graph is -transitive, then we could use the same argument as Raz and Safra’s to deduce our desired result. Alas, our graph does not necessarily satisfy this transitivity property; for instance, consider any two sets and let be such that they disagree on only one variable, i.e., there is a unique such that . It is possible that, for every that does not contain , agrees with both and ; in other words, every such can be a common neighbor of and . Since each variable appears roughly in only fraction of the sets, there can be as many as common neighbors of and even when there is no edge between and !

Fortunately for us, a weaker statement holds: if and disagree on more than variables (instead of just one variable as above), then and have at most common neighbors in . Here should be thought of as times a small constant which will be specified later. To see why this statement holds, observe that, since every is a random subset that includes each clause with probability , Chernoff bound implies that, for every subcollection of size , contains all but fraction of variables. Let denote the set of common neighbors of and . It is easy to see that and can only disagree on variables that do not appear in . If is of size , then contains all but fraction of variables, which means that and disagrees only on fraction of variables. By selecting the constant appropriately inside , we arrive at the claim statement.

In other words, while the transitive property does not hold for every edge, it holds for the edges where and are -inconsistent. This motivates us to define a two-level consistency graph, where the edges with -inconsistent are referred to as the red edges whereas the original edges in is now referred to as the blue edges. We define this formally below.

[Red/blue Graph] A red-blue graph is an undirected graph where its edge set is partitioned into two sets , the set of red edges, and , the set of blue edges. We use the prefixes “blue-” and “red-” to refer to the quantities of the graph and respectively; for instance, is said to be a blue-neighbor of if .

[Two-Level Consistency Graph] Given a collection of functions and a real number , the two-level consistency graph is a red-blue graph defined as follows.

• The vertex set is simply .

• The blue edges are the consistent pairs , i.e., .

• The red edges are the -inconsistent pairs , i.e., .

Note that constitute neither a blue nor a red edge when .

Now, the transitivity property we argue above can be stated as follows: for every red-edge of , there are at most different ’s such that both and are blue edges. For brevity, let us call any red-blue graph -red/blue-transitive if, for every red edge , and have at most common blue-neighbors. We will now argue that in any -red/blue-transitive of average blue-degree , there exists a subset of size such that only fraction of pairs of vertices in are red edges.

Before we prove this, let us state why this is useful for decoding the desired global function . Observe that such a subset of vertices in the two-level consistency graph translates to a subcollection such that, for all but fraction of pairs of sets , does not form a red edge. Recall from definition of red edges that, for such , and disagrees on at most variables. In other words, is similar to a clique in the (not two-level) consistency graph, except that (1) fraction of pairs are allowed to disagree on as many variables as they like, and (2) even for the rest of pairs, the guarantee now is that they agree on all but at most variables, instead of total agreement as in the previous case of clique. Fortunately, this still suffices to find that is -consistent with functions. One way construct such a global function is to simply assign each according to the majority of for all such that . (This is formalized in Section 4.3.) Note that in our case and . Hence, if we pick and , we indeed get a global function that is -consistent with local functions.

We now move on to sketch how one can find such an “almost non-red subgraph”. For simplicity, let us assume that every vertex has the same blue-degree (i.e. is -regular). Let us count the number of red-blue-blue triangle (or rbb triangle), which is a 3-tuple of vertices in such that are blue edges whereas is a red edge. An illustration of a rbb triangle can be found in Figure 0(a). The red/blue transitivity can be used to bound the number of rbb triangles as follows. For each , since the graph is -red/blue-transitive there are at most rbb triangle with and . Hence, in total, there can be at most rbb triangles. As a result, there exists such that the number of rbb triangles such that is at most . Let us now consider the set that consists of all blue-neighbors of . There can be at most red edges with both endpoints in because each such edge corresponds to a rbb triangle with . From our assumption that every vertex has blue degree , we indeed have that and that the fraction of pairs of vertices in that are linked by red edges is as desired. This completes our overview for the case . this last part may be shortened or skipped perhaps

### 2.3 Towards δ=ko(1)/k Regime

To handle smaller , we need to first understand why the approach above fails to work for . To do so, note that the above proof sketch can be summarized into three main steps:

1. Show that the two-level consistency graph is -red/blue-transitive for some .

2. Use red/blue transitivity to find a large subgraph of with few induced red edges.

3. Decode a global function from such an “almost non-red subgraph”.

The reason that we need , or equivalently , lies in Step 2. Although not stated as such earlier, our argument in this step can be described as follows. We consider all length-2 blue-walks, i.e., all such that and are both blue edges, and, using the red/blue transitivity of the graph, we argue that, for almost of all these walks, is not a red edge (i.e. is not a rbb triangle), which then allows us to find an almost non-red subgraph. For this argument to work, we need the number of length-2 blue-walks to far exceed the number of rbb triangles. The former is whereas the latter is bounded above by in -red/blue-transitive graphs. This means that we need , which implies that .

To overcome this limitation, we instead consider all length- blue-walks for and we will define a “rbb-triangle-like” structure on these walks. Our goal is again to show that this structure appears rarely in random length- blue-walks and we will then use this to find a subgraph that allows us to decode a good assignment for . Observe that the number of length- blue walks is . We also hope that the number of “rbb-triangle-like” structures is still small; in particular, we will still get a similar bound for such generalized structure, similar to our previous bound for the red-blue-blue triangles. When this is the case, we need , meaning that when it suffices to select , which yields factor inapproximability as desired. To facilitate our discussion, let us define notations for -walks here.

[-Walks] For any red/blue graph and any integer , an -blue-walk in is an -tuple of vertices such that every pair of consecutive vertices are joined by a blue edge, i.e., for every . For brevity, we sometimes refer to -blue walks simply as -walks. We use to denote the set of all -walks in .

Note here that a vertex can appears multiple times in a single -walk.

One detail we have yet to specify in the proof is the structure that generalizes the rbb triangle for -walks where . Like before, this structure will enforce the two end points of the walk to be joined by a red edge, i.e., . Additionally, we require every pair of non-consecutive vertices to be joined by a red edge. We call such a walk a red-filled -walk (see Figure 0(b)):

[Red-Filled -Walks] For any red/blue graph , a red-filled -walk is an -walk such that every pair of non-consecutive vertices is joined by a red edge, i.e., for every such that . Let denote the set of all red-filled -walks in . Moreover, for every , let denote the set of all red-filled -walks from to , i.e., .

As mentioned earlier, we will need a generalized transitivity property that works not only for rbb triangles but also for our new structure, i.e. the red-filled -walks. This can be defined analogously to -red/blue transitivity as follows.

[-Red/Blue Transitivity] For any positive integers , a red/blue graph is said to be -red/blue-transitive if, for every pair of vertices that are joined by a red edge, there exists at most red-filled -walks starting at and ending at , i.e., .

Using a similar argument to before, we can show that, when consists of random subsets where each element is included in a subset with probability , the two-level agreement graph is -red/blue transitive for some parameter that is a function of only and . When and are small enough in terms of , can made to be . (The full proof can be found in Section 4.1.1.)

Once this is proved, it is not hard (using a similar argument as before) to show that, when , most -walks are not red-filled, i.e., . Even with this, it is still unclear how we can get back a “clique-like” subgraph; in the case of above, this implies that a blue-neighborhood induces few red edges, but the argument does not seem to generalize to larger . Fortunately, it is still quite easy to find a large subgraph that a non-trivial fraction of pairs of vertices do not form red edges; specifically, we will find two subsets each of size such that for at least fraction of , is not a red edge. To find such sets, observe that, if , then for a random the probability that there exists non-consecutive vertex in the walk that are joined by a red edge is at least . Since there are less than such , union bound implies that there must be non-consecutive such that the probability that are not joined by a red edge is at least . Let us assume without loss of generality that ; since they are not consecutive, we have .

Let us consider . By a simple averaging argument, there must be and such that, conditioning on and , the probability that is at least . However, this conditional probability is exactly equal to fraction of that and are not joined by a red edge. Recall again that is used to denote the set of all blue-neighbors of . Thus, and are the sets with desired property.

We are still not done yet since we have to use these sets to decode back the global function . This is still not obvious: the guarantee we have for our sets is rather weak since we only know that at least of the pairs of vertices from the two sets do not form red edges. This is in contrast to the case where we have a subgraph such that almost all induced edges are not red.

To see how to overcome this barrier, recall that a pair that does not form a red edge corresponds to . As a thought experiment, let us think of the following scenario: if instead of just -consistency, these pairs satisfy (exact) consistency, then we can consider the collection where . This is a collection of local functions such that . Thus, when , we are in the regime where , meaning that we can apply our earlier argument (for the regime) to recover !

The approach in the previous paragraph of course does not work directly because we only know that fraction of the pairs are -consistent, not exactly consistent. However, we can still try to mimic the proof in the regime and define a red/blue graph in such a way that such -consistent pairs are now blue edges. Naturally, the red edges will now be the -inconsistent pairs for some . In other words, we consider the generalized two-level consistency graph defined as follows.

[Generalized Two-Level Consistency Graph] Given a collection of functions and two real numbers , the generalized two-level consistency graph