DeepAI

# On the Hardness of Compressing Weights

We investigate computational problems involving large weights through the lens of kernelization, which is a framework of polynomial-time preprocessing aimed at compressing the instance size. Our main focus is the weighted Clique problem, where we are given an edge-weighted graph and the goal is to detect a clique of total weight equal to a prescribed value. We show that the weighted variant, parameterized by the number of vertices n, is significantly harder than the unweighted problem by presenting an O(n^3 - ε) lower bound on the size of the kernel, under the assumption that NP ⊈ coNP/poly. This lower bound is essentially tight: we show that we can reduce the problem to the case with weights bounded by 2^O(n), which yields a randomized kernel of O(n^3) bits. We generalize these results to the weighted d-Uniform Hyperclique problem, Subset Sum, and weighted variants of Boolean Constraint Satisfaction Problems (CSPs). We also study weighted minimization problems and show that weight compression is easier when we only want to preserve the collection of optimal solutions. Namely, we show that for node-weighted Vertex Cover on bipartite graphs it is possible to maintain the set of optimal solutions using integer weights from the range [1, n], but if we want to maintain the ordering of the weights of all inclusion-minimal solutions, then weights as large as 2^Ω(n) are necessary.

• 24 publications
• 1 publication
• 13 publications
02/09/2020

### Optimal polynomial-time compression for Boolean Max CSP

In the Boolean maximum constraint satisfaction problem - Max CSP(Γ) - on...
10/24/2018

### A Maximum Edge-Weight Clique Extraction Algorithm Based on Branch-and-Bound

The maximum edge-weight clique problem is to find a clique whose sum of ...
05/17/2020

### Optimal Bounds for the k-cut Problem

In the k-cut problem, we want to find the smallest set of edges whose de...
02/18/2020

### Fixed-Parameter Tractability of the Weighted Edge Clique Partition Problem

We develop an FPT algorithm and a kernel for the Weighted Edge Clique Pa...
10/01/2019

### Polynomial-Time Preprocessing for Weighted Problems Beyond Additive Goal Functions

Kernelization is the fundamental notion for polynomial-time prepocessing...
04/26/2022

### MaxSAT with Absolute Value Functions: A Parameterized Perspective

The natural generalization of the Boolean satisfiability problem to opti...
06/15/2022

### Faster Decomposition of Weighted Graphs into Cliques using Fisher's Inequality

Mining groups of genes that consistently co-express is an important prob...

## 1 Introduction

A prominent class of problems in algorithmic graph theory consist of finding a subgraph with certain properties in an input graph , if one exists. Some variations of this problem can be solved in polynomial time (detecting a triangle), while the general problem is NP-complete since it generalizes the Clique problem. In recent years, there has been an increasing interest in understanding the complexity of such subgraph detection problems in weighted graphs, where either the vertices or the edges are assigned integral weight values, and the goal is either to find a subgraph of a given form which optimizes the total weight of its elements, or alternatively, to find a subgraph whose total weight matches a prescribed value.

Incorporating weights in the problem definition can have a significant effect on computational complexity. For example, determining whether an unweighted -vertex graph has a triangle can be done in time  (where  is the exponent of matrix multiplication) [16], while for the analogous weighted problem of finding a triangle of minimum edge-weight, no algorithm of running time  is known for any . Some popular conjectures in fine-grained complexity theory even postulate that no such algorithms exist [31]. Weights also have an effect on the best-possible exponential running times of algorithms solving NP-hard problems: the current-fastest algorithm for the NP-complete Hamiltonian Cycle problem in undirected graphs runs in time  [4], while for its weighted analogue, Traveling Salesperson, no algorithm with running time  is known for general undirected graphs (cf. [24]).

In this work we investigate how the presence of weights in a problem formulation affects the compressibility and kernelization complexity of NP-hard problems. Kernelization is a subfield of parameterized complexity [7, 10] that investigates how much a polynomial-time preprocessing algorithm can compress an instance of an NP-hard problem, without changing its answer, in terms of a chosen complexity parameter.

For a motivating example of kernelization, we consider the Vertex Cover problem. For the unweighted variant, a kernelization algorithm based on the Nemhauser-Trotter theorem [26] can efficiently reduce an instance  of the decision problem, asking whether  has a vertex cover of size at most , to an equivalent one  consisting of at most  vertices, which can therefore be encoded in  bits via its adjacency matrix. In the language of parameterized complexity, the unweighted Vertex Cover problem parameterized by the solution size  admits a kernelization (self-reduction) to an equivalent instance on  bits. For the weighted variant of the problem, where an input additionally specifies a weight threshold  and a weight function  on the vertices, and the question is whether there is a vertex cover of size at most  and weight at most , the guarantee on the encoding size of the reduced instance is weaker. Etscheid et al. [11, Thm. 5] applied a powerful theorem of Frank and Tardös [13] to develop a polynomial-time algorithm to reduce any instance  of Weighted Vertex Cover to an equivalent one with  edges, which nevertheless needs  bits to encode due to potentially large numbers occurring as vertex weights. The Weighted Vertex Cover problem, parameterized by solution size , therefore has a kernel of  bits.

The overhead in the kernel size for the weighted problem is purely due to potentially large weights. This led Etscheid et al. [11] to ask in their conclusion whether this overhead in the kernelization sizes of weighted problems is necessary, or whether it can be avoided. As one of the main results of this paper, we will prove a lower bound showing that the kernelization complexity of some weighted problems is strictly larger than their unweighted counterparts.

#### Our results

We consider an edge-weighted variation of the Clique problem, parameterized by the number of vertices :

Exact-Edge-Weight Clique (EEWC) Input: An undirected graph , a weight function , and a target . Question: Does have a clique of total edge-weight exactly , i.e., a vertex set  such that  for all distinct  and such that ?

Our formulation of EEWC does not constrain the cardinality of the clique. This formulation will be convenient for our purposes, but we remark that by adjusting the weight function it is possible to enforce that any solution clique  has a prescribed cardinality. Through such a cardinality restriction we can obtain a simple reduction from the problem with potentially negative weights to equivalent instances with weights from , by increasing all weights by a suitably large value and adjusting  according to the prescribed cardinality. Note that an instance of EEWC can be reduced to an equivalent one where  has all possible edges, by simply inserting each non-edge with a weight of . Hence the difficulty of the problem stems from achieving the given target weight  as the total weight of the edges spanned by , not from the requirement that  must be a clique.

EEWC is a natural extension of Zero-Weight Triangle [1], which has been studied because it inherits fine-grained hardness from both 3-Sum [33] and All Pairs Shortest Paths [30, Footnote 3]. EEWC has previously been considered by Abboud et al. [2] as an intermediate problem in their W[1]-membership reduction from -Sum to -Clique. Vassilevska-Williams and Williams [33] considered a variation of this problem with weights drawn from a finite field. The related problem of detecting a triangle of negative edge weight is central in the field of fine-grained complexity for its subcubic equivalence [32] to All Pairs Shortest Paths. Another example of an edge-weighted subgraph detection problem with an exact requirement on the weight of the target subgraph is Exact-Edge-Weight Perfect Matching, which can be solved using algebraic techniques [23, §6] and has been used as a subroutine in subgraph isomorphism algorithms [22, Proposition 3.1].

The unweighted version of EEWC, obtained by setting all edge weights to , is NP-complete because it is equivalent to the Clique problem. When using the number of vertices  as the complexity parameter, the problem admits a kernelization of size  obtained by simply encoding the instance via its adjacency matrix. We prove the following lower bound, showing that the kernelization complexity of the edge-weighted version is a factor  larger. The lower bound even holds against generalized kernelizations (see Definition 2).

The Exact-Edge-Weight Clique problem parameterized by the number of vertices  does not admit a generalized kernelization of  bits for any , unless .

Intuitively, the lower bound exploits the fact that the weight value of each of the  edges in the instance may be a large integer requiring bits to encode. We also provide a randomized kernelization which matches this lower bound.

There is a randomized polynomial-time algorithm that, given an -vertex instance  of Exact-Edge-Weight Clique, outputs an instance  of bitsize , in which each number is bounded by , that is equivalent to

with probability at least

. Moreover, if the input is a YES-instance, then the output is always a YES-instance.

The proof is based on the idea that taking the weight function modulo a random prime preserves the answer to the instance with high probability. We adapt the argument by Harnik and Naor [14] that it suffices to pick a prime of magnitude . As a result, each weight can be encoded with just bits.

It is noteworthy that the algorithm above can produce only false positives, therefore instead of using randomization we can turn it into a co-nondeterministic algorithm which guesses the correct values of the random bits. The framework of cross-composition excludes not only deterministic kernelization, but also co-nondeterministic [9], thus the lower bound from Theorem 1 indeed makes the presented algorithm tight.

Together, Theorems 1 and 1 pin down the kernelization complexity of Exact-Edge-Weight Clique, and prove it to be a factor  larger than for the unit-weight case. For Clique, the kernelization of  bits due to adjacency-matrix encoding cannot be improved to  for any , as was shown by Dell and van Melkebeek [9].

We extend our results to the hypergraph setting, which is defined as follows: given a -regular hypergraph () with non-negative integer weights on the hyperedges, and a target value , test if there is a vertex set  for which each size- subset is a hyperedge (so that  is a hyperclique) such that the sum of the weights of the hyperedges contained in  is exactly . By a bootstrapping reduction using Theorem 1, we prove that Exact-Edge-Weight -Uniform Hyperclique does not admit a generalized kernel of size  for any  unless , while the randomized hashing technique yields a randomized kernelization of size .

We can view the edge-weighted (-hyper)clique problem on  as a weighted constraint satisfaction problem (CSP) with weights from

, by introducing a binary variable for each vertex, and a weighted constraint for each subset

of  vertices, which is satisfied precisely when all variables for  are set to true. If  is a (hyper)edge  then the weight of the constraint on  equals the weight of ; if  is not a hyperedge of , then the weight of the constraint on  is set to  to prevent all its vertices from being simultaneously chosen. Under this definition,  has a (hyper)clique of edge-weight  if and only if there is an assignment to the variables for which the total weight of satisfied constraints is . Via this interpretation, the lower bounds for EEWC yield lower bounds on the kernelization complexity of weighted variants of CSP. We employ a recently introduced framework [18] of reductions among different CSPs whose constraint languages have the same maximum degree  of their characteristic polynomials, to transfer these lower bounds to other CSPs (see Section 3.3 for definitions). We obtain tight kernel bounds when parameterizing the exact-satisfaction-weight version of CSP by the number of variables, again using random prime numbers to obtain upper bounds. Our lower bounds for Exact-Edge-Weight -Uniform Hyperclique transfer to all CSPs with degree . In degree-1 CSP each constraint depends on exactly one variable, therefore its exact-weighted variant is equivalent to the Subset Sum problem, for which we also provide a tight lower bound.

[] Subset Sum parameterized by the number of items  does not admit a generalized kernelization of size  for any , unless .

Theorem 1 tightens a result of Etscheid et al. [11, Theorem 14], who ruled out (standard) kernelizations for Subset Sum of size  assuming the Exponential Time Hypothesis. Our reduction, conditioned on the incomparable assumption , additionally rules out generalized kernelizations that compress into an instance of a potentially different problem. Note that the new lower bound implies that the input data in Subset Sum cannot be efficiently encoded in a more compact way, whereas the previous lower bound relies on the particular way the input is encoded in the natural formulation of the problem. On the other hand, a randomized kernel of size is known [14].

The results described so far characterize the kernelization complexity of broad classes of weighted constraint satisfaction problems in which the goal is to find a solution for which the total weight of satisfied constraints is exactly equal to a prescribed value. We also broaden our scope and investigate the maximization or minimization setting, in which the question is whether there is a solution whose cost is at least, or at most, a prescribed value. Some of our upper-bound techniques can be adapted to this setting: using a procedure by Nederlof, van Leeuwen and de Zwaan [25] a maximization problem can be reduced to a polynomial number of exact queries. This leads, for example, to a Turing kernelization (cf. [12]) for the weight-maximization version of -Uniform Hyperclique which decides an instance in randomized polynomial time using queries of size  to an oracle for an auxiliary problem. We do not have lower bounds in the maximization regime.

In an attempt to understand the relative difficulty of obtaining an exact target weight versus maximizing the target weight, we finally investigate different models of weight reduction for the Weighted Vertex Cover problem studied extensively in earlier works [6, 11, 25]. We consider the problem on bipartite graphs, where an optimal solution can be found in polynomial time, but we investigate whether a weight function can be efficiently compressed while either preserving (a) the collection of minimum-weight vertex covers, or (b) the relative ordering of total weight for all inclusion-minimal vertex covers. We give a polynomial-time algorithm for case (a) which reduces to a weight function with range  using a relation to -matchings, but show that in general it is impossible to achieve (b) with a weight function with range , by utilizing lower bounds on the number of different threshold functions.

#### Organization

We begin with short preliminaries with the crucial definitions. We prove our main Theorem 1 in Section 3 by presenting a cross-composition of degree 3 into Exact-Edge-Weight Clique and employing it to obtain kernelization lower bounds for -uniform hypergraphs for . This section also contains the kernelization lower bound for Subset Sum as well as generalization of these results to Boolean CSPs. Next, in Section 4 we focus on bipartite Weighted Vertex Cover and the difficulty of compressing weight functions. The proofs of statements marked with are located in the appendix. The kernel upper bounds, including the proof of Theorem 1, together with Turing kernelization for maximization problems, are collected in Appendix B.

## 2 Preliminaries

We denote the set of natural numbers including zero by , and the set of positive natural numbers by . For positive integers  we define . For a set  and integer  we denote by  the collection of all size- subsets of . All logarithms we employ have base . Given a set and a weight function , for a subset we denote .

All graphs we consider are undirected and simple. A (standard) graph  has a vertex set  and edge set . For , a -uniform hypergraph  consists of a vertex set  and a set of hyperedges , that is, each hyperedge is a set of exactly  vertices. Hence a -uniform hypergraph is equivalent to a standard graph. A clique in a -uniform hypergraph  is a vertex set  such that for each  we have : each possible hyperedge among the vertices of  is present. A vertex cover for a graph  is a vertex set  containing at least one endpoint of each edge. A vertex cover is inclusion-minimal if no proper subset is a vertex cover.

#### Parameterized complexity

A parameterized problem  is a subset of , where is a finite alphabet.

Let be parameterized problems and let be a computable function. A generalized kernel for into of size is an algorithm that, on input , takes time polynomial in and outputs an instance such that:

1. and are bounded by , and

2. if and only if .

The algorithm is a kernel for if . It is a polynomial (generalized) kernel if is a polynomial.

[Linear-parameter transformations] Let and be parameterized problems. We say that is linear-parameter transformable to , if there exists a polynomial-time computable function , such that for all , (a) if and only if and (b) . The function is called a linear-parameter transformation.

We employ a linear-parameter transformation for proving the lower bound for Subset Sum. For other lower bounds we use the framework of cross-composition [5] directly.

[Polynomial equivalence relation, [5, Def. 3.1]] Given an alphabet , an equivalence relation on is called a polynomial equivalence relation if the following conditions hold.

There is an algorithm that, given two strings , decides whether and belong to the same equivalence class in time polynomial in .

For any finite set the equivalence relation partitions the elements of into a number of classes that is polynomially bounded in the size of the largest element of .

[Degree- cross-composition] Let be a language, let be a polynomial equivalence relation on , and let be a parameterized problem. A degree-d OR-cross-composition of into with respect to is an algorithm that, given instances of belonging to the same equivalence class of , takes time polynomial in and outputs an instance such that:

the parameter is bounded by , where is some constant independent of , and

if and only if there is an such that .

[[5, Theorem 3.8]] Let be a language that is NP-hard under Karp reductions, let be a parameterized problem, and let be a real number. If  has a degree- OR-cross-composition into and parameterized by has a polynomial (generalized) kernelization of bitsize , then .

## 3 Kernel lower bounds

### 3.1 Exact-Edge-Weight Clique

In this section we show that Exact-Edge-Weight Clique parameterized by the number of vertices in the given graph does not admit a generalized kernel of size , unless . We use the framework of cross-composition to establish a kernelization lower bound [5]. We will use the NP-hard Red-Blue Dominating Set (RBDS) as a starting problem for the cross-composition. Observe that RBDS is NP-hard because it is equivalent to Set Cover and Hitting Set [20].

Red-Blue Dominating Set (RBDS) Input: A bipartite graph with a bipartition of  into sets  (red vertices) and  (blue vertices), and a positive integer . Question: Does there exist a set with such that every vertex in has at least one neighbor in ?

The following lemma forms the heart of the lower bound. It shows that an instance of EEWC on  vertices can encode the logical OR of a sequence of  instances of size  each. Roughly speaking, this should be interpreted as follows: when , each of the roughly  edge weights of the constructed graph encodes  useful bits of information, in order to allow the instance on  edges to represent all  inputs.

There is a polynomial-time algorithm that, given integers  and a set of  instances  of RBDS such that  and  for each , constructs an undirected graph , integer , and weight function  such that:

1. the graph  contains a clique of total edge-weight exactly  if and only if there exist  such that  has a red-blue dominating set of size at most ,

2. the number of vertices in  is , and

3. the values of  and  depend only on , and .

###### Proof.

We describe the construction of ; it will be easy to see that it can be carried out in polynomial time. Label the vertices in each set  arbitrarily as , and similarly label the vertices in each set  as . We construct a graph  with edge-weight function  and integer such that  has a clique of total edge weight exactly if and only if some  is a YES-instance of RBDS

. In the following construction we interpret edge weights as vectors of length

written in base , which will be converted to integers later. Starting from an empty graph, we construct  as follows; see Figure 1.

1. For each , create a vertex . The vertices  form an independent set, so that any clique in  contains at most one vertex .

2. For each , create a vertex set  and insert edges of weight  between all possible pairs of .

3. For each , create a vertex . The vertices  form an independent set, so that any clique in  contains at most one vertex .

4. For each , for each , insert an edge between  and  of weight .

The next step is to ensure that the neighborhood of a vertex  in is captured in the weights of the edges which are incident on  in .

1. For each , for each , insert an edge between  and .

2. The weight of each edge  is a vector of length , out of which the least significant positions are divided into blocks of length each, and the most significant position is 1. The numbering of blocks as well as positions within a given block start with the least significant position.

For each , for each , the weight of edge  is defined as follows. For each , for each , the value  represents the value of the position of the block of the weight of . The value is defined based on the neighborhood of vertex in as follows:

 vk,q(bi,rjx)={1if {bq,rx}∈E(Gi,j,k)0otherwise. (1)

Intuitively, the vector representing the weight of edge  is formed by a 1 followed by the concatenation of blocks of length , such that the block is the -incidence vector describing which of the  blue vertices of instance  are adjacent to .

Note that the  blue vertices of an input instance  are represented by a single blue vertex  in . The difference between distinct blue vertices is encoded via different positions of the weight vectors. The most significant position of the weight vectors, which is always set to  for edges of the form , will be used to keep track of the number of red vertices in a solution to RBDS.

The graph constructed so far has a mechanism to select the first index  of an instance  (by choosing a vertex ), to select the second index  (by choosing vertices ), and to select the third index  (by choosing a vertex ). The next step in the construction adds weighted edges , of which a solution clique in  will contain exactly one. The weight vector for this edge is chosen so that the domination requirements from all RBDS instances whose third index differs from  (and which are therefore not selected) can be satisfied “for free”.

1. For each , insert an edge between  and .

2. As in Step 6, the weight of the edge  is a ()-tuple consisting of the most significant position followed by  blocks of length . There is a  at the most significant position, block consists of  zeros, and the other blocks are filled with ones. Hence the weight of the edge  is independent of .

To be able to ensure that  has a clique of exactly weight  if some input instance

has a solution, we need to introduce padding numbers which may be used as part of the solution to

EEWC.

1. For each position  of a weight vector, add a vertex set  to . Recall that  is the upper bound on the solution size for RBDS.

2. For each , for each , for each , add an edge . The weight of edge  has value 1 at the  position and zeros elsewhere.

3. For each , for each , add an edge  of weight  for all , i.e., for all vertices  which were not already adjacent to .

We define the target weight  to be the -length vector with value  at each position, which satisfies Condition 3. Observe that has vertices: Steps 1 and 3 contribute  vertices, Step 2 contributes , and Step 9 contributes . Hence Condition 2 is satisfied. It remains to verify that has a clique of total edge weight exactly  if and only if some input instance  has a solution of Red-Blue Dominating Set of size at most . Before proving this property, we show the following claim which implies that no carries occur when summing up the weights of the edges of a clique in .

For any clique , for any position  of a weight vector, there are at most  edges of the clique  whose weight vector has a  at position , and all other weight vectors are  at position . By construction, the entries of the vector encoding an edge weight are either  or .

By Steps 1 and 3, a clique  in  contains at most one vertex  and one vertex . Since  does not have edges between vertices in distinct sets  and  by Step 2, any clique in  consists of at most one vertex , one vertex , a subset of one set , and a subset of . For any fixed position , the only edge-weight vectors which can have a  at position  are the  edges from  to , the edge , and the  edges between  and . As this yields  edges that possibly have a  at position , the claim follows.

The preceding claim shows that when we convert each edge-weight vector to an integer by interpreting the vector as its base--representation, then no carries occur when computing the sum of the edge-weights of a clique. Hence the integer edge-weights of a clique  sum to the integer represented by vector , if and only if the edge-weight vectors of the edges in  sum to the vector . In the remainder, it therefore suffices to prove that there is a YES-instance  of RBDS among the inputs if and only if  has a clique whose edge-weight vectors sum to the vector . We prove these two implications.

If some input graph  has a red-blue dominating set of size at most , then  has a clique of edge-weight exactly . Let of size at most be a dominating set of . We define a vertex set  as follows. Initialize , and for each vertex , add the corresponding vertex  to .

We claim that  is a clique in . To see this, note that  is a clique by Step 2. Vertex  is adjacent to all vertices of  by Step 4. Vertex  is adjacent to all vertices of  by Step 5. By Step 8 there is an edge between  and .

Let us consider the weight of clique . Since is a dominating set of , if we sum up the weight vectors of the edges  for , then by Step 6 we get a value of at least one at each position of block . The most significant position of the resulting sum vector has value . By Step 8 the weight vector of the edge consists of all ones, except for block and the most significant position, where the value is zero. Thus adding the edge weight of to the previous sum ensures that each block has value at least  everywhere, whereas the most significant position has value . All other edges spanned by  have weight . Letting  denote the vector obtained by summing the weights of the edges of clique , we therefore find that  has value  as its most significant position and value at least  everywhere else.

Next we add some additional vertices to the set to get a clique of weight exactly . By Step 11, vertices from the sets  for  are adjacent to all other vertices in the graph and can be added to any clique. All edges incident on a vertex  have weight , except the edges to vertices of the form  whose weight vector has a  at the  position and  elsewhere. Since  contains exactly one such vertex , for any  we can add up to  vertices from  to increase the weight sum at position  from its value of at least  in , to a value of exactly . Hence  has a clique of edge-weight exactly .

If  has a clique of edge-weight exactly , then some input graph  has a red-blue dominating set of size at most . Suppose is a clique whose total edge weight is exactly . Note that only edges for which one of the endpoints is of the form for have positive edge weights. The remaining edges all have weight . Also, by Step 1 there is at most one -vertex in . Hence since  there is exactly one vertex in . By Step 9 and 10, the edges of type for contribute at most to the value of each position of the sum. Hence for each position there is an edge in clique  of the form or which has a  at position . We use this to show there is an input instance with a red-blue dominating set of size at most .

By Step 3, there is at most one -vertex in . Let  if , and otherwise let  be the unique -vertex in . Since the weight of the edge  has zeros in block  by Step 8, our previous argument implies that for each of the positions of block , there is an edge in clique  of the form  whose weight has a  at that position. Hence  contains at least one -vertex, and by Step 2 all -vertices in the clique  are contained in a single set . We show that  has a red-blue dominating set of size at most . Let . Since for each of the  positions of block  there is an edge  in  with a  at that position, by Step 5 each blue vertex of  has a neighbor in . Hence  is a red-blue dominating set. By Step 5, the most significant position of each edge incident on has value . As the most significant position of the target is set to , it follows that , which proves that  has a red-blue dominating set of size at most . This completes the proof of Lemma 3.1. ∎

Lemma 3.1 forms the main ingredient in a cross-composition that proves kernelization lower bounds for Exact-Edge-Weight Clique and its generalization to hypergraphs. For completeness, we formally define the hypergraph version as follows.

Exact-Edge-Weight -Uniform Hyperclique (EEW--HC) Input: A -uniform hypergraph , weight function , and a positive integer . Question: Does have a hyperclique of total edge-weight exactly ?

The following theorem generalizes Theorem 1. The case  of the theorem follows almost directly from Lemma 3.1 and Theorem 2, as the construction in the lemma gives the crucial ingredient for a degree- cross-composition. For larger , we essentially exploit the fact that increasing the size of hyperedges by one allows one additional dimension of freedom, as has previously been exploited for other kernelization lower bounds for -Hitting Set and -Set Cover [8, 9]. The proof is given in Appendix A.1.

[] For each fixed , Exact-Edge-Weight -Uniform Hyperclique parameterized by the number of vertices does not admit a generalized kernel of size  for any , unless .

### 3.2 Subset Sum

We show that Subset Sum parameterized by the number of items does not have generalized kernel of bitsize for any , unless . We prove the lower bound by giving a linear-parameter transformation from Exact Red-Blue Dominating Set. We use Exact Red-Blue Dominating Set rather than Red-Blue Dominating Set as our starting problem for this lower bound because it will simplify the construction: it will avoid the need for ‘padding’ to cope with the fact that vertices are dominated multiple times.

The Subset Sum problem is formally defined as follows.

Subset Sum (SS) Parameter: Input: A multiset  of positive integers and a positive integer . Question: Does there exist a subset with ?

We use the following problem as the starting point of the reduction.

Exact Red-Blue Dominating Set (ERBDS) Parameter: Input: A bipartite graph with a bipartition of  into sets  (red vertices) and  (blue vertices), and a positive integer . Question: Does there exist a set of size exactly such that every vertex in has exactly one neighbor in ?

Jansen and Pieterse proved the following lower bound for ERBDS. [[17, Thm. 4.9]] Exact Red-Blue Dominating Set parameterized by the number of vertices  does not admit a generalized kernel of size  unless .

Actually, the lower bound they proved is for a slightly different variant of ERBDS where the solution is required to have size at most , instead of exactly . Observe that the variant where we demand a solution of size exactly is at least as hard as the at most version: the latter reduces to the former by inserting  isolated red vertices. Therefore the lower bound by Jansen and Pieterse also works for the version we use here, which will simplify the presentation.

See 1

###### Proof.

Given a graph  with a bipartition of into and with , , and target value  for ERBDS, we transform it to an equivalent instance of SS such that . We start by defining numbers in base . For each , the number consists of digits. We denote the digits of the number  by , where is the least significant and is the most significant digit. Intuitively, the number  corresponds to the red vertex . See Figure 2 for an illustration.

For each , for each , digit of number is defined as follows:

 Ni[j]=⎧⎨⎩1if j=nB+11if j∈[nB] and {ri,bj}∈E(G)0otherwise. (2)

Hence the most significant digit of each number is , and the remaining digits of number  form the -vector indicating to which of the  blue vertices  is adjacent in .

To complete the construction we set and we define as follows:

 t=d11…1nB times (3)

Observe that under these definitions, there are no carries when adding up a subset of the numbers in , as each digit of each of the  numbers is either  or  and we work in base .

The number of items  in the constructed instance of SS is , linear in the parameter  of ERBDS. It is easy to see that the construction can be carried out in polynomial time. To complete the linear-parameter transformation from ERBDS to SS, it remains to prove that  has a set  of size exactly  such that every vertex in  has exactly one neighbor in , if and only if there exist a set  with .

In the forward direction, suppose that there exists a set of size exactly such that every vertex in has exactly one neighbor in . We claim that  is a solution to SS. The resulting sum has value  at the most significant digit since . All other digits correspond to vertices in . Since each blue vertex is adjacent to exactly one vertex from  it is easy to verify that all remaining digits of the sum are exactly one, implying that the numbers sum to exactly .

For the reverse direction, suppose there is a set  with . Since the most significant digit of  is set to  and each number in  has a  as most significant digit, we have  since there are no carries during addition. Define  as the set of the red vertices corresponding to the numbers in . As  and no carries occur in the summation, we have for each . As the -th digit of all numbers is either or by definition, there is a unique with , so that is the unique neighbor of in . This shows that is an exact red-blue dominating set of size , concluding the linear-parameter transformation.

If there was a generalized kernelization for SS of size , then we would obtain a generalized kernelization for ERBDS of size  by first transforming it to SS, incurring only a constant-factor increase in the parameter, and then applying the generalized kernelization for the latter. Hence by contraposition and Theorem 3.2, the claim follows. ∎

### 3.3 Constraint Satisfaction Problems

In this section we extend our lower bounds to cover Boolean Constraint Satisfaction Problems (CSPs). We employ the recently introduced framework [18] of reductions among different CSPs to make a connection with EEW--HC. We start with introducing terminology necessary to identify crucial properties of CSPs.

#### Preliminaries on CSPs

-ary constraint is a function . We refer to as the arity of , denoted . We always assume that the domain is Boolean. A constraint is satisfied by an input if . A constraint language is a finite collection of constraints , potentially with different arities. A constraint application, of a -ary constraint to a set of  Boolean variables, is a triple , where the indices select of the  Boolean variables to whom the constraint is applied, and is an integer weight. The variables can repeat in a single application.

A formula of CSP is a set of constraint applications from over a common set of variables. For an assignment , that is, a mapping from the set of variables to , the integer is the sum of weights of the constraint applications satisfied by . The considered decision problems are defined as follows.

Exact-Weight CSP Parameter: Input: A formula of CSP over variables, an integer . Question: Is there an assignment for which ?

Max-Weight CSP Parameter: Input: A formula of CSP over variables, an integer . Question: Is there an assignment for which ?

The compressibility of Max-Weight CSP has been studied by Jansen and Włodarczyk [18], who obtained essentially optimal kernel sizes for every in the case where the weights are polynomial with respect to . Even though the upper and lower bounds in [18] are formulated for Max-Weight CSP, they could be adapted to work with Exact-Weight CSP. The crucial idea which allows to determine compressibility of is the representation of constraints via multilinear polynomials.

For a -ary constraint its characteristic polynomial is the unique -ary multilinear polynomial over satisfying for any .

It is known that such a polynomial always exists and it is unique [27].

The degree of constraint language , denoted , is the maximal degree of a characteristic polynomial over all .

The main result of Jansen and Włodarczyk [18] states that Max-Weight CSP with polynomial weights admits a kernel of bits and, as long as the problem is NP-hard, it does not admit a kernel of size , for any , unless . It turns out that in the variant when we allow both positive and negative weights the problem is NP-hard whenever  [19]. The lower bounds are obtained via linear-parameter transformations, where the parameter is the number of variables . We shall take advantage of the fact that these transformations still work for an unbounded range of weights.

[[18], Lemma 5.4] For constraint languages such that , there is a polynomial-time algorithm that, given a formula on variables and integer , returns a formula on variables and integer , such that

1. ,

2. ,

3. .

#### Kernel lower bounds for CSP

The lower bound of has been obtained via a reduction from -SAT (with ) to Max-Weight CSP, combined with the fact that Max -SAT does not admit a kernel of size for  [9, 18]. We are going to show that when the weights are arbitrarily large, then the optimal compression size for Exact-Weight CSP becomes essentially , so the exponent is always larger by one compared to the case with polynomial weights. To this end, we are going to combine the aforementioned reduction framework with our lower bound for Exact-Edge-Weight -Uniform Hyperclique.

Consider a constraint language consisting of a single -ary constraint , which is satisfied only if all the arguments equal 1. The characteristic polynomial of is simply , hence the degree of equals . We first translate our lower bounds for the hyperclique problems into a lower bound for Exact-Weight CSP for all , and then extend it to other CSPs.

[] For all , Exact-Weight CSP does not admit a generalized kernel of size , for any , unless .

The lower bound for Exact-Weight CSP given by Lemma 3.3 yields a lower bound for general Exact-Weight CSP using the reduction framework described above.

For any  with , Exact-Weight CSP does not admit a generalized kernel of size , for any , unless .

###### Proof.

Consider an -variable instance of Weighted Exact CSP, where . It holds that . By Lemma 3.3, there is a linear-parameter transformation that translates into an equivalent instance of Weighted Exact CSP. If we could compress into bits, this would entail the same compression for . The claim follows from Lemma 3.3. ∎

This concludes the discussion of kernelization lower bounds. The kernelization upper bounds discussed in the introduction can be found in Appendix B.

## 4 Node-weighted Vertex Cover in bipartite graphs

#### Preserving all minimum solutions

For a graph  with node-weight function , we denote by  the collection of subsets of  which are minimum-weight vertex covers of . For -vertex bipartite graphs there exists a weight function with range  that preserves the set of minimum-weight vertex covers, which can be computed efficiently.

[] There is an algorithm that, given an -vertex bipartite graph  and node-weight function , outputs a weight function  such that . The running time of the algorithm is polynomial in  and the binary encoding size of .

The proof of the theorem is given in Appendix C. It relies on the fact that a maximum

-matching (the linear-programming dual to

Vertex Cover) can be computed in strongly polynomial time in bipartite graphs by a reduction to Max Flow. The structure of a maximum -matching allows two weight-reduction rules to be formulated whose exhaustive application yields the desired weight function. The bound of  on the largest weight in Theorem 4 is best-possible, which we prove in Lemma C.1 in Appendix C.

#### Preserving the relative weight of solutions

For a graph , we say that two node-weight functions  are vertex-cover equivalent if the ordering of inclusion-minimal vertex covers by total weight is identical under the two weight functions, i.e., for all pairs of inclusion-minimal vertex covers  we have . While a minimum-weight vertex cover of a bipartite graph can be found efficiently, the following theorem shows that nevertheless weight functions with exponentially large coefficients may be needed to preserve the ordering of minimal vertex covers by weight.

[] For each , there exists a node-weighted bipartite graph  on  vertices with weight function such that for all weight functions  which are vertex-cover equivalent to , we have: .

## 5 Conclusions

We have established kernelization lower bounds for Subset Sum, Exact-Edge-Weight -Uniform Hyperclique, and a family of Exact-Weight CSP problems, which make it unlikely that there exists an efficient algorithm to compress a single weight into  bits. This gives a clear separation between the setting involving arbitrarily large weights and the case with polynomially-bounded weights, which can be encoded with bits each. The matching kernel upper bounds are randomized and we leave it as an open question to derandomize them. For Subset Sum parameterized by the number of items , a deterministic kernel of size is known [11].

Kernelization of minimization/maximization problems is so far less understood. We are able to match the same kernel size as for the exact-weight problems, but only through Turing kernels. Using techniques from [11] one can obtain, e.g., a kernel of size for Max-Edge-Weight Clique. Improving upon this bound possibly requires a better understanding of the threshold functions. Our study of weighted Vertex Cover on bipartite graphs indicates that preserving the order between all the solutions might be overly demanding and it could be easier to keep track only of the structure of the optimal solutions. Can we extend the theory of threshold functions so that better bounds are feasible when we just want to maintain a separation between optimal and non-optimal solutions?

## References

• [1] Amir Abboud, Shon Feller, and Oren Weimann. On the fine-grained complexity of parity problems. In Artur Czumaj, Anuj Dawar, and Emanuela Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), volume 168 of LIPIcs, pages 5:1–5:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
• [2] Amir Abboud, Kevin Lewi, and Ryan Williams. Losing weight by gaining edges. In Andreas S. Schulz and Dorothea Wagner, editors, Algorithms - ESA 2014 - 22th Annual European Symposium, Wroclaw, Poland, September 8-10, 2014. Proceedings, volume 8737 of Lecture Notes in Computer Science, pages 1–12. Springer, 2014.
• [3] László Babai, Kristoffer Arnsfelt Hansen, Vladimir V. Podolskii, and Xiaoming Sun. Weights of exact threshold functions. In Petr Hlinený and Antonín Kucera, editors, Mathematical Foundations of Computer Science 2010, 35th International Symposium, MFCS 2010, Brno, Czech Republic, August 23-27, 2010. Proceedings, volume 6281 of Lecture Notes in Computer Science, pages 66–77. Springer, 2010.
• [4] Andreas Björklund. Determinant sums for undirected Hamiltonicity. SIAM J. Comput., 43(1):280–299, 2014.
• [5] Hans L. Bodlaender, Bart M. P. Jansen, and Stefan Kratsch. Kernelization lower bounds by cross-composition. SIAM J. Discrete Math., 28(1):277–305, 2014.
• [6] Miroslav Chlebík and Janka Chlebíková. Crown reductions for the minimum weighted vertex cover problem. Discret. Appl. Math., 156(3):292–312, 2008.
• [7] Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Dániel Marx, Marcin Pilipczuk, Michal Pilipczuk, and Saket Saurabh. Parameterized Algorithms. Springer, 2015.
• [8] Holger Dell and Dániel Marx. Kernelization of packing problems. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 68–81. SIAM, 2012.
• [9] Holger Dell and Dieter van Melkebeek. Satisfiability allows no nontrivial sparsification unless the polynomial-time hierarchy collapses. J. ACM, 61(4):23:1–23:27, 2014.
• [10] Rodney G. Downey and Michael R. Fellows. Fundamentals of Parameterized Complexity. Texts in Computer Science. Springer, 2013.
• [11] Michael Etscheid, Stefan Kratsch, Matthias Mnich, and Heiko Röglin. Polynomial kernels for weighted problems. J. Comput. Syst. Sci., 84:1–10, 2017.
• [12] Henning Fernau. Kernelization, Turing kernels. In Encyclopedia of Algorithms, pages 1043–1045. Springer, 2016.
• [13] András Frank and Éva Tardos.

An application of simultaneous diophantine approximation in combinatorial optimization.

Combinatorica, 7(1):49–65, 1987.
• [14] Danny Harnik and Moni Naor. On the compressibility of NP instances and cryptographic applications. SIAM Journal on Computing, 39(5):1667–1713, 2010.
• [15] Anwar A. Irmatov. Asymptotics of the number of threshold functions and the singularity probability of random -matrices. Doklady Mathematics, 101:247–249, 2020.
• [16] Alon Itai and Michael Rodeh. Finding a minimum circuit in a graph. SIAM J. Comput., 7(4):413–423, 1978.
• [17] Bart M. P. Jansen and Astrid Pieterse. Optimal sparsification for some binary CSPs using low-degree polynomials. TOCT, 11(4):28:1–28:26, 2019.
• [18] Bart M. P. Jansen and Michal Wlodarczyk. Optimal polynomial-time compression for Boolean Max CSP. In Fabrizio Grandoni, Grzegorz Herman, and Peter Sanders, editors, 28th Annual European Symposium on Algorithms, ESA 2020, September 7-9, 2020, Pisa, Italy (Virtual Conference), volume 173 of LIPIcs, pages 63:1–63:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
• [19] Peter Jonsson and Andrei Krokhin. Maximum -colourable subdigraphs and constraint optimization with arbitrary weights. Journal of Computer and System Sciences, 73(5):691 – 702, 2007.
• [20] Richard M. Karp. Reducibility among combinatorial problems. In Proceedings of a symposium on the Complexity of Computer Computations, held March 20-22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA, The IBM Research Symposia Series, pages 85–103. Plenum Press, New York, 1972.
• [21] Richard M. Karp and Michael O. Rabin.

Efficient randomized pattern-matching algorithms.

IBM journal of research and development, 31(2):249–260, 1987.
• [22] Dániel Marx and Michal Pilipczuk. Everything you always wanted to know about the parameterized complexity of subgraph isomorphism (but were afraid to ask). CoRR, abs/1307.2187, 2013. arXiv:1307.2187v3.
• [23] Ketan Mulmuley, Umesh V. Vazirani, and Vijay V. Vazirani. Matching is as easy as matrix inversion. Comb., 7(1):105–113, 1987.
• [24] Jesper Nederlof. Bipartite TSP in time, assuming quadratic time matrix multiplication. In Konstantin Makarychev, Yury Makarychev, Madhur Tulsiani, Gautam Kamath, and Julia Chuzhoy, editors, Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020, pages 40–53. ACM, 2020.
• [25] Jesper Nederlof, Erik Jan van Leeuwen, and Ruben van der Zwaan. Reducing a target interval to a few exact queries. In Branislav Rovan, Vladimiro Sassone, and Peter Widmayer, editors, Mathematical Foundations of Computer Science 2012 - 37th International Symposium, MFCS 2012, Bratislava, Slovakia, August 27-31, 2012. Proceedings, volume 7464 of Lecture Notes in Computer Science, pages 718–727. Springer, 2012.
• [26] G.L. Nemhauser and L.E.jun. Trotter. Vertex packings: structural properties and algorithms. Math. Program., 8:232–248, 1975.
• [27] Noam Nisan and Mario Szegedy. On the degree of Boolean functions as real polynomials. Computational Complexity, 4:301–313, 1994.
• [28] James B. Orlin. Max flows in time, or better. In Symposium on Theory of Computing Conference, STOC’13, pages 765–774. ACM, 2013.
• [29] Alexander Schrijver. Combinatorial Optimization: Polyhedra and Efficiency, volume 24. Springer-Verlag, Berlin, 2003.
• [30] Virginia Vassilevska and Ryan Williams. Finding, minimizing, and counting weighted subgraphs. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, pages 455–464. ACM, 2009.
• [31] Virginia Vassilevska Williams. Hardness of easy problems: Basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In Thore Husfeldt and Iyad A. Kanj, editors, 10th International Symposium on Parameterized and Exact Computation, IPEC 2015, September 16-18, 2015, Patras, Greece, volume 43 of LIPIcs, pages 17–29. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015.
• [32] Virginia Vassilevska Williams and R. Ryan Williams. Subcubic equivalences between path, matrix, and triangle problems. J. ACM, 65(5):27:1–27:38, 2018.
• [33] Virginia Vassilevska Williams and Ryan Williams. Finding, minimizing, and counting weighted subgraphs. SIAM J. Comput., 42(3):831–854, 2013.

## Appendix A Kernel lower bounds

### a.1 Omitted proofs for Exact-Edge-Weight Clique

See 3.1

###### Proof.

We give a degree- cross-composition (Definition 2) from RBDS to the weighted hyperclique problem using Lemma 3.1. We start by giving a polynomial equivalence relation on inputs of RBDS. Let two instances of RBDS be equivalent under if they have the same number of red vertices, the same number of blue vertices, and the same target value . It is easy to check that is a polynomial equivalence relation.

Consider inputs of RBDS from the same equivalence class of . If  is not a  power of an integer, then we duplicate one of the input instances until we reach the first number of the form , which is trivially such a power. This increases the number of instances by at most the constant factor  and does not change whether there is a YES-instance among the instances. As all requirements on a cross-composition are oblivious to constant factors, from now on we may assume without loss of generality that  for some integer . By definition of , all instances have the same number  of red vertices, the same number  of blue vertices, and have the same maximum size  of a solution.

For , we can simply invoke Lemma 3.1 for the