Kernelization, data reduction, or preprocessing: all of these refer to the goal of simplifying and reducing (the size of) the input in order to speed up computation of challenging tasks. Many heuristic techniques are applied in practice, however, we seek a theoretical understanding in the form of procedure with guaranteed bounds on the sizes of the reduced data. We use the notion of kernelization from parameterized complexity (cf.[38, 9]), where along with an input instance we get a positive integer expressing the parameter value, which may be the size of the sought solution or some structural limitation of the input. A kernel is an algorithm running in time which returns a reduced instance of the same problem of size bounded in terms of ; we sometimes also refer to as the kernel.
It is well known  that a problem admits a kernel if and only if it has an algorithm running in time for some computable function (i.e., if it is fixed-parameter tractable, or FPT, parameterized by ). The “catch” is that this kernel may be very large (exponential or worse) in terms of , while for many problems, kernels of size polynomial in are known. This raises a fundamental question for any FPT problem: does it have a polynomial kernel? Answering this question typically provides deep insights into a problem and the structure of its solutions.
Parameterized complexity has historically focused primarily on graph problems, but it has been increasingly branching out into other areas. Kernelization, as arguably the most important subfield of parameterized complexity (cf. a recent monograph 
), follows suit. Scheduling is a fundamental area in combinatorial optimization, with results from parameterized complexity going back to 1995. Arguably the most central problem in scheduling is makespan minimization on identical machines, denoted as , which we shall define soon. It took until the seminal paper of Goemans and Rothvoss  to get an FPT algorithm for parameterized by the number of job types (hence also by the largest job). Yet, the existence of a polynomial kernel for remained open, despite being raised by Mnich and Wiese  and reiterated by van Bevern111The question was asked at the workshop “Scheduling & FPT” at the Lorentz Center, Leiden, in February 2019, as a part of the opening talk for the open problem session.. Here, we give an affirmative answer for this problem: There is a polynomial kernel for when parameterized by the longest processing time . Let us now introduce and define the scheduling problems and . There are jobs and identical machines, and the goal is to find a schedule minimizing an objective. For each job , a processing time is given and a weight are given; in the case of the weights play no role and can be assumed to be all zero. A schedule is a mapping which to each job assigns some machine and a closed interval of length , such that the intervals assigned to each machine do not overlap except for their endpoints. For each job , denote by its completion time, which is the time when it finishes, i.e., the right end point of the interval assigned to in the schedule. In the makespan minimization () problem, the goal is to find a schedule minimizing the time when the last job finishes , called the makespan. In the minimization of sum of weighted completion times (), the goal is to minimize . (In the rest of the paper we formally deal with decision versions of these problems, where the task is to decide whether there exists a schedule with objective value at most . This is a necessary approach when speaking of kernels and complexity classes like NP and FPT.)
In fact, our techniques imply results stronger in three ways, where we handle:
the much more complicated objective function involving possibly large job weights,
the unrelated machines setting (denoted and ), and
allowing the number of jobs and machines to be very large, known as the high-multiplicity setting.
For this, we need further notation to allow for different kinds of machines. For each machine and job , a processing time is given. For a given scheduling instance, say that two jobs are of the same type if for all and , and say that two machines are of the same kind if for all jobs . We denote by and the number of job types and machine kinds, respectively, call this type of encoding the high-multiplicity encoding, and denote the corresponding problems and .
Our approach is indirect: taking an instance of scheduling, we produce a small equivalent instance of a the so-called huge -fold integer programming problem with a quadratic objective function (see more details below). This is known as compression, i.e., a polynomial time algorithm producing from a small equivalent instance of a different problem: The problems and parameterized by the number of job types , the longest processing time , and the number of machine kinds admit a polynomial compression to quadratic huge -fold IP parameterized by the number of block types , the block dimension , and the largest coefficient . If we can then find a polynomial reduction from quadratic huge -fold IP to our scheduling problems, we are finished. For this, it suffices to show NP membership, as we do in Lemma 3.
Besides giving polynomial kernels for some of the most fundamental scheduling problems, we wish to highlight the technique behind this result, because it is quite unlike most techniques used in kernelization and is of independent interest. Our algorithm essentially works by solving the natural Configuration LP of (and other problems), which can be done in polynomial time when is polynomially bounded, and then using powerful structural insights to reduce the scheduling instance based on the Configuration LP solution. The Configuration LP is a fundamental tool in combinatorial optimization which goes back to the work of Gilmore and Gomory in 1961 . It is known to provide high-quality results in practicesize=, color=blue!50!whitesize=, color=blue!50!whitetodo: size=, color=blue!50!whitecite, in fact, the “modified integer round-up property (MIRUP)” conjecture states that the natural Bin Packing Configuration LP always attains a value which is at most one larger than the integer optimum . The famous approximation algorithm of Karmarkar and Karp  for Bin Packing is based on rounding the Configuration LP, and many other results in approximation use the Configuration LP for their respective problems as the starting point.
In spite of this centrality and vast importance of the Configuration LP, there are only few structural results providing deeper insight. Perhaps the most notable is the work of Goemans and Rothvoss  and later Jansen and Klein  who show that there is a certain set of “fundamental configurations” such that in any integer optimum, all but few machines (bins, etc.) will use these fundamental configurations. Our result is based around a theorem which shows a similar yet orthogonal result and can be informally stated as follows: There is an optimum of the Configuration IP where all but few configuration are those discovered by the Configuration LP, and the remaining configurations are not far from those discovered by the Configuration LP. We note that our result, unlike the ones mentioned above [20, 26], also applies to arbitrary separable convex functions. This has a fundamental reason: the idea behind both previous results is to shift weight from the inside of a polytope to its vertices without affecting the objective value, which only works for linear objectives.
Huge -fold IP.
Finally, we highlight that the engine behind our kernels, a conditional kernel for the so-called quadratic huge -fold IP, is of independent interest. Integer programming is a central problem in combinatorial optimization. Its parameterized complexity has been recently intensely studied [11, 32, 33, 10, 6]. However, it turns out that integer programs cannot be kernelized in all but the most restricted cases [34, 35, 26]. We give a positive result about a class of block-structured succinctly encoded IPs with a quadratic objective function, so-called quadratic huge -fold IPs, which was used to obtain many interesting FPT results [29, 32, 3, 17, 31, 4]. However, our result is conditional on having a polynomial algorithm for the so-called separation subproblem of the Configuration LP of the quadratic huge -fold IP, so there is a price to pay for the generality of this fragment of IP. The separation subproblem is to optimize a certain objective function (which varies) over the set of configurations. In the cases considered here, we show that this corresponds to (somewhat involved) variations of the knapsack problem with polynomially bounded numbers; in other problems expressible as -fold IP, the separation subproblem corresponds to a known hard problem. Informally, our result reads as follows: If the separation subproblem can be solved in polynomial time, then quadratic huge -fold IP has a polynomial kernel parameterized by the block dimensions, the number of block types, and the largest coefficient. One aspect of the algorithm above is reducing the quadratic objective function. The standard approach, also used in kernelization of weighted problems [13, 7, 1, 19, 43, 42, 21] is to use a theorem of Frank and Tardos  which “kernelizes” a linear objective function if the dimension is a parameter. However, we deal with
a quadratic convex (non-linear) function,
over a space of large dimension.
We are able to overcome these obstacles by a series of steps which first “linearize” the objective, then “aggregate” variables of the same type, hence shrinking the dimension, then reduce the objective using the algorithm of Frank and Tardos, and then we carefully reverse this process (cf. Lemma 4.2). This result has applications beyond this work: for example, the currently fastest strongly FPT algorithm for (i.e., an algorithm whose number of arithmetic operations does not depend on the weights ) has dependence of on the number of machines ; applying our new result instead of [11, Corollary 69] reduces this dependence to .
Theorem 1 can be used to obtain kernels for other problems which can be modeled as huge -fold IP. First, we may also optimize the norms of times when each machine finishes, a problem known as . Our results (Corollary 2.5) show that also in this setting the separation problem can be solved quickly. Second, the problem is identical to Bin Packing (in their decision form), so our kernel also gives a kernel for Bin Packing parameterized by the largest item size. Moreover, also the Bin Packing with Cardinality Constraints problem has a huge -fold IP model [30, Lemma 54] for which Corollary 2.5 indicates that the separation subproblem can be solved quickly. Third, Knop et al.  give a huge -fold IP model for the Surfingsize=, color=blue!50!whitesize=, color=blue!50!whitetodo: size=, color=blue!50!whitemaybe make all problems textsc? problem, in which many “surfers” make demands on few different “services” provided by few “servers”, where each surfer may have different costs of getting a service from a server; one may think of internet streaming with different content types, providers, and pricing schemes for different customer types. The separation problem there is polynomially solvable for an interesting reason: its constraint matrix is totally unimodular because it is the incidence matrix of the complete bipartite graph. Thus, Theorem 1 gives polynomial kernels for all of the problems above with the given parameters.
Let us finally review related results in the intersection of parameterized complexity and scheduling. First, up to our knowledge, to study scheduling problems from the perspective of multivariate complexity were Bodlaender and Fellows . Fellows and McCartin  study study scheduling on single machine of unit length jobs with (many) different release times and due dates. Single machine scheduling where two agents compete to schedule their private jobs is investigated by Hermelin et al. . There are few other result [44, 27, 24, 23] focused on identifying tractable scenarios for various scheduling paradigms (such as flow-shop scheduling or e.g. structural limitations of the job–machine assignment).
We consider zero to be a natural number, i.e.,
. We write vectors in boldface (e.g.,) and their entries in normal font (e.g., the -th entry of a vector is ). For positive integers we set and , and we extend this notation for vectors: for with , (where we compare component-wise). For two vectors , is defined coordinate-wise, i.e., for all , and similarly for .
If is a matrix, denotes the -th coordinate of the -th row, denotes the -th row and denotes the -th column. We use , i.e., all our logarithms are base . For an integer , we denote by the binary encoding length of ; we extend this notation to vectors, matrices, and tuples of these objects. For example, , and . For a function and two vectors , we define ; if is clear from the context we omit it and write just .
2.1 Kernel and Compression
Let be a parameterized problem. We say that is fized-parameter tractable (or in FPT for short) if there exists an algorithm that given an instance decides whether in time, where is a computable function. A kernel for is a polynomial time algorithm (that is, an algorithm that stops in time) that given an instance returns an equivalent instance (that is, if and only if ) for which both and are upper-bounded by for some computable function . It is well-known that a parameterized problem is in FPT if and only if there is a kernel for it. Of course, the smaller the size of the instance returned by the kernelization algorithm the better; in particular, we are interested in deciding whether can be a polynomial in and if this is the case, we say there is a polynomial kernel for . A compression is a similar notion to kernel, that is, it is a polynomial time algorithm that given returns an instance with , however, this time we allow to be an instance of a different parameterized problem and we require if and only if . A problem admits a polynomial compression if the function is a polynomial and we say that the problem admits a polynomial compression into the problem .
[[15, Theorem 1.6]] Let be parameterized problems such that is NP-hard and is in NP. If admits a polynomial compression into , then it admits a polynomial kernel. The above observation is useful when dealing with NP-hard problems. The proof simply follows by pipelining the assumed polynomial compression with a polynomial time (Karp) reduction from to .
2.2 Scheduling Notation
Overloading the convention slightly, for each and , denote by the processing time of a job of type on a machine of kind , by the weight of a job of type , by the number of jobs of type , by the number of machines of kind , and denote , , , , , and . We denote the high multiplicity versions of the previously defined problems and .
For an instance of or , we define its size as , whereas for an instance of or we define its size as . Note that the difference in encoding actually leads to different problems: for example, an instance of with jobs with maximum processing time can be encoded with bits while an equivalent instance of needs bits, which is exponentially more if . The membership of high-multiplicity scheduling problems in NP was open for some time, because it is not obvious whether a compactly encoded instance also has an optimal solution with a compact encoding. This question was considered by Eisenbrand and Shmonin, and we shall use their result. For a set define the integer cone of , denoted , to be the set [Eisenbrand and Shmonin [12, Theorem 2]] Let be a finite set of integer vectors and let . Then there exists a subset such that and the following holds for the cardinality of :
if all vectors of are nonnegative, then ,
if , then .
One can use Proposition 2.2 to show that the decision versionf of and have short certificates and thus belong to NP. We will later derive the same result as a corollary of the fact that both of these scheduling problems can be encoded as a certain form of integer programming, which we will show to have short certificates as well.
2.3 Conformal Order and Graver Basis
Let be two vectors. We say that is conformal to (we denote it ) if both and for all . In other words, if they are in the same orthant (the first condition holds) and is component-wise smaller than . For a matrix we define its Graver basis to be the set of all -minimal vectors in . We define and .
We say that two functions are equivalent on a polyhedron if if and only if for all . Note that if and are equivalent on , then the set of minimizers of over is the same as the set of minimizers of over . [Frank and Tardos ] Given a rational vector and an integer , there is a polynomial algorithm which finds a such that the linear functions and are equivalent on , and . The dual graph of a matrix has and if rows and contain a non-zero at a common coordinate . The dual treewidth of is . We do not define treewidth here, but we point out that for every tree . [Eisenbrand et al. [11, Theorem 98]] An IP with a constraint matrix can be solved in time , where is the dimension of the IP and is the length of the input. [Eisenbrand et al. [11, Lemma 25]] For an integer matrix , we have .
Let us use Proposition 2.2 to show that and have short certificates. Here and later we will use the notion of a configuration: a configuration is a vector encoding how many jobs of which type are assigned to some machine. (The decision versions of) and belong to NP.
To show membership in NP, we have to prove the existence of short certificates. More precisely, for a high-multiplicity scheduling instance with a parameter , we have to show that if has an optimum of at most , then there exists a certificate of this fact of length . In both cases ( and ) the certificate will be a collection of configurations together with their multiplicities. However, to use Proposition 2.2 we will need to introduce a more complicated notion of an extended configuration. . Let an instance of together with the value be given. For each machine kind , define , and define the set of its extended configurations of to be . The interpretation is that in any the first coordinates encode a configuration (i.e., an assignment of jobs to a machine) and the remaining coordinates encode the kind of a machine for which this configuration can be processed in time at most . Then any decomposition of the vector with corresponds to a solution of where the last job finishes in time at most . Finally, since all vectors in are nonnegative, Proposition 2.2 (Part 1) applied to says that if such a decomposition exists (i.e., if is a Yes instance), then there exists one with and we are done.
. Let be an instance of together with the value . It is well known  that on a single machine a schedule minimizing is one which schedules jobs according to their Smith ratios non-increasingly. For each machine kind , we define to be the value of for the aforementioned scheduling of the instance on a single machine of kind . Define , and define the set of extended configurations to be . The difference, as compared with , is that does not define (we only use it to ensure finiteness) but we have an additional coordinate which expresses (an upper bound on) the contribution of each configuration (machine) to the objective. Hence, any decomposition of the vector with corresponds to a solution of of value at most . Proposition 2.2 says if any decomposition exists (i.e., if is a Yes instance), then there exists one where . Because is a quadratic function with coefficients bounded by and  we have and hence there exists a certificate of length and is in NP. ∎
2.4 -fold Integer Programming
The Integer Programming problem is to solve:
where , , , and .
A generalized -fold IP matrix is defined as
Here, , is an -matrix, and and for all , are integer matrices. Problem (IP) with is known as generalized -fold integer programming (generalized -fold IP). “Regular” -fold IP is the problem where and for all . Recent work indicates that the majority of techniques applicable to “regular” -fold IP also applies to generalized -fold IP .
The structure of allows us to divide any -dimensional object, such as the variables of , bounds , or the objective , into bricks of size , e.g. . We use subscripts to index within a brick and superscripts to denote the index of the brick, i.e., is the -th variable of the -th brick with and . We call a brick integral if all of its coordinates are integral, and fractional otherwise.
Huge -fold IP.
The huge -fold IP problem is an extension of generalized -fold IP to the high-multiplicity scenario, where blocks come in types and are encoded succinctly by type multiplicities. This means there could be an exponential number of bricks in an instance with a polynomial encoding size. The input to the huge -fold IP problem with types of blocks is defined by matrices and , , vectors , , , , functions satisfying we have and given by evaluation oracles, and integers such that . We say that a brick is of type if its lower and upper bounds are and , its right hand side is , its objective is , and the matrices appearing at the corresponding coordinates are and . Denote by the indices of bricks of type , and note and . The task is to solve (IP) with a matrix which has blocks of type for each . Knop et al.  have shown a fast algorithm solving huge -fold IP. The main idea of their approach is to prove a powerful proximity theorem showing how one can drastically reduce the size of the input instance given that one can solve a corresponding configuration LP (which we shall formally define later). We will build on this approach here. When are restricted to be separable quadratic (and convex) for all , we call the problem quadratic huge -fold IP.
2.5 Configurations LP of Huge -fold IP
Having modeled our scheduling problems as huge -fold IP instances, our next goal is to solve the Configuration LP, which we will now define. Because the results we derive below apply to any quadratic huge -fold IP, we state them generally (and not as claims about the specific instances which we shall apply them to).
Let a huge -fold IP instance with types be fixed. Recall that denotes the number of blocks of type , and let . We define for each the set of configurations of type as
Here we are interested in four instances of convex programming (CP) and convex integer programming (IP) related to huge -fold IP. First, we have the Huge IP
Then, there is the Configuration LP of (HugeIP),
Finally, by observing that implies for all , defining , leads to the Configuration ILP,
The classical way to solve (ConfLP) is by solving its dual using the ellipsoid method and then restricting (ConfLP) to the columns corresponding to the rows encountered while solving the dual, a technique known as column generation. The Dual LP of (ConfLP) in variables , is:
To verify feasibility of for , we need to maximize the left-hand side of (4) over all and check if it is at most . This corresponds to solving the following separation problem: find integer variables which for a given vector solve
Denote by the time needed to solve (-IP). [Knop et al. [30, Lemma 12]] An optimal solution of (ConfLP) with can be found in time. Since (-IP) is an IP, it can be solved using Proposition 2.3 in time . Hence, together with Lemma -IP, we get the following corollary: An optimal solution of (ConfLP) with can be found in time . We later show how that for our formulations of and , indeed is polynomial in , and , hence the (ConfLP) optimum can be found in polynomial time.
3 Compressing High Multiplicity Scheduling to Quadratic -fold IP
In this section we are going to prove Theorem 1. To that end, we use the following assumption (which mainly simplifies notation).
From here on, we assume , since both quantities are parameters.
Theorem 1 (repeated).
The problems and parameterized by the number of job types , the longest processing time , and the number of machine kinds admit a polynomial compression to quadratic huge -fold IP parameterized by the number of block types , the block dimension , and the largest coefficient .
Recall that in order to use Theorem 1 to provide kernels for selected scheduling problems (which are NP-hard) we want to utilize Proposition 2.1. Thus, we have to show that the “target problem” quadratic huge -fold IP is in NP. The decision version of quadratic huge -fold IP belongs to NP.
We will use Proposition 2.2 to show that there exists an optimum whose number of distinct configurations is polynomial in the input length. Such a solution can then be encoded by giving those configurations together with their multiplicities, and constitutes a polynomial certificate. Recall that (ConfILP) corresponding to the given instance of huge -fold is
Let be the set of columns of the matrix extended with an additional coordinate which is the coefficient of the objective function corresponding to the given column, that is, for a column (i.e., the objective value of configuration ). Hence and for any . Applying Proposition 2.2, part 2, to , yields that there exists an optimal solution of (ConfILP) with satisfying , hence polynomial in the input length of the original instance. ∎
Clearly Lemma 3 holds for any huge -fold IP whose objective is restricted by some, not necessarily quadratic, polynomial.
Using Theorem 1.
Before we move to the proof of Theorem 1 we first derive two simple yet interesting corollaries. The problems and admit polynomial kernelizations when parameterized by .
Let . We describe a polynomial compression from to quadratic huge -fold IP which, by Lemma 3, yields the sought kernel, since is NP-hard and huge -fold with a quadratic objective is in NP.
We first perform the high-multiplicity encoding of the given instance of , thus obtaining an instance of with the input encoded as . Now, we can apply Theorem 1 and obtain an instance equivalent to with size bounded by a polynomial in . ∎
3.1 Huge -fold IP Models
Denote by the -dimensional vector whose all entries are . It was shown [29, 30] that is modeled as a feasibility instance of huge -fold IP as follows. Recall that we deal with the decision versions and that is the upper bound on the value of the objective(s). We set , the number of block types is , , , , , for , and the multiplicities of blocks are . The meaning is that the first type of constraints expressed by the matrices ensures that every job is scheduled somewhere, and the second type of constraints expressed by the matrices ensures that every machine finishes in time .
In the model of , for each machine kind , we define to be the ordering of jobs by the ratio non-increasingly, and let