1 Introduction
The design of algorithms for solving hard combinatorial optimization problems remains a valuable and challenging task. Practically relevant problems are typically NPcomplete or NPhard. Examples include any kind of search problem through a combinatorial space, such as inference in graphical models (Wainwright et al., 2005), planning (Ono and Williams, 2008), mechanism design (De Vries and Vohra, 2003), program synthesis (Manna and Waldinger, 1971), verification (Bérard et al., 2013), and engineering design (Cui et al., 2006, Mirhoseini et al., 2017), amongst many others.
The widespread importance of solving these hard combinatorial optimization problems has spurred intense research in designing approximation algorithms and heuristics for large classes of combinatorial optimization settings, such as integer programming (Berthold, 2006, Fischetti and Lodi, 2010, Land and Doig, 2010) and satisfiability (Zhang and Malik, 2002, De Moura and Bjørner, 2008, Dilkina et al., 2009a). Historically, the design of such algorithms was done largely manually, requiring careful understandings of the underlying structure within specific classes of optimization problems. Such approaches are often unappealing due to the need to obtain substantial domain knowledge, and one often desires a more automated approach.
In recent years, there has been an increasing interest to automatically learn good (parameters of) algorithms from training data. The most popular paradigm, also referred to as “learning to search”, aims to learn good local decisions within a search procedure such as branchandbound (He et al., 2014, Khalil et al., 2016, 2017a, Song et al., 2018, 2019, Gasse et al., 2019)
. While this line of research has shown promise, it falls short of delivering practical impact, especially in improving wallclock time. A major reason is that most algorithms are implemented on opensourced solvers such as SCIP, which, according to recent benchmark results
(Mittelmann, 2017, Optimization, 2019), is considerably slower than leading commercial solvers such as Gurobi and CPlex (usually by a factor of 10 or more). Such learning to search approaches also ignore the many other heuristics typically employed by commercial solvers, such as primal presolve heuristics (Achterberg et al., 2019).Motivated by the aforementioned drawback, in this paper, we study how to design abstractions of largescale combinatorial optimization problems that can leverage existing stateoftheart solvers as a generic blackbox subroutine. Our goal is to arrive at new approaches that can reliably outperform leading commercial solvers in wallclock time. We are further interested in designing frameworks that are amenable to datadriven methods. We ground our work in two ways. First, we study how to solve integer programs (IPs), which are a common way to represent many combinatorial optimization problems. Second, we leverage the large neighorhood search (LNS) paradigm, which iteratively chooses a subset of variables to optimize while leaving the remainder fixed. A major appeal of LNS is that it can easily use any existing solver as a subroutine. We are furthermore interested in designing a framework that does not require incorporating extensive domain knowledge in order to apply to various problem domains, e.g., by learning datadriven decision procedures for the framework.
Our contributions can be summarized as:

We propose a general LNS framework for solving largescale IPs. Our framework does not depend on incorporating domain knowledge in order to achieve strong performance. In our experiments, we combine our framework with Gurobi, which is a leading commercial IP solver.

We show that, perhaps surprisingly, even using a random decision procedure within our LNS framework finds significantly outperforms Gurobi on many problem instances.

We develop a learningbased approach that predicts a partitioning of the variables of an IP, which then serves as a learned decision procedure within our LNS framework. In a sense, this datadriven procedure is effectively learning how to decompose the original optimization problem into a series of smaller subproblems that can be solved much more efficiently using existing solvers.

We perform an extensive empirical validation across several IP benchmarks, and demonstrate superior wallclock performance compared to Gurobi across all benchmarks. These results suggest that our LNS framework can effectively leverage leading stateoftheart solvers to reliably achieve substantial speedups in wallclock time.
2 Related Work on Learning to Optimize
An increasingly popular paradigm for the automated design and tuning of solvers is to use datadriven or learningbased approaches. Broadly speaking, one can categorize most existing “learning to optimize” approaches into three categories: (1) learning search heuristics such as for branchandbound; (2) tuning the hyperparameters of existing algorithms; and (3) learning to identify key substructures that an existing solver can exploit, such as backdoor variables. In this section, we survey these three paradigms.
2.1 Learning to Search
In learning to search, one typically operates within the framework of a search heuristic, and trains a local decision policy from training data. Perhaps the most popular search framework for integer programs is branchandbound (Land and Doig, 2010), which is a complete algorithm for solving integer programs (IPs) to optimality. Branchandbound is a general framework that includes many decision points that guide the search process, which historically have been designed using carefully attained domain knowledge. To arrive at more automated approaches, a collection of recent works explore learning datadriven models to outperform manually designed heuristics, including learning for branching variable selection (Khalil et al., 2016, Gasse et al., 2019), or node selection (He et al., 2014, Song et al., 2018, 2019). Moreover, one can also train a model to decide when to run primal heuristics endowed in many IP solvers (Khalil et al., 2017a)
. Many of these approaches are trained as policies using reinforcement or imitation learning.
Writing highly optimized software implementations is challenging, and so all previous work on learning to branchandbound were implemented within existing software frameworks that admit interfaces for custom functions. The most common choice is the opensource solver SCIP (Achterberg, 2009), while some previous work relied on callback methods with CPlex (Bliek1ú et al., 2014, Khalil et al., 2016). However, in general, one cannot depend on highly optimized solvers being amenable to incorporating learned decision procedures as subroutines. For instance, Gurobi, the leading commercial IP solver according to (Mittelmann, 2017, Optimization, 2019), has very limited interface capabilities, and to date, none of the learned branchandbound implementations can reliably outperform Gurobi.
Beyond branchandbound, other search frameworks that are amenable to datadriven design include A* search (Song et al., 2018), direct forward search (Khalil et al., 2017b), and samplingbased planning (Chen et al., 2020). These settings are less directly relevant, since our work is grounded in solving IPs. However, the LNS framework can, in principle, be interpreted more generally to include these other settings as well, which is an interesting direction for future work.
2.2 Algorithm Configuration
Another area of using learning to speed up optimization solvers is algorithm configuration (Hoos, 2011, Hutter et al., 2011, Ansótegui et al., 2015, Balcan et al., 2018, Kleinberg et al., 2019). Existing solvers tend to have many customizable hyperparameters whose values strongly influence the solver behaviors. Algorithm configuration aims to optimize those parameters on a problembyproblem basis to speed up the solver.
Similar to our approach, algorithm configuration approaches leverage existing solvers. One key conceptual difference is that algorithm configuration does not yield fundamentally new approaches, but rather is a process for tuning the hyperparameters of an existing approach. As a consequence, one limitation of algorithm configuration approaches is that they rely on the underlying solver being able to solve problem instances in a reasonable amount of time, which may not be possible for hard problem instances. Our LNS framework can thus be viewed as a complementary paradigm for leveraging existing solvers. In fact, in our experiments, we perform a simple version of algorithm configuration. We defer incorporating more complex algorithm configuration procedures as future work.
2.3 Learning to Identify Substructures
The third category of approaches is learning to predict key substructures of an optimization problem. A canonical example is learning to predict backdoor variables (Dilkina et al., 2009b), which are a set of variables that, once instantiated, the remaining problem simplifies to a tractable form (Dilkina et al., 2009a). Our approach bears some highlevel affinity to this paradigm, as we effectively aim to learn decompositions of the original problem into a series of smaller subproblems. However, our approach makes a much weaker structural assumption, and thus can more readily leverage a broader suite of existing solvers. Other examples of this general paradigm include learning to precondition solvers, such as generating an initial solution to be refined with a downstream solver, which is typically more popular in continuous optimization settings (Kim et al., 2018).
3 A General Large Neighborhood Search Framework for Integer Programs
We now present our large neighborhood search (LNS) framework for solving integer programs (IPs). LNS is a metaheuristic that generalizes the neighborhood search for optimization which iteratively improves an existing solution by local search. As a concept, LNS has been studied for over two decades (Shaw, 1998, Ahuja et al., 2002, Pisinger and Ropke, 2010). However, previous work studied specialized settings with domainspecific decision procedures. For example, in Shaw (1998), the definition of neighborhoods is highly specific to the vehicle routing problem, so the decision making of how to navigate the neighborhood is also domainspecific. We instead aim to develop a general framework that avoids requiring domainspecific structures, and whose decision procedures can be designed in a generic and automated way, e.g., via learning as described in Section 4. In particular, our approach can be viewed as a decompositionbased LNS framework that operates on generic IP representations, as described in Section 3.2.
3.1 Background
Formally, let be the set of all variables in an optimization problem and be all possible value assignments of . For a current solution , a neighborhood function is a collection of candidate solutions to replace , afterwards a solver subroutine is evoked to find the optimal solution within . Traditional neighborhood search approaches define explicitly, e.g., the 2opt operation in the traveling salesman problem (Dorigo et al., 2006) and its extension of opt operation (Helsgaun, 2009). LNS defines implicitly through a destroy and a repair method. A destroy method destructs part of the current solution while a repair method rebuilds the destroyed solution. The number of candidate repairments is potentially exponential in the size of the neighborhood, which explains the “large“ in LNS.
In the context of solving IPs, the LNS is also used as a primal heuristics for finding high quality incumbent solutions (Rothberg, 2007, Helber and Sahling, 2010, Hendel, 2018). The ways large neighborhoods are constructed are random (Rothberg, 2007), manually defined (Helber and Sahling, 2010) and bandit algorithm selection from a predefined set (Hendel, 2018). Furthermore, because of the level of decisionmaking, these LNS approaches often require interface access to the underlying solver, which is often undesirable when designing frameworks that offer ease of deployment.
Recently, there has been some work on using learning within LNS (Hottung and Tierney, 2019, Syed et al., 2019). These approaches are designed for specific optimization problems, such as capacitated vehicle routing, and so are not directly comparable with our generic approach for solving IPs. Furthermore, they often focus on learning the underlying solver (rather than rely on existing stateoftheart solvers), which makes them unappealing from a deployment perspective.
3.2 Decompositionbased Large Neighborhood Search for Integer Programs
We now describe the details of our LNS framework. At a high level, our LNS framework operates on an integer program (IP) via defining decompositions of its integer variables into disjoint subsets. Afterwards, we can select a subset and use an existing solver to optimize the variables in that subset while holding all other variables fixed. The benefit of this framework is that it is completely generic to any IP instantiation of any combinatorial optimization problem.
Throughout this paper, we consider minimization of the objective value for all the problems. We first describe a version of LNS for integer programs based on decompositions of their integer variables which is a modified version of the evolutionary approach proposed in Rothberg (2007). The algorithm is outlined in Alg 1.
For an integer program with a set of integer variables (not necessarily all the integer variables), we define a decomposition of the set as a disjoint union . Assume we have an existing feasible solution to , we view each subset of integer variables as a local neighborhood for search. We fix integers in with their values in the current solution and optimize for variable in (referred as the FIXANDOPTIMIZE function in Line 3 of Alg 1). As the resulting optimization is a smaller IP, we can use any offtheshelf IP solver to carry out the local search. In our experiments, we use Gurobi to optimize the subIP. A new solution is obtained and we repeat the process with the remaining subsets.
Decomposition Decision Procedures. Notice that a different decomposition defines a different series of LNS problems and the effectiveness of our approach proceeds with a different decomposition for each iteration. The simplest implementation is to use a random decomposition approach, which we show empirically already delivers very strong performance. We can also consider learningbased approaches that learn a decomposition from training data, discussed further in Section 4.
4 Learning a Decomposition
In this study, we apply datadriven methods, such as reinforcement learning and imitation learning, to learn policies to generate decompositions for the LNS framework described in Section
3.2. We specialize a Markov decision process for our setting. For a combinatorial optimization problem instance
with a set of integer variables , a stateis a vector representing an assignment for variables in
, i.e., it is an incumbent solution. An action at a state is a decomposition of as described in Section 3.2. After running LNS through neighborhoods defined in , we obtain a (new) solution . The reward where is the objective value of when is the solution. We restrict ourselves to finitehorizon task of length so we can set the discount factor to be 1.4.1 Reinforcement Learning
For reinforcement learning, for simplicity, we choose to use REINFORCE (Sutton et al., 2000) which is a classical MonteCarlo policy gradient method for optimizing policies. To goal is to find a policy that maximizes , the expected discounted accumulative reward. The policy is normally parameterized with some . Policy gradient methods seek to optimize by updating in the direction of
By sampling trajectories
, one can estimate the gradient
.4.2 Imitation Learning
In imitation learning, demonstrations (from an expert) serves as the learning signals. However, we do not have the access to an expert to generate good decompositions. To overcome this issue, we generate demonstrations by sampling random decompositions and take the ones resulting in best objectives as demonstrations. This procedure is shown in Alg 2. The core of the algorithm is shown on Lines 712 where we repeatedly sample random decompositions and call the Decompositionbased LNS algorithm (Alg 1) to evaluate them. In the end, we record the decompositions with the best objective values (Lines 1316).
Once we have generated a collection of good decompositions , we apply two imitation learning algorithms. The first one is behavior cloning (Pomerleau, 1989). The main idea is to turn each demonstration trajectory into a collection of stateaction pairs
, then treat policy learning as a supervised learning problem. In our case, the action
is a decomposition which we represent as a vector. Each element of the vector has a label about which subset it belongs to. Thus, we reduce the learning problem to a supervised classification task.Behavior cloning suffers from cascading errors (Ross and Bagnell, 2010). We use the forward training algorithm (Ross and Bagnell, 2010) to correct mistakes made at each step. We adapt the forward training algorithm for our use case and present it in Alg 3 that uses Alg 2 as a subroutine. The main difference with behavior cloning is the adaptive demonstration collection step on Line 4. In this case, we do not collect all demonstrations beforehand, instead, they are collected dependent on the predicted decompositions of previous policies.
4.3 Featurization of an Optimization Problem
For training models, it is necessary to define features that contain enough information for learning. In the following paragraphs, we describe the featurization of two classes of combinatorial optimization problems.
Combinatorial Optimization over Graphs.
The first class of problems are defined explicitly over graphs as those considered in Khalil et al. (2017b). Examples include the minimum vertex cover, the maximum cut and the traveling salesman problems. The (weighted) adjacency matrix of the graph contains all the information to define the optimization problem so we use it as the feature input to a learning model. Notice that for such optimization problems, each vertex in the graph is often associated with an integer variable in its IP formulation.
General Integer Programs.
There are other classes of combinatorial optimization problems that do not originate from explicit graphs. Nevertheless, they can be modeled as integer programs. We construct the following incidence matrix between the integer variables and the constraints. For each integer variable and a constraint , where is the coefficient of the variable in the constraint if it appears in it and 0 otherwise.
Incorporating Current Solution.
As outlined in Section 3.2, we seek to adaptively generate decompositions based on the current solution. Thus we need to include the solution in the featurization. Regardless of which featurization we use, the feature matrix has the same number of rows as the number of integer variables we consider. As a result, we can simply include the variable value in the solution as an additional feature.
5 Emprical Validation
We present experimental results on four diverse applications covering both combinatorial optimization over graphs and general IPs. We discuss the design choices of crucial parameters in Section 5.1, and present the main results in Sections 5.2 & 5.3. Finally, we inspect visualizations to interpret predicted decompositions in Section 5.4, and discuss some limitations in Section 5.5.
5.1 Datasets & Setup
Datasets. We evaluate on 4 NPhard benchmark problems expressed as IPs. The fist two, minimum vertex cover (MVC) and maximum cut (MAXCUT), are graph optimization problems. For each problem, we consider two random graph distributions, the ErdősRényi (ER) (Erdős and Rényi, 1960) and the BarabásiAlbert (BA) (Albert and Barabási, 2002) random graph models. For MVC, we use graphs of size 1000. For MAXCUT, we use graphs of size 500. All the graphs are weighted and each vertex/edge weight is sampled uniformly from [0, 1] for MVC and MAXCUT, respectively. We also apply our method to combinatorial auctions (LeytonBrown et al., 2000) and riskaware path planning (Ono and Williams, 2008), which are not based on graphs. We use the Combinatorial Auction Test Suite (CATS) (LeytonBrown et al., 2000) to generate auction instances from two distributions: regions and arbitrary. For each distribution, we consider two sizes: 2000 items with 4000 bids and 4000 items with 8000 bids. For the riskaware path planning experiment, we use a custom generator to generate obstacle maps with 30 obstacles and 40 obstacles.
Learning a Decomposition. When learning the decomposition procedure we use 100 instances for training, 10 for validation and 50 for testing. When using reinforcement learning, we sample 5 trajectories for each problem to estimate the policy gradient. For imitation learning based algorithms, we sample 5 random decompositions and use the best one as demonstrations. All our experiment results are averaged over 5 random seeds.
1  2  3  

2  
3  
4  
5 
Initialization. To run large neighborhood search, we require an initial feasible solution (typically quite far from optimal). For MVC, MAXCUT and CATS, we initialize a feasible solution by including all vertices in the cover set, assigning all vertices in one set and accepting no bids, respectively. For riskaware path planning, we initialize a feasible solution by running Gurobi for 3 seconds. This time is included when we compare wallclock time with Gurobi.
Hyperparameter Configuration. We must set two parameters in order to run the our LNS approach. The first one is , the number of equally sized subsets to divide variables into. The second is how long we run the solver on each subproblem. Each subIP could still be fairly large so solving them to optimality can take a long time, so we impose a time limit. We run a parameter sweep over the number of decompositions from 2 to 5 and time limit for subIP from 1 second to 3 seconds. For each configuration of , the wallclock time for one iteration of LNS will be different. For a fair comparison, we use the ratio where is the objective value improvement and is the time spent as the selection criterion for the optimal configuration. Table 1 contains the result for one MVC dataset. As shown, the configuration makes a big difference on the performance. For this case, is the best setting and we will use them for our experiments on this particular dataset. We perform the procedure for every dataset. See Appendix for similar tables for other datasets.
5.2 Benchmark Comparisons with Gurobi
We now present our main benchmark evaluations. We instantiate our framework in four ways:

RandomLNS: using random decompositions

BCLNS: using a decomposition policy trained using behavior cloning

FTLNS: using a decomposition policy trained using forward training

RLLNS: using a decomposition policy trained using REINFORCE
We use Gurobi 9.0 as the underlying solver. For learned LNS methods, we generate 10 decompositions in sequence by the model and apply LNS with these decompositions. We use the same time limit setting for running each subproblem, as a result, the wallclock among decomposition methods are very close.
When comparing using just Gurobi, we limit Gurobi’s runtime to be the longest runtime across all instances from our LNS methods. In other words, Gurobi’s runtime is longer than all the decompostion based methods, which gives more time to find the best solution possible.
Main Results. Tables 2, 3 and 4 show the results across the different benchmarks. We make two observations:

All LNS variants significantly outperform Gurobi (up to improvement in objectives), given the same amount or less wallclock time. Perhaps surprisingly, this phenomenon holds true even for RandomLNS.

The imitation learning based variants, FTLNS and BCLNS, outperform RandomLNS and RLLNS in most cases.
Overall, these results suggest that our LNS approach can reliably offer substantial improvements over stateoftheart solvers such as Gurobi. These results also suggest that one can use learning to automatically design strong decomposition approaches, and we provide a preliminary qualitative study of what the policy has learned in Section 5.4. It is possible that a more sophisticated RL method could further improve RLLNS.
PerIteration Comparison. We use a total of 10 iterations of LNS, and it is natural to ask how the solution quality changes after each iteration. Fig 1 shows objective value progressions of variants of our LNS approach on three datasets. For the two combinatorial auction datasets, BCLNS and FTLNS achieve substantial performance gains over RandomLNS after just 2 iterations of LNS, while it takes about 4 for the riskaware path planning setting. These results show that learning a decomposition method for LNS can establish early advantages over using random decompositions.
Running Time Comparison. Our primary benchmark comparison limited all methods to roughly the same time limit. We now investigate how the objective values improve over time. Figure 2 shows four representative instances. We see that BCLNS achieves the best performance profile of solution quality vs. wallclock.
How Long Does Gurobi Need? Figure 2 also allows us to compare with the performance profile of Gurobi In all cases, LNS methods find better objective values than Gurobi early on and maintain this advantage even as Gurobi spends significantly more time. Most notably, in Figure 1(d), Gurobi was given 2 hours of wallclock time, and failed to match the solution found by RandomLNS in just under 5 seconds (the time axis is in log scale).
MVC BA 1000  MVC ER 1000  MAXCUT BA 500  MAXCUT ER 500  

Gurobi  
RandomLNS  
BCLNS  
FTLNS  
RLLNS 
CATS Regions 2000  CATS Regions 4000  CATS Arbitrary 2000  CATS Arbitrary 4000  

Gurobi  
RandomLNS  
BCLNS  
FTLNS 
30 Obstacles  40 Obstacles  

Gurobi  
RandomLNS  
BCLNS  
FTLNS 
5.3 Comparison with DomainSpecific Heuristics
We also compare with strong domainspecific heuristics for three classes of problems: MVC, MAXCUT and CATS. We do not compare in the riskaware path planning domain, as there are no readily available heuristics. Overall, we find that our LNS methods are competitive with specially designed heuristics, and can sometimes substantially outperform them. These results provide evidence that our LNS approach is a promising direction for the automated design of solvers that avoids the need to carefully integrate domain knowledge while achieving competitive or stateoftheart performance.
MVC BA 1000  MVC ER 1000  

Localratio  
BestLNS 
For MVC, we compare with a 2OPT heuristic based on localratio approximation (BarYehuda and Even, 1983). Table 5 summarizes the results. The best LNS result outperforms by on BA graphs and on ER graphs.
For MAXCUT, we compare with 3 heuristics. The first is the greedy algorithm that iteratively moves vertices from one cut set to the other based on whether such a movement can increase the total edge weights. The second, proposed in Burer et al. (2002), is based on a ranktwo relaxation of an SDP. The third is from de Sousa et al. (2013). The results are presented in Table 6. The SDPbased heuristic performs best for both random graph distributions, which shows that a specially designed heuristic can still outperform a general IP solver.
For CATS, we consider 2 heuristics. The first is greedy: at each step, we accept the highest bid among the remaining bids, remove its desired items and eliminate other bids that desire any of the removed items. The second is based on LP rounding: we first solve the LP relaxation of the IP formulation of a combinatorial auction problem, and tThen we move from the bid having the largest fractional value in the LP solution down and remove items/bids in the same manner. As shown in Table 7, Our LNS approach outperforms both methods by up to in objective values.
MAXCUT BA 500  MAXCUT ER 500  

Greedy  
Burer  
De Sousa  
BestLNS 
CATS Regions 2000  CATS Regions 4000  CATS Arbitrary 2000  CATS Arbitrary 4000  

Greedy  
LP Rounding  
BestLNS 
5.4 Visualization
A natural question is what property a good decomposition has. Here we provide one interpretation for the riskaware path planning. We use a slightly smaller instance with 20 obstacles for a clearer view. Binary variables in an IP formulation of this problem model relationships between obstacles and waypoints. Thus we can interpret the neighborhood formed by a subset of binary variables as attention over specific relationships among some obstacles and waypoints.
Figure 3 captures 4 consecutive iterations of LNS with large solution improvements. Each subfigure contains information about the locations of obstacles (light blue squares) and the waypoint locations after the current iteration of LNS. We highlight a subset of 5 obstalces (red circles) and 5 waypoints (dark blue squares) that appear most frequently in the first neighborhood of the current decomposition. Qualitatively, the top 5 obstacles define some important junctions for waypoint updates. For waypoint updates, the highlighted ones tend to have large changes between iterations. Thus, a good decomposition focuses on important decision regions and allows for large updates in these regions.
5.5 Limitations
While we have presented positive results on our proposed general LNS method, limitations exist. A major one is that neighborhoods need to be large for LNS to work well. For example, if we compare RandomLNS with Gurobi under the same wallclock time limit on a combinatorial auction dataset with only 500 items and 1000 bids. Gurobi outperforms RandomLNS on the objective value for about (). This means that LNS is most effective in dealing with large scale problem instances, not necessarily for small ones. As a result, it is important to evaluate whether a problem is of the scale where the proposed LNS method is useful.
6 Conclusion & Future Work
We have presented a general large neighborhood search framework for solving integer programs. Our extensive benchmarks show the proposed method consistently outperform Gurobi in wallclock time across diverse applications. Our method is also competitive with strong heuristics in specific problem domains.
We believe our current research has many exciting future directions that connect different aspects of the learning to optimize research. Our framework relies on a good solver thus the progress from the learning to search community can lead to better LNS results. We have briefly mentioned a version of the algorithm configuration problem we face in Section 5.1, but the full version that adaptively chooses the number of subsets and time limit is surely more powerful. Algorithm configuration can also help optimize the hyperparameters of a solver for solving each subproblem. Finally, our approach is closely tied to the effort to identify substructures in optimization problems. A deeper understanding of them can inform the design our datadriven methods and define new variants of the learning problem within the proposed framework, e.g., learning to identify backdoor variables.
References
 Presolve reductions in mixed integer programming. INFORMS Journal on Computing. Cited by: §1.
 SCIP: solving constraint integer programs. Mathematical Programming Computation. Cited by: §2.1.
 A survey of very largescale neighborhood search techniques. Discrete Applied Mathematics 123 (13), pp. 75–102. Cited by: §3.
 Statistical mechanics of complex networks. Reviews of modern physics 74 (1), pp. 47. Cited by: §5.1.

Modelbased genetic algorithms for algorithm configuration
. In AISTATS, Cited by: §2.2.  Learning to branch. In ICML, Cited by: §2.2.
 A localratio theorm for approximating the weighted vertex cover problem. Technical report Computer Science Department, Technion. Cited by: §5.3.
 Systems and software verification: modelchecking techniques and tools. Cited by: §1.
 Primal heuristics for mixed integer programs. Cited by: §1.
 Solving mixedinteger quadratic programming problems with ibmcplex: a progress report. In RAMP, Cited by: §2.1.
 Ranktwo relaxation heuristics for maxcut and other binary quadratic programs. SIAM Journal on Optimization 12 (2), pp. 503–521. Cited by: §5.3.
 Learning to plan via neural explorationexploitation trees. In ICLR, Cited by: §2.1.
 Combinatorial search of thermoelastic shapememory alloys with extremely small hysteresis width. Nature materials. Cited by: §1.
 Z3: an efficient smt solver. In TACAS, Cited by: §1.

Estimation of distribution algorithm for the maxcut problem.
In
International Workshop on GraphBased Representations in Pattern Recognition
, pp. 244–253. Cited by: §5.3.  Combinatorial auctions: a survey. INFORMS Journal on computing. Cited by: §1.
 Backdoors to combinatorial optimization: feasibility and optimality. In CPAIOR, Cited by: §1, §2.3.
 Backdoors in the context of learning. In SAT, Cited by: §2.3.
 Ant colony optimization. IEEE computational intelligence magazine 1 (4), pp. 28–39. Cited by: §3.1.
 On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. Cited by: §5.1.
 Heuristics in mixed integer programming. Wiley Encyclopedia of Operations Research and Management Science. Cited by: §1.

Exact combinatorial optimization with graph convolutional neural networks
. In NeurIPS, Cited by: §1, §2.1.  Learning to search in branch and bound algorithms. In NeurIPS, Cited by: §1, §2.1.
 A fixandoptimize approach for the multilevel capacitated lot sizing problem. International Journal of Production Economics 123 (2), pp. 247–256. Cited by: §3.1.
 General kopt submoves for the lin–kernighan tsp heuristic. Mathematical Programming Computation 1 (23), pp. 119–163. Cited by: §3.1.
 Adaptive large neighborhood search for mixed integer programming. Cited by: §3.1.
 Automated algorithm configuration and parameter tuning. In Autonomous search, pp. 37–71. Cited by: §2.2.
 Neural large neighborhood search for the capacitated vehicle routing problem. arXiv:1911.09539. Cited by: §3.1.
 Sequential modelbased optimization for general algorithm configuration. In LION, Cited by: §2.2.
 Learning to run heuristics in tree search. In IJCAI, Cited by: §1, §2.1.
 Learning to branch in mixed integer programming. In AAAI, Cited by: §1, §2.1, §2.1.
 Learning combinatorial optimization algorithms over graphs. In NeurIPS, Cited by: §2.1, §4.3.

Semiamortized variational autoencoders
. In ICML, Cited by: §2.3.  Procrastinating with confidence: nearoptimal, anytime, adaptive algorithm configuration. In NeurIPS, Cited by: §2.2.
 An automatic method for solving discrete programming problems. In 50 Years of Integer Programming 19582008, pp. 105–132. Cited by: §1, §2.1.
 Towards a universal test suite for combinatorial auction algorithms. In EC, Cited by: §5.1.
 Toward automatic program synthesis. Communications of the ACM 14 (3), pp. 151–165. Cited by: §1.
 Device placement optimization with reinforcement learning. In ICML, Cited by: §1.
 Latest benchmarks of optimization software. Cited by: §1, §2.1.

An efficient motion planning algorithm for stochastic dynamic systems with constraints on probability of failure
. In AAAI, Cited by: §1, §5.1.  Gurobi 8 performance benchmarks. Note: https://www.gurobi.com/pdfs/benchmarks.pdf Cited by: §1, §2.1.
 Large neighborhood search. Handbook of metaheuristics. Cited by: §3.
 Alvinn: an autonomous land vehicle in a neural network. In NeurIPS, Cited by: §4.2.
 Efficient reductions for imitation learning. In AISTATS, Cited by: §4.2.

An evolutionary algorithm for polishing mixed integer programming solutions
. INFORMS Journal on Computing 19 (4), pp. 534–541. Cited by: §3.1, §3.2.  Using constraint programming and local search methods to solve vehicle routing problems. In CP, Cited by: §3.
 Cotraining for policy learning. In UAI, Cited by: §1, §2.1.
 Learning to search via retrospective imitation. arXiv:1804.00846. Cited by: §1, §2.1, §2.1.
 Policy gradient methods for reinforcement learning with function approximation. In NeurIPS, Cited by: §4.1.
 Neural network based large neighborhood search algorithm for ride hailing services. In EPIA, Cited by: §3.1.

MAP estimation via agreement on trees: messagepassing and linear programming
. IEEE transactions on information theory. Cited by: §1.  The quest for efficient boolean satisfiability solvers. In CAV, Cited by: §1.
Appendix A Appendix
a.1 More Algorithm Configuration Results
In this section, we present the more algorithm configuration results similar to the one in Section 5.1.
1  2  3  

2  
3  
4  
5 
1  2  3  

2  
3  
4  
5 
1  2  3  

2  
3  
4  
5 
1  2  3  

2  
3  
4  
5 
1  2  3  

2  
3  
4  
5 
1  2  3  

2  
3  
4  
5 