1 Introduction
Neural networks (NNs) have significantly improved the ability of autonomous systems to learn and make decisions for complex tasks such as image recognition [10], speech recognition [4]
, and natural language processing
[3]. As a result of this success, formal methods based on representing the decision making problem with NNs as a mathematical programming model, such as verification of NNs [8, 13] and optimal planning with respect to the learned NNs [17] have been studied.In the area of learning and planning, Hybrid Deep MILP Planning [17] (HDMILPPlan) has introduced a twostage datadriven framework that i) learns transitions models with continuous action and state spaces using NNs, and ii) plans optimally with respect to the learned NNs using a mixedinteger linear programming (MILP) model. It has been experimentally shown that optimal planning with respect to the learned NNs [17] presents a challenging task for branchandbound (B&B) solvers [7] due to the poor linear relaxation of the underlying MILP model that has a large number of bigM constraints.
In this paper, we focus on the important problem of improving the efficiency of MILP models for decision making with learned NNs. In order to tackle this challenging problem, we build on potential heuristics [14, 18], which provide an efficient framework for computing a lower bound on the cost of a given state as a function of its features. In this work, we describe the problem of finding an optimal potential bounds for learned NN models with continuous inputs and outputs (i.e., continuous action and state spaces) as a bilevel program, and solve it using a novel finitetime constraint generation algorithm. Features of our linear potential heuristic are defined over the hidden units of the learned NN model, thus providing a rich and expressive candidate feature space. We use our constraint generation algorithm to compute the potential contribution (i.e., reward potential) of each hidden unit to the reward function of the HDMILPPlan problem. The precomputed reward potentials are then used to construct linear constraints that bound the reward function of HDMILPPlan, and provide a tighter linear relaxation for B&B optimization by exploring smaller number of nodes in the search tree.
Experimentally, we show that our constraint generation algorithm efficiently computes reward potentials for learned NNs, and the overhead computation is justified by the overall strengthening of the underlying MILP model for the task of planning over long horizons. Overall this work bridges the gap between two seemingly distant literatures – planning heuristics for discrete spaces and decision making with learned NN models in continuous action and state spaces. Specifically, we show that datadriven NN models for planning can benefit from advances in heuristics and their impact on the efficiency of search in B&B optimization.
2 Preliminaries
We review the HDMILPPlan framework for optimal planning [17] with learned NN models, potential heuristics [14] as well as bilevel programming [1].
2.1 Deterministic Factored Planning Problem Definition
A deterministic factored planning problem is a tuple where and are sets of state and action variables with continuous domains, is a function that returns true if action and state variables satisfy global constraints, denotes the stationary transition function, and is the reward function. Finally, represents the initial state constraints, and represents the goal constraints. For horizon , a solution to problem (i.e. a plan for ) is a value assignment to the action variables with values for all time steps (and state variables with values for all time steps ) such that and for all time steps , and the initial and goal state constraints are satisfied, i.e. and , where denotes the value of variable at time step . Similarly, an optimal solution to is a plan such that the total reward is maximized. For notational simplicity, we denote as for some function and sets and . Given the notations and the description of the planning problem, we next describe a datadriven planning framework using learned NNs.
2.2 Planning with Neural Network Learned Transition Models
Hybrid Deep MILP Planning [17] (HDMILPPlan) is a twostage datadriven framework for learning and solving planning problems. Given samples of state transition data, the first stage of the HDMILPPlan process learns the transition function
using a NN with Rectified Linear Units (ReLUs)
[12] and linear activation units. In the second stage, the learned transition function is used to construct the learned planning problem . As shown in Figure 1, the learned transition function is sequentially chained over the horizon , and compiled into a MILP. Next, we review the MILP compilation of HDMILPPlan.2.3 MixedInteger Linear Programming Compilation of HDMILPPlan
We begin with all notation necessary for the HDMILPPlan.
2.3.1 Parameters

is the set of ReLUs in the neural network.

is the set of bias units in the neural network.

is the set of output units in the neural network.

denotes the learned weight of the neural network between units and .

is the set of action variables connected as inputs to unit .

is the set of state variables connected as inputs to unit .

is the set of ReLUs connected as inputs to unit .

specifies the output unit that predicts the value of state variable .

is a large constant used in the bigM constraints.
2.3.2 Decision Variables

is a decision variable with continuous domain denoting the value of action variable at time step .

is a decision variable with continuous domain denoting the value of state variable at time step .

if ReLU is activated at time step , 0 otherwise (i.e., is a Boolean decision variable).
2.3.3 MILP Compilation
(1)  
subject to  
(2)  
(3)  
(4)  
(5)  
(6)  
(7)  
(8)  
(9) 
for all time steps except constraints (2)(4) where expression denotes the total weighted input of ReLU at time step , and is equivalent to .
In the above MILP, the objective function (1) maximizes the sum of rewards over a given horizon . Constraints (24) ensure the initial state, global and goal state constraints are satisfied. Constraints (59) model the learned transition function . Note that while constraints (59) are sufficient to encode the piecewise linear activation behaviour of ReLUs, the use of bigM constraints (67) can hinder the overall performance of the underlying B&B solvers that rely on the linear relaxation of the MILP. Therefore next, we turn to potential heuristics that will be used to strengthen the MILP compilation of HDMILPPlan.
2.4 Potential Heuristics
Potential heuristics [14, 18] are a family of heuristics that map a set of features to their numerical potentials. In the context of costoptimal classical planning, the heuristic value of a state is defined as the sum of potentials for all the features that are true in that state. Potential heuristics provide an efficient method for computing a lower bound on the cost of a given state.
In this paper, we introduce an alternative use of potential functions to tighten the linear relaxation of ReLU units in our HDMILPPlan compilation and improve the search efficiency of the underlying B&B solver. We define the features of the learned NN over its set of hidden units (i.e., gray circles in Figure 1), and compute the potential contribution (i.e., reward potential) of each hidden unit to the reward function for any time step . These reward potentials are then used to introduce additional constraints on ReLU activations that help guide B&B search in HDMILPPlan. Specifically, we are interested in finding a set of reward potentials, denoted as and representing the activation (i.e., ) and the deactivation (i.e., ) of ReLUs , such that the relation holds for all feasible values of , and at any time step . Once and are computed, we will add as a linear constraint to strengthen HDMILPPlan. Next we describe bilevel programming that we use to model the problem of finding optimal reward potentials.
2.5 Bilevel Programming
Bilevel programming [1] is an optimization framework for modeling twolevel asymetrical decision making problems with a leader and a follower problem where the leader has the complete knowledge of the follower, and the follower only observes the decisions of the leader to make an optimal decision. Therefore, the leader must incorporate the optimal decision of the follower to optimize its objective.
In this work, we use bilevel programming to model the the problem of finding the optimal reward potentials. In the bilevel programming description of the optimal reward potentials problem, the leader selects the optimal values of reward potentials and , and the follower selects the values of , and such that the expression is maximized. That is, the follower tries to find values of , and that violate the relation as much as possible. Therefore the leader must select the values of reward potentials and by incorporating the optimal decision making model of the follower. Given the preliminaries, we now describe the reward potentials for learned NNs.
3 Reward Potentials for Learned Neural Networks
In this section, we present the optimal reward potentials problem and an efficient constraint generation framework for computing reward potentials for learned NNs.
3.1 Optimal Reward Potentials Problem
The problem of finding the optimal reward potentials over a set of ReLUs for any time step can be defined as the following bilevel optimization problem.
3.1.1 Leader Problem
(10)  
subject to  
(11)  
3.1.2 Follower Problem
In the above bilevel problem, the leader problem selects the values of the reward potentials and such that their total sum is minimized (i.e., objective function (10)), and their total weighted sum for all ReLU activations is an upper bound to all values of the reward function (i.e., constraint (11) and the follower problem). Given the values of the reward potentials and , the follower selects the values of decision variables , , and such that the difference between the value of the reward function and the sum of reward potentials is maximized subject to constraints (3) and (59). Next, we show the correctness of the optimal reward potentials problem as the bilevel program described by the leader (i.e., objective function (10) and constraint (11)) and the follower (i.e., objective function (12) and constraints (3) and (59)) problems.
Theorem 3.1 (Correctness of The Optimal Reward Potentials Problem)
Proof (by Contradiction)
Let and denote the values of reward potentials selected by the leader problem that violate the relation for some values of , and , implying . However, the feasibility of constraint (11) implies that the value of the objective function (12) must be nonpositive (i.e., the follower problem is not solved to optimality), which yields the desired contradiction.
3.2 Constraint Generation for Computing Reward Potentials
The optimal reward potentials problem can be solved efficiently through the following constraint generation framework that decomposes the problem into a master problem and a subproblem. The master problem finds the values of ReLU potentials and . The subproblem finds the values of ReLU variables that violate constraint (11) the most for given and , and also finds the maximum value of reward function for given which is denoted as . Intuitively, the master problem selects the values of ReLU potentials and that are checked by the subproblem for the validity of the relation for all feasible values of , and at any time step . If a violation is found, a linear constraint corresponding to given and is added back to the master problem and the procedure is repeated until no violation is found by the subproblem.
3.2.1 Subproblem :
For a complete value assignment to ReLU potential variables and , the subproblem optimizes the violation (i.e., objective function (12)) with respect to constraints (3) and (59) as follows.
We denote the optimal values of ReLU variables , found by solving the subproblem as , and the value of the reward function found by solving the subproblem as . Further, we refer to subproblem as .
3.2.2 Master problem :
Given the set of complete value assignments to ReLU variables and optimal objective values for all , the master problem optimizes the regularized ^{1}^{1}1The squared terms penalize arbitrarily large values of potentials to avoid numerical issues. A similar numerical issue has been found in the computation of potential heuristics for costoptimal classical planning problems with deadends [18]. sum of reward potentials (i.e., regularized objective function (10)) with respect to the modified version of constraint (11) as follows.
(14)  
subject to  
(15) 
We denote the optimal values of ReLU potentials and , found by solving the master problem as and , respectively. Further, we refer to master problem as .
3.2.3 Reward Potentials Algorithm
Given the definitions of the master problem and the subproblem , the constraint generation algorithm for computing an optimal reward potential is outlined as follows.
Given constraints (3) and (59) are feasible, Algorithm 1 iteratively computes reward potentials and (i.e., line 3), and first checks if there exists an activation pattern, that is a complete value assignment to ReLU variables , that violates constraint (11) (i.e., lines 4 and 5), and then returns the optimal reward value for the violating activation pattern. Given the optimal reward value for the violating activation pattern, constraint (15) is updated (i.e., lines 67). Since there are finite number of activation patterns and solving gives the maximum value of for each pattern , the Reward Potentials Algorithm 1 terminates in at most iterations with an optimal reward potential for the learned NN.
3.2.4 Increasing the Granularity of the Reward Potentials Algorithm
The feature space of Algorithm 1 can be enhanced to include information on each ReLUs input and/or output. Instead of computing reward potentials for only the activation and deactivation of ReLU , we (i) introduce an interval parameter to split the output range of each ReLU into equal size intervals, (ii) introduce auxiliary Boolean decision variables to represent the activation interval of ReLU such that if and only if the output of ReLU is within interval , and otherwise, and (iii) compute reward potentials for each activation interval and deactivation of ReLU .
3.3 Strengthening HDMILPPlan
Given optimal reward potentials and , the MILP compilation of HDMILPPlan is strengthened through the addition of following constraints:
(16)  
(17)  
(18) 
for all time steps where denotes the upperbound obtained from performing forward reachability on the output of each ReLU in the learned NN. Next, we present our experimental results to demonstrate the efficiency and the utility of computing reward potential and strengthening HDMILPPlan.
4 Experimental Results
In this section, we present computational results on (i) the convergence of Algorithm 1, and (ii) the overall strengthening of HDMILPPlan with the addition of constraints (1618) for the task of planning over long horizons. First, we present results on the overall efficiency of Algorithm 1 and the strengthening of HDMILPPlan over multiple learned planning instances. Then, we focus on the most computationally expensive domain identified by our experiments to further investigate the convergence behaviour of Algorithm 1 and the overall strengthening of HDMILPPlan as a function of time.
4.1 Experimental Setup
The experiments were run on a MacBookPro with 2.8 GHz Intel Core i7 16GB memory. All instances from the HDMILPPlan paper [17], namely Navigation, Reservoir Control and HVAC [17], were selected. Both domain instance sizes and their respective learned NN sizes are detailed in Table 1. The range bounds on action variables for Navigation domains were constrained to . CPLEX 12.9.0 [7] solver was used to optimize both Algorithm 1, and HDMILPPLan, with 6000 seconds of total time limit per domain instance. In our experiments, we show results for the values of interval parameter ^{2}^{2}2The preliminary experimental results for interval parameter have not shown significant improvements over the base encoding of HDMILPPlan.. Finally in the master problem, we have chosen the regularizer constant in the objective function (9) to be where is the large constant used in the bigM constraints of HDMILPPlan (i.e., constraints (67)).
Domain Instance  Network Structure  Horizon 

Navigation (8by8 maze)  4:32:32:2  100 
Navigation (10by10 maze)  4:32:32:2  100 
Reservoir Control (3 reservoirs)  6:32:3  500 
Reservoir Control (4 reservoirs)  8:32:4  500 
HVAC (3 rooms)  6:32:3  100 
HVAC (6 rooms)  12:32:6  100 
4.2 Overall Results
In this section, we present the experimental results on (i) the computation of the optimal reward potentials using Algorithm 1, (ii) and the performance of HDMILPPlan with the addition of constraints (1618) over multiple learned planning instances over long horizons. Table 2 summarizes the computational results and highlights the best performing HDMILPPlan settings for each learned planning instance.
Domain Setting  Alg. 1  Cumul.  Primal  Dual  Open  Closed 

Nav,8,100,Base    6000    261.4408  16536  27622 
Nav,8,100,N=2  345  6000    267.1878  6268  15214 
Nav,8,100,N=3  1150  6000    267.056  6189  12225 
Nav,10,100,Base    6000    340.5974  17968  35176 
Nav,10,100,N=2  800  6000    340.6856  14435  27651 
Nav,10,100,N=3  1700  6000    339.8124  2593  7406 
HVAC,3,100,Base    260.21  Opt. found  Opt. proved  0  289529 
HVAC,3,100,N=2  7  88.21  Opt. found  Opt. proved  0  2501 
HVAC,3,100,N=3  9  194.44  Opt. found  Opt. proved  0  10891 
HVAC,6,100,Base    6000  1214369.086  1213152.304  618687  648207 
HVAC,6,100,N=2  8  6000  1214365.427  1213199.787  554158  567412 
HVAC,6,100,N=3  10  6000  1214364.704  1213025.189  1011348  1021637 
Res,3,500,Base    33.01  Opt. found  Opt. proved  0  1 
Res,3,500,N=2  1  99.81  Opt. found  Opt. proved  0  714 
Res,3,500,N=3  2  90.27  Opt. found  Opt. proved  0  674 
Res,4,500,Base    300.71  Opt. found  Opt. proved  0  1236 
Res,4,500,N=2  7  109.66  Opt. found  Opt. proved  0  1924 
Res,4,500,N=3  6  232.19  Opt. found  Opt. proved  0  1294 
The first column of Table 2 identifies the domain setting of each row. The second column denotes the runtime of Algorithm 1 in seconds. The remaining columns provide information on the performance of HDMILPPlan. Specifically, the third column (i.e., Primal) denotes the value of the incumbent plan found by HDMILPPlan, the fourth column (i.e., Dual) denotes the value of the duality bound found by HDMILPPlan, the fifth and sixth columns (i.e., Open and Closed) denote the number of open and closed nodes in the B&B tree respectively, and the last column (i.e., Cumul.) denotes the cumulative runtime of Algorithm 1 and HDMILPPlan in seconds. The bolded values indicate the best performing HDMILPPlan settings for each learned planning instance where the performance of each setting is evaluated first based on the runtime performance (i.e., Cumul. column), followed by the quality of incumbent plan (i.e., Primal column) and duality bound (i.e., Dual column) obtained by HDMILPPlan.
In total of five out of six instances, we observe that strengthened HDMILPPlan with interval parameter performed the best. The pairwise comparison of the base HDMILPPlan and strengthened HDMILPPlan with interval parameter shows that in almost all instances, the strengthened model performed better in comparison to the base model. The only instance in which the base model significantly outperformed the other two was the Reservoir Control domain with three reservoirs where the B&B solver was able to find an optimal plan in the root node. Overall, we found that especially in the instances where the optimality was hard to prove within the runtime limit of 6000 seconds (i.e., all Navigation instances and HVAC domain with 6 rooms), strengthened HDMILPPlan explored signigicantly less number of nodes in general while obtaining either higher quality incumbent plans or lower dual bounds. We observe that Algorithm 1 terminated with optimal reward potentials in less than 10 seconds in both Reservoir Control and HVAC domains, and took as much as 1700 seconds in Navigation domain – highlighting the effect of NN size and complexity (i.e., detailed in Table 1) on the runtime of Algorithm 1. As a result, next we focus on the most computationally expensive domain identified by our experiments, namely Navigation, to get a better understanding on the convergence behaviour of Algorithm 1 and the overall efficiency of HDMILPPlan as a function of time.
4.3 Detailed Convergence Results on Navigation Domain
In this section, we inspect the convergence of Algorithm 1 in the Navigation domain for computing an optimal reward potential for the learned NNs.
Figure 2 visualizes the violation of constraint (11) as a function of time over the computation of optimal reward potentials using the Reward Potentials Algorithm 1 for the learned NNs of both Navigation 8by8 (i.e., top) and Navigation 10by10 (i.e., bottom) planning instances. In both, we observe that the violation of constraint (11) decreases exponentially as a function of time, showcasing a longtail runtime behaviour and terminates with optimal reward potentials.
4.4 Detailed Strengthening Results on Navigation Domain
Next, we inspect the overall strengthening of HDMILPPlan with respect to its underlying linear relaxation and search efficiency as a result of constraints (1618), for the task of planning over long horizons in the Navigation Domain.
Figures 3 and 4 visualize the overall effect of incorporating constraints (1618) into HDMILPPlan as a function of time for the Navigation domain with (a) 8by8 and (b) 10by10 maze sizes. In both Figures 3 and 4, linear relaxation (i.e. top), number of closed nodes (i.e., middle), and number open nodes (i.e., bottom), are displayed as a function of time. The inspection of both Figures 3 and 4 show that once the reward potentials are computed, the addition of constraints (1618) allows HDMILPPlan to obtain a tighter bound by exploring signigicantly less number of nodes. In the 8by8 maze instance, we observe that HDMILPPlan with constraint (16) outperforms the base HDMILPPlan by 1700 and 3300 seconds with interval parameter , respectively. In the 10by10 maze instance, we observe that HDMILPPlan with constraints (1618) obtains a tighter bound compared to the base HDMILPPlan by 3750 seconds and almost reaches the same bound by the time limit (i.e., 6000 seconds) with interval parameter , respectively.
The inspection of the top subfigures in Figures 3 and 4 shows that increasing the value of the interval parameter increases the computation time of Algorithm 1, but can also increase the search efficiency of the underlying B&B solver through increasing its exploration and pruning capabilities, as demonstrated by the middle and bottom subfigures in Figures 3 and 4. Overall from both instances, we conclude that HDMILPPlan with constraints (1618) obtains a linear relaxation that is at least as good as the base HDMILPPlan by exploring significantly less number of nodes in the B&B search tree.
5 Related Work
In this paper, we have focused on the important problem of improving the efficiency of B&B solvers for optimal planning with learned NN transition models in continuous action and state spaces. Parallel to this work, planning and decision making in discrete action and state spaces [11, 16, 15], verification of learned NNs [8, 5, 6, 13], robustness evaluation of learned NNs [19] and defenses to adversarial attacks for learned NNs [9] have been studied with the focus of solving very similar decision making problems. For example, the verification problem solved by Reluplex [8] ^{3}^{3}3Reluplex [8] is a SMTbased learned NN verification software. is very similar to the planning problem solved by HDMILPPlan [17] without the objective function and horizon . Interestingly, the verification problem can also be modeled as an optimization problem [2] and potentially benefit from the findings presented in this paper. For future work, we plan to explore how our findings in this work translate to solving other important tasks for learned neural networks.
6 Conclusion
In this paper, we have focused on the problem of improving the linear relaxation and the search efficiency of MILP models for decision making with learned NNs. In order to tacke this problem, we used bilevel programming to correctly model the optimal reward potentials problem. We then introduced a novel finitetime constraint generation algorithm for computing the potential contribution of each hidden unit to the reward function of the planning problem. Given the precomputed values of the reward potentials, we have introduced constraints to tighten the bound on the reward function of the planning problem. Experimentally, we have shown that our constraint generation algorithm efficiently computes reward potentials for learned NNs, and the overhead computation is justified by the overall strengthening of the underlying MILP model as demonstrated on the task of planning over long horizons. With this paper, we have shown the potential of bridging the gap between two seemingly distant literatures; heuristics search and decision making with learned NN models in continuous action and state spaces.
References
 [1] Bard, J.: Practical Bilevel Optimization: Algorithms And Applications. Springer US (09 2000). https://doi.org/10.1007/9781475728361
 [2] Bunel, R., Turkaslan, I., Torr, P.H., Kohli, P., Kumar, M.P.: A unified view of piecewise linear neural network verification (2017)

[3]
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. Journal of Machine Learning Research
12, 2493–2537 (2011)  [4] Deng, L., Hinton, G.E., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 8599–8603 (2013)

[5]
Ehlers, R.: Formal verification of piecewise linear feedforward neural networks. In: D’Souza, D., Narayan Kumar, K. (eds.) Automated Technology for Verification and Analysis. pp. 269–286. Springer International Publishing, Cham (2017)
 [6] Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) Computer Aided Verification. pp. 3–29. Springer International Publishing, Cham (2017)
 [7] IBM: IBM ILOG CPLEX Optimization Studio CPLEX User’s Manual (2019)
 [8] Katz, G., Barrett, C., Dill, D., Julian, K., Kochenderfer, M.: Reluplex: An efficient smt solver for verifying deep neural networks. In: TwentyNinth International Conference on Computer Aided Verification. CAV (2017)
 [9] Kolter, Zico, W., Eric: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: ThirtyFirst Conference on Neural Information Processing Systems (2017)

[10]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: TwentyFifth Neural Information Processing Systems. pp. 1097–1105 (2012),
http://dl.acm.org/citation.cfm?id=2999134.2999257  [11] Lombardi, M., Gualandi, S.: A lagrangian propagator for artificial neural networks in constraint programming. vol. 21, pp. 435–462 (Oct 2016). https://doi.org/10.1007/s1060101592346, https://doi.org/10.1007/s1060101592346

[12]
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: TwentySeventh International Conference on Machine Learning. pp. 807–814 (2010),
http://www.icml2010.org/papers/432.pdf 
[13]
Narodytska, N., Kasiviswanathan, S., Ryzhyk, L., Sagiv, M., Walsh, T.: Verifying properties of binarized deep neural networks. In: ThirtySecond AAAI Conference on Artificial Intelligence. pp. 6615–6624 (2018)
 [14] Pommerening, F., Helmert, M., R¨oger, G., Seipp, J.: From nonnegative to general operator cost partitioning. In: TwentyNinth AAAI Conference on Artificial Intelligence. pp. 3335–3341 (2015)
 [15] Say, B., Sanner, S.: Compact and efficient encodings for planning in factored state and action spaces with learned binarized neural network transition models (2018)
 [16] Say, B., Sanner, S.: Planning in factored state and action spaces with learned binarized neural network transition models. In: TwentySeventh International Joint Conference on Artificial Intelligence. pp. 4815–4821 (2018). https://doi.org/10.24963/ijcai.2018/669, https://doi.org/10.24963/ijcai.2018/669
 [17] Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixedinteger linear programming. In: TwentySixth International Joint Conference on Artificial Intelligence. pp. 750–756 (2017). https://doi.org/10.24963/ijcai.2017/104, https://doi.org/10.24963/ijcai.2017/104
 [18] Seipp, J., Pommerening, F., Helmert, M., R¨oger: New optimization functions for potential heuristics. In: TwentyFifth International Conference on Automated Planning and Scheduling. pp. 193–201 (2015)
 [19] Tjeng, V., Xiao, K., Tedrake, R.: Evaluating robustness of neural networks with mixed integer programming. In: Seventh International Conference on Learning Representations (2019)
Comments
There are no comments yet.