    # Knowledge engineering mixed-integer linear programming: constraint typology

In this paper, we investigate the constraint typology of mixed-integer linear programming MILP formulations. MILP is a commonly used mathematical programming technique for modelling and solving real-life scheduling, routing, planning, resource allocation, timetabling optimization problems, providing optimized business solutions for industry sectors such as: manufacturing, agriculture, defence, healthcare, medicine, energy, finance, and transportation. Despite the numerous real-life Combinatorial Optimization Problems found and solved, and millions yet to be discovered and formulated, the number of types of constraints, the building blocks of a MILP, is relatively much smaller. In the search of a suitable machine readable knowledge representation for MILPs, we propose an optimization modelling tree built based upon an MILP ontology that can be used as a guidance for automated systems to elicit an MILP model from end-users on their combinatorial business optimization problems.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Combinatorial Optimization Problems (COPs) arise in many real-life applications such as scheduling [11, 16], planning [1, 4], resource allocation [7, 10], routing [9, 15], and time-tabling [5, 18]. See, for example, [3, 6, 17], for more examples of mathematical programming applied in real-life business COPs where millions or even billions of dollars were saved.

There are several commonly-employed solution approaches for COPs. The two main branches are exact methods and heuristic methods. Mathematical Programming is an exact method that can provide proven optimal solutions, and even when it fails to produce an optimal solution within a predetermined time and memory limit, it can still provide a proven optimality gap. Heuristic approaches (such as trial-and-error, simulation, learning, meta-heuristic or custom-made problem-specific heuristic) on the other hand, do not have a solution guarantee. With meta-heuristics or learning methods, with parameters properly tuned, they may be able to provide reasonably good quality solutions within a much shorter time. Exact algorithms will always be preferred in applications where a proven optimal solution matter. For instance, in Kidney Exchange Optimization, an increment of one unit in the objective function means one more kidney transplant can be carried out, and undoubtably will have a significant impact on the health outcome of the patient with a kidney failure.

When solving a COP, mathematical Programming-based exact algorithms essentially implement an exhaustive tree search with smart pruning strategies. In the case of Integer Programming (IP)-family of methods, the theoretical basis is algebra, whereas in the case of Constraint Programming (CP), the theoretical basis is logical inferences. CP and IP each has their strengths and weaknesses. IP-family of methods include Pure Integer Programming where all decision variables are integers, Binary Integer Programming where all decision variables are binary, and Mixed-integer Linear Programming (MILP) where some decision variables are continuous, and the rest are binary or general integers. MILP can also model some nonlinear terms (e.g., quadratic, bilinear, and piecewise linear terms), and therefore MILP is a very practical technique in modelling and solving real-life COPs. This is evidenced by the fact that in the history of Franz Edelman Awards, 20% of the finalists applied IP-family of methods , and that the top two algorithms in the 2 Nurse Rostering Competition are MILP-based methods .

The formal mathematical specification of a MILP problem is given as follows: , with and the decision variables; and the cost coefficients, for a

-vector of

, a -vector of ; and the constraint coefficient matrices; and a -vector of . For a thorough exposition of MILP, see, for example, [12, 14].

An MILP comprises four main components: a set of decision variables, an objective function that is a linear combination of the decision variables, a set of constraints (each containing a linear combination of decision variables), and the index sets that enumerate the decision variables and constraints. We proposed an MILP ontology in , see Figure 1 below. Figure 1: The proposed ontology of mixed integer linear programming models for Combinatorial Optimization problems. Note: The visualization is courtesy of WebVOWL .

## 2 Constraint types

Here, we ask a fundamental question: is there a finite number of MILP constraint types, and if the answer is yes, how many are there? Classic families of COPs such as routing, scheduling, planning have dozens or hundreds of variations from real-life applications. Every COP is different. Numerous MILPs for real-life COPs found and solved, and many more yet to be discovered and formulated. We are not able to examine every single MILP ever developed in history, however we performed two simple studies for obtaining insights. 1) We examined all constraints used in the production planning problems listed in H. Paul Williams’ textbook ‘‘Model building in mathematical programming’’. 2) We examined constraints used in a number of publications.

We considered the production planning examples in H. Paul Williams’ book ‘‘Model building in mathematical programming’’, and observed that constraints that represent limits (bounds), those that blending from raw materials to products, those that balance two quantities, those that governs logic conditions, and the classic binary integer programming constraints such as set partition, set packing, and set covering and their weighted variations cover all the production planning problems presented therein. In Figure 2, we present a table where we listed the meaning of the constraints used and the section number in the textbook where the examples were discussed. These constraints are reasonably easy to interpret in the sense that the mathematical specification of the constraints is either very close to or can be directly translated from a natural language (NL) description of a näive end-user, (a domain-expert end-user who is not trained to develop MILP models)--we call these explicit constraints. Figure 2: A table of constraints and the sections in which they are used in the H Paul Williams book  for production planning problems.

There are many ways to classify explicit constraints into different types. For instance, if we classify MILP constraint by their mathematical form, they can only be in one of the following forms:

, , and (here, ). For simplicity, we will write as for the rest of the paper.

Type I: Bound constraints (Demand and Supply) Resource limit (supply) constraints and demand constraints are very commonly used in MILPs, particularly in production planning-type of problems. For resource limit (supply) constraints, an upper bound on the supply can be fixed (e.g., a Knapsack constraint) or depends on the value of a decision variable. Similar for the lower bound on a demand that needs to be satisfied.

Type II: Balancing constraints Equality constraints has many variations in its usage: to balance (equate) input and output quantity; to balance the flow or quantities over two consecutive time periods, to set initial conditions, to assign values, and so on.

Set packing/partitioning/covering constraints The set packing/partitioning/covering constraints are subtypes of Type I and Type II constraints, typically used for assignment or allocation. They allow us to model the choice of at most/exactly/at least one out of many. The weighted version of the set packing/partitioning/covering constraints allow us to model the choice of out of many.

Logic constraints The three main subtypes of logic constraints are the Big-M, If-then, and Either-or constraints, each has a number of varieties. (Some of these varieties were discussed in ).

So, the next question is, what about real-life COPs other than the ones in the  and how do we represent the knowledge in order to enable automatic mapping from business requirements to the mathematical specification (or that of a general purpose modelling language such as OPL or Minizinc)?

## 3 The optimization modelling tree

We designed an Optimization Modelling Tree (OMT) and examined a number of COPs in published journal articles to ascertain whether the OMT is adequate in the sense that by traversing through the tree all elements for the MILPs can be found. The focus of this exercise is to evaluate whether the constraint types and subtypes in the OMT are enough to represent these example COPs.

A chemical production scheduling example A chemical production scheduling MILP model is presented in . The problem considers a given planning horizon, partitioned into a number of time slots. The decisions to be made are whether a unit (a machine or equipment) should start processing a task at a particular time slot, what the batch size should be, and the inventory level of each material at each time slot. A basic MILP is presented with 4 constraints. Constraint Set (1) is a set packing constraint ensuring that each unit will be starting at most one task at a time (Number 11 on the OMT). Constraint Set (2) is a combined logic (Big-M) and upper/lower bound constraint--if a unit starts processing a task at a given time, then the capacity of the batch size must be observed, otherwise, the task will not be processed on this machine at this time (Numbers 3 and 9 on the OMT). Constraint Set (3) presented in the paper should have been two constraints. One is to equate the inventory (storage) of a material at a time slot to the inventory at the previous time slot plus the new production and minus the consumption (Number 14 on the OMT). The second part of Constraint Set (3) is an upper bound on the storage limit (Number 7 on the OMT). These constraints are reasonably straight forward to describe by an end-user, and all requirements can be found in the constraint types on the OMT.

A supply chains production planning example An MILP model for mid-term production planning for high-tech low-volume supply chains is presented in . The decision variables are mostly general integer variables, and the six constraint sets are as follows. Constraint Set (1) are to balance two quantities, in specific, quantities between two consecutive time slots (Number 12 on the OMT). Constraint Sets (2) and (5) are equality constraints for assigning quantities (Number 13 on the OMT). Constraint Sets (3) and (6) are variable upper bounding and lower bounding (Numbers 2 and 8 on the OMT) whereas Constraint set (4) has the logic condition that the upper and lower bounds on decision variables for quantities apply only when the associated binary decision variables is non-zero (Numbers 3 and 9 on the OMT).

A university course timetabling problem example A university course timetabling problem was modelled as an MILP in . The decision variables are binary. One set of the decision variables represent yes/no answers to whether a particular section of a course should be assigned to a particular professor in a particular time slot. Translating them from NL to formal specifications should be reasonably straight forward. Constraint sets (2) and (3) are Set Partitioning constraints (for choice of exactly one out of many, Number 17 on the OMT), Constraint Sets (4) to (9) are Set Packing constraints (for choice of at most one out of many, Number 11 on the OMT)). Constraint sets (11)--(13), and (15) are general if-then constraints regulating if occurs, then both of and must occur. The constraints are in the form of (Number 24 on the OMT), although and are better constraints to use. This brings an important aspect for knowledge engineering MILPs: multiple feasible MILP constraints exist for the same requirement, some are strong for computational use than others. The OMT in its current state has some limitations, as we can see from the next example.

A multitrip vehicle routing problem with time windows example. The COP described in  is a routing-type problem. An end-user not trained with MILP knowledge does not normally describe that a yes/no decision is associated with each pair of locations (e.g., and with a yes answer indicating Location must be visiting immediate after Location

). However, commonly-used MILPs for routing problems typically use a binary variable for each of these decisions. Once the hurdle in decision variable definition is overcome, the rest of the constraints can be found in the constraint types or subtypes described in the OMT. Constraint Sets (1), (3), (4) are all Set Partitioning Constraints, i.e., to choose exactly one out of many (Number 17 on the OMT). Constraint Set (2) is to set to zero variables that represent impossible decisions (Number 19 on the OMT). Constraint Sets (6) to (8) are to regulate the time of arrival of a vehicle route to visit a customer, and the constraints are

if-then subtype be found in the OMT. The last constraint set (10) ensures is a straight forward upper bounding constraint on total time used, and the bound itself is a variables (Number 2 on the OMT). Constraint Set (9) is a special type of demand - capacity constraint commonly used in routing-type of problems. The requirement is not trivial to describe in NL by an end-user but the mathematical constraint itself can be found in the OMT (it is in fact a Set Packing Constraint, Number 11 on the OMT). Constraint Set (5) is a flow-balance constraint (which is covered by the OMT), the mathematical meaning is that if a customer is visited, then there must be a customer that was visited before him/her and one after him/her. An end-user would not describe the requirement like this. We call these implicit constraints.

## 4 Summary remarks and future research

What the OMT contains, is not just the mathematical specification of the MILP constarints. Mathematically, , , and are enough to cover all MILP constraints. However, the OMT we designed branches by usage, (or, the meaning of the constraints in application). A beginner MILP modeller, for example, can traverse through the tree to elicit business requirements from a non-expert end-user. We have tested some COP instances and the constraints in the OMT do in fact cover all the ‘‘explicit’’ (or, straight-forward) constraints. Even for constraints in our test cases that are not straight forward, i.e., the ‘‘implicit’’ constraints, they too are covered by the OMT, though the mapping mechanism is not represented on the OMT.

We have the same results for the ACs and the SECs of an ATSP, they can appear in the form of Set Partitioning and Set Covering constraints respectively, but the mapping is not explicitly represented in the tree.

In summary, we hypothesize that the OMT is scalable, however, we are unable to prove it in this paper.

### 4.1 Future research

Now, what is required appears to be the compilation of a list of mappings for commonly-used implicit constraints. For example, the knowledge that ‘‘visiting each city exactly once and return to the home city’’ is equivalent to ‘‘each entity has one that precedes it and one that succeeds it’’ is needed to be represented on this OMT, and consequently the Set Partitioning and Set Covering constraints be identified as the right ones to use. This will be the subject of our next research project: to perform a survey of literature for commonly-used implicit constraints and the usage they map to, and to represent such a mapping on the OMT in an efficient way.

## References

•  K. Akartunali, V. Mak-Hau, and T. Tran. A unified mixed-integer programming model for simultaneous fluence weight and aperture optimization in vmat, tomotherapy, and cyberknife. Computers & Operations Research, 56:134--150, 2015.
•  S. Ceschia, N. Dang, P. De Causmaecker, S. Haspeslagh, and A. Schaerf. The second international nurse rostering competition. Annals of Operations Research, 274, 03 2019.
•  D. Chen, R. Batson, and Y. Dang. Applied Integer Programming: Modeling and Solution. Wiley, 2010.
•  J. T. de Kruijff, C. A. Hurkens, and T. G. de Kok. Integer programming models for mid-term production planning for high-tech low-volume supply chains. European Journal of Operational Research, 269(3):984--997, 2018.
•  A. Ghoniem, V. Pereira, and H. Gomes Costa. Linear integer model for the course timetabling problem of a faculty in Rio de Janeiro. Advances in Operations Research, 2016:7597062, 2016.
•  M. Gorman, L. Nittala, and J. Alden. Anatomy of the edelman: Measuring the world’s best analytics projects. INFORMS Journal on Applied Analytics, 50:6:343--403, 2020.
•  P. Lalbakhsh, V. Mak-Hau, R. Séguin, V. Nguyen, and A. Novak.

Capacity analysis for aircrew training schools - estimating optimal manpower flows under time varying policy and resource constraints.

In 2018 Winter Simulation Conference (WSC), pages 2285--2296, 2018.
•  S. Lohmann, S. Negru, F. Haag, and T. Ertl. Visualizing ontologies with VOWL. Semantic Web, 7(4):399--419, 2016.
•  V. Mak and A. Ernst. New cutting-planes for the time- and/or precedence-constrained atsp and directed vrp. Math Meth Oper Res, 66:69–98, 2007.
•  V. Mak-Hau. On the kidney exchange problem: cardinality constrained cycle and chain problems on directed graphs: a survey of integer programming approaches. Journal of Combinatorial Optimization, 33(1):35--59, 2017.
•  V. Mak-Hau, B. Hill, D. Kirszenblat, B. Moran, V. Nguyen, and A. Novak. A simultaneous sequencing and allocation problem for military pilot training: integer programming approaches. Computers & Industrial Engineering, page 107161, 2021.
•  G. Nemhauser and L. Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, Ltd, 1988.
•  B. Ofoghi, V. Mak, and J. Yearwood. A knowledge representation approach to automated mathematical modelling, 2020.
•  Y. Pochet and L. Wolsey. Production Planning by Mixed Integer Programming. Springer, 2006.
•  M. P. Seixas and A. B. Mendes. Column generation for a multitrip vehicle routing problem with time windows, driver work hours, and heterogeneous fleet. Mathematical Problems in Engineering, 2013.
•  S. Velez and C. T. Maravelias. Reformulations and branching methods for mixed-integer programming chemical production scheduling models. Industrial & Engineering Chemistry Research, 52(10):3832--3841, 2013.
•  H. Williams. Model building in mathematical programming. 01 2013.
•  W. Zhou, X. You, and W. Fan. A mixed integer linear programming method for simultaneous multi-periodic train timetabling and routing on a high-speed rail network. Sustainability, 12:1131, 2020.