Log In Sign Up

Scalable Optimal Deployment in the Cloud of Component-based Applications using Optimization Modulo Theory, Mathematical Programming and Symmetry Breaking

The problem of Cloud resource provisioning for component-based applications consists in the allocation of virtual machines (VMs) offers from various Cloud Providers to a set of applications such that the constraints induced by the interactions between components and by the components hardware/software requirements are satisfied and the performance objectives are optimized (e.g. costs are minimized). It can be formulated as a constraint optimization problem, hence, in principle the optimization can be carried out automatically. In the case the set of VM offers is large (several hundreds), the computational requirement is huge, making the automatic optimization practically impossible with the current general optimization modulo theory (OMT) and mathematical programming (MP) tools. We overcame the difficulty by methodologically analyzing the particularities of the problem with the aim of identifying search space reduction methods. These are methods exploiting:(1) the symmetries of the general Cloud deployment problem, (2) the graph representation associated to the structural constraints specific to each particular application, and (3) their combination. An extensive experimental analysis has been conducted on four classes of real-world problems, using six symmetry breaking strategies and two types of optimization solvers. As a result, the combination of a variable reduction strategy with a column-wise symmetry breaker leads to a scalable deployment solution, when OMT is used to solve the resulting optimization problem.


page 1

page 2

page 3

page 4


HTN Planning Domain for Deployment of Cloud Applications

Cloud providers are facing a complex problem in configuring software app...

Symmetry Breaking in Symmetric Tensor Decomposition

In this note, we consider the optimization problem associated with compu...

BFS Enumeration for Breaking Symmetries in Graphs

There are numerous NP-hard combinatorial problems which involve searchin...

Breaking Instance-Independent Symmetries In Exact Graph Coloring

Code optimization and high level synthesis can be posed as constraint sa...

Lifting Symmetry Breaking Constraints with Inductive Logic Programming

Efficient omission of symmetric solution candidates is essential for com...

Efficient lifting of symmetry breaking constraints for complex combinatorial problems

Many industrial applications require finding solutions to challenging co...

GraphZero: Breaking Symmetry for Efficient Graph Mining

Graph mining for structural patterns is a fundamental task in many appli...

1 Introduction

Efficient resource management in the context of deploying component-based software applications in the Cloud means deciding which virtual machines (VMs) to acquire from the Cloud Providers (CP) and how to place the software components on them in such a way that the functional architecture is preserved and the deployment cost is minimized. Automated Cloud resource provisioning requires solving a selection and an assignment problem, i.e. which VMs should be leased from CPs and how should the components be assigned to them in such a way that the cost is minimized. This is related to the bin-packing problem, a fundamental problem in combinatorial optimization, which arises in many challenging problems from diverse application areas. Due to the importance of the bin-packing problem, there has been extensive research on developing mathematical formalisms, efficient algorithms, software systems, and applications (just to name a few:  DBLP:journals/toms/MartelloPVBK07; doi:10.1137/0207001; FernandezdelaVega1981; Beloglazov:2010:EER:1844765.1845139; DBLP:journals/eor/Carvalho02). It can be formulated as follows  Korte:2012:COT:2190621: given a set of bins with the same size and a list of items with sizes find: (i) the minimum number of bins, and (ii) a -partition of the set , such that the objects assigned to the bins do not exceed their capacity (). The problem can be formulated as a constrained optimization problem (COP) as follows:  

Minimize subject to
where if bin is used and if item is placed in bin .


In a recent project222, we studied the problem of Cloud resource provisioning for component-based applications. It consists in the allocation of virtual machines (VMs) offers from various Cloud Providers (CPs), to a set of applications such that the constraints induced by the interactions between components and by the components hardware/software requirements are satisfied and the performance objectives are optimized (e.g. costs are minimized).

The problem is similar to the bin-packing problem, however:

  1. bins (VMs) can have different capacity, which depends on the VMs offers;

  2. the placement of items (components) in bins is limited not only by the capacity constraints, but also by the constraints induced by the components interactions;

  3. the number of items is not known a priori (for component-based applications, several instances of a component can be deployed, depending on specific constraints on the number of instances);

  4. the smallest cost (optimality criteria) is not necessarily obtained by minimizing the number of bins.

It can be formulated as a constraint optimization problem (COP) and solved, in principle, by state-of-the-art mathematical programming (MP) and optimization modulo theories (OMT) tools

. While the application of MP techniques for solving COP has a long tradition, the usage of OMT is recent. Our motivation for using the OMT approach lies in the tremendous advances of methods and tools in this domain in the last decade. Applications in artificial intelligence and formal methods for hardware and software development have greatly benefited from these

8602994; 10.1145/2597809.2597817; DBLP:conf/tacas/NadelR16. The performance of these tools is highly dependent on the way the problem is formalized as this determines the size of the search space. In the case when the number of VMs offers is large, a naive encoding which does not exploit the symmetries of the underlying problem leads to a huge search space making the optimization problem intractable. We overcame this issue by reducing the search space by:

  1. systematically analyzing the symmetries which appear in the context of Cloud deployment applications;

  2. design and integrate with state-of-the-art MP (CPLEX 2016_CPLEX_usermanual) and OMT (Z3 DBLP:conf/tacas/BjornerPF15) tools static symmetry breakers for speeding-up the solution process.

As a result the scalability of the used optimizers increased, most notable, by at least orders of magnitude in the case of the OMT solver.

This paper extends our previous work DBLP:conf/iccp2/MicotaEZ18; LPAR-IWIL2018:Influence_of_Variables_Encoding; Erascu_SMT_2019 in the following aspects:

  1. we formalize the Cloud deployment problem (Section 3) by abstracting away the particularities of several realistic case studies (Section 2);

  2. we propose a methodology analyzing the particularities of the problem with the aim of identifying search space reduction methods; these are methods exploiting the symmetries of the general Cloud deployment problem, respectively methods utilizing the graph representation (cliques) of each application (Section 4);

  3. we assess and compare the performance of two tools based on different theoretical background, namely mathematical programming (CPLEX 2016_CPLEX_usermanual) and computational logic (Z3 10.1007/978-3-540-78800-3_24); we identified limits in their scalability and applied search space reduction methods aiming to improve their performance (Section 5).

2 Case Studies

The case studies introduced in this section exemplify the following aspects: (i) different component characteristics and the rich interactions type in between; (ii) the kind of linear constraints used to express these interactions (see exemplifications in Section 3); (iii) the kind of solution we are searching for (see Section 4).

2.1 Secure Web Container

The Secure Web Container DBLP:journals/tsc/CasolaBEMR17 (Figure 1) is a service which provides: (i) resilienceto attacks and failures, by introducing redundancy and diversity techniques, and (ii) protection from unauthorized and potentially dangerous accesses, by integrating proper intrusion detection tools. Resilience can be implemented by a set of different Web Container components and a Balancer component, which is responsible for dispatching web requests to the active web containers to ensure load balancing. In the simplest scenario, there are two Web Containers (e.g. Apache Tomcat333 and Nginx). Intrusion detection is ensured by the generation of intrusion detection reports with a certain frequency. It was implemented by deploying an IDSAgent, to be installed on the resources to be protected, and an IDSServer, which collects data gathered by the IDSAgents and performs the detection activities.

Figure 1: Secure Web Container Application

The constraints between application components are as follows.

  • For Web resilience: (i) Any two of the Balancer, Apache and Nginx components cannot be deployed on the same machine (Conflict constraint); (ii) Exactly one Balancer component has to be instantiated (Deployment with bounded number of instances constraint, in particular equal bound). (iii) The total number of instances for Apache and Nginx components must be at least (level of redundancy) (Deployment with bounded number of instances constraint, in particular lower bound).

  • For Web intrusion detection: (i) the IDSServer component needs exclusive use of machines (Conflict constraint). (ii) There must be an IDSServer component additional instance every 10 IDSAgent component instances (Require-Provide constraint). (iii) One instance of IDSAgent must be allocated on every acquired machine except where an IDSServer or a Balancer are deployed (Full Deployment constraint).

We want to deploy this application in the Cloud with the minimal cost. There are multiple Cloud Providers that offer infrastructure services (virtual machines) in multiple heterogeneous configurations, including Amazon444, Google Cloud555, Microsoft Azure666 In fact, the Crawler Engine we implemented DBLP:conf/synasc/ErascuIM18 gathered several hundreds of virtual machines offers having different types, i.e distinct hardware configurations (e.g. number of CPUs, memory, storage) and prices. Therefore the method used to solve the constraint optimization problem should be scalable with respect to the number of VM offers.

2.2 Secure Billing Email Service

In the context of a web application ensuring a secure billing email service (Figure 2) we consider an architecture consisting of components: (i) a coding service (), (ii) a software manager of the user rights and privileges (), (iii) a gateway component (), (iv) an SQL server () and (v) a load balancer (). Component should use exclusively a virtual machine, thus it can be considered in conflict with all the other components. In such a case the original optimization problem can be decomposed in two subproblems, one corresponding to component and the other one corresponding to the other components. The first problem is trivial: find the VM with the smallest price which satisfies the hardware requirements of component .

The load balancing component should not be placed on the same machine as the gateway component and the SQL server (Conflict constraint). On the other hand, only one instance of components and should be deployed while the other three components could have a larger number of instances placed on different virtual machines (Deployment with bounded number of instances constraint, in particular equal bound).

Figure 2: Secure Billing Email Service

2.3 Wordpress

Wordpressopen-source application is frequently used in creating websites, blogs and applications. We chose it in order to compare our approach to Zephyrus and Zephyrus2 deployment tools DBLP:conf/kbse/CosmoLTZZEA14; DBLP:conf/setta/AbrahamCJKM16. In DBLP:conf/kbse/CosmoLTZZEA14; DBLP:conf/setta/AbrahamCJKM16, the authors present a high-load and fault tolerant Wordpress (Figure 3) deployment scenario. The two characteristics are ensured by load balancing. One possibility is to balance load at the DNS level using servers like Bind777 multiple DNS requests to resolve the website name will result in different IPs from a given pool of machines, on each of which a separate Wordpress instance is running. Alternatively one can use as website entry point an HTTP reverse proxy capable of load balancing (and caching, for added benefit) such as Varnish. In both cases, Wordpress instances will need to be configured to connect to the same MySQL database, to avoid delivering inconsistent results to users. Also, having redundancy and balancing at the front-end level, one usually expects to have them also at the Database Management System (DBMS) level. One way to achieve that is to use a MySQL cluster, and configure the Wordpress instances with multiple entry points to it.

Figure 3: Wordpress Application

In the deployment scenario considered by us, the following constraints must be fulfilled: (i) DNSLoadBalancer requires at least one instance of Wordpress and DNSLoadBalancer can serve at most 7 Wordpress instances (Require-Provide constraint). (ii) HTTPLoadBalancer requires at least one Wordpress instance and HTTPLoadBalancer can serve at most 3 Wordpress instances (Require-Provide constraint). (iii) Wordpress requires at least three instances of MySQL and MySQL can serve at most 2 Wordpress (Require-Provide constraint). (iv) Only one type of Balancer must be deployed; the Balancer components are HTTPLoadBalancer, DNSLoadBalancer and Varnish (Exclusive deployment constraint). (v) Since Varnish exhibits load balancing features, it should not be deployed on the same VM as with another type of Balancer (Conflict constraint). Moreover, Varnish and MySQL should not be deployed on the same VM because it is best practice to isolate the DBMS level of an application (Conflict constraint). (vi) If HTTPLoadBalancer is deployed, then at least 2 instances of Varnish must be deployed too (Deployment with bounded number of instances constraint, in particular lower bound). (vii) At least 2 different entry points to the MySQL cluster (Deployment with bounded number of instance constraint, in particular lower bound). (viii) No more than 1 DNS server deployed in the administrative domain (Deployment with bounded number of instances constraint, in particular upper bound). (ix) Balancer components must be placed on a single VM, so they are considered to be in conflict with all the other components.

2.4 Oryx2

Oryx2 application (Figure 4

) is a realization of the lambda architecture, featuring speed, batch, and serving tiers, with a focus on applying machine learning models in data analysis, and deploys the latest technologies such as Apache Spark

888 and Apache Kafka999 It has a significant number of components interacting with each other and is highly used in practical applications. It consists of several components which can be distributed over thousands of VMs in the case of a full deployment. The main goal of Oryx2 is to take incoming data and use them to create and instantiate predictive models for various use-cases, e.g. movie recommendation. It is comprised of several technologies. Both the batch and serving layer are based on Apache Spark which in turn uses both Apache Yarn101010 for scheduling and Apache HDFS as a distributed file system. For a processing pipeline Oryx2 uses Apache Kafka with at least two topics; one for incoming data and one for model update. Apache Zookeeper111111 is used by Kafka for broker coordination. All of the aforementioned technologies have subservices with a minimum system requirement and recommended deployment as of Figure 4.

Figure 4: Oryx2 Application

The constraints corresponding to the interactions between the components are described in the following. (i) Components HDFS.DataNode and Spark.Worker must the deployed on the same VM (Co-location). In this scenario, we also collocated Yarn.NodeManager because we used Yarn as a scheduler for Spark jobs. (ii) Components Kafka and Zookeeper, HDFS.NameNode and HDFS.SecondaryNameNode, YARN.ResourceManagement and HDFS.NameNode, HDFS.SecondaryNameNode, YARN.HistoryService are, respectively, in conflict, that is, they must not be placed on the same VM. (iii) Components HDFS.DataNode, YARN.NodeManager and Spark.Worker must be deployed on all VMs except those hosting conflicting components (Full Deployment). (iv) In our deployment, we consider that for one instance of Kafka there must be deployed exactly 2 instances of Zookeeper (Require-Provide constraint). There can be situations, however, when more Zookeeper instances are deployed for higher resilience. (v) A single instance of YARN.HistoryService, respectively Spark.HistoryService should be deployed (Deployment with bounded number of instances constraint, in particular equal bound).

3 The Problem

To describe the problem in a formal way, we consider a set of interacting components, , to be assigned to a set of virtual machines, . Each component is characterized by a set of requirements concerning the hardware resources. Each virtual machine, , is characterized by a type, which is comprised by hardware/software characteristics and leasing price. There are also structural constraints describing the interactions between components (see Section 3.2). The problem is to find:

  1. assignment matrix with binary entries for , , which are interpreted as follows:

  2. the type selection vector

    t with integer entries for , representing the type (from a predefined set) of each VM leased.

such that: (i) the structural constraints, (ii) the hardware requirements (capacity constraints) of all components are satisfied and (iii) the purchasing/ leasing price is minimized.

For instance, in the case of a Secure Web Container Service (Section 2.1), the problem corresponding to

components and to a prior estimation of the number of VMs equal to

, a solution can be (1), respectively (2).


The structural constraints are application-specific (see Section 3.2) and derived in accordance with the analysis of the case studies from Section 2. General constraints (see Section 3.1) are always considered in the formalization and are related to the: (i) basic allocationrules, (ii) occupancycriteria, (iii) hardware capacity of the VM offers, (iv) linkbetween the VM offers and the components hardware/software requirements.

The problem to solve can be stated as a COP as follows:

Subject to ,
121212As of the definition of a VM type, includes . If then the machine is not occupied. ,
General constraints
Basic allocation
Application- specific constraints
Conflicts ,
Co-location ,
Exclusive deployment
for fixed
Require- Provide
Full deployment
Deployment with bounded number of instances


  • if components and are in conflict (can not be placed in the same VM);

  • if components and must be co-located (must be placed in the same VM);

  • if requires at least instances of and can serve at most instances of ;

  • is the hardware requirement of type of the component ;

  • is the hardware characteristic of the VM of type .

3.1 General Constraints

The basic allocation rules specify that each component must be allocated to at least one VM, except those being in Exclusive Deployment relation (see below).

Capacity constraints specify that the total amount of a certain resource type required by the components hosted on a particular VM does not overpass the corresponding resource type of a VM offer.


For example, the VM offer of type , characterized by CPU, memory, storage and price is encoded as ,

In order to have a sound formalization, one also needs to link a type of a VM offer to each of the occupied VMs, :


Since in our approach denotes an upper estimation (see Section  4.1) for the number of VMs needed for deployment, estimation which actually might be higher than the optimum. Hence, the following constraint is needed:

Vector is the binary occupancy vector defined as:


i.e. is if machine is used and is otherwise.

3.2 Application-specific Constraints

We identified two main types of application-specific constraints regarding the components: those concerning the interactions between components (conflict, co-location, exclusive deployment) and those concerning the number of instances (require-provide, full deployment, deployment with a bounded number of instances).

Conflict. This case corresponds to situations when there are conflictual components which cannot be deployed on the same VM. Considering that all conflicts between components are encoded in a matrix (i.e. if and are conflictual components and otherwise), the constraints can be described as a set of linear inequalities:

It should be noted that this type of constraints usually induces an increase in the number of VMs.

For example, for the Wordpress application, Varnish component exhibits load balancing features. Hence, it should not be deployed on the same VM with HTTPLoadBalancer or DNSLoadBalancer. Moreover, Varnish and MySQL should not be deployed on the same VM because it is best practice to isolate the DBMS level of an application. Therefore, based on the notations in Figure 3, where each component has an assigned identifier, the corresponding constraints are:

Co-location. This means that the components in the collocation relation should be deployed on the same VM. The co-location relation can be stored in a matrix (i.e. if and should be collocated and otherwise) and the constraints can be described as a set of equalities:

The number of VMs needed decreases with the increase of the number of co-located components.

For example, for the Oryx2 application, components HDFS.DataNode and Spark.Worker must be deployed on the same VM. In this scenario, we also co-located Yarn.NodeManager because we used Yarn as a scheduler for Spark jobs:

Exclusive deployment. There are cases when from a set of components only one should be deployed in a deployment plan. Such a constraint can be described as:

where is a function defined as: if and if .

For example, for the Wordpress application, only one type of Balancer must be deployed (the Balancer components are HTTPLoadBalancer and DNSLoadBalancer). If HTTPLoadBalancer is deployed, a caching component, in our case Varnish, should also be deployed leading to a different set of conflicts:

Require-Provide. A special case of interaction between components is when one component requires some functionalities offered by other components. Such an interaction induces constraints on the number of instances corresponding to the interacting components as follows: (i) requires (consumes) at least instances of and (ii) can serve (provides) at most instances of . This can be written as:


For example, for Wordpress application, the Wordpress component () requires at least three instances of MySQL and MySQL () can serve at most 2 Wordpress instances, leading to the constraint:

A related case is when for each set of instances of component a new instance of should be deployed. This can be described as:


This constraint cannot be deduced from (6) because of the following. Taking in (6) , we obtain an expression meaning that for instances of one should have at least one instance of (but there can be more). (7) is more specific requiring exactly one instance of . Full deployment. There can be also cases when a component must be deployed on all leased VMs (except on those which would induce conflicts on components). This can be expressed as:

where is the conflicts matrix and is defined as above.

For example, for the Oryx2 application, components HDFS.DataNode (), YARN.NodeManager () and Spark.Worker () must be deployed on all VMs except those hosting conflicting components. Since , we have , for .

Note that we do not allow, in the application description, the full deployment of two conflicting components.

Deployment with bounded number of instances. There are situations when the number of instances corresponding to a set, , of deployed components should be equal, greater or less than some values. These types of constraints can be described as follows:

For example, for the Secure Web Container application, the total amount of instances of components Apache () and Nginx () must be at least (level of redundancy):

4 Solving Approach

Our solving approach is based on the following methodology: (i) Estimate an upper bound on the number of VMs needed for the application deployment (Section 4.1). (ii) Analyze the application-specific constraints, in particular co-location and conflicts, and adapt the formalization in such a way that constraints are implicitly satisfied (co-location) or the search space is reduced (conflicts) (Section 4.2). (iii) Analyze the symmetries of the problem and identify symmetry breaking strategies (Section 4.3). (iv) Select the optimization method (Section 4.4).

4.1 Prior Estimation of the Number of Virtual Machines

The number of decision variables (elements of the assignment matrix and of the type selection vector ) depends on the number of VMs, , taken into account in the deployment process. However, the optimal number of used VMs is also unknown, thus a prior estimation of a (tight) upper bound is required. In the case of the traditional bin-packing problem, such an upper bound is given by the number of items. In the case of resource allocation class of problems addressed in this paper, since the number of instances per component is also unknown, estimating an upper bound for the number of VMs is not trivial. In order to estimate we solve a surrogate optimization problem which takes into account only constraints involving number of instances (i.e. Require-Provide and Deployment with bounded number of instances) and minimizes the total number of instances:

Subject to
Bounded number of instances

As is set to the sum of the number of instances estimated by solving the above surrogate problem, it means that it corresponds to the case when each instance will be assigned to a distinct machine which would correspond to the case when all components would be in conflict. Since in real world cases usually not all components are in conflict, it follows that is an upper bound estimate of the number of virtual machines required to satisfy all application-specific constraints.

4.2 Analysis of the Application-specific Constraints

Some constraints can be satisfied by a proper encoding of the decision variables (as that referring to the fact that only one instance of a component is deployed on a VM) while other ones can be exploited in order to reduce the search space by redefining the models.

4.2.1 Co-location-type Constraints

For instance, the co-location-type constraints can be exploited by combining all co-located components in a hyper-component and redefining the existing constraints. Let us consider that is a set of components which should be co-located, i.e. for each instance of one of the components deployed on a VM, an instance of all the other components should be deployed on the same machine. Thus all components in will have the same number of deployed instances and the original problem of assigning components to VMs can be reformulated as the problem of assigning components (as components will be replaced with one hyper-component, ). The original constraints involving elements of will be also reorganized as follows:

Conflicts. All conflict type constraints involving elements of will be replaced with one constraint involving , i.e. if then .

Exclusive Deployment. If an element of the hyper-component is in an exclusive deployment relation with a component then will be excluded, i.e. is preferred.

Full Deployment. If at least one component of the hyper-component appears in full deployment constraint then such a constraint should be added for .

Require-provide. In all require-provide constraints involving components from these components will be replaced with the hyper-component .

Capacity related constraints. The capacity constraints (e.g. CPUs, memory size, storage) concerning the components from will be aggregated by sum in unique constraints involving the hyper-component.

4.2.2 Conflict-type Constraints

When there are conflictual components, the conflict graph can be used to identify components which should be placed on different machines. This type of constraints can be further exploited by fixing the values of the decision variables which correspond to some of the conflictual components. More specifically, in a first step, all cliques which exist in the conflict graph are identified, i.e. subsets of components which are fully conflicting, meaning that their instances should be deployed on different VMs. Then the clique is selected, that is the clique with the largest deployment size, i.e. the largest number of instances (). Since the instances of all components belonging to should be assigned to different machines, one can fix values of the assignment variables as follows. Each of the instances of component in is assigned sequentially to the first available machine, i.e. the following constraints are explicitly set:


where the value denotes the number of machines already occupied by instances corresponding to components . Since, the values obtained by solving the surrogate problem described in subsection 4.1 might represent over-estimations (e.g. particularly for the components involved in constraints related to bounded number of instances) a conservative approach would be to fix variables corresponding to just one instance for each of the components belonging to the selected clique .

Based on the observations above, as well as on the analysis of our case studies, we grouped the decision variables into three main categories: (i) variables with fixed values (as those set above); (ii) variables bounded by constraints (the variables for which their values are determined by solving some of these constraints); (iii) variables free of constraints, for which the values are mainly controlled through the optimization criterion (e.g. components which fit into any offered VM and are not involved in constraints describing the interaction between components).

Let us consider the Secure Web Container use case (see Figure 1). The surrogate optimization problem presented in Section 4.1 estimates a maximum of VMs for solving the problem. The conflict graph corresponding to the application described in Figure 1) contains two cliques: [Balancer, Apache, Nginx, IDSServer] and [Balancer, IDSServer, IDSAgent], the first one having the largest deployment size hence playing the role of (see Figure 5).

Figure 5: Secure Web Container conflict graph. The components with green background belong to the clique .

Based on the variable fixing rules described above, most of the decision variables can be fixed, as is illustrated in Table 1, where framed elements correspond to fixed values, and the other elements correspond to variables which are bounded by constraints (e.g. Full deployment in the case of , Deployment with bounded number of instances for the set ). Note that the precise number of instances of and

is not known at the moment of variables fixing; we only know that the sum of their instances should be at least 

. Hence only one instance for , respectively, is fixed. In this example, we also explored the fact that is in conflict with , although is not part of .

1 0 0 0 0 0
0 0 1 0 1 0
0 0 0 1 0 0
0 1 0 0 0 0
0 0 1 1 1 0
Table 1: Fixing variables in the assignment matrix of the Secure Web Container use case

In this example there are no free of constraints variables, but it would be possible to take into account a component which is not involved in structural constraints (e.g. conflicts, co-location, exclusive or full deployment, require-provide) but can be beneficial from a functional point of view (e.g. a monitoring component). In such a case, different deployment plans having the same cost, but different assignments for the free of constraints component, can be generated.

4.3 Symmetries and Symmetry Breakers

Realistic Cloud applications might involve the deployment of a large number of components instances on VMs selected from a large pool of offers. This leads to the necessity of solving optimization problems with a large search space, which, however might contain equivalent solutions. The search space can be limited by reducing the number of decision variables (as is illustrated in Section 4.2.1), by reducing the number of unassigned variables (as is illustrated in Section 4.2.2) or by breaking the symmetries related to the decision variables as discussed in this section.

In the following, symmetries which appear in Cloud deployment problems are described and appropriate symmetry breaking strategies are identified. For a self-contained presentation, we first introduce some theoretical notions.

4.3.1 Preliminaries

A matrix model flener:Reform02 is a constraint program that contains one or more matrices of decision variables. Symmetries in constraint satisfaction/optimization problems, in general, and in matrix models, in particular, are a key problem since search can revisit equivalent states many times. In order to deal with symmetries, one must first define what they are. We use the definition from DBLP:conf/cp/FlenerFHKMPW02, namely a symmetry is a bijection on decision variables that preserves solutions and non-solutions. Two variables are indistinguishable if some symmetry interchanges their roles in all solutions and non-solutions. These are variable symmetries. The definition can be extended also to value symmetries, i.e. symmetries that permute only the values of variables.

For matrix models, symmetry often occurs because groups of objects within a matrix are indistinguishable. This leads to row/column symmetries. Two rows/columns are indistinguishable if their variables are pairwise identical due to a row/column symmetry. A matrix model has row/column symmetry iff all the rows/columns of one of its matrices are indistinguishable. A matrix model has partial row/column symmetry iff strict subset(s) of the rows/columns of one of its matrices are indistinguishable. Partial row/column symmetry are more often encountered in Cloud deployment problems, as explained in this section.

Elimination of equivalent states, problem known as symmetry breaking, has, most of the times, a positive impact in the computation time of the problem solution process. Symmetries can be eliminated by using symmetry breaking techniques which can be categorized as follows (DBLP:reference/fai/2, Chapter 10):

  1. Reformulation means that the problem is remodeled to eliminate some or all symmetries. It proved to be very efficient method for breaking symmetry, but unfortunately there is no known systematic procedure for performing the remodeling process in general.

  2. Static symmetry breaking adds the so-called symmetry breaking constraints before search starts, hence making some symmetric solutions unacceptable while leaving at least one solution in each symmetric equivalence class.

  3. Dynamic symmetry breaking removes symmetries dynamically during search, adapting the search procedure appropriately.

In this paper we use static symmetry breaking as detailed in Section 5.

A natural way to break symmetry is to order the symmetric objects. To break row/column symmetry, one can simply order the rows/columns lexicographically. We say that the rows/columns in a matrix are lexicographically ordered if each row/column is lexicographically smaller than the previous.

Symmetries of Cloud deployment problems can be eliminated in a similar manner, as explained in the following.

4.3.2 Column Symmetries

The column symmetries for the Cloud deployment problem from Section 3 are determined by the decision variables and . In defining the column symmetries one might exploit the different point of views of the problem. On one hand, the variables , , representing the type of leased VMs, might be considered indistinguishable, thus exhibit symmetry. At the same time but independently, the columns of the assignment matrix , representing columns, might be considered indistinguishable. These correspond to full variable symmetry. Symmetry breakers are based on the idea of ordering the columns, for example:

(i) decreasing by the number of components:

(ii) decreasing by lexicographic order of columns: where denotes the column .

(iii) decreasing by lexicographic order of tuples containing the price and the hardware characteristics of a VM, e.g. CPU number, memory, storage.

On the other hand, after , are assigned values, only the VMs of the same type are indistinguishable. This means that only the columns of the assignment matrix corresponding to VMs of the same type are indistinguishable. This is partial symmetry determined by the fact that we turn partial value symmetry for into partial variable symmetry for . Symmetry breakers are also based on the idea of ordering the columns but with some restrictions, for example:

(i) decreasing by the number of components for columns representing VMs of the same type:


(ii) decreasing by lexicographic order of columns for columns representing VMs of the same type


4.3.3 Row Symmetries

The row symmetries for the Cloud deployment problem from Section 3 are determined by different viewpoints on the assignment matrix . On one hand, the problem has row symmetry if by applying a permutation on the row labels of (i.e. on the set of components , , , ), the assignment matrix corresponds to an equivalent solution (all constraints are satisfied and the value of the optimization criteria is the same).

The main application-specific constraints which induce row symmetry are that related to conflicts, as any two components being in a conflict relationship are indistinguishable. One can break this kind of symmetry by computing the clique with maximal deployment size in which all components are pairwise conflictual hence can not be placed on the same VM. More details were given in Section 4.2.2.

On the other hand, in the case when all components of an application are identical from the point of view of the hardware requirements and there are no application-specific constraints, the problem has full symmetry with respect to the rows. This means that any of the permutations of the assignment matrix row will correspond to an equivalent solution. However, it is rarely this case of Cloud deployment applications. Therefore we are dealing rather with partial row symmetry instead of full row symmetry.

4.4 Optimization Approaches

Optimization problems originated from resource management problems can be tackled by exact (constraints programming, mathematical programming) or inexact (

meta-heuristic algorithms

) methods. Inexact methods are highly used in the literature because of their low computational time. However, these methods are suboptimal and there are no theoretical results which allow to estimate how far from the real optimum the solution is. The benefit of exact techniques is that they guarantee the optimal solution, with the disadvantage of higher computational time.

In this paper we used mathematical programming (MP) and constraint programming, in particular OMT solving. The main difference between these two approaches is the usage of different theoretical basis, namely algebra, respectively logical inferences for solution construction.

4.4.1 Mathematical Programming

Mathematical programming

is a branch in the field of operations research dealing with groups of methods for various optimization problems, particularly linear and quadratic optimization problems. The problem addressed in this paper belongs to the class of integer linear programming problems for which there are well established solving methods.

In our experiments, we used CPLEX solver 2016_CPLEX_usermanual. It is the first commercial linear optimizer on the market to be written in the C programming language started being developed 20 years ago. It implements efficient algorithms solving integer programming, mixed integer programming, and quadratic programming problems. Distinct features of CPLEX, which to the best of our knowledge are not available for OMT solvers, are: (i) it allows to specify bounds on variables at the moment of their declaration; this avoids adding additional logical constraints to the model defining these bounds, (ii) it exhibits performance variability which we exploited by ordering the constraints and by using pre-processing parameters (see Section 5.1.2). A drawback which we encountered when using CPLEX to solve the problem was an explosion of additional variables dynamically generated by the solver, which influences negatively the processing time. This happens because CPLEX introduces a new variable for each sum appearing in constraints (e.g. sum of decision variables corresponding to the same column as in Eq. (6)), even if the same sum appears in several constraints, hence the same variable would have been used. In order to avoid such a variables explosion, the set of occupancy decision variables (i.e. vector in the problem description) has been explicitly introduced in the CPLEX problem specification.

4.4.2 Satisfiability and Optimization Modulo Theories

SMT solving is an extension of satisfiability (SAT) solving by introducing the possibility of stating constraints in some expressive theories, for example arithmetic, data structures, bit-vector expressions and their valid combinations. The idea behind SMT solving is that, given a formula in a certain theory, this is translated into a propositional formula. This is checked for satisfiability using a SAT solver. If unsatisfiability can not be deduced, then a candidate model (variables assignment) is fed to the theory solver. If this candidate model makes the initial formula true, SAT is returned meaning that the initial formula is true for the respective model. If SAT can not be deduced then the theory solver backtracks trying another candidate model. This process is repeated until no more candidate models are found.

For our problem, we needed a SMT solver which exhibits optimization features, the so-called Optimization Modulo Theories (OMT) solvers. There are not too many options in this regard: (i) OptiMathSAT 10.1007/978-3-319-21690-4_27 uses an inline architecture in which the SMT solver (MathSAT5131313 is run only once and its internal SAT solver is modified to handle the search for the optima, (ii) Symba DBLP:conf/popl/LiAKGC14 and DBLP:conf/tacas/BjornerPF15 both are based on an offline architecture in which the SMT solver Z3 10.1007/978-3-540-78800-3_24 is incrementally called multiple times as a black-box. Since Symba is not actively maintained and, in our previous work outperformed OptiMathSAT Erascu_SMT_2019, in this paper we will use . We refer to it as being an integrating part of it. It is built on top of the mature SMT solver Z3, it was widely used in various application domains and offers a Python API. It takes as an input SMT formulas (constraints and objectives) which are simplified to Pseudo-Boolean Optimization (PBO) constraints. The PBO solver implements a wide range of methods for simplification and generating conflicting clauses, and compiling them into small sorting circuits. At the heart of the tool lies the dual simplex algorithm which is also used to prune branches.

OMT is similar to constraint programming methods as they both use logical inferences to construct de solution. Hence, encoding the problem in OMT formalism meant almost a one to one translation of the constraints introduced in Section 3.

5 Experimental Analysis

The goal of the experimental analysis is two fold. On one hand, we want to asses the scalability of state-of-the-art general MP and OMT tools, namely CPLEX 2016_CPLEX_usermanual, respectivelly Z3 DBLP:conf/tacas/BjornerPF15, in solving COPs corresponding to the case studies from Section 2. Tests (see Section 5.2) revealed that the naive application of general MP and OMT techniques is not sufficient to solve realistic Cloud deployment applications. Hence, on the other hand, we evaluate the effectiveness of various static symmetry breaking techniques in improving the computational time of solving these problems (see Section 5.1). The scalability and effectiveness are evaluated from two perspectives: number of VMs offers, respectively number of deployed instances of components. For Secure Web Container, Secure Billing Email and Oryx2 applications, we considered up to 500 VMs offers. Additionally, for the Wordpress application, we considered up to instances of the Wordpress component to be deployed. The set of offers was crawled from the Amazon CPs offers list.

5.1 Experimental Settings

In this section we present and motivate the plethora of symmetry breakers tested on two types of optimization tools for optimization, as well as the characteristics of the hardware we ran the experiments on.

5.1.1 Selected Symmetry Breaking Strategies

Aiming to reduce the search space size, a set of strategies have been selected in order to exploit the particularities of the problem: (i) the VMs needed for application deployment might have different characteristics; (ii) applications components might be in conflict hence conflict-type constraints can be exploited; (iii) the number of instances to be deployed is unknown.

Our approach is incremental and experimental: we start with traditional symmetry breakers that have been used for other problems related to bin-packing and combine them with the aim of further search space reduction.

Price-based ordering (PR). This strategy aims to break symmetry by ordering the vector containing the types of used VMs decreasingly by price, i.e. This means that the solution will be characterized by the fact that the columns of the assignment matrix will be ordered decreasingly by the price of the corresponding VMs.

Lexicographic ordering (LX). This corresponds to the traditional strategy aiming to break column-wise symmetries. The constraints to be added aiming to ensure that two columns, and are in a decreasing lexicographic order, i.e. , are:


Price-based and lexicographic ordering (PRLX). The columns corresponding to VMs with the same price can be considered as indistinguishable, thus the induced symmetries can be broken by ordering lexicographically the corresponding columns.

Fixed values (FV). The search space can be reduced also by fixing the values of some variables starting from the application specific constraints. The strategy included in the experimental analysis is based on the exploitation of the conflict-type constraints as described in Section 4.2.2, i.e. the constraints given by (8)-(9) have been added to the problem specification.

Fixed values and price ordering (FVPR). This strategy uses FV to fix, on separate VMs, the conflicting components and PR to order the VMs. The list of VMs is not globally ordered but it is split in sublists which are ordered. This splitting is based on the structure of the clique with maximal deployment size (). More specifically, for each component in the sublist containing the VMs on which its instances are deployed is decreasingly ordered based on price. Finally the VMs which do not contain instances of the components in are decreasingly ordered (see Algorithm 1).

Fixed values and lexicographic ordering (FVLX). This strategy is similar to FVPR, the only difference being the fact that instead of imposing a price-based order on VMs it is applied a lexicographic order on the corresponding columns of the assignment matrix. More specifically, the lines 12 and 15 in Algorithm 1 are replaced with and , respectively.

1:  Find the clique
2:    /* - VM index */
3:    /* CL - constraints list */
4:  for each component  do
6:     for each instance of  do
8:        for each do
10:        end for
13:     end for
14:  end for
16:  return  
Algorithm 1 FVPR Algorithm

Note that Algorithm 1 uses FV strategy, hence it requires the identification of all cliques in the conflict graph. At his aim, we used an implementation of Bron-Kerbosch algorithm available in NetworkX library ( Finding cliques in a graph is a NP-hard problem, however in our case studies the graph size (given by the number of components) is not very large, so this preprocessing step does not increase significantly the execution time.

It is worth mentioning that the strategies involved in the experimental analysis belong to several classes: (i) PR, LX and PRLX correspond to column symmetry breakers as they exploit full and partial symmetries in sets of columns corresponding to groups of VMs with similar characteristics (e.g. same price); (ii) FV is a row symmetry breaker exploiting the conflict-type constraints; (iii) FVPR and FVLX combine column and row symmetry by incorporating the advantages of the individual types.

5.1.2 Software and Hardware Settings

In the case of the OMT solver Z3, as background theory, the formalization uses quantifier-free linear arithmetic, in particular quantifier-free linear integer arithmetic. This was chosen based on the results obtained in Erascu_SMT_2019. Z3 was used with the default values of the parameters.

In the case of CPLEX, besides the binary decision variables corresponding to the elements of the assignment matrix and the integer decision variables corresponding to the vector containing the types of VMs, the binary variables corresponding to the occupancy vector,

, have been explicitly included in the CP problem. This has been done in order to limit the number of variables generated by CPLEX for all constraints specified as logical expressions (as stated in Section 4.4.1). All CPLEX experiments have been conducted using primal reduction pre-processing (ensured by setting the option parameters.preprocessing.reduce=1), as the performance in this case was significantly better than in the other cases (no reduction or primal and dual reduction).

The implementation, as well as the results, are available at

All tests in this paper were performed on an Lenovo ThinkCentre with the following configuration: Intel Core i7 vPro using CPLEX v12 and Z3 v4.8.7.

5.2 Results

Table 2 includes the results obtained without using symmetry breaking strategies. The estimated number of VMs reported on the second column has been obtained using the method described in Section 4.1. The list of offers was crawled from the Amazon site141414 Each list of VM offers covers the main instance types, for example, small, medium, large. The list of offers can be viewed as a containment hierarchy (i.e. the list of 20 offers is included in the list of 40 offers etc.).

The tables include only those cases for which we obtained a result in a 40 minutes timeframe, as this was the time limit in the SMT-COMP’19151515 The missing values () mean that no solution is returned in this timeframe.

One can observe that Z3 scales better than CPLEX, however, none of the tools scales for problems involving a large number of components’ instances (e.g. at least instances of the Wordpress component) and a large number of offers (more than a couple of dozens). This is due to the fact that the number of VM offers influences the number of constraints generated, most notable constraints of type (3) and constraints of type (4) are generated.

Problem estimated #VMs #offers=20 #offers= 40 #offers=250 #offers=500
Oryx2 11
Sec. Web
Sec. Billing
min#inst = 3
min#inst = 4
- - - - -
Table 2: Scalability tests for Z3 and CPLEX tools. Time values are expressed in seconds

To overcome the lack of scalability issue, we applied the symmetry breaking strategies described in Section 5.1. The results are presented in Table 3.

Problem #offers=20 #offers=40 #offers=250 #offers=500
Z3 Solver
Oryx2 0.44 0.33 0.49 0.52 0.38 0.45 1.21 1.27 1.34 2.42 1.04 1.47 3.89 10.75 16.53 5.99 2.34 6.97 10.08 39.02 84.41 19.41 9.83 33.70
Sec. Billing
0.12 0.16 0.09 0.29 0.08 0.20 0.30 0.44 0.27 0.52 0.27 0.38 0.80 2.61 0.76 1.58 0.68 1.68 2.30 7.71 1.92 5.95 1.85 2.68
Sec. Web
0.13 0.16 0.12 0.21 0.11 0.20 0.41 0.60 0.37 0.62 0.39 0.53 0.99 3.35 1.52 1.65 1.54 3.86 3.24 6.70 2.37 4.40 3.48 4.69
0.27 0.32 0.31 0.27 0.23 0.44 0.69 1.15 1.09 1.18 0.71 1.29 1.36 7.10 12.57 5.00 2.46 6.13 4.27 29.45 24.36 10.88 8.05 25.49
0.41 0.59 0.58 0.61 0.35 0.61 1.34 2.09 2.73 1.83 1.11 2.63 3.92 13.65 49.84 6.64 3.63 14.89 12.26 54.50 161.00 21.33 9.15 37.99
0.60 0.88 1.11 0.71 0.48 1.06 2.23 2.98 5.84 2.61 1.63 2.91 12.17 18.12 167.88 8.55 7.26 21.46 22.86 77.51 787.07 33.53 15.25 72.55
0.82 1.17 1.40 1.25 0.49 1.16 2.40 4.50 6.11 3.11 1.70 4.39 11.37 24.25 824.55 15.98 6.36 45.78 25.96 117.47 - 45.50 17.31 59.71
1.15 1.68 2.32 1.36 0.75 1.46 4.78 6.05 19.30 4.49 2.47 7.76 23.93 40.70 - 27.24 8.30 43.81 46.05 140.93 - 55.32 28.13 92.17
1.95 2.56 15.25 2.23 0.99 2.56 15.91 13.06 174.09 6.60 4.47 15.45 139.67 107.48 - 71.43 28.00 161.76 260.44 302.54 - 415.67 79.64 303.21
1.97 3.34 23.96 2.63 1.06 3.10 15.62 16.50 202.34 8.00 4.66 17.69 273.28 111.81 - 139.91 33.64 167.59 375.29 396.45 - 421.62 87.92 253.93
5.95 4.30 71.71 3.24 1.50 5.41 62.44 21.93 1008.30 12.79 6.06 26.24 492.09 126.14 - 127.68 34.33 170.95 - 467.52 - 1187.05 240.36 476.49
27.24 5.45 253.62 3.46 1.91 5.97 216.74 28.96 2328.61 24.74 6.76 41.91 2078.36 155.49 - 414.06 70.77 239.72 - 436.16 - - 338.74 671.25
28.43 5.55 239.09 4.67 1.92 7.14 225.30 32.69 - 17.64 8.41 33.03 1755.92 188.86 - 414.70 33.32 245.59 - 690.17 - - 418.80 623.72
CPLEX Solver
Oryx2 0.05 14.06 0.20 1.36 0.03 1.23 0.14 68.22 0.19 5.45 0.06 12.35 5.25 - 60.08 94.82 3.04 925.00 247.11 - - - 27.30 -
Sec. Billing
0.07 1.32 0.05 0.38 0.06 0.11 0.35 5.03 0.23 1.48 0.21 0.42 58.56 - 29.22 234.97 31.98 37.89 - - - - 376.39 -
Sec. Web
0.31 10.53 0.15 2.40 0.12 0.10 3.60 142.85 0.33 12.78 0.30 0.31 347.47 - 74.70 - 68.86 58.08 - - - - 1316.53 1795.03
0.19 85.58 0.89 0.72 0.24 2.37 1.55 - 42.77 7.88 1.97 39.62 346.77 - - 1588.69 1582.11 - - - - - - -
0.32 910.36 1.84 0.84 0.32 4.93 4.48 - - 10.27 3.63 153.73 1252.09 - - - - - - - - - - -
0.50 - 4.05 4.18 0.54 18.63 7.04 - - 45.68 3.58 - - - - - - - - - - - - -
0.67 - 14.73 5.21 0.68 18.49 14.03 - - 36.24 8.76 - - - - - - - - - - - - -
0.39 - 12.05 7.75 1.92 27.22 29.01 - - 124.74 25.04 - - - - - - - - - - - - -
0.54 - 1554.52 7.53 1.77 38.41 30.07 - - 141.97 23.68 - - - - - - - - - - - - -
0.77 - 25.57 7.46 2.40 39.10 17.16 - - 73.49 26.84 - - - - - - - - - - - - -
3.75 - 1908.48 9.57 1.87 58.52 74.72 - - 168.08 43.22 - - - - - - - - - - - - -
2.55 - 98.84 24.54 2.87 109.82 96.82 - - 320.81 29.20 - - - - - - - - - - - - -
4.70 - - 13.97 2.29 63.03 239.28 - - 580.61 44.26 - - - - - - - - - - - - -
Table 3: Efficiency analysis: time to find an optimal solution (in seconds) in case of applying different symmetry breaking strategies

5.3 Discussion

From the results reported in Table 3, we can draw the following remarks:

  1. Using appropriate symmetry breakers and the OMT solver Z3 all problem instances were solved within the established timeframe. On the other hand, CPLEX shows lack of scalability, even if for small size problems it leads in a smaller amount of time to solutions.

  2. The best scalability is obtained by combining traditional column-wise symmetry breakers with search space reduction methods which exploit the graph representation associated to the structural constraints specific to each particular application.

Regarding the first remark, one possible explanation of the poorer scalability of CPLEX is that the number of variables increases significantly when the number of offers increases (see Table 4). This is due to the fact that in a preprocessing step, CPLEX translates the original problem into a MP formulation, in particular, all logical implications are rewritten and auxiliary variables are introduced.

Problem Binary Integer Total Binary Integer Total Binary Integer Total Binary Integer Total
#offers=20 #offers=40 #offers=250 #offers=500
Oryx2 387 63 450 644 63 707 3339 63 3402 6547 63 6610
Sec. Billing
148 29 177 265 29 294 1490 29 1519 2948 29 2977
Sec. Web
178 34 212 318 34 352 1788 34 1822 3538 34 3572
242 46 288 429 46 475 2389 46 2435 4722 46 4768
302 57 359 535 57 592 2985 57 3042 5902 57 5959
362 69 431 642 69 711 3582 69 3651 7082 69 7151
392 75 467 695 75 770 3880 75 3955 7672 75 7747
452 86 538 802 86 888 4477 86 4563 8852 86 8938
511 98 609 908 98 1006 5073 98 5171 10031 98 10129
541 103 644 961 103 1064 5371 103 5474 10621 103 10724
601 115 716 1068 115 1183 5968 115 6083 11801 115 11916
661 126 787 1174 126 1300 6564 126 6690 13014 126 13140
691 132 823 1227 132 1359 6862 132 6994 13621 132 13753
Table 4: Number and types of variables used by CPLEX (explicit decision variables and auxiliary generated variables) - no symmetry breaking case.

Regarding the second remark, first we notice that, in the case when Z3 solver is used, out of the single-criterion strategies, the LX one leads to significantly better results than PR and FV strategies. This is particularly true in the case of large size problems (i.e. many components and many offers, as is the case of Wordpress). However, since LX strategy leads to many implication type constraints (Eq. 12) this has a negative impact in the case of CPLEX solver.

We observe that the FV strategy scales worse than the other individual strategies in the case of the Wordpress application with a large number of offers and a high number of Wordpress instances. On the other hand, as it is expected, FV strategy is effective for problems for which the number of pairwise conflicting components is large and a significant reduction in the number of variables can be obtained (e.g. Secure Billing Email, Secure Web Container - see Table 5). In these cases combining FV with other strategies (e.g. FVPR, FVLX) does not bring a benefit because the search space exhibits none or very few symmetries after applying FV and the additional symmetry breaking constraints only increases the computational burden.

As column-wise symmetry breaker, the PR strategy has the disadvantage of not being able to break all symmetries (as the VMs used in the deployment might have the same price) but the advantage of leading to simpler, thus less computationally costly, symmetry breaking constraints. Therefore, in the case of CPLEX solver the PR strategy leads to better results than LX strategy. On the other hand, in the case of Z3 solver, LX scales better than PR because it turns out that only few types of VMs are typically used for deployment, hence ordering based on price does not break too many symmetries.

Out of the three combined strategies (PRLX, FVPR, FVLX), the one leading consistently to the best results is FVPR, especially in the case of large size problem instances. Even in the cases when CPLEX solver has not reached a solution in the established timeframe for PR and FV, the combined strategy has obtained a good performance. This is because fixing values to variables is equivalent to their elimination.

Although one would expect that FVLX outperforms PRLX as variables are fixed, this is the case only for Wordpress with at least deployed instances. In fact, the benefit of combining FV and LX, even when compared with LX strategy, can be observed only for the problem instances mentioned above.

#fixed values
Oryx2 6 110 5% 11 6