Virtual Machines (VM) are a widely used building block of workload management and deployment. They are heavily used in both traditional data center environments and clouds. A hypervisor is usually used to manage all virtual machines on a physical machine. This virtualization technology is quite mature now and can provide good performance and security isolation among VM instances. The isolation among VMs is strong and an individual VM has no awareness of other VMs running on the same physical machine (PM). However, for applications that require higher flexibility at runtime and less isolation, hypervisor based virtualization might not satisfy the entire set of quality of service (QoS) requirements.
Recently there has been an increasing interest in container technology, which provides a more lightweight mechanism by the way of operating system level virtualization compared to VMs . A container runs on a kernel with similar performance isolation and allocation characteristics as VMs but without the expensive VM runtime management overhead . Containerization of applications, that is deployment of application or its components in containers, has become popular in cloud service industry [3, 4, 5]. For example, Google, Amazon, eBay and Netflix are providing many of their popular services through a container-based cloud .
A popular containerization engine is Docker  which wraps an application in a complete filesystem that contains everything required to run: code, runtime, system tools, system libraries – anything that can be installed on a VM. This guarantees that the application will always run the same, regardless of its run-time environment . Container based services are popularly known as Microservices (while VM based services are referred to as Macroservices) and are being leveraged by many service providers for a number of reasons: (a) reduced complexity for tiny services; (b) easy and fast scalability and deployment of application; (c) improvement of flexibility by using different tools and frameworks; (d) enhanced reliability for the system . Containers have empowered the usage of microservices architectures by being lightweight, providing fast start-up times, and having a low overhead. Containers can be used to develop monolithic architectures where the whole system runs inside a single container or clustered architectures where a combination of containers is used .
A flexible computing model combines Infrastructure-as-a-Service (IaaS) with container based Platform-as-a-Service (PaaS). Leveraging both containers and VMs brings the good of both technologies for all stakeholders to the table; the strong isolation of VM for the security and privacy purposes and flexibility of containers for the sake of performance and management. Platforms such as Tutum , Kubernetes , Nirmata  and others [12, 13] offer services for managing microservice environments made of containers while relying on IaaS public/private clouds as the backend resource providers. Application service providers who plan to migrate to microservice platform need to know how their QoS would be at scale in particular. Service availability and service response time are among top two key QoS metrics . Quantifying and characterizing such quality measures requires accurate modeling and coverage of large parameter space while using tractable and solvable models in a timely manner to assist with runtime decisions . Scalability is a key factor in realization of availability and responsiveness of a distributed application in times of workload fluctuation and components’ failure. In other words, the way and time an application can provision and deprovision resources determines the service response time and availability.
In this work, we build a microservice platform using the most in use technology, ie Docker , and then leverage this platform to design a tuneable analytical performance model that provides what-if analysis and capacity planning at scale. Both microservice platform providers and microservice application owners may leverage the performance model to measure the quality of their elasticity in terms of provisioning and deprovisioning resources. The proposed performance model is comprehensive as it models both microservice as well as the macroservice layers through separate but interactive sub models. The performance model supports high degree of virtualization at both layers (ie, multiple VMs on a PM and multiple containers on a VM) that reflects the real use case scenarios in today microservice platforms. A preliminary version of this work appeared as a conference paper  which only models the microservice layer and treats the back-end IaaS cloud as a black box. In this work however, we propose a comprehensive fine granular performance model that captures the details of both layers and their interactions; we discuss the scalability and tractability and also provide an algorithm that presents the internal logic of the performance model.
The rest of the paper is organized as follows. Section 2 describes the virtualization techniques and the new trend of emerging microservices. Section 3 describes the system that we are going to model in this work. Section 4 introduces the stochastic sub-models and their interactions. Section 5 presents the experiments and numerical results obtained from the analytical model. In section 6, we survey related work in cloud performance analysis, and finally, section 7 summarizes our findings and concludes the paper.
2 Virtualization and Microservices
The concept of virtualization was created with the main goal of increasing the efficient use of computing resources. With the advent of cloud computing, virtualization has received even more attention for managing cloud data centers. Cloud providers essentially employ two main components, cloud management systems and hypervisors, to manage their physical stack of resources and make it accessible to a large number of users. Hypervisors provide a well defined logical view of physical resources of a single physical machine (PM). Cloud management systems on the other hand, provide a single interface of management for both service providers and cloud users of the whole data center. This way, cloud resources are provided to users as virtual machines (VMs), highly isolated environments with a full operating system. The most popular hypervisors are VMware , KVM , Xen  and Hyper-V ; and popular open source cloud management systems include OpenStack , CloudStack , and Eucalyptus . Hypervisor-based virtualization provides the highest isolated virtual environment. However, the cost of virtualization overhead is high as each VM has to run its own kernel as the Guest OS. Moreover, VM resources are mainly underutilized as each VM usually hosts one application . VM virtualization limitations led to the development of Linux containers wherein only those resources will be used which are required by applications while avoiding the overhead of redundant virtualized operating systems as is the case in VMs.
A Linux container takes a different approach than a hypervisor and could be used as an alternative to or a complement for hypervisor based virtualization. Containerization is an approach where one can run many processes in an isolated fashion. It uses only one kernel (ie, OS) to create multiple isolated environments. Containers are very light weight because they do not virtualize the hardware; instead, all containers on physical host uses the single host kernel efficiently via process isolation. Containers are highly portable and applications can be bundled into a single unit and deployed in various environments without making any changes to the container. Because of the standard container format, developers only have to worry about the applications running inside the container and system administrators will take care of deploying the container onto the servers. This well segregated container management leads to faster application delivery. Moreover, building new containers is fast because containers are very light weight and it takes seconds to build a new container. This in turn reduces the time for DevOps including development, test, deployment and run-time operations . All in all, containers not only improve the software development life cycle in cloud significantly, but also provide a better QoS compared to non-containerized cloud applications. Several management tools are available for Linux containers, including LXC , lmctfy , Warden , and Docker .
Recently, a pattern has been adopted by many software-as-a-service providers in which both VMs and containers are leveraged to provide so called microservices. Microservices is an approach that allows more complex applications to be configured from basic building blocks, where each building block is deployed in a container and the constituent containers are linked together to form the cohesive application. The application’s functionality can then be scaled by deploying more containers of the appropriate building blocks rather than entire new instances of the full application. Microservice platforms (MSP) such as Nirmata , Docker Cloud  and Google Kubernetes  facilitate the management of such service paradigm. MSPs are automating deployment, scaling, and operations of application containers across clusters of physical machines in cloud. MSPs enable software-as-service providers to quickly and efficiently respond to customer demand by scaling the applications on demand, seamlessly rolling out new features and optimizing hardware usage by using only the resources that are needed.
Fig. 1 depicts the high-level architecture of MSPs and the way they leverage the backend public or private cloud (ie, infrastructure-as-a-service clouds).
3 System Description
In this section we describe the system under modeling with respect to Fig. 2 that has been derived from the conceptual model shown in Fig. 1. First we describe the microservice platform (MSP) in which users request containers. In MSP, a request may originate form two sources: first, direct requests from users that want to deploy a new application or service; second type would be runtime requests either directly from users or from applications (eg, consider adaptive applications) by which applications adapt to the runtime conditions; for example, scaling up the number of containers to cope with a traffic pick.
Now consider a user, eg, John, that wants to provide a new service for his customers. He would log in to his MSP and create a Host Group with 2 initial VMs for the new service; he sets the max size of host group to 5 VMs and each VM to run up to 4 containers; in other words, his host group can scale up to 5 VMs that can accommodate up to 20 containers. He requests 6 containers for the initial setup and deploys 3 containers on each VM. Now, John has 2 active VMs – each is running 3 containers. His application is adaptive so that in case of high traffic it will scale up by adding one container at a time. Consider a situation in which John’s application add one more container to the application due to high traffic. At this time, one VM is working with full capacity (ie, running 4 containers) and another one only has capacity for one more container. Since John has configured his cluster application to grow up to 5 VMs, the MSP requests a new VM from macroservice provider (ie, IaaS) due to high utilization in the host group. Thus John’s host group now has 3 active VMs and can accommodate 5 more containers. Now consider the reverse scenario in which a low traffic is going on and the application releases a few of containers during some time. If the host group utilization reaches a predefined low value then the MSP releases one VM. We capture all these events for every single user in a continues-time Markov chain that will be described in section4.1.
The steps incurred in servicing a request in MSP are shown in the upper part of Fig. 2. User requests for containers are submitted to a global finite queue and then processed on a first-come, first-serve basis (FCFS). A request that finds the queue full, will be rejected immediately. Also the request will be rejected if the application has already reached its capacity (for example in John’s scenario, if host group already has 5 running VMs each of which is running 4 containers). Once the request is admitted to the queue, it must wait until the VM Assigning Module (VMAM) processes it. VMAM finds the most appropriate VM in the user’s Host Group and then send the request to that VM’s so that the Container Provisioning Module (CPM) initiates and deploys the container (second delay). When a request is processed in CPM, a pre-build or customized container image is used to create a container instance. These images can be loaded from a public repository like Docker Hub  or private repositories.
In the macroservice infrastructure (lower part in Fig. 2), when a request is processed, a pre-built or customized disk image is used to create a VM instance . In this work we assume that pre-built images fulfill all user requirements and PMs and VMs are homogeneous. Each PM has a Virtual Machine Monitor (VMM), aka hypervisor, that configure and deploy VMs on a PM. In this work we allow users to submit a request for a single VM at a time. Two types of requests are arriving to the Macroservice Infrastructure: first, requests that are coming from the cloud users which are referred to as Macroservice users in Fig. 2
; the second type is from MSPs by which users/applications are asking for VMs to deploy/scale their containers. Self-adaptive applications in MSP might autonomously and directly ask VMs from the the backend cloud. Since the number of users/applications in MSP is relatively high, and they may submit requests for new VMs with low probability, the request process can be modeled as Poisson process[29, 30]. The same story is held for external request processes at both MSP and backend cloud.
The steps incurred in servicing a request in Macroservice Infrastructure (MSI) are shown in the lower part of Fig. 2. User requests are submitted to a global finite queue and then scheduler processes them on a FCFS-basis. A request that finds the queue full, will be rejected immediately. Once the request is admitted to the queue, it must wait until the scheduler in the Physical Machine Assigning Module (PMAM) processes it (first delay). Once the request is assigned to one of the PMs, it will have to wait in that PM’s input queue (second delay) until the VM Provisioning Module (VMPM) instantiates and deploys the necessary VM (third delay), then the actual service starts. When a running request finishes, the capacity used by the corresponding VM is released and becomes available for servicing the next request.
|MPM||Microservice Platform Module|
|Mean arrival rate to CSM|
|Mean time that takes to obtain a VM|
|Mean time that takes to release a VM|
|Min number of active VM in user’s Host Group|
|Max size of the user’s host group (VMs)|
|Max number of containers running on a VM|
|Mean value of container lifetime|
|The rate by which a container can be instantiated|
|Utilization in the user’s Host Group|
|Microservice global queue size, =|
|Blocking probability in the microservice global queue|
|Arrival rate of requests for VMs originated from CSM|
|VM release rate imposed by CSM|
|Mean waiting time in microservice global queue|
|PMAM||Physical Machine Assigning Module|
|PMSM||Physical Machine Sub-Model|
|Mean value of external request to PMSM|
|Aggregate arrival rate to PMSM that is equal to|
|Blocking probability due to lack of room in the macroservice global queue|
|Blocking probability due to lack of capacity in the cloud center|
|Size of global input queues|
|Mean look up time in the pool|
|Total probability of blocking ()|
|Probability of successful search in pool|
|Mean waiting time in macroservice global queue|
|Mean look up delay among pools|
|Mean waiting time in a PM queue|
|VMPM||Virtual Machine Provisioning Module|
|VMSM||Virtual Machine Sub-Model|
|Max number of VMs that a PM is set to host; is also the queue size in each PM|
|Mean value of VM lifetime|
|The rate by which a VM can be instantiated|
|Arrival rate to a PM in the pool|
|Number of PMs in the servers pool|
|Mean VM provisioning time|
|Mean total delay imposed on requests in macroservice infrastructure|
To model the behavior of this system, we design three stochastic sub-models, one captures the details of the microservice platform called Microservice Platform Module (MPM) and the other two capture the details of the macroservice infrastructure, namely, physical machine assigning module (PMAM) and virtual machine provisioning module (VMPM). We implement each module with a stochastic sub-model. We combine all three stochastic sub-models and build an overall interactive performance model. Then, we solve this model to compute the cloud performance metrics: request rejection probability, probability of immediate service and mean response delay as functions of variations in workload (request arrival rate), container lifetime, users quota (ie, host group size), number of container per VM and system capacity (i.e., number of PMs in cloud center). We describe our analysis in detail in the following sections, using the symbols and acronyms listed in Tables I and II.
4 Layered Analytical Model
In this paper, we implement the sub-models using interactive Continuous Time Markov Chain (CTMC). The sub-models are interactive such that the output of one model is input to the other ones and vice versa. Table III shows the modules and their corresponding stochastic sub-models, which will be discussed in detail in the following sections.
4.1 Microservice Platform Module
The container allocation process in microservice platform is described in the Container Sub-Model (CSM) shown in Fig. 3. CSM is a 3-dimensional CTMC with states that labeled as where indicates the number of requests in Microservice Global Queue, denotes the number of running containers in the platform and finally shows the number of active VMs in the user’s Host Group. In the other words, we assume all inter-event periods are distributed exponentially. Each VM can accommodate up to containers that is set by the user. Since the number of potential users is high and a single user typically submits requests at a time with low probability, the requests arrival can be adequately modeled as a Poisson process  with rate . Let be the rate at which a container can be deployed on a VM and
be the mean value of containers lifetime (i.e., both exponentially distributed). So, the total service rate for each VM is the product of number of running containers by. Assume and are the rates at which the MSP can obtain and release a VM respectively. CSM asks for a new VM from backend cloud (ie, macroservice infrastructure) when explicitly ordered by the MSP user or when the utilization of the host group is equal or greater than a predefined value. For state , utilization (ie, ) is defined as follows,
in which is the maximum number of containers that can be run on a single VM. The value of is indicating the ratio of active containers to the whole available containers at each state. On the other hand, if utilization drops lower than a predefined value, the CSM will release one VM to optimize the cost. A VM can be released if there is no running containers on it so that the VM should be fully decommissioned in advance. Also the CSM holds a minimum number of VMs in the Host Group regardless of utilization, in order to maintain the availability of the service (ie., ). The user may also set another value for its application(s) (ie, ) indicates the MSP can not request more than VMs from macroservice infrastructure on behalf of the user. Thus the application scale up at most to VMs in case of high traffic and scale down to VMs in times of low utilization. We set the global queue size (ie, ) to the total number of containers that it can accommodate at its full capacity (ie, = ). Note that the request will be blocked if the user reached its capacity, regardless of Global Queue state.
State indicates that there is no request in queue, no running container and the Host Group consists of VMs that is the minimum number of VMs that user maintain in its Host Group. Consider an arbitrary state such as , in which five transitions might happen:
Upon arrival of a new request the system with rate of moves to state if the user still has capacity (ie, ), otherwise the request will be blocked and the system stays in the current state.
The CSM instantiates a container with rate for the request in the head of Global Queue and moves to .
The lifetime of a containers is exponentially distributed and finishes with rate of and the system moves to .
If the utilization get higher than the threshold, MSP requests a new VM, and the system moves to state with rate .
Or, the utilization drops below a certain value, MSP decommission a VM, and the system releases the idle VM so that moves to with rate .
Note that CSM (depicted in Fig. 3) is only for one user (or one application in another sense); in this work we assume homogeneous users so that we only need to solve one CSM regardless of number of users at MSP. Suppose that is the steady-state probability, calculated the same way as in . For the CSM (Fig. 3) to be in the state . So the blocking probability in CSM can be calculated as follow,
We also interested in two probabilities by which the MSP requests () or releases () a VM.
Using these probabilities, the rate by which microservice platform requests (ie, ) or releases (ie, ) a VM can be calculated.
In order to calculate the mean waiting time in queue, we first calculate the number of requests in the queue as
Applying Little’s law , the mean waiting time in the global queue is given by:
Instantiation time for containers (ie, ) will be added to to obtain the total delay in microservice platform.
CSM has interactions with both physical machine provisioning sub-model (ie, PMSM, described in section 4.2.1) and virtual machine provisioning sub-model (ie, VMSM, described in section 4.2.2). and are used by PMSM and VMSM respectively. The details of interactions among sub-models will be explained fully in section 4.3.
4.2 Macroservice Infrastructure Model
4.2.1 PM Provisioning Sub-Model
The resource allocation process is described in the Physical Machine Sub-Model (PMSM) shown in Fig. 4. PMSM is a 2-dimensional CTMC (i.e., inter-event epocs are exponentially distributed) that records the number of requests in the global queue and the latest state of provisioning. The state indicates that the last provisioning was successful while shows that the last provisioning has been failed. We assume that the mean arrival rate is and the global queue size is . One more request can be at the deployment unit for provisioning thus, the capacity of system is .
Let be the success probability of finding a PM that can accept the current request in the pool. We assume that , is the mean look up delay for finding an appropriate PM in the pool that can accommodate the request. Upon arrival of first request, system moves to state . Afterwards, depending on the upcoming event, three possible transitions can occur:
Another request has arrived and system transits to state with rate . Note that this request might arrive from two sources; first directly from macroservice users or from microservice platform due to over utilization. See the inputs to the global queue in the lower part of Fig. 2.
Or, a PM in the pool accepts the request so that system moves back to state with rate .
Or, none of the PMs in the pool can accept the request so that the system moves to state with rate . This way, the scheduler gives another chance to the request for provisioning; if the second attempt doesn’t go through, the request will be rejected due to lack of capacity. From states , for two moves are possible: 1) the system goes to if the provisioning was successful or 2) moves to if the provisioning has been failed for the second time. At state the system moves back to regardless of the previous provisioning state. Note that the number of retry attempts, here is set to 2, can be adjusted in the cloud service controller .
In this sub-model the look up rate () and macroservice users’ request are exogenous parameters, microservice users’ requests () is calculated from CSM and success probability () is calculated from the VMSM that will be discussed in next section.
Using steady-state probabilities , blocking probability can be calculated.
requests may experience two kinds of blocking events:
Blocking due to a full global queue occurs with the probability of
Blocking due to insufficient resources (PMs) at server pools occurs with the probability of 
Eq. 10 is the ratio of aggregate rates by which the system blocks requests due to insufficient resources to all other rates over all states. The probability of reject is, then, . In order to calculate the mean waiting time in queue, we first establish the probability generating function (PGF) for the number of requests in the queue, as
The mean number of requests in queue is
Applying Little’s law , the mean waiting time in the global queue is given by (first delay):
Look up time for appropriate PM can be considered as a Coxian distribution with 2 steps (Fig. 5).
Therefore according to , the look-up time (second delay) can be calculated as follows.
4.2.2 VM Provisioning Sub-Model
Virtual Machine provisioning Sub-Model (VMSM) captures the instantiation, deployment and provisioning of VMs on a PM. VMSM also incorporates the actual servicing of each request (VM) on a PM. Fig. 6 shows the VMSM (a CTMC) for a single PM in the servers’ pool. As we assume homogeneous PMs, all VMSMs for the PMs are identical in terms of arrival and instantiation rates. Consequently, the server pool is modelled with a set of identical VMSMs so that we only need to solve one of them.
Each state in Fig. 6 is labeled by in which indicates the number of requests in PM’s queue, denotes the number of requests that is under provisioning by the hypervisor and is the number of VM that are already deployed on the PM. Note that we set the queue size at each PM to , the maximum number of VMs that can be deployed on a PM. Also the hypervisor can only deploy one VM at a time to the PM; so the value of is 0 when the instantiation unit is idle and will be 1 when it is deploying a VM. Let be the rate at which a VM can be deployed on a PM and be the service rate of each VM. So, the total service rate for each PM is the product of number of running VMs by . Note that in VMSM a VM may get released due to two events; first the service time of VM is finished (the rate is ) and second if the microservice platform specifically asks for termination of that VM (with rate of , see Fig. 6).
State indicates that the PM is empty and there is no request either in the queue or in the instantiation unit. Upon arriving a request, model transits to state . The arrival rate to each PM is given by:
in which is the number of PMs in the pool. Note that , used in (9), is obtained from PMSM. The state transition in VMSM can occur due to request arrival, VM instantiation or service completion. From state , system can move to state with rate . From , system can transit to with rate (i.e., instantiation rate) or upon arriving a request moves to state . From , system can move to with rate , transits to with rate (i.e., service completion or explicit request from microservice platform), or again upon arriving a request, system can move to state with rate .
Suppose that is the steady-state probability for the PM model (Fig. 6) to be in the state . Using steady-state probabilities, we can obtain the probability that at least one PM in the pool can accept the request for provisioning. At first we need to compute the probability that a PM cannot admit a request for provisioning ():
Therefore, probability of successful provisioning () in the pool can be obtained as
Note that is used as an input parameter in the PMSM (Fig. 4).
From VMSM, we can also obtain the mean waiting time at PM’s queue (third delay: ) and mean provisioning time (fourth delay: ) by using the same approach as the one that led to Eq. (13). As a result, the total delay in macroservice infrastructure before having VM ready for service, is given by:
Using Eq. 18 we can calculate and rates that are input parameters for container sub-model (CSM). We assume that, without loss of generality, the amount of time that takes to obtain a VM is equal to the time that is needed to release the VM.
4.3 Interaction among Sub-Models
The interactions among sub-models are depicted in Fig. 7 and also described in Table IV. From container sub-model (CSM), the request rate for new VMs (ie, ) as well as release rate of VMs (ie, ) can be calculated (see Eq. 5); these are as inputs to PM assigning sub-model (PMSM) and VM provisioning sub-model (VMSM), respectively. From macroservice model, which includes both PMSM and VMSMs, the amount of time that CSM can obtain (ie, ) or release (ie, ) a VM will be calculated; these two are input parameters for CSM. The VMSMs compute the steady-state success probability ( that at least a PM in the pool can accept the request. This success probability is used as input parameters to the PMSM. The PMSM computes the blocking probability, , which is the input parameter to VM provisioning sub-model. Since and are computed using both PMSM and VMSM, we show them as the outputs of the MSI in Fig. 7 as well as Table IV.
|Model or Sub-model||Input(s)||Output(s)|
As can be seen, there is an inter-dependency among sub-models. This cyclic dependency is resolved via fixed-point iterative method  using a modified version of successive substitution approach.
For numerical experiments the successive substitution method (Algorithm 1) is continued until the difference between the previous and current value of blocking probability in the global queues for both CSM and PMSM (i.e., and ) are less than as the max_err. Usually the integrated model converges to the solution in less than 15 iterations for each while-loop. When Algorithm 1 converged then performance metrics will be calculated.
It can be shown that fixed-point variable can be expressed as a function of and variable can be expressed as a function of and . For this reason, before starting the fixed-point iteration, we guess an initial value for and we check the values of in successive iterations to determine the condition for convergence in the inner loop. Same procedure will be followed for the external loop and finally the fixed point iteration will converge. Fixed point iteration starts by calling the function CSM(·) to obtain the steady solution of the container sub-model. Output of CSM(·) is the steady state queue size () that the job will be blocked in microservice platform. Function PMSM(·) uses this updated as an input parameter and solves the PMSM sub-model for steady state solution. Now the inner loop will iterate until we get a steady state solution for macroservice model. Then and will be calculated using Eq. 20. The steady state value for queue size at microservice platform is obtained and compared to the previous value. If the difference is less than the threshold, then the algorithm converge and the final steady state values for and will be calculated. Proof of existence of a solution for this method in general has be detailed in .
4.4 Scalability and flexibility of integrated model
Both macroservice infrastructure and microservice platform may scale up or down in terms of global queue sizes, number of PMs in the pool, number of VMs in user quota and the degree of virtualization (in both VM and container level) which are referred as design parameters. Table V shows the relationship between the number of states in each sub-model and their associated design parameters.
|Sub-Model||Design Parameters||No. of States: f(i)|
|CSM||f(,,) = -|
|PMSM||f() = 2 + 1|
|VMSM||f(m) = 3, if m=1|
|m||f(m) = 6, if m=2|
|f(m) = , if m2|
As can be seen from Table V, the number of states in each sub-model has a linear or polynomial dependency to design parameters which guarantees the scalability of the integrated model. Note that VMSMs are identical for all PMs in the pool. Therefore, we only need to solve one VMSM regardless of number of PMs in the macroservice infrastructure.
5 Numerical Validation
In this section we first describe our experimental setup on Amazon EC2 cloud; then we discuss the results and insights that we obtained from the real implementation and deployment of our microservice platform. Next the analytical model will be tuned and validated against experimental results. Finally, we leverage the analytical model to study and investigate the performance of microservice platform at large scale for various configuration and parameter settings.
5.1 Experimental Setup
Here, we present our microservice platform and discuss experiments that we have performed on this platform. For experiments we couldn’t use available third party platforms such as Docker Cloud or Nirmata as we needed full control of the platform for monitoring, parameter setting, and performance measurement. As a result, we have created a microservice platform from scratch based on the conceptual architecture presented in Fig. 1. We employed Docker Swarm as the cluster management system, Docker as the container engine and Amazon EC2 as the backend public cloud. We developed the front-end in Java for the microservice platform that interacts with the cluster management system (ie, Swarm) through REST APIs. The microservice platform leverages three initial VMs, two configured in worker mode and another one in master mode to manage the Docker Swarm cluster. All VMs are of type m3.medium (1 virtual CPU with 3.5 GB memory). In our deployment, we have used Consul as the discovery service, that has been installed on the Swarm Manager VM.
For the containers, we have used Ubuntu 16.04 image available on Docker Hub. Each running container was restricted to use only 512 MB memory, thus making the capacity of a VM to be 7 containers. The Swarm manager strategy for distributing containers on worker nodes was binpack. The advantage of this strategy is that fewer VMs can be used, since Swarm will attempt to put as many containers as possible on the current VMs before using another VM. Table VI presents the input values for the experiments.
|Initial cluster size||2||VM|
|Max cluster size||10||VM|
In order to control experiment’s costs, we have limited the cluster size to maximum of 10 running VMs for the application, which gives us a maximum capacity of 70 running containers. For the same reason, we set the container lifetime as 2 minutes in the experiment. Under this configuration, our experiment takes up to 640 minutes. The results of our experiment are presented in Fig. 8. Note that, the axis in Fig. 8 is experiment time in which we report the average values of performance indicators in every single minute; hereafter we call each minute of the experiments an iteration. As can be seen in Fig 8
, the result of experiment from iteration 60 to 580 has been ignored to present the interesting events and moments during the experiment more clearly on graphs111Another version of Figure 8, including the whole experiment, can be found here..
In the experiment, the lower and upper utilization thresholds are set to 70% and 90% respectively (shaded area in the forth plot of Fig. 8
that shows the areas where the cluster is underloaded or overloaded). The arrival rate has a Poisson distribution with mean value of 20 to 40 requests per minute shown in the second plot of Fig.8 with blue line. In the first plot, red line shows the number of running VMs and the blue line enumerates the number of running containers.
5.2 Experimental Results
An interesting observation here is the behavior of the system at the beginning of the experiment. The capacity of the cluster is 2 VMs that can support up to 14 containers. The experiment starts with 20 requests per minute for which the current capacity is not enough. Therefore, as can be seen from iteration 0 to 20 the number of VMs and containers increases linearly (first plot in Fig. 8). This is due to high utilization of the cluster (forth plot in Fig. 8) that triggers the autonomic manager to scale up the cluster. At the same time, in the second plot in Fig. 8, the response time is declining after a sudden jump (ie, up to approximately 140 s) and also we see many rejected requests due to full queue (indicated by legend “req. rej. (fq)”) during the first 20 iteration of the experiment.
In the third plot of Fig. 8, we show the throughput of the system by measuring the number of successful container that have been provisioned during each iteration in the experiment. Also for each iteration we show the percentage of containers that have been provisioned without queuing by yellow colour. In our experiment, if a request get service in less that a 10 ms we categorize it as immediate service which indicates that request did not experience any queuing in the system. As can be seen, during the first 20 iterations of the experiment almost all of the request for containers have experienced delay due to queuing.
Around iteration 20, the system reaches the capacity that can handle the workload so we can see that response time drops to less than a second (approximately 450 ms), there is no blocking request due to full queue anymore and utilization is back to normal area. Also we can see that after iteration 20 most of the request gets immediate service so that containers are being provisioned for customers without any delay (look at yellow bars in the third plot that are indicating this fact). We let the experiment to continue from iteration 60 to 580 while increasing the workload smoothly. During this time, the system adapt to the changing workload accordingly and maintains the performance indicators at desired ranges.
However, after iteration 580 we can see that (first plot in Fig. 8) the cluster is about its capacity (ie, 10 VMs and 70 containers). As a result, we notice some request rejections due to no capacity in the system (indicated with legend of “req. rej. (nc)” in the second plot). When workload get beyond the 30 requests per second (ie after iteration 600), the cluster gets its full capacity which leads to continues rejection of requests due to lack of capacity. In spite of request rejection and running at full capacity, performance indicators (ie, response time and immediate service) are desirable as opposed to what we have witnessed at the beginning of the experiment. In the last 20 iteration of the experiment since system is running at capacity the autonomic manager drops the new requests immediately so that the queue gets cleared very fast; therefore, when capacity becomes available the new request will get into the service immediately which results in very good response time as well. However, in the beginning of the experiment, ie first 20 iterations, the autonomic manager knows that there is extra capacity so it lets the requests to be queued till the new VMs get provisioned; VM provisioning takes around 110 seconds on average which contributes to the long response times at the beginning of the experiment.
5.3 Validation of the Analytical Model
In this section, we validate the analytical model with results of experiments presented in section 5.2. We use the same parameters, outlined in Table VI, for both experiments and numerical analysis. The analytical model has been implemented and solved in Python using NumPy, SciPy and Sympy libraries . Table VII shows the comparison between the results from analytical model and experiment. As can be seen, both analytical model and experimental results are well in tune with error less than 10%. Note that in the experiment, we consider requests that get into service less than 10 ms as immediate service; This value has been approximated based on our experience with SAVI  cloud in processing requests when there is no queuing. It might be different in other cloud data centers. It should be noted that 10 ms is the request processing and not the resource provisioning. Changing this value will directly impact the resulted number of requests with immediate service in our calculation in both analytical and experimental evaluation.
|Mean No of VMs||7.12||7.61|
|Mean No of Containers||39.8||43.8|
|Immediate Service Prob.||0.7415||0.782|
5.4 What-if Analysis and Capacity Planning
Thanks to minimal cost and runtime associated with analytical performance model, we leverage it to study interesting scenarios at large scale with different configuration and parameter settings to shed some light on MSP provisioning performance. More specifically, in this section we show how under different configurations and parameter settings, we obtain three important performance metrics, namely, rejection probability of requests, total response delay and probability of immediate service in both MSP and MSI. Note that due to space limit, we only present a subset of results in this section. Table VIII presents the range of individual parameter values the we set for numerical analysis.
|Microservice Platform (MSP)|
|No. of users in MSP||20||N/A|
|Arrival rate per user||request/min|
|No. of containers on each VM||N/A|
|Container provisioning time||millisecond|
|Normal Host Group utilization||N/A|
|Minimum Host Group size||2||VM|
|Macroservice Infrastructure (MSI)|
|No. of PMs in the pool||150||PM|
|No. of VMs on each PM||4||VM|
|Mean look-up rate in the pool||60||search/min|
|VM provisioning time||minute|
|Size of global queue||requests|
|These values have been obtained from experiments.|
In the first scenario we investigates the effects of containers lifetime and the users’ quota on rejection probability and total delay in both MSP and MSI. For the MSI, we set 2 days as the mean VM lifetime and assume 150 PMs in the servers’ pool each of which can run up to 4 VMs. In the first scenario, the results of which are shown in Fig. 9, as can be noticed both containers lifetime and user quota have significant impact on rejection probabilities. The impact of quota and lifetime is almost the same in MSP (Fig 9(a)) as well as MSI. Also, in order to harness the rejection probability under 10%, at least 4 VMs should be assigned to the user and the container lifetime should be less than 12 minutes in the MSP. However, as can be seen from Fig. 9(b), if the goal is to maintain less than 10% rejection in MSI, the container lifetime should be less than 12 minutes and at least 5 VMs should be assigned to the user’s host group. From Fig. 9(b), we can identify a very narrow area in which the backend cloud operates in stable regime (ie, the narrow dark blue rectangle) and will enter to unstable state with a little fluctuation either in containers’ lifetimes or users’ quota.
We also measure the delays in both micro and macro layers. Note that the delay in microservice level is the aggregate waiting and processing time before having the containers ready for service; in the macro layer it is all imposed delays on request before getting the VM up and ready to be used. Fig. 10 indicates that the trend of delays are in tune with rejection probabilities. From Fig. 10(a), it can be observed that if there is no need for a new VM, the request will be processed usually in less than a second but if there is no VM that can accommodate the new container the delay might increase up to 27 seconds. Fig 10(b) shows that, if the load is low, the macroservice infrastructure can provision a VM in 108 second on average (ie, only processing time) but as traffic intensity gets high it might take up to 170 seconds on average to get a VM.
For the second scenario, we fixed the user quota at 16 containers and then study the rejection probabilities and total delays by varying the containers lifetime and arrival rate of requests for containers. For the macroservice infrastructure, we set 2 days as the mean VM lifetime and again assume 150 PMs in the pool. In this scenario we let each PM to run 4 VMs. From Fig. 11(a), it can be noticed that if container lifetime increases linearly, the rejection probability increases exponentially while with increasing arrival rate it increases sub-linearly. Therefore, if the user does not ask for more than 2 containers per minute and containers lifetime stay bellow 12 minutes, the rejection probability would be less than 10%.
From the diagram in Fig. 11(b), we can see that MSP imposes a long delay on requests in worst cases compared to the first experiment (ie, Fig. 10(a)). For example, in microservice platform, when containers’ lifetime and arrival rate are high (ie, 20 minutes and 2 containers per minute respectively), delay may bump up to 80 seconds. One potential solution for addressing this long delay is to permit MSP ask for more than one VM at a time (ie, making batch requests).
We also characterize the probability of immediate service for both scenarios at both MSP and MSI. This metric shows how many of requests get into the service without any delay (ie, queuing delay). As can be seen in Fig. 12, the trend of immediate services probabilities are inverse to the trend of their corresponding rejection probabilities (ie, compare Fig. 12(a) with 9(a) and Fig. 12(b) with 11(a)). However it should be noted that these two probability are not truly complement as there are some requests that neither get rejected nor immediate service rather get into services after some queuing. Note that in the second scenario we don’t present the results for MSI due to page limit.
6 Related Work
Performance analysis of cloud computing has attracted considerable research attention although most of the works considered hypervisor based virtualization in which VMs are the sole way of providing isolated environment for the users [37, 38]. However, recently, container-based virtualization has been getting momentum due to its advantages over VMs for providing microservices.
Performance analysis of IaaS clouds has been investigated extensively under various configurations and use cases. In [14, 39, 40, 41], monolithic analytical performance models have been proposed and evaluated. An analytical model based on Markov chains to predict the number of cloud instances or VMs needed to satisfy a given SLA performance requirement such as response time, throughput, or request loss probability has been proposed in . In [28, 43], authors proposed a general analytical model for an end-to-end performance analysis of a cloud service. They illustrated their approach using IaaS cloud with three pools of servers: hot, warm and cold, using service availability and provisioning response delays as the key QoS metrics. The proposed approach reduces the complexity of performance analysis of cloud centers by dividing the overall model into sub-models and then obtaining the overall solution by iteration over individual sub-model solutions. Our work in this paper is based on  and .
Performance analysis of cloud services considering containers as a virtualization option is in its infancy. Much of the works have been focused on comparison between implementation of various applications deployed either as VMs or containers. In , authors did a performance comparison including a front end application server hosting Joomla PHP application and backend server hosting PostgreSQL database for storing and retrieving data for the Joomla application. They showed that containers have outperformed VMs in terms of performance and scalability. Container deployment process 5x more requests compared to VM deployment and also containers outperformed VMs by 22x in terms of scalability. This work shows promising performance when using containers instead of VMs for service delivery to end users.
The authors in [44, 45] performed a more comprehensive study on performance evaluation of containers under different deployments. They used various benchmarks to study the performance of native deployment, VM deployment, native Docker and VM Docker. In native deployment, the application is installed in a native OS; in VM deployment, the application is installed in a vSphere VM; in native Docker, the application is installed in a container that is being run on a native OS; and finally, a Docker container including the application is deployed on a vSphere VM that itself is deployed on a native OS. Redis. has been used as the backend datastore in this experiment. All in all, they showed that in addition to the well-known security, isolation, and manageability advantages of virtualization, running an application in a Docker container in a vSphere VM adds very little performance overhead compared to running the application in a Docker container on a native OS. Furthermore, they found that a container in a VM delivers near native performance for Redis and most of the micro-benchmark tests.
These studies reveal a promising future for using both virtualization techniques in order to deliver secure, scalable and high performant services to the end user [46, 47]. The recent popularity of microservice platforms such as Docker Tutum , Nirmata  and Google Kubernetes  are attributed to such advantages mentioned above. However, to the best of our knowledge, there is no comprehensive performance model that incorporates the details of microservice platform and macroservice infrastructure. In this work, we studied the performance of PaaS and IaaS collaborating with each other to leverage both virtualization techniques for providing fine-grained, secure, scalable and performant services.
In this paper, we presented a performance model suitable for analyzing the provisioning quality of microservice platforms while incorporating the details of servicing in the backend IaaS cloud, using interacting stochastic models. We have developed a comprehensive analytical model that captures important aspects including microservices management, resource assigning process, and virtual machine provisioning. The performance model can assist cloud micro/macro service providers to maintain their SLA in a systematic manner. In other words, the proposed and evaluated performance model in this paper provides a systematic approach to study the elasticity of microservice platforms by evaluating the provisioning performance at both microservice platform and the back-end cloud.
We have also implemented a microservice platform from scratch to estimate related parameters to be used for calibration of the analytical model. After introducing the measurements provided by the real implementation into the performance model, we carried out extensive numerical analysis to study the effects of various parameters such as request arrival rate for containers, container lifetime, VM lifetime, virtualization degree and the size of the users’ application on the request rejection probability, probability of immediate service and response time. We showed that using the performance model, we can characterize the behavior of the system at scale for given configurations and therefore facilitate the capacity planning and SLA analysis by both micro and macro service providers.
We would like to thank Dr. Murray Woodside for his valuable technical comments and inputs. This research was supported by the Natural Sciences and Engineering Council of Canada (NSERC).
-  A. Khan, “Key characteristics of a container orchestration platform to enable a modern application,” IEEE Cloud Computing, vol. 4, no. 5, pp. 42–48, September 2017.
-  S. Soltesz, H. Pötzl, M. E. Fiuczynski, A. Bavier, and L. Peterson, “Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors,” in ACM SIGOPS Operating Systems Review, vol. 41, no. 3. ACM, 2007, pp. 275–287.
-  D. Bernstein, “Containers and cloud: From lxc to docker to kubernetes,” IEEE Cloud Computing, vol. 1, no. 3, pp. 81–84, 2014.
-  J. Beda. (2015, 05) Containers at scale: the Google Cloud Platform and Beyond. [Online]. Available: https://speakerdeck.com/jbeda/containers-at-scale
-  A. Celesti, D. Mulfari, M. Fazio, M. Villari, and A. Puliafito, “Exploring container virtualization in IoT clouds,” in IEEE International Conference on Smart Computing. IEEE, 2016, pp. 1–6.
-  B. Burns and D. Oppenheimer, “Design patterns for container-based distributed systems.” in HotCloud, 2016.
-  D. Merkel, “Docker: lightweight linux containers for consistent development and deployment,” Linux Journal, vol. 2014, no. 239, p. 2, 2014.
-  H. Khazaei, C. Barna, N. Beigi-Mohammadi, and M. Litoiu, “Efficiency analysis of provisioning microservices,” in IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2016, pp. 261–268.
-  Docker Cloud. (2017, 3) The Docker Platform for Dev and Ops. [Online]. Available: https://cloud.docker.com
-  Google, Inc. (2018, 10) Manage a cluster of Linux containers as a single system to accelerate Dev and simplify Ops. [Online]. Available: http://kubernetes.io/
-  Nirmata, Inc. (2017, 3) Microservices Operations and Management. [Online]. Available: http://nirmata.com
-  hook.io, Inc. (2018, 9) Microservices and Webhook Hosting. [Online]. Available: http://hook.io
-  vamp.io, Inc. (2018, 9) Automation and Controls for Enterprise Devops. [Online]. Available: http://vamp.io
-  H. Khazaei, J. Mišić, and V. B. Mišić, “Performance analysis of cloud computing centers using queueing systems,” vol. 23, no. 5, p. 1, 2012.
-  F. Longo, R. Ghosh, V. K. Naik, and K. S. Trivedi, “A scalable availability model for infrastructure-as-a-service cloud,” Dependable Systems and Networks, International Conference on, pp. 335–346, June 2011.
-  VMware, Inc. (2016, 6) VMware vSphere Hypervisor. Website. [Online]. Available: http://www.vmware.com/products/vsphere-hypervisor
-  KVM. (2016, 6) Kernel-based Virtual Machine. [Online]. Available: http://www.linux-kvm.org/page/Main_Page
-  Citrix Systems, Inc. (2016, 6) Xen Hypervisor. Website. [Online]. Available: http://www.xen.org
-  Microsoft, Inc. (2016, 6) Windows Server Virtualization. [Online]. Available: http://www.microsoft.com/en-ca/server-cloud/solutions/virtualization.aspx
-  Rackspace Cloud Computing. (2016, 6) Openstack, an open source software for creating private and public clouds. [Online]. Available: https://www.openstack.org
-  Apache Fundation. (2016, 6) Apache CloudStack. [Online]. Available: https://cloudstack.apache.org/
-  Eucalyptus Systems, Inc. (2016, 6) HPE Helion Eucalyptus. [Online]. Available: http://www8.hp.com/us/en/cloud/helion-eucalyptus-overview.html
-  W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and linux containers,” technology, vol. 28, p. 32, 2014.
-  A. M. Joy, “Performance comparison between linux containers and virtual machines,” in Computer Engineering and Applications (ICACEA), 2015 International Conference on Advances in. IEEE, 2015, pp. 342–346.
-  S. Graber et al. (2016, 6) LXC-Linux containers. [Online]. Available: https://linuxcontainers.org
-  V. Marmol et al. (2016, 6) Let me contain that for you. [Online]. Available: https://github.com/google/lmctfy/blob/master/README.md
-  Warden, Inc. (2016, 6) Cloud Foundry Warden documentation. [Online]. Available: http://docs.cloudfoundry.org/concepts/architecture/warden.html
-  H. Khazaei, J. Mišić, and V. B. Mišić, “A fine-grained performance model of cloud computing centers,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 11, pp. 2138–2147, November 2013.
-  K. S. Trivedi, Probability and Statistics with Reliability, Queuing and Computer Science Applications, 2nd ed. Wiley, July 2001.
-  V. Mainkar and K. S. Trivedi, “Sufficient conditions for existence of a fixed point in stochastic reward net-based iterative models,” Software Engineering, IEEE Transactions on, vol. 22, no. 9, pp. 640–653, September 1996.
-  G. Grimmett and D. Stirzaker, Probability and Random Processes, 3rd ed. Oxford University Press, Jul 2010.
-  L. Kleinrock, Queueing Systems, Volume 1, Theory. Wiley-Interscience, 1975.
-  SAVI. (2018, October) Smart applications on virtual infrastructure. [Online]. Available: http://www.savinetwork.ca
-  D. P. Heyman and M. J. Sobel, Stochastic Models in Operations Research. Dover, 2004, vol. 1.
-  R. Ghosh, F. Longo, V. K. Naik, and K. S. Trivedi, “Modeling and performance analysis of large scale iaas clouds,” Future Generation Computer Systems, July 2012.
-  SciPy. (2016, 6) A python-based ecosystem of open-source software for mathematics, science, and engineering. [Online]. Available: http://scipy.org
-  Y. Rochman, H. Levy, and E. Brosh, “Dynamic placement of resources in cloud computing and network applications,” Performance Evaluation, vol. 115, pp. 1–37, 2017.
-  H. Raei, N. Yazdani, and R. Shojaee, “Modeling and performance analysis of cloudlet in mobile cloud computing,” Performance Evaluation, vol. 107, pp. 34–53, 2017.
-  H. Khazaei, J. Mišić, and V. B. Mišić, “Performance of cloud centers with high degree of virtualization under batch task arrivals,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 12, pp. 2429–2438, December 2013.
-  D. Bruneo, “A stochastic model to investigate data center performance and qos in iaas cloud computing systems,” Parallel and Distributed Systems, IEEE Transactions on, vol. 25, no. 3, pp. 560–569, 2014.
-  S. Vakilinia, M. M. Ali, and D. Qiu, “Modeling of the resource allocation in cloud computing centers,” Computer Networks, vol. 91, pp. 453–470, 2015.
-  K. Salah, K. Elbadawi, and R. Boutaba, “An analytical model for estimating cloud resources of elastic services,” Journal of Network and Systems Management, pp. 1–24, 2015.
-  H. Khazaei, J. Mišić, V. B. Mišić, and S. Rashwand, “Analysis of a pool management scheme for cloud computing centers,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 5, pp. 849–861, 2013.
-  VMware, Inc. (2016, 6) Docker containers performance in vmware vsphere. [Online]. Available: https://blogs.vmware.com/performance/2014/10/docker-containers-performance-vmware-vsphere.html
-  W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and linux containers,” in Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium On. IEEE, 2015, pp. 171–172.
-  U. Gupta, “Comparison between security majors in virtual machine and linux containers,” arXiv preprint arXiv:1507.07816, 2015.
-  M. Villamizar, O. Garces, H. Castro, M. Verano, L. Salamanca, R. Casallas, and S. Gil, “Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud,” in Computing Colombian Conference. IEEE, 2015, pp. 583–590.