Cloud elasticity and software defined paradigm are the enabling building blocks for auto-scalability which is one of the required and vital features for nowadays cloud software systems including infrastructures, platforms and applications. Due to dynamic nature of cloud environments, workloads, and internal states, cloud software systems needs to constantly adapt themselves to the new conditions to maintaining their service level agreements (SLA) while utilizing their resources efficiently.
Recently, a pattern has been adopted by many software-as-a-service providers in which both virtual machines (VMs) and containers are leveraged to offer their solutions as microservices. This way, strong isolation, preferred for higher order of security, inherited from hypervisor-based virtualization will be coupled with flexibility and portability of containers to offer reliable and easy-to-manage services. Scaling algorithms and solutions in such scenarios are required to provide scalability for both VMs and containers.
So far, we have seen customized and vendor-specific auto-scalability features by cloud service providers such as IBM (IBM, 2017), Amazon (Amazon, 2017) and others. General purpose solutions in the open source community, as we survey them in Section 5, often deal with either microservices (i.e., containers) or macroservices (i.e., VMs) rather than the whole stack of the application. Such shortcomings motivated us to design and implement a general-purpose, cloud and application agnostic monitoring and auto-scaling solution for nowadays cloud software systems, which embodies both micro and macro services.
The name Elascale is inspired from Elasticsearch (Elasticsearch, 2017) as Elascale leverages ELKB family including Elasticsearch, Logstash, Kibana and Beats (Beats, 2017) for the monitoring, data processing/storage and visualization purposes. Also, Elascale leverages a core scalability engine that implements the monitor-analyze-plan-execute-knowledge, i.e., MAPE-K loop, introduced by IBM (IBM, 2005). Elascale automatically instrument the cloud application, for collecting performance metrics at all layers, and then applies a default threshold-based, reactive scalability algorithm for all application’s micro and macro services. It discovers all services automatically, and make the user able to set values for scaling algorithm, e.g., thresholds, scaling steps, cooling time and the like. Extendability, including supporting various applications, cloud service providers, scaling algorithms, visualization and etc, has been the primary concerns in designing and implementing Elascale solution. Figure 1 shows the high level architecture of the Elascale.
This paper is organized as follows; in Section 2 present the background concepts and technologies for our work in this paper. In Section 3 we present Elascale in details. Section 4 presents the experimental evaluation and Section 5 surveys related work and projects. In the end, Section 6 concludes the paper and highlights the immediate future work for Elascale.
In this section we briefly describe the main related concepts which helps us to present Elascale effectively.
Microservices are becoming the main trend in cloud-based application development (Turnbull, 2014). The traditional monolithic application is decomposed into small pieces that provide a single service: the full capabilities of the application emerge from the interaction of these small pieces. Microservices are independent from each other and organized around capabilities, e.g. user interface, front-end, etc. Their decoupling allows developers to use the best technology for their implementation according to the task they have to accomplish: the application becomes polyglot, involving different programming languages and technologies. An application composed of microservices is inherently distributed, being divided into hundreds of different microservices, deployed in a large network infrastructure, that communicate both in a synchronous or asynchronous way, using REST or a message-based system respectively. Through the use of microservices, the application can scale efficiently as it is possible to scale only the microservices that are under heavy load, not the entire application. The microservices architecture embodies the principles of the DevOps (a clipped compound of “development” and “operations”) movement, promoting the automation of deployment and testing and reducing the burden on management and operations (Florio and Di Nitto, 2016).
Microservices usually run into containers like Docker (Turnbull, 2014; Bernstein, 2014). Docker containers can run an application as an isolated process on a host machine, including only the application and all its dependencies (the kernel of the Operating System is shared among other containers) and providing to it only the resources it requires. Docker containers are different from a fully virtualized system like a virtual machine: a virtual machine contains a full OS that runs in isolation on physical resources that are virtualized by an hypervisor on the basis of the ones available in the host machine; a Docker container uses the resources available in the host (both a physical or a virtualized one) that are assigned to it by the Docker Engine. The consequence is that Docker containers can share physical resources and are lightweight: it is possible to run multiple containers on the same machine starting them in seconds. Docker allows developers to implement their application and their services using the technology or language that is most suitable to them. Services deployed in a Docker container can be scaled or replaced just starting or stopping the container running that specific service. Moreover Docker containers can be deployed in very different settings, from servers in a cloud computing infrastructure to ARM-based IoT devices.
2.2.1. Docker Swarm
Distributed applications need distributed system and compute resources on it. Docker Swarm is a clustering and scheduling tool, which offers functionalities to turn a group of Docker Systems (Nodes) into a Virtual Docker System. It builds a cooperative group of systems that can provide redundancy if one or more nodes fail. Swarm provides workload balancing for containers (Rouse, 2017). It assigns containers to underlying nodes and optimizes resources by automatically scheduling container workloads to run on the most appropriate host with adequate resources while maintaining necessary performance levels (Naik, 2016). An IT administrator or developer controls Swarm using a swarm manager, which organizes and schedules containers. Kubernetes (Authors, 2017) is an alternative for Docker Swarm that is introduced by Google.
2.2.2. Docker Machine
Docker Machine is a tool that makes it easy to provision and manage multiple Docker hosts remotely from a personal computer. Such servers are commonly referred to as Dockerized hosts that can be used to run Docker containers. Docker Machine supports various backend cloud service providers such as Amazon Web Services, Microsoft Azure, Digital Ocean, Google Compute Engine, Exoscale, Generic, OpenStack, Rackspace, IBM Softlayer and VMware vCloud Air which makes it ideal to manage the macroservices centrally regardless of their location and vendor (Docker.com, 2017). Docker Machine in combination with Docker Swarm can be used to build a virtual system of systems in multiple clouds (Naik, 2016).
2.3. ELK and Beats
Beats are lightweight agents that can send performance metrics data from containers, VMs or large number of software products to Logstash or Elasticsearch (Beats, 2017). Currently, there exist more than 70 different Beats, developed by Elastic and the community in the Beats family, including Dockbeat and Metricbeat that have been used in the initial version of Elascale. Logstash is a dynamic data collection pipeline with an extensible plugin ecosystem and strong Elasticsearch synergy. Elasticsearch is a distributed, JSON-based search and analytics engine designed for horizontal scalability, maximum reliability, and easy management. Kibana gives shape to the data and is the extensible user interface for configuring and managing all aspects of the Elastic Stack (Elasticsearch, 2017).
We now present our approach, Elascale111https://gitlab.com/hamzeh.khazaei/Elascale. The two important components of Elascale are: (i) an ELKB based monitoring stack, and (ii) an auto-scaling engine. Together, the two component provides auto-scalability for any cloud application that has been deployed through microservices. Currently, Elascale purely supports Docker technology family and the plan is to support non-docker solution in next versions, Kubernetes for clustering in particular. Therefore, Elascale V0.9 (current version) assumes and leverages the followings:
Docker as the container engine
Application has been deployed through ‘service’ or ‘stack’ commands
Docker Swarm for clustering
Docker Machine to manage the backend cloud(s)
Elascale itself is deployed as a microservice on Swarm Master node or another machine that has access to the Docker engine on the Swarm Master node. The following command deploys Elascale as a microservice on Swarm Master node which is set to scale both microservices and macroservices for the current application:
The above command triggers scalability for both microservices and macroservices, could be false for each, and assumes Swarm as the cluster management system. Also it assumes that Docker Machine has already been installed and configured. Now we dive down under the hood to see what will happen after issuing above command:
First, it will create a virtual machine and then deploy Elasticsearch, Logstash and Kibana as “replicated” docker services.
Elascale deploys Metricbeat and Dockbeat on all VMs using a “global” docker service; this way any newly added node to the Swarm cluster will get these two beats automatically.
Elascale then discovers all microservices and create “microservice.ini” that contains the list of application microservices along with default parameters to be used by the Elascale’s scaling engine. Figure 1(a) shows an excerpt of this file for our sample IoT application. As can be seen, auto scalability is disabled by default for services; at this time a web user interface will be generated out of this file for the user to be customized. The user, i.e., the application owner can change parameters to suit their needs.
With the same fashion, Elascale discovers macroservices, i.e., virtual machines, and create “macroservice.ini” file to be customized by the user in the Elascale web UI, like microservices. Figure 1(b) shows an excerpt of this file for our sample IoT application.
After customizing the configuration files by the application owner, the Elascale auto-scaling engine starts monitoring the application’s microservices and macroservices based on the active scaling algorithm.
The default scaling algorithm in Elascale uses different criteria to scale out/in services. We adopt function as the generic formula to scale the micro and macro services.
in which . In addition to cpu, memory, and network utilization (percentages %), we incorporate replication factor (i.e., ) for services which is the dependency factor among services. For example, defining the target replication factor (i.e., ) of service ‘’ to service ‘’ as 50% means that for each instance of service , i.e., for each container, there should be at least 2 containers of up and running. The main idea of using the ratio of target replication factor to current replication factor (i.e., ) is to maintain QoS in times of nodes’ failure or network partitioning. In other words, if due to an internal failure some of services become down or unreachable, Elascale redeploys missing services autonomically regardless of resource utilizations. The weights of parameters (i.e., ) can be tuned based on a bottleneck analysis that is performed during application test. By assigning more weight to , we can guarantee a higher level of reliability for our application. The details of the default algorithm has been elaborated in (Khazaei et al., 2017). Figure 3 shows the data flow and command flow between target application and Elascale.
4. Experimental Evaluation
In this section, we elaborate on our experimental setup as well as results. In the experiment, we use the latest stable versions of all software components, as of June 2017, in Elascale and the sample IoT application.
4.1. Experimental Setup
We deployed a sample IoT application using SAVI-IoT (Khazaei et al., 2017) platform on SAVI Cloud (SAVI, 2015). Our application leverages the SAVI Core-Cloud at the University of Toronto and one of the SAVI edges located at the University of Victoria. Using SAVI-IoT platform, we create required VMs on the Core and the Edge cloud; this process includes provisioning of VMs and installing Docker packages (i.e., Docker engine and Swarm). Then, a Swarm cluster is created out of provisioned VMs; the Swarm master will be located at the Core-Cloud. All VMs are labeled and tagged with their roles and locations. Next, the microservices will be deployed on top of macroservices at all layers; related microservices will be linked to constitute the application logic by the SAVI-IoT platform.
In the sample IoT application, as sensors, we deploy containerized virtual sensors that collect performance metrics including, CPU utilization, network load and memory consumption of themselves. We refer to these containers as “virtual-sensor-container”. In other words, each virtual-sensor-container embodies three probe sensors that report resource utilizations every 15 seconds. Every virtual-sensor-container will be attached to a co-located aggregator, i.e., a microservice that is running Kafka (kafka.apache.org), automatically. The Kafka service on that aggregator takes the responsibility to forward the aggregated data from virtual sensors to the upper service. Here we set the Kafka service to aggregate sensor data for every 60 seconds and then send them up to a service named IoT-Edge-Processor. This streaming service at the Edge-Cloud aggregates all the received data streams from it’s aggregators and ingest them into the Cassandra datastore (cassandra.apache.org) located at the Core-Cloud. Figure 4 shows the experimental setup.
We examine a normal shape workload to evaluate both out-scaling and in-scaling of the application. A client program requests for new sensors according to a Poisson process. Each request is translated to a virtual-sensor-container which embodies 3 virtual sensors for measuring cpu, memory and network load. The new virtual-sensor-container will be attached to an aggregator automatically. By adding more and more virtual sensors, the resource utilization at aggregators gets increased. If it reaches the upper threshold, here set to 70%, according to function , Elascale scales out the aggregator microservice.
Eq. 2 reveals that the aggregator service (i.e., Kafka) is cpu intensive as the weight of is equal to all others combined. Here we set the replication factor of Kafka service relative to virtual-sensor-containers. We use and functions for the Edge and Core services.
IoT-Edge-Processor service is memory intensive while Cassandra service is intensive to all resources equally; these sensitivities have been reflected in Eqs 3 and 4. The replication factor of IoT-Edge-Processor service has been set relative to Kafka service and the replication factor for Cassandra service has been set based on IoT-Edge-Processor service. Therefore, as can be inferred, Eqs 2, 3 and 4 strive to maintain the balance between each service and it’s lower level service. Note that, replication factor can be defined differently depending on the managed application logic.
After some time, by adding more and more sensors, the application reaches it’s upper limit capacity; an upper limit capacity for a cloud application may be set for various reasons (Barna et al., 2017). We let the application to run at full capacity for some time and then we configure the client program to remove virtual sensors with the same Poisson process to see if Elascale shrinks the whole application accordingly.
|Type||Network||RAM||CPU Quota of VM|
|Type_a||dedicate overlay||512 MB||25.0%|
|Type_b||dedicate overlay||1250 MB||33.0%|
|Type_c||dedicate overlay||3 GB||50.0%|
4.2. Results and Discussion
We set the upper capacity limit for our application as Table 3. Elascale has been deployed on the Swarm Manager node using the same command presented in Listing LABEL:lst:elascale. The top panel in Figure 5 shows the number of containers and VMs for the sample IoT application during the experiment. The solid lines represent the number of VMs and dashed lines show the number of containers in the application during the experiment that took around 150 minutes; we refer to each minute of the experiment as an iteration. For the first 10 minutes, application is working with initial sensors so no scaling has been initiated. It can be seen that initial configuration for the edge is one virtual-container-sensor, one Kafka container, and one IoT-Edge-Processor container (see the first 10 minute in Figure 5).
|Service||VM (#)||Container per VM||Container (#)|
At iteration 9, we turn the client program on to request new sensors. For any request, the application provisions a virtual-container-sensor, that includes 3 virtual probe sensors. As can bee seen in the top plot of Figure 5, Elascale first scales the Kafka microservice by adding more containers as the 70% threshold has been reached according to Eq 2. Consequently, after some time, around iteration 13, Elascale scales the IoT-Edge-Processor service as well. The scaling out process of microservices is going on until around iteration 25 in which the Elascale, this time, scales the macroservice, i.e., adding one VM, for the aggregator service as the existing VM is filled with containers. We can see that the macroservices scaling is also happened for the IoT-Edge-Processor service at iteration 40. This cascading scaling continues until application reaches its capacity limit around iteration 55. Afterwards, requests for new sensors will be rejected by the application.
|Name||Microservice||Macroservice||Application Agnostic||Cloud Agnostic||as-a-service||Extendability|
|Ladder (Larrakoetxea, 2017)||No||Yes||N/A||Could be||Could be||High|
|Autoscaler (Wielgus, 2017)||No||Yes||N/A||Yes||No||Medium|
|Orbiter (Arbezzano, 2017)||Yes||Yes||Partially||Partially||Could be||High|
|Cluster-autoscaler (Zheng, 2017)||Yes||No||Yes||N/A||No||Low|
|App-autoscaler (Yang, 2017)||No||Yes||Yes||Yes||Yes||Medium|
|Zenscaler (Richer, 2017)||Yes||No||Yes||N/A||No||Medium|
|k8s-kapacitor-autoscale (Cook, 2017)||Yes||No||Yes||N/A||No||Low|
Around iteration 100, we set the client program to remove the virtual sensors with the same process. As can be seen, Elascale shrinks both microservices and macroservices to maintain optimized resource utilization. Around iteration 130, Elascale scales-in the application to the initial state as all the added sensors have been removed by the client program. The lower threshold for in-scaling has been set to 40% for all services.
As the application owner, we set the “auto_scale” parameter for Cassandra micro and macro services to “False” as based on our experiments, it wouldn’t be beneficial to scale Cassandra datastore at high load. Adding more nodes or storage capacity to Cassandra datastore at high load, will have negative effects on application performance for a long time (even hours) due to data replication and synchronization processes in background.
The bottom plot in Figure 5 shows the provisioning time for both macroservices and microservices. In macroservice level, provisioning means a) creating the VM at backend cloud b) installation of Docker services, c) joining to the application swarm cluster and c) labeling node based on their roles in the application. As can be seen in the bottom plot (i.e., blue bars), provisioning macroservices takes 50 to 150 seconds depending on the VM specifications. In terms of microservice, the provisioning time is in order of milliseconds. Provisioning at microservice includes, loading the Docker image (images will be available locally after first instantiations) and configure it to be part of the target service.
It worth noting that provisioning time is different than “contribution time”. We refer to contribution time as the amount of time that is needed for the new resources (i.e., VMs or containers) to virtually contribute into the application.
Contribution time is based on the application logic and the nature of the service that is being scaled. As a general rule, stateless services (e.g., load balancers) have a contribution time close to their provisioning time, while statefull services have a much longer contribution time compared to the provisioning time (i.e., distributed datastores). As a result, elasticity can be quantified based on provisioning time while scalability is more related to contribution time (Khazaei et al., 2017).
5. Related Work
Autoscaling has received significant attention from academia and industry, especially with the emphasis placed on sustainable computing and the emergence of cloud computing and its elastic resources. Most approaches to autoscaling are application/platform and infrastructure specific, and rely on expert knowledge of the application, or on exhaustive experimentations to derive such knowledge. Provided that, reactive and predictive solutions are proposed for autoscaling; a survey of such existing solutions can be found in (Lorido-Botran et al., 2014). Such solutions are orthogonal to our solution and are not discussed here.
Leveraging both macroservices and microservices is a promising approach to develop and deploy cloud applications as each technology brings different features to the table (Barna et al., 2017; Khazaei et al., 2016) for software engineers and application providers. Both VMs and containers can be directly controlled and managed by the application in an autonomous manner to maintain the SLAs as well as operation costs. However, most of solutions proposed in the literature, are either focused on application (i.e., containers) or infrastructure (i.e., VMs). As mentioned above, most of them are also application or cloud specific. There is no solution out there to be first application and cloud agnostic and second considers both macro and microservices at the same time.
In open source community there has been some efforts to provide general purpose solutions and frameworks to address and provide auto-scalability and monitoring as service though, none of which is truly application and cloud agnostic while considering both micro and macro services. Table 4 summarized active projects in open source community to date. We also pair Elascale in this table to facilitate a straightforward comparison.
In Table 4 “Extendability” refers to the potential of the solution for supporting other type of micro or macro services; the amount of development effort is the primary criteria in this quality. As can be seen, Elascale can scale both containers and VMs regardless of the application or cloud service provider.
In this paper we presented and evaluated Elascale that is a cloud/application agnostic auto-scaling engine and monitoring system. Elascale itself may be deployed as a microservice on the application cluster manager node and establishes monitoring and auto-scalability automatically. More specifically, it first discovers microservices and macroservices then incorporates the user inputs regarding final customization through a web UI and finally monitors the whole application stack to be scaled in or out if deemed necessary. Also, Elascale provides a monitoring dashboard, in which the application owner can see the live status of the whole application stack. Elascale has been designed to be highly extendable, incorporating new scaling algorithms in particular.
As future work, we plan to add Kubernetes support and a generic-sophisticated predictive scaling algorithms to Elascale for various types of cloud applications.
Acknowledgements.This research was supported the Natural Sciences and Engineering Council of Canada (NSERC), and the Ontario Research Fund for Research Excellence under the Connected Vehicles and Smart Transportation (CVST) project.
- Amazon (2017) Amazon. 2017. AWS Auto Scalability. https://aws.amazon.com/autoscaling. (2017). Accessed: 2017-07-05.
- Arbezzano (2017) Gianluca Arbezzano. 2017. Orbiter. https://github.com/gianarb/orbiter. GitHub repository (2017).
- Authors (2017) The Kubernetes Authors. 2017. Production-Grade Container Orchestration. https://kubernetes.io. (2017). Accessed: 2017-07-05.
- Barna et al. (2017) Cornel Barna, Hamzeh Khazaei, Marios Fokaefs, and Marin Litoiu. 2017. Delivering elastic containerized cloud applications to enable DevOps. In Proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. IEEE Press, 65–75.
- Beats (2017) Beats. 2017. Lightweight Data Shippers. https://www.elastic.co/products/beats. (2017). Accessed: 2017-07-05.
- Bernstein (2014) David Bernstein. 2014. Containers and cloud: From lxc to docker to kubernetes. IEEE Cloud Computing 1, 3 (2014), 81–84.
- Cook (2017) Nathaniel Cook. 2017. Kapacitor + Kubernetes Autoscaling. https://github.com/influxdata/k8s-kapacitor-autoscale. GitHub repository (2017).
- Docker.com (2017) Docker.com. 2017. Docker Machine. https://docs.docker.com/machine. (2017). Accessed: 2017-07-05.
- Elasticsearch (2017) Elasticsearch. 2017. The Open Source Elastic Stack. https://www.elastic.co. (2017). Accessed: 2017-07-05.
- Florio and Di Nitto (2016) Luca Florio and Elisabetta Di Nitto. 2016. Gru: An Approach to Introduce Decentralized Autonomic Behavior in Microservices Architectures. In Autonomic Computing (ICAC), 2016 IEEE International Conference on. IEEE, 357–362.
- IBM (2005) IBM. 2005. An Architectural Blueprint for Autonomic Computing. Technical Report. IBM.
- IBM (2017) IBM. 2017. Bluemix Auto Scaling. https://www.ibm.com/cloud-computing/bluemix/auto-scale. (2017). Accessed: 2017-07-05.
- Khazaei et al. (2017) Hamzeh Khazaei, Hadi Bannazadeh, and Alberto Leon-Garcia. 2017. SAVI-IoT: A Self-Managing Containerized IoT Platform. In the 5th IEEE International Conference on Future Internet of Things and Cloud (FiCloud). IEEE.
- Khazaei et al. (2016) Hamzeh Khazaei, Cornel Barna, Nasim Beigi-Mohammadi, and Marin Litoiu. 2016. Efficiency Analysis of Provisioning Microservices. In Cloud Computing Technology and Science (CloudCom), 2016 IEEE International Conference on. IEEE, 261–268.
- Larrakoetxea (2017) Xabier Larrakoetxea. 2017. Ladder. https://github.com/themotion/ladder. GitHub repository (2017).
- Lorido-Botran et al. (2014) Tania Lorido-Botran, Jose Miguel-Alonso, and Jose Antonio Lozano. 2014. A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments. J. Grid Comput. 12, 4 (2014), 559–592.
- Naik (2016) Nitin Naik. 2016. Building a virtual system of systems using Docker Swarm in multiple clouds. In Systems Engineering (ISSE), 2016 IEEE International Symposium on. IEEE, 1–3.
- Richer (2017) Maximilien Richer. 2017. Zenscaler. https://github.com/Zenika/zenscaler. GitHub repository (2017).
- Rouse (2017) Margaret Rouse. 2017. Docker Swarm. http://searchitoperations.techtarget.com/definition/Docker-Swarm. (2017). Accessed: 2017-07-05.
- SAVI (2015) SAVI. 2015. Smart Applications on Virtual Infrastructure, http://www.savinetwork.ca. Cloud platform. (June 2015).
- Turnbull (2014) James Turnbull. 2014. The Docker Book: Containerization is the new virtualization. James Turnbull.
- Wielgus (2017) Marcin Wielgus. 2017. Autoscaler. https://github.com/kubernetes/autoscaler. GitHub repository (2017).
- Yang (2017) Bo Yang. 2017. Appautoscaler. https://github.com/cloudfoundry-incubator/app-autoscaler. GitHub repository (2017).
- Zheng (2017) Zihong Zheng. 2017. Cluster Proportional Autoscaler. https://github.com/kubernetes-incubator/cluster-proportional-autoscaler. GitHub repository (2017).