Software is increasingly delivered as a service through clouds and other scalable platforms. Microservices allow for a software application to be developed in a distributed way as well as increasing its resilience and scalability. They typically interact with each other using REST APIs (Dragoni2017, ), message queues or service meshes. Separate teams of developers can work on individual component services of a much larger application (7030212, ) and some of the resulting software artefacts can be reusable and thus placed in a marketplace for other developers to integrate to other projects. Beyond existing repositories specific to programming languages such as Maven Central (maven, ), RubyGems (ruby, ) and the Python Package Index (pypi, ), microservice-specific repositories, hubs and marketplaces have shown growth over recent months. These include Docker Hub (docker, ), Helm Hub (helm, ) and the Amazon Serverless Application Repository (awssar, ) among others.
The widespread adoption of these marketplaces also causes concern as little quantitative data is available to developers, leading to opportunistic design decision with potentially unpredictable results.
We envision the globally operated Microservice Artefact Observatory (MAO (mao, )) as a scientific community effort to monitor and analyze these artefact marketplaces and provide insights through a combination of metadata monitoring, static checks and dynamic testing. For this purpose, we contribute a first working prototype of the corresponding federated research infrastructure for resilient tracking and analysis of marketplaces and artefacts. More importantly, we contribute and outlook of how researchers and developers can benefit from such an approach. A key feature of the infrastructure is a novel method of continuously generating ground truth data resulting from consensus voting over the individual observations and insights. This ground truth is usable by the software engineering and distributed systems communities for further studies.
2. Related Work
Various works have delved into the monitoring and testing of quality aspects of microservice architectures. Some approaches focus on the metadata and logs generated by the microservices (8103476, ), rather than benchmarking specific architectures (DBLP:journals/corr/abs-1807-10792, ; 7133548, ) or testing dependencies and service interactions (8377834, ). Researchers looked also at runtime monitoring of microservice-based applications (8377902, ). More recently, there were attempts (Gan:2019:OBS:3297858.3304013, ) for benchmarking architectural models based on microservices to study the implications of the design pattern on real world applications. Sieve (DBLP:journals/corr/abs-1709-06686, ) extracts usable metrics for developers by monitoring these software artefacts. A similar approach was proposed in (7958458, ) for corporate managers. Finally, there are efforts to define a standard set of requirements for the orchestration of microservices (10.1007/978-3-319-74781-1_16, ) as well as surveys to detect trends in microservice development (DBLP:journals/corr/abs-1808-04836, ).
We observe the lack of an approach to systematic monitoring of microservice-related marketplaces. Current tools and platforms are aimed at analyzing and monitoring specific case studies and architectures to a significant depth, but the publicly available reusable artefacts are, for the most part, not scrutinized for their quality properties nor tracked with respect to potential improvements or regressions. We argue that such an approach is necessary for providing preemptive quality metrics to developers that aim to reuse publicly available artefacts, as well as researchers studying the evolution of the field and emerging trends and issues.
Prior work include a data crawler of Helm charts on the Kube Apps Hub (DBLP:journals/corr/abs-1901-00644, ), on the Serverless Application Repository (DBLP:journals/corr/abs-1905-04800, ) for QA and preservation purposes, as well as DApps across multiple marketplaces (dapps, ). The current output of this effort includes static analysis software tools, as well as experimental datasets (sardata, ; helmdata, ; dappsdata, ) in addition to the research papers and preprints. We intend to integrate and upstream the efforts and vision presented in this paper into these open initiatives with the goal that future metrics collections no longer rely on standalone, brittle and centralised tools.
A clear gap we have identified in the state of the art is a large scale monitoring system with the capability to track the evolution of these marketplaces and the artefacts within, to provide both broad and deep insight on the state of the ecosystem.
3.1. Problem Statement
Treating reusable microservice artefacts as black or grey boxes when designing a complex application can lead to wrong assumptions about their properties and thus unpredictable behavior of the application itself. Small mistakes in a microservice implementation or configuration, even in dependency third-party artefacts, might endanger the entire application as shown in Fig. 1. The quality of compositions is after all limited by their weakest point. Providing information in the form of metadata analysis, code quality, performance benchmarking and security evaluation for these artefacts can increase the predictability of their behaviour in production by allowing the developers to make informed design decisions and be aware of potential issues ahead of time. This would also help researchers in the field have access to insights on development trends and emerging anti-patterns in microservice architectures.
Additionally, current methods used to monitor software repositories lack infrastructure to enable large scale collaborations and robust historical tracking. Yet, we argue that such infrastructure is a requirement for monitoring projects of this scale, especially since an international consortium (albeit an informal one in this case) is involved. We identify that the capability of reliable continuous tracking within a decentralized system as well as an algorithm or set of algorithms to establish ground truth when multiple data snapshots are submited by peers within the collaboration as the key features of the proposed infrastructure.
Our approach to this problem is focused around the following research questions:
RQ 1: How can a distributed, federated system enable more efficient and resilient monitoring and analysis of microservice artefacts at the scale of a marketplace?
RQ 2: How can cluster consensus be utilised within the federation to establish ground truth about artefact quality metrics?
We envision the work to be pursued along two main dimensions: (1) the establishment and engineering of the proposed observatory infrastructure, and (2) the exploitation of said infrastructure within our primary use case of collecting artefact metrics.
The engineering component involves providing the functionality needed to assist and automate all aspects of the data management pipeline, from scheduling data acquisition tools to comparing snapshots of the data from different nodes to reach a ground truth measurement. Additionally, a main requirement is to provide resilience and reliability, to protect from hardware outages and corrupt data files.
Fig. 2 shows the current architecture for the orchestration system. It primarily consists of an orchestration/scheduling service that manages the data acquisition tools. Nodes access an etcd (etcd, ) cluster to share registry information such as available tools to deploy, dataset repositories and notifications. An additional gateway service can also be deployed to connect external tools if needed.
Apart from data acquisition automation the orchestration system aims to assist in collaborations by partially automating the replication of experiments and the verification of results. Fig. 3 shows the current concept algorithm that will be implemented in the next stages. A node that runs an experiment can announce its results to other nodes (Fig. 3-➋). If the nodes have not executed this data acquisition tool (Fig. 3-➌), they will simply accept this measurement. However, if they already have a measurement (Fig. 3-➍), they will respond to the announcement, indicating a comparable data snapshot in the etcd registry. The first node will then retrieve these snapshots from their respective repositories (e.g., git) and run the verification algorithm (Fig. 3-➎).
The verification makes a distinction between performance metrics and quality or security metrics. In the case of benchmarking or other performance metrics (as labeled in the dataset itself) the verification will produce average, minimum and maximum values as the ground truth measurement. For metrics such as metadata or vulnerability characteristics, where average values would be meaningless, the ground truth measurement will be arrived at via clustering.
Once the verification is complete and the ground truth data snapshot is established, it will once again be announced to all nodes (see Fig. 3-➏), this time marked as verified so nodes can accept it unless a more recent measurement has surfaced.
The second dimension will focus on the exploitation of the new architecture within the MAO use case in order to both further the goals of MAO of reliable tracking of the evolution of microservice artefact repositories, as well as evaluate the improvement this federated system brings to the current state-of-the-art research practice. As the observatory grows, with more tools and metrics and more collaborators, so will out view of the system’s effectiveness increase, allowing us to better gauge how it scales when applied to a large real-world scenario. For this purpose, we have initiated a free collaborative network of researchers and pilot software engineers222MAO collaboration: https://mao-mao-research.github.io/.
4. Preliminary Results
Early contributions in this work have been divided between improving the infrastructure of the experiments, and developing additional tooling for monitoring and testing artefacts.
We developed a distributed monitoring architecture using a geo-distributed cluster using etcd for metadata exchange among peers and a Docker-based scheduler-orchestrator application for members to run the monitoring tools (orchestrator, ). The system is currently operated in our own servers in Switzerland, with a pilot deployment in Argentina and additional deployments being discussed with MAO-member researchers.
The first monitoring tool deployed on the system is an automatic crawler of Dockerhub’s public API. It collects image metadata and provides basic insights on the development trends within the ecosystem (collector, ). The first version focused on OS support and CPU architecture for each image, to better understand the extent of Docker’s support for heterogeneous ecosystems. Figure 4 shows the current set of data produced by the crawler. We tracked the evolution of support for different CPU architectures over time. The graph shows the number of images for ARM, x86-64 and IBM Z architectures for each date of the tracking period, along with a trend line to highlight increase/decrease over time.
A concurrent experiment focuses on extending the existing study of the AWS Serverless Application Repository with a benchmarking tool, to test the collected artefacts. To that end, an emulation based on LocalStack (localstack, ) and sam local (samlocal, ) is currently under development. The aim is to emulate the behaviour of AWS’s serverless offering to provide more accurate metrics than current unit-test tools. We also quantitatively assess Dockerfiles with multiple linters on the source level.
5. Conclusion and Future Work
We proposed an architecture to analyze data from monitored artefact marketplaces. This architecture is currently being extended to support dynamic execution of the gathered artefacts, and to allow on-demand comparison of data between nodes, to establish ground truth data via cluster consensus.
The analysis will be extended towards a more diverse set of artefacts types, such as, such as Kubertetes Helm Charts, Docker images and Compose files, Kubernetes Operators and others, furthering the goals of the MAO project while simultaneously allowing us to evaluate the observatory architecture at scale.
Additionally, we aim to integrate the knowledge base we build over time into data-driven QA tooling, that can be used within CI/CD pipelines, thus providing real-time quality analysis and feedback to developers.
We thank Paolo Costa, John Wilkes and reviewers of the original paper for feedback and suggestions leading to this extended presentation of our position and approach.
- (1) AWS Serverless Application Repository. https://aws.amazon.com/serverless/serverlessrepo/. Accessed 02.06.2020.
- (2) Docker Hub. https://hub.docker.com/. Accessed 02.06.2020.
- (3) etcd. https://etcd.io/. Accessed 02.06.2020.
- (4) Helm Hub. https://hub.helm.sh/. Accessed 02.06.2020.
- (5) Maven Repository: Central. https://mvnrepository.com/repos/central. Accessed 02.06.2020.
- (6) PyPi. The Python Package Index. https://pypi.org/. Accessed 02.06.2020.
- (7) RubyGems. https://rubygems.org/. Accessed 02.06.2020.
- (8) LocalStack. https://localstack.cloud/, 2019. Accessed: 29.02.2020.
- (9) Dockerhub Collector. https://github.com/EcePanos/Dockerhub-Collector, 2020. Accessed: 29.02.2020.
- (10) MAO-MAO: Microservice Artefact Observatory. https://mao-mao-research.github.io/, 2020. Accessed: 14.02.2020.
- (11) MAO Orchestrator. https://github.com/serviceprototypinglab/mao-orchestrator, 2020. Accessed: 29.02.2020.
- (12) Sam Local. https://github.com/thoeni/aws-sam-local, 2020. Accessed: 29.02.2020.
- (13) Brogi, A., Canciani, A., Neri, D., Rinaldi, L., and Soldani, J. Towards a reference dataset of microservice-based applications. In Software Engineering and Formal Methods (Cham, 2018), A. Cerone and M. Roveri, Eds., Springer International Publishing, pp. 219–229.
- (14) Chang, K. S., and Fink, S. J. Visualizing serverless cloud application logs for program understanding. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (Oct 2017), pp. 261–265.
- (15) Dragoni, N., Giallorenzo, S., Lafuente, A. L., Mazzara, M., Montesi, F., Mustafin, R., and Safina, L. Microservices: Yesterday, Today, and Tomorrow. Springer International Publishing, Cham, 2017, pp. 195–216.
Gan, Y., Zhang, Y., Cheng, D., Shetty, A., Rathi, P., Katarki, N., Bruno,
A., Hu, J., Ritchken, B., Jackson, B., Hu, K., Pancholi, M., He, Y., Clancy,
B., Colen, C., Wen, F., Leung, C., Wang, S., Zaruvinsky, L., Espinosa, M.,
Lin, R., Liu, Z., Padilla, J., and Delimitrou, C.
An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems.In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (New York, NY, USA, 2019), ASPLOS ’19, ACM, pp. 3–18.
- (17) Ma, S., Fan, C., Chuang, Y., Lee, W., Lee, S., and Hsueh, N. Using service dependency graph to analyze and test microservices. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (July 2018), vol. 02, pp. 81–86.
- (18) Mayer, B., and Weinreich, R. A dashboard for microservice monitoring and management. In 2017 IEEE International Conference on Software Architecture Workshops (ICSAW) (April 2017), pp. 66–69.
- (19) Papapanagiotou, I., and Chella, V. Ndbench: Benchmarking microservices at scale. CoRR abs/1807.10792 (2018).
- (20) Phipathananunth, C., and Bunyakiati, P. Synthetic runtime monitoring of microservices software architecture. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (July 2018), vol. 02, pp. 448–453.
- (21) Qasse, I., and Spillner, J. DApps Quality Characteristics Dataset. Zenodo dataset at https://doi.org/10.5281/zenodo.3382126, August 2019.
- (22) Qasse, I. A., Spillner, J., Talib, M. A., and Nasir, Q. A Study on DApps Characteristics. In 2nd IEEE International Conference on Decentralized Applications and Infrastructures (DAPPS) (August 2020). To appear - conference shifted due to coronavirus.
- (23) Rahman, M., and Gao, J. A reusable automated acceptance testing architecture for microservices in behavior-driven development. In 2015 IEEE Symposium on Service-Oriented System Engineering (March 2015), pp. 321–325.
- (24) Spillner, J. AWS-SAR Dataset. https://github.com/serviceprototypinglab/aws-sar-dataset, 2019. Accessed 13.05.2019.
- (25) Spillner, J. Duplicate refuction in Helm Charts. https://osf.io/5gkxq/, 2019. Accessed 13.05.2019.
- (26) Spillner, J. Quality assessment and improvement of helm charts for kubernetes-based cloud applications. CoRR abs/1901.00644 (2019).
- (27) Spillner, J. Quantitative analysis of cloud function evolution in the AWS serverless application repository. CoRR abs/1905.04800 (2019).
- (28) Thalheim, J., Rodrigues, A., Akkus, I. E., Bhatotia, P., Chen, R., Viswanath, B., Jiao, L., and Fetzer, C. Sieve: Actionable insights from monitored metrics in microservices. CoRR abs/1709.06686 (2017).
- (29) Thönes, J. Microservices. IEEE Software 32, 1 (Jan 2015), 116–116.
- (30) Viggiato, M., Terra, R., Rocha, H., Valente, M. T., and Figueiredo, E. Microservices in Practice: A Survey Study. CoRR abs/1808.04836 (2018).