Automated Probe Life-Cycle Management for Monitoring-as-a-Service

09/21/2023
by   Alessandro Tundo, et al.
0

Cloud services must be continuously monitored to guarantee that misbehaviors can be timely revealed, compensated, and fixed. While simple applications can be easily monitored and controlled, monitoring non-trivial cloud systems with dynamic behavior requires the operators to be able to rapidly adapt the set of collected indicators. Although the currently available monitoring frameworks are equipped with a rich set of probes to virtually collect any indicator, they do not provide the automation capabilities required to quickly and easily change (i.e., deploy and undeploy) the probes used to monitor a target system. Indeed, changing the collected indicators beyond standard platform-level indicators can be an error-prone and expensive process, which often requires manual intervention. This paper presents a Monitoring-as-a-Service framework that provides the capability to automatically deploy and undeploy arbitrary probes based on a user-provided set of indicators to be collected. The life-cycle of the probes is fully governed by the framework, including the detection and resolution of the erroneous states at deployment time. The framework can be used jointly with existing monitoring technologies, without requiring the adoption of a specific probing technology. We experimented our framework with cloud systems based on containers and virtual machines, obtaining evidence of the efficiency and effectiveness of the proposed solution.

READ FULL TEXT
research
01/01/2021

Declarative Dashboard Generation

Systems of systems are highly dynamic software systems that require flex...
research
05/25/2019

Safely and Quickly Deploying New Features with a Staged Rollout Framework Using Sequential Test and Adaptive Experimental Design

During the rapid development cycle for Internet products (websites and m...
research
03/14/2018

CloudHealth: A Model-Driven Approach to Watch the Health of Cloud Services

Cloud systems are complex and large systems where services provided by d...
research
10/03/2013

C2MS: Dynamic Monitoring and Management of Cloud Infrastructures

Server clustering is a common design principle employed by many organisa...
research
01/29/2018

Rapid Testing of IaaS Resource Management Algorithms via Cloud Middleware Simulation

Infrastructure as a Service (IaaS) Cloud services allow users to deploy ...
research
08/27/2021

Graph-based Incident Aggregation for Large-Scale Online Service Systems

As online service systems continue to grow in terms of complexity and vo...
research
02/12/2019

COP2: Continuously Observing Protocol Performance

As enterprises move to a cloud-first approach, their network becomes cru...

Please sign up or login with your details

Forgot password? Click here to reset