Site Reliability Engineering: Application of Item Response Theory to Application Deployment Practices and Controls

08/15/2020
by   Kiran Mahesh ND, et al.
0

Reliability of an application or solution in production environment is one of the fundamental features where every SRE team is critically focused upon. At the same time achieving extreme reliability comes with the cost which include but not limited to slow pace of new feature deployments, operations cost and opportunity cost. One such earlier effort in giving an objective metric to strike the fine balance between acceptable reliability and product velocity is error budget and its associated policy. There are also contemporary deployment guidelines and controls per organization to ascertain the reliability of an application deployment version into customer facing or production environments. This work proposes new objective metrics called Application Deployment Score estimated using dichotomous Item Response Theory model. This score is used to assess the improvement trend of each application version deployed into customer facing environment, identify the improvement scope for each application deployment in each area of deployment guidelines and controls, adjust the error budget i.e. soft error budget of a interdependent application in application mesh by giving soft collective responsibility and finally defines a new metric called deployment index which helps to assess the effectiveness of these contemporary deployment guidelines and controls in upholding the agreed SLOs of the application in customer facing environments. This study opens a new field of research in developing new underlying latent indexes (i.e. new objective metrics) in SRE and DevOps space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2023

The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models

When deploying machine learning models in production for any product/app...
research
02/22/2019

MPP: Model Performance Predictor

Operations is a key challenge in the domain of machine learning pipeline...
research
01/26/2021

Using a Balanced Scorecard to Identify Opportunities to Improve Code Review Effectiveness: An Industrial Experience Report

Peer code review is a widely adopted software engineering practice to en...
research
07/19/2022

Balancing the trade-off between cost and reliability for wireless sensor networks: a multi-objective optimized deployment method

The deployment of the sensor nodes (SNs) always plays a decisive role in...
research
02/09/2018

Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data

Accurately predicting customer churn using large scale time-series data ...
research
08/17/2021

On the equivalence of holding cost and response time for evaluating performance of queues

This self-contained discussion relates the long-run average holding cost...

Please sign up or login with your details

Forgot password? Click here to reset