Optimizing Waiting Thresholds Within A State Machine

10/08/2018
by   Rohit Pandey, et al.
0

Azure (the cloud service provided by Microsoft) is composed of physical computing units which are called nodes. These nodes are controlled by a software component called Fabric Controller (FC), which can consider the nodes to be in one of many different states such as Ready, Unhealthy, Booting, etc. Some of these states correspond to a node being unresponsive to FCs requests. When a node goes unresponsive for more than a set threshold, FC intervenes and reboots the node. We minimized the downtime caused by the intervention threshold when a node switches to the Unhealthy state by fitting various heavy-tail probability distributions. We consider using features of the node to customize the organic recovery model to the individual nodes that go unhealthy. This regression approach allows us to use information about the node like hardware, software versions, historical performance indicators, etc. to inform the organic recovery model and hence the optimal threshold. In another direction, we consider generalizing this to an arbitrary number of thresholds within the node state machine (or Markov chain). When the states become intertwined in ways that different thresholds start affecting each other, we can't simply optimize each of them in isolation. For best results, we must consider this as an optimization problem in many variables (the number of thresholds). We no longer have a nice closed form solution for this more complex problem like we did with one threshold, but we can still use numerical techniques (gradient descent) to solve it.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/06/2019

Asymptotic Analysis Based Greedy Method for Threshold-Based Distributed Optimization of Persistent Monitoring on Graphs

We consider the optimal multi-agent persistent monitoring problem define...
07/31/2021

Application of hypercomplex number system in the dynamic network model

In recent years, the direction of the study of networks in which connect...
11/06/2019

Asymptotic Analysis for Greedy Initialization of Threshold-Based Distributed Optimization of Persistent Monitoring on Graphs

We consider the optimal multi-agent persistent monitoring problem define...
04/30/2019

Some results on multithreshold graphs

Jamison and Sprague defined a graph G to be a k-threshold graph with thr...
07/30/2018

Distributed Stochastic Optimization in Networks with Low Informational Exchange

We consider a distributed stochastic optimization problem in networks wi...
07/19/2019

Learning sparsity in reservoir computing through a novel bio-inspired algorithm

The mushroom body is the key network for the representation of learned o...
05/22/2013

A novel automatic thresholding segmentation method with local adaptive thresholds

A novel method for segmenting bright objects from dark background for gr...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.