 # On Optimal Operational Sequence of Components in a Warm Standby System

We consider an open problem of optimal operational sequence for the 1-out-of-n system with warm standby. Using the virtual age concept and the cumulative exposure model, we show that the components should be activated in accordance with the increasing sequence of their lifetimes. Lifetimes of the components and the system are compared with respect to the stochastic precedence order. Only specific cases of this optimal problem were considered in the literature previously.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

As an introductory reasoning, consider first one component that starts operating at . Assume that in the process of production it had acquired an initial unobserved resource (Finkelstein 

). For mechanical or electronic items, for instance, it can be a ‘distance’ between the initial value of the key parameter and the boundary that defines a failure of the component. It is natural to assume that it is a continuous random variable with the Cdf

 F(r)=P(R≤r). (1.1)

A similar notion of a random resource (hazard potential) was considered in Singpurwala . Suppose that for each realization of the component’s remaining resource is monotonically decreasing with time. Therefore, the run out resource, to be called wear, monotonically increases. The wear in can be defined as

 W(t)=t∫0w(u)du, (1.2)

where denotes the rate of wear. Thus the value of is an intrinsic property of a manufactured item, whereas the rate defines the ‘consumption’ of in a given environment. The larger rate corresponds to a severer environment, whereas can be often considered as a baseline one. The failure occurs when the wear reaches . Denote the corresponding random time by . Then

 P(T≤t)≡P(R≤W(t))=F(W(t)). (1.3)

Therefore, the described survival model can be interpreted in terms of the accelerated life model (ALM)(Nelson ; Bagdonavicius and Nikulin ). Our reasoning in what follows will be based on the ALM (1.3), whereas the discussion above can be considered as a useful interpretation.
In applications, the most common specific case is the cumulative exposure model (Nelson ), which corresponds to the case when the scale transformation in (1.3) is linear, i.e.,

 P(T≤t)≡P(R≤wt)=F(wt). (1.4)

Engineering systems, especially those that are used in mission-critical applications such as aerospace, power generation, flight control and computing, are often designed with redundancies in order to meet the stringent safety and reliability requirements (Levitin et al. [10, 11]). One of the widely-applied redundancy techniques in various applications is the standby redundancy, when one or several components operate and redundant components serve as the standby spares. In the case of failure of an operating component, a replacement procedure is initiated to activate a standby component and to replace the failed one so that a system continues to operate.
According to its failure characteristics before the activation, a standby component can be categorized as ‘hot’, ‘cold’, or ‘warm’. A hot standby component works concurrently with the online primary component and thus is ready to take over at any time for fast recovery. In this case, the standby component is fully exposed to the operating stress and is characterized by the same failure rate as the online one. A cold standby component is unpowered and shielded from operation and environmental stresses. As a more general option that, e.g., can take into account the non-ideal standby mode conditions or/and partial loading, a warm standby component is characterized by the failure rate that is smaller than that for the fully operational component. (Yun and Cha ; Levitin et al. [10, 11]; Zhang et al. ; Hazra and Nanda ). Obviously, the former two types of loading are the special cases of the warm standby mode.
Reliability analysis of the warm standby systems is much more challenging than that for cold and hot standby. Indeed, the lifetime of a cold standby system is just the sum of lifetimes of all components; the lifetime of a hot standby system is just a maximum of individual lifetimes, whereas in the warm standby case, a switch of the regimes from the warm standby to the operational mode should be taken into account. In accordance with the linear cumulative exposure model based on the scale transformation (1.4) with , the equivalent lifetime (virtual age) of a warm standby component that had spent some time in this mode before switching to the active mode is this time reduced by the lifetime deceleration factor plus the lifetime spent in the active mode afterwards. More general models not restricted to the case of a linear scale transformation are usually based on the notion of the ‘virtual age’. See, e.g., Cha et al.  and Finkelstein  for applications of the virtual age concept to regimes switching.

###### Remark 1.1

Note that we can arrive at (1.3) formally without employing the notion of resource. Indeed, let a more severe environment be the baseline and denote the corresponding lifetime in it by . The lifetime of a component in a milder environment should be larger. Assume that this is in the sense of usual stochastic ordering, i.e, , which implies that

 Fm(t)=F(W(t)),

where and the time dependent scale transformation function is increasing and for all .

Optimal (in terms of maximizing reliability characteristics of a system) activation sequence for components obviously does not exist in a hot standby system, is trivial (no difference) for the cold standby system and is meaningful for a general warm standby system. Only some special cases (see Cha et al.  and Zhai et al. ) for the latter case were considered in the literature. In this note, we are considering the problem in a much more generality and therefore, under certain assumptions, solving an open problem of theoretical reliability.

## 2 Problem formulation

We want to obtain an optimal sequence of activation of the standby components for a heterogeneous system of components, with one active component and others in a warm standby mode. We assume that in a standby mode all components are characterized by the same deceleration factor . Generalization to the general case will be also discussed. Intuitive reasoning based on the notions of resource of the components prompts us that we must first activate the weakest component, then the weakest from the remaining, etc. Specific cases in the literature support this intuition. However, in what stochastic sense must we order components and other assumptions of the model are crucial for the corresponding proof.
Denote the lifetimes of the components of the system in active (operational regime) by , . Assume that they are ordered in some non-specified for now stochastic sense, i.e.,

 T1≤T2≤⋯≤Tn. (2.1)

For definitions of various stochastic orders see, e.g., Shaked and Shantikumar . If the operating component fails, the next operable one (that did not fail in the warm standby mode) is activated, etc. The question is to define a sequence of activation for standby components that will maximize the lifetime of the whole system (in some stochastic sense). Some important specific cases were studied in Cha et al.  and Zhai et al. , where

• The hazard rate ordering was considered for the lifetimes of two components. Then it was proved that one should start with the weaker in this sense component, which results in the maximum expected lifetime of a system.

• For the -out-of-

system, only the specific case of exponentially distributed lifetimes and linear model (

1.4) was considered. Then, under the assumption of the hazard rate ordering it was proved that if activation starts with the weakest component, and the next weakest is chosen from the remaining components, etc., reliability of the system will be maximal in the sense of the usual stochastic order.

The goal of the current study is to consider this problem in more generality for arbitrary lifetime distributions which is a challenging open problem. We think that the choice of stochastic ordering in the previous work was preventing authors from obtaining more general results. In what follows, we use the stochastic precedence order (to be defined in the next section), which is natural in many reliability settings and, in spite of this, not sufficiently explored in the literature so far.
The problem to be considered is based on the definition of the warm standby mode via the general model (1.3) or its specific case (1.4). It should be noted that this is an assumption itself (note that all previous specific studies of reliability of the warm standby systems relied on these or similar expressions). However, in order to consider switching from one regime to another, one must have a stochastic model for that. The virtual age concept based on ALM (1.3)-(1.4) is a well-established in the literature way to deal with this.

## 3 Two components

Let us consider first the system with two components with lifetimes in an operational mode ordered as in some stochastic sense to be defined below. Let , and let be the realizations of , , and be the corresponding realization of . Then

 P(Z≥0)=P(T2≥T1). (3.1)

Denote by the lifetime of a system when the first component is activated first and by when the second is activated first and and the corresponding realization. We will show later that under given assumptions

 z≥0⟹y12−y21≥0,

which, as each realization of corresponds to the realization of , implies that

 Z≥0⟹Y12−Y21≥0. (3.2)

Thus, specifically, if

 P(Z≥0)≥0.5, then P((Y12−Y21)≥0)≥0.5, (3.3)

which, in fact, is the definition of the stochastic precedence (sp) order for the components and for the variants of the system as well (Boland et al. ; Finkelstein )

 T2≥spT1⟹Y12≥spY21.

Thus the stochastic precedence order for two random variables says that and it seems to be natural in many reliability settings, e.g., for the stress-strength reliability modeling (Finkelstein ). It is also consistent for the current problem, as the components and the variants of the system will be ordered only in the sense of this order. Note that the stochastic precedence order is weaker than the usual stochastic order (Boland et al. ). On the other hand, comparison with the ordering of expectations depends on parameters involved (Finkelstein ).
In spite of its obvious attractiveness the stochastic precedence order had attracted much less attraction in the literature and only a few papers are devoted to it (Boland et al. ; Finkelstein ). However, it may be the most natural one in many reliability settings (e.g., stress/strength problems). In fact, it was suggested in Finkelstein  to call it (at least at some instances) the stress-strength order, which naturally compares two random variables as in structural reliability. For recent advances, see Santis et al. , and Montes and Montes .
We will first prove the following result.

###### Theorem 3.1

Let the following stochastic precedence order holds for the two component system described above.

 T2≥spT1.

Then the corresponding order of components achieves the maximum lifetime of a system in the sense of the stochastic precedence order, i.e.,

Proof: Let be the realizations of , and let . If the first component start first, then the corresponding realization of a lifetime of a system in accordance with the linear cumulative exposure model (1.4) with for a milder regime is

 t1+(t2−wt1)=t2+(1−w)t1>t2, (3.4)

where is the virtual (equivalent) age of the second component just after switching to activation (from a warm standby mode) and, therefore, the remaining lifetime in this realization is .
Let now the second (better) component start first. We have two specific cases:
Case I: , (where ), which means that the first component (in a warm mode) will fail before the active second component. Note that as is the age of the first component at failure (in an active mode), in accordance with the model, is the age of the first component at failure if it is operates all time in the warm standby mode. Thus the lifetime of a system in this case is just .
Case II: . This means that the active second component fails before the warm standby one and that the switching should be performed at . Then the lifetime of a system in this realization is the sum

 t2+αt1−t2α=t1+t2(1−w), (3.5)

where is the time that the first component should operate (after ), if it were operating in the warm standby mode. However, it was switched to the active mode and this time should be recalculated as .
Thus we must compare (3.4) with (3.5).

 t2−wt1>t2(1−w),

which is true as .
Thus it is most beneficial to activate first the first component with a smaller lifetime in each realization. It follows then from (3.3).

###### Remark 3.1

As the virtual age concept is well-defined for a general model (1.2)–(1.3) and the function is monotonically increasing (therefore, the inverse function exists), Theorem 3.1 can be generalized to this case. Indeed let us compare relations that correspond to (3.4) and (3.5) in this case. Relationship (3.4) turns to

 t1+(t2−W(t1)),

whereas (3.5) can be written now as

 t2+W(W−1(t1)−t2), (3.6)

where denotes the inverse function which exists due to monotonicity of . Assume additionally that is concave, i.e., , which means that the rate of wear in (1.2) is decreasing (non-increasing). Then we can proceed with (3.6), which result in the following inequalities

 t2+W(W−1(t1)−t2)≤t2+t1−W(t2)≤t1+t2−W(t1).

The first one obviously follows from our sufficient condition , whereas the second, from monotonicity of and . It seems that the assumption of concavity is essential for the stochastic precedence order in this case as it is easy to see via the corresponding counterexample () that the corresponding ordering for the system does not always hold.

## 4 n components

Consider the -out-of- components warm standby system. It is a coherent system meaning that each component is relevant and its structure function is monotone. It is well-known (Barlow and Proschan ) that in this case improving reliability of any of the components will improve reliability of a system. Thus this is the definition with respect to usual stochastic order both on the level of components and the system. On the other hand, it can be also easily seen that increasing the mean lifetime of a component not necessarily leads to increasing the mean lifetime of a system. Similarly, if we decrease the failure rate of a component, then it does not always imply that the system failure rate will also decrease. This means that the result is sensitive to the employed type of stochastic order. The relevant order in our discussion is the stochastic precedence order. Therefore, the corresponding monotonicity problem should be addressed specifically, as we need this result in what follows.

###### Lemma 4.1

If the lifetime of a component in a coherent system is improved in the sense of stochastic precedence order, then the lifetime of the coherent will also improved in the same sense.

Proof: Denote a lifetime of a coherent system of components by where for convenience of further notation, the lifetime of the th component is denoted just by . Let us replace this component with another one with lifetime , whereas all other lifetimes stay the same and denote the system lifetime . For convenience, we will call the defined systems and , respectively. Since is same as except is replaced by , the set of all minimal path sets for both systems will be the same (For a given system, the minimal path set is a set of minimum number of components whose functioning ensures the functioning of the system). Let be the set of all minimal path sets for both systems. Further, let denote the lifetime of the minimal path set , for .
For and , let be the set of minimal path sets that contain the component (for convenience we denote the component and its lifetime by the same letter). Similarly, let be the set of minimal path sets that contain the component . Note that, for , and may not be the same even though . In fact, for ,

 TPjr=min{Sr,T}, TP∗jr=min{Sr,T∗},

where

 Sr=minl∈Pjr{Tl}=minl∈P∗jr{Tl}.

As previously, denote by the lower case letters the realizations of the corresponding random variables. Let us assume that , meaning that realization of the replaced component is larger than that for the initial component. Then, for ,

 tPjr=min{sr,t}≤min{sr,t∗}=tP∗jr,

which implies that

 max{tPj1,tPj2,…,tPjk}≤max{tP∗j1,tP∗j2,…,tP∗jk}. (4.1)

Let and be the realizations of and , respectively. Then,

 τ(t1,t2,…,tn,t) = max{tP1,tP2,…,tPm} = ≤ max{max1≤r≤k{tP∗jr},maxz∈{1,2,…,m}∖{j1,j2,…,jk}{tPz}} = τ(t1,t2,…,tn,t∗),

where the inequality follows from (4.1). Thus, in realizations,

 t≤t∗⟹τ(t1,t2,…,tn,t)≤τ(t1,t2,…,tn,t∗),

which is similar to previous section results, and hence

 P(T
###### Remark 4.1

The proof of the above lemma can intuitively be explained as follows. Denote by realization of the state function ( or ) of at time . Similarly, let denote the realization of the state function of at time , for . It is clear that for and , whereas for , we have as the system is coherent and the state function of the th component has been improved in this interval. Thus, the lifetime of a system with in each realization is larger than that with if .

Let us specify now the ordering in (2.1) as

 T1≤spT2≤sp⋯≤spTn. (4.2)

Now we can formulate the following theorem.

###### Theorem 4.1

Let the stochastic precedence order (4.2) holds for the -out-of- warm standby system described above. Then the corresponding of components achieves the maximum lifetime of a system in the sense of the stochastic precedence order.

Proof: Assume that we had improved the lifetime , , in the sense of the stochastic precedence order, i.e., . We start with the first component (with the smallest lifetime) in an active mode. Assume that other components are in an arbitrary, non-ordered sequence. Consider the th and the th components, and combine them in one aggregated component. If , we do nothing, and change the sequence of these two components if otherwise. By this change, as follows from Lemma 4.1, we increase the lifetime of this pair (similar to Theorem 3.1) and therefore, the lifetime of a system. We can do it with all ‘non-properly’ components and eventually arrive at (4.2), which maximizes the lifetime of the system in the sense of the stochastic precedence order.
The rationale behind this operation is similar to the above case of two components. The difference to be considered, however, is that the initial activation time in the case of only two components was and now it is some arbitrary . Let and the th component start first if activated. We emphasize once more the fact that are realizations of , , which are the lifetimes in the activated mode. Event means that both components had failed before the prospective activation and the corresponding comparison is irrelevant. Another possibility is that the th component had failed before the activation whereas the th does not. In this case, the lifetime of the pair (after activation) is, in accordance with the cumulative exposure model, (). The last possibility is when both of them did not fail before activation. In this case, the lifetime of a pair after activation is (compare with (3.4) that corresponds to the case ):

 ti−wta+(ti+1−w(ti−wta)), (4.3)

where is the virtual age of the th component just after activation and, therefore, its remaining lifetime in this realization is (). As the th component was operating during the time since activation till the failure of the th component in the warm standby mode, this time should be recalculated to end up with the remaining lifetime of the th component after its activation as .
Let now the th component starts first. Reasoning similar to the above results in a smaller (in realizations) lifetime of a pair as compared with the initial sequence. For instance, obviously, the term , which corresponds to the case when the th component fails before the activation whereas the th does not, stays the same. We have now also two specific cases for the case when the components did not fail (in the warm standby mode) before (see cases I and II of the previous section). But we can just adjust properly our previous reasoning considering the remaining lifetimes after activation, which are and (), then the reasoning and comparison with (4.3) will be exactly the same as comparison of (3.5) with (3.4).

###### Remark 4.2

Generalization to the model (1.2)–(1.3) can be performed using reasoning similar to that in Remark 3.1.

## 5 Concluding remarks

In this paper, we show that the optimal operational sequence for the -out-of- system with warm standby is when the components are activated in accordance with the increasing sequence of their lifetimes. It turns out from our reasoning that the natural stochastic ordering for this problem is the stochastic precedence order.
When the warm standby component is activated, its age should be ‘re-calculated’. This recalculation is performed using the virtual age concept and the cumulative exposure model.
The proofs are performed for the linear cumulative exposure model. Generalization to the time-dependent case is also discussed.
Previously, only specific cases of the problem were considered in the literature. In Cha et al.  and Zhai et al.  the case of two components was considered and the sequence was justified (in terms of expected lifetimes of a system) for the case when the components were ordered in the sense of the hazard rate ordering. Moreover, the corresponding sequence was justified in Zhai et al.  for -out-of- system but only for the exponentially distributed lifetimes of components.
Our result is general, and what is crucial, it employs the natural for this setting stochastic precedence ordering both for components and the system lifetimes as well.

Acknowledgments

The first author was supported by the NRF (National Research Foundation of South Africa) grant No 103613. The work of the second author was supported by the Claude Leon Foundation, South Africa. The work of the third author was supported by Priority Research Centers Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2009-0093827). The work of the third author was also supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2016R1A2B2014211).

## References

•  Barlow, R.E. and Proschan, F. (1975). Statistical Theory of Reliability and Life Testing. Holt, Renerhart Winston, New York.
•  Bagdonavicius, V. and Nikulin. M. (2002). Accelerated Life Models. Modelling and Statistical Analysis. Chapman Hall.
•  Boland, P.J., Singh, H., and Cukic, B. (2004). The stochastic precedence ordering with applications in sampling and testing.

Journal of Applied Probability

, 41, 73–82.
•  Cha, J.H., Mi, J., and Yun, W.Y. (2008). Modeling of a general standby system and evaluation of its performance. Applied Stochastic Models in Business and Industry, 24, 159–169.
•  Finkelstein, M. (2007). On statistical and information-based virtual age of degrading systems. Reliability Engineering and System Safety, 92, 676–681.
•  Finkelstein, M. (2008). Failure Rate Modelling for Reliability and Risk. Springer, London.
•  Finkelstein, M. (2013). On some comparisons of lifetimes for reliability analysis. Reliability Engineering and System Safety, 119, 300–304.
•  Finkelstein, M. and Cha, J.H. (2013). Stochastic Modelling for Reliability: Shocks, Burn-in, and Heterogeneous Populations. Springer, London.
•  Hazra, N.K. and Nanda, A.K. (2017). General standby allocation in series and parallel systems. Communications in Statistics-Theory and Methods, 46, 9842–9858.
•  Levitin, G., Xing, L., and Dai, Y (2013). Optimal sequencing of warm standby components. Computers Industrial Engineering, 65, 570–576 .
•  Levitin, G., Xing, L., and Dai, Y. (2014). Cold versus hot standby mission operation cost minimization for -out-of- systems. European Journal of Operational Research, 234, 155–162.
•  Montes, I. and Montes, S. (2016). Stochastic dominance and statistical preference for random variables couple by an Archimedean copula or by the Frèchet-Hoeffding upper bound.

Journal of Multivariate Analysis

, 143, 275–298.
•  Nelson, W. (1990).

Accelerated Testing: Statistical Models, Test Plans, and Data Analysis

. Wiley Series in Probability and Statistics, Wiley Sons, New York.
•  Ruiz-Castro, J. E. and Fernández-Villodre, G. (2012). A complex discrete warm standby system with loss of units. European Journal of Operational Research, 218, 456–469.
•  Santis, E.D., Fantozzi, F., and Spizzichino, F. (2015). Relations between stochastic orderings and generalized stochastic precedence. Probability in the Engineering and Informational Sciences, 29(3), 329-343.
•  Shaked, M. and Shanthikumar, J. (2007). Stochastic Orders. Springer, New York.
•  Singpurwalla, N. D. (2006). The hazard potential: introduction and overview. Journal of American Statistical Association, 101, 1705–1717.
•  Yun, W.Y. and Cha, J.H. (2010). Optimal design of a general warm standby system. Reliability Engineering and System Safety, 95, 880–886.
•  Zhai, Q., Yang, J., Peng, R., and Zhao, Y. (2015). A study of optimal component order in a general -out-of- warm standby system. IEEE Transactions on Reliability, 64, 349–358.
•  Zhang, T., Xie, M., and Horigome, M. (2006). Availability and reliability of -out-of-(): warm standby systems. Reliability Engineering and System Safety, 91, 381-–387.