1 Introduction
Reliability assessment and optimization of engineered systems have received growing attention in a broad range of sectors, such as power grid dehghani2021adaptive; al2012novel, transportation systems edrissi2015transportation; zhang2017game; zhang2021bayesian, computing systems mo2015performability; mo2017performability, electrical and mechanical systems xu2019reliability; zafar2020efficient; moustafa2021system; vohra2020fast. Over the last few years, the increasing occurrences of extreme events have posed more than ever pressing needs for highly reliable infrastructure systems so that they can still operate at a desirable performance under extreme natural conditions. The malfunction and failure of these critical systems can lead to catastrophic consequences in terms of economic loss and human fatality. Take the 2021 Texas power crisis as an example, the inadequately winterizing of power equipment significantly compromised the reliability of power transmission system and resulted in the partial failure of the power grid, which eventually led to a massive power outage and left 4.5 million homes and businesses without power for several days 2021_texas_crisis.
In the context of reliability assessment and optimization of engineered systems, one of the popular means in characterizing engineered systems is to model it as a multistate system (MSS) rausand2003system; lisnianski2003multi; yingkui2012multi; xie2004computing. Differing from the binarystate reliability models, which assumes that a system and its components only have two states (i.e., perfectly operational or complete failure), MSS introduces a finite number of intermediate states for each system component to indicate a wide range of performance levels that lie between the perfectly functioning state and the completely failed state lisnianski2003multi. The rich intermediate states in MSS models enable the representation of the deterioration behavior of engineered systems in a finer granularity than that of traditional binarystate system models. As a consequence, MSS models have become an appealing tool for modelling and assessing system reliability in a broad array of industrial applications. For example, qiu2019reliability modeled a power distribution system as a MSS and developed a universal generating function (UGF)based approach to quantify its reliability, where power transmission loss was taken into consideration. liu2019reliability modeled the stochastic dependency among state transitions of a MSS or component via copula functions and studied the reliability assessment of MSS with state transition dependency. iscioglu2021reliability evaluated the reliability of a MSS consisting of identical independent units with two different types of dependency among components. mi2018reliability developed an evidential networkbased method to analyze the reliability of complex MSS with epistemic uncertainty.
In general, the methods used for reliability assessment of MSS can be roughly categorized into five classes: stochastic process particularly Markov process method lisnianski2012multi; liu2006reliability, extensions of conventional binary reliability model such as multistate fault tree method janan1985multistate, Monte Carlo simulation (MCS) method ramirez2005composite; zeng2021resilience, universal generating function (UGF) method levitin2005universal
, and Bayesian network
si2010integrated. Among them, MCS is one of the most popularly used approaches for system reliability assessment owing to its easiness to implement, advantages in uncertainty representation and propagation as well as flexibility in characterizing complex system behavior and interactions among system components. For example, zio2004estimationexploited the flexibility of MCS and developed quantitative measures to estimate the importance of components in a multistate seriesparallel system.
echard2011akdeveloped an efficient active learning method that combined Kriging with Monte Carlo Simulation to perform reliability assessment in structural systems.
schneider2013social treated social network as a multistate commodity and applied reliability measures commonly used in MSS to quantify the influence of a given actor in the social network.In principle, the reliability of MSS can be accurately estimated using the standard MCS method as long as sufficient MCS samples are generated. However, despite the popularity of MCS, its computational effort grows exponentially in accordance with the number of components and componentwise states in MSS. The number of MCS samples needed to estimate the reliability of MSS in largescale systems at a high precision easily gets computationally unaffordable. Such a flaw in MCS makes it inapplicable in timesensitive applications that require realtime decisionmaking support. One alternative approach is to build datadriven surrogate models for the concerned MSS by taking advantage of the recent advances in deep learning. Unfortunately, deep learning faces similar issues when dealing with MSS in the highdimensional space. To be specific, a considerable amount of data needs to be collected to represent MSS in a wide range of scenarios (e.g., different degradation conditions, deterioration trajectories) in order to train a deep learning model. The collection of such a representative training dataset for MSS can take a long time and might incur unaffordable costs in some cases.
A promising direction to address the aforementioned issues is to encode physical laws (or empirical laws) in the development of machine learning models, which is referred to as PhysicsInformed Machine Learning (PIML) in the literature raissi2019physics; karniadakis2021physics; lu2021deepxde. A representative example along this front is the family of PhysicsInformed Neural Networks (PINNs) raissi2019physics
. The physical laws (i.e., conservation laws) governing system behaviors in the form of partial differential equation (PDE) or ordinary differential equation (ODE) are usually rigorously derived from first principles. In PINN, physical laws are typically incorporated as a soft loss term in the objective function of deep learning models. The incorporation of physical laws in the deep learning models substantially prunes the parameter search space as solutions violating the physical laws are discarded immediately. As a result, encoding physical laws in machine learning models essentially reduces the number of training points that are required to tune a deep learning model. The benefits of exploiting physical laws in building efficient deep learning models have been showcased in several recent studies
kapusuzoglu2021information; zhao2021physics; chao2022fusing; zobeiry2021physics; zhou2021physics; cofre2021remaining. It is worthwhile noting that the loss functions of PINNs is complicated and involve multiple terms, which would compete with each other during the training process karniadakis2021physics. Hence, since PINNs is a highly nonconvex optimization problem, it is essential to assure the stability and robustness in the training of PINNs, which remains an active research topic yet wang2022and.The application of PINN in MSS reliability assessment has been rarely studied in the literature even though several features of MSS make it a natural fit to be formulated and solved as a PINNtype problem. Specifically, the stochastic behavior of component state transitions in MSS is commonly characterized as Markov process dui2015semi; lisnianski2017recent; barbu2017semi; eryilmaz2015dynamic, which are usually difficult to derive analytical solutions. Numerical methods are typically adopted, such as differential equation solver and Monte Carlo simulation. These numerical methods are prohibitively computational expensive, and they get computationally unaffordable easily when extensive uncertainty and sensitivity analysis are needed. The ODE governing the Markov processes freidlin1996markov makes PINN a viable solution for MSS reliability assessment. Secondly, existing approaches often discretize the life span of MSS into multiple equallysized time intervals. The side effect of doing this is that the reliability of MSS can only be performed in the prespecified discrete time instants. In contrast, the development of PINN for MSS reliability assessment frees us from the MSS life span discretization, and it allows to estimate the reliability of MSS at any time instant in a meshfree fashion.
To address the above issues, in this paper, we are motivated to develop a generic framework casting reliability assessment of MSS as a machine learning problem by exploiting the power of PINN. Towards this goal, one common pain point in adopting PINN is that the original formulation by raissi2019physics often struggles to approximate the exact solution of PDEs in high precision due to the extremely imbalanced gradients during the training of PINN via backpropagation karniadakis2021physics; wang2022and; wang2021understanding. To address the imbalanced gradients among loss terms in PINN, we treat each loss term as an individual task and tackle this problem from a multitask learning perspective following the approach proposed by yu2020gradient. The key idea is to project a task’s gradient onto the normal plane of the gradient of any other task that has a conflicting gradient. Compared to previous studies, we make the following contributions:

Formulation of a generic physicsinformed neural networkbased framework to tackle the system reliability assessment problem in MSS. The developed PINNbased framework provides a novel and effective paradigm for assessing the reliability of complex MSS.

To address the issue of the extremely imbalanced loss function in PINNs, we integrate the gradient surgery method with PINN to deconflict gradients during the training of PINN via backpropagation. The incorporation of the gradient surgery approach in PINN significantly accelerates the convergence speed of PINN and substantially improves the solution quality in MSS reliability assessment.

We investigate the applications of the PINNbased framework in several different scenarios in terms of system state transition rates (e.g., homogeneous continuoustime Markov chain and nonhomogeneous continuoustime Markov chain) and system scales (e.g., smallscale MSS, mediumscale MSS). Besides, we examine the quality of the solutions from PINN by comparing them with that of Matlab solver.
The rest of the paper is structured as follows. Section 2 provides a brief introduction to the multistate systems (MSS) and describes the technical background for PINN. Section 3 develops the proposed methodology to build PINN for MSS reliability assessment. Section 4 shows the applications of PINN in MSS reliability assessment and compares its performance with two other alternatives. Section 5 ends this paper with concluding remarks and discusses future research directions.
2 Background
In this section, we briefly introduce the technical background of reliability modeling of multistate system and the mathematical formulation of physicsinformed neural network.
2.1 MSS Reliability Model
Traditional binary reliability models only allow two operational states: perfectly functioning or complete failure. Whereas, MSS reliability assessment associates the system and its components with multiple intermediate states as indicated by either performance capacity or damage severity during performance degradation. Suppose the performance of a MSS is characterized by discrete ordered states rausand2003system, represented by the following set:
(1) 
where denotes the worst state, and denotes the best state. The others are intermediate states between the worst and the best states.
Suppose the probabilities associated with the
states in the MSS at timeis denoted by the following vector:
(2) 
As the probability vector constitutes the exhaustive set of all the MSS states, it needs to satisfy the following constraint:
(3) 
where denotes the MSS operation period.
In general, the system dynamics in a MSS at each time instant is characterized by a statetransition diagram as shown in Fig. 1. Each node in Fig. 1 represents the probability associated with the state , and each branch labeled denotes the corresponding onestep transition probability from state to state at the time instant . Mathematically, the state transition probabilities among all the states at the time instant can be represented by the following matrix:
(4) 
where . Obviously, the sum of all the elements in each row is zero.
With the properly defined state transition matrix , then the evolution of the states in MSS over time can be described using the Kolmogorov forward equation:
(5) 
where refers to the firstorder derivative of at the time instant and denotes the initial system state at the time instant .
Given the initial system state , MSS evolves over time following the state transition matrix . The MSS reliability can be derived by aggregating the state probability associated with system states that perform its desired function during the mission time. Mathematically, it is formulated as below:
(6) 
where
is a binary variable indicating whether state
satisfies the desired property at the system level. If , then state meets the intended function, otherwise, state does not meet the requirement; denotes the probability of state at the time instant .2.2 PhysicsInformed Neural Networks
In this section, we explain the underlying architecture of physicsinformed neural networks (PINNs) and describes the mathematical formulation of PINNs.
In several industrial applications, the behavior of dynamical systems are described by general nonlinear partial differential equation (PDE) karniadakis2021physics; pang2019fpinns. Consider a PDE represented in a general form formulated in Eq. (7):
(7) 
where denotes the latent solution, is a nonlinear differential operator, denotes a vector of space coordinates, and denotes the time. The domain of the PDE is bounded based on the prior knowledge of the dynamic systems, and is the time interval within which the system evolves.
It is wellknown that neural networks are universal function approximators to learn the unknown relationship between any inputs and outputs. As a result, neural networks can be used to approximate the solution to the PDE function shown in Eq. (7). Suppose we denote the lefthandside of Eq. (7) as :
(8) 
now acts as a constraint modelling the physical law described by the PDE in Eq. (7). The first term in Eq. (7) can be approximated by a neural network, where and are inputs to the neural network. The neural network for approximating together with Eq. (8) (here, Eq. (8) acts as an equality constraint) result in a physicsinformed neural network. Regarding the nonliner differentiator , its value can be derived using the same neural network that is used to approximate
, where automatic differentiation can be applied to differentiate compositions of functions following the chain rule
baydin2018automatic.The neural network approximating has the same parameters as the network representing . The weights of the neural network can be optimized by minimizing the following function:
(9) 
where is a factor denoting the weight associated with the loss term , and
(10) 
and
(11) 
where denotes the initial and boundary data points on , denotes the prediction of the neural network on the inputs , refers to the weights in the neural network, represents the collocation points for , and represent the number of points generated for and , respectively.
In Eq. (9), measures the loss of the neural network when approximating the function , while enforces the physical law imposed by Eq. (7) into the neural network via a series of collocation points . With the training data consisting of boundary and collocation points, we seek to minimize the loss function formulated in Eq. (9) through optimizing the weights in the neural network via gradient descent algorithms. As illustrated in Eq. (9), PINNs provide a rigorous way to seamlessly integrate the information from both the measurement data and physical laws, where physical laws are encoded into the loss function of the neural network via automatic differentiation. Consideration of underlying physical laws prunes the feasible solution space to the neural network parameters, and thus significantly reduces the number of training points as well as the size of the neural network (e.g., number of layers, number of hidden node in each layer etc.) during model training.
3 Proposed Framework
In this section, we introduce the proposed PINNbased framework for MSS reliability assessment in details. The proposed framework consists of two major steps. In the first step, we recast MSS reliability assessment as a machine learning problem in the framework of PINN. Next, we outline the gradient surgery approach to minimize gradient conflicts among multiple tasks during the training of PINN.
3.1 PINNs for MSS Reliability Assessment
As introduced in Section 2.1, there are two key components in the reliability assessment of a MSS that need to be appropriately characterized in the framework of PINN. The first key component is the initial state denoting the states of the MSS at the time instant . The second core component is the state transition, which is described by the Kolmogorov forward equations as shown in Eq. (5).
For the sake of demonstration, Fig. 2 illustrates the configuration of a PINN composed of two hidden layers with each layer having hidden units for reliability assessment in MSS. In practice, a PINN can consist of as many hidden layers and hidden units as needed. Suppose we discretize the operation horizon of a MSS into time steps, which is typically referred to as collocation points. Each time we feed a specific time step into the PINN, we obtain the probability associated with each state in . With automatic differentiation, we derive the first order derivative of corresponding to each state with respect to the time instant . Next, we compute the loss function between the first order derivative of and the exact solution as illustrated in Eq. (5).
In addition to incorporating the differential equations characterizing state transitions, another constraint that needs to be modeled is the initial state. Basically, at the time instant , the system starts at a specific condition characterized by the probability associated with each state in MSS. Suppose denotes the probability corresponding to state at the time instant , then, in conjunction with the loss function for state transition modeling, we have the loss function for the MSS reliability assessment as below:
(12) 
where refers to the number of states in the MSS, is the number of time steps, denotes the neural network with its weights represented by the parameter , indicates the probability corresponding to state at the time instant as predicted by the neural network, is a weighting factor, and reveals the derivative with respect to the time that is estimated by the neural network via automatic differentiation.
As shown in Eq. (12), there are two key components in the loss function . The first component uses a mean squared error metric to evaluate the loss corresponding to the initial states of the MSS, while the second component enforces the structure modeling of the state transition in MSS and estimates the residual when approximating the governing equation characterizing the state transitions in MSS. Combining the two loss terms together, the goal is to minimize the loss function through optimizing the parameter . With a set of training data, we can reduce the loss of the neural network iteratively via backpropgation using gradient descent algorithms, such as Adam jais2019adam.
After the PINN is properly trained, then it can be used to estimate the probability corresponding to any state at any given time instant . Once the probability associated with each state is accurately predicted, the reliability of MSS at system level can be inferred following Eq. (6). Different from existing methods, a significant advantage of PINNs is that they allow to estimate the reliability of MSS in a continuous manner. Most of existing approaches can only estimate the probability associated with each state at predetermined time instants, while PINN is meshfree and it allows to tackle MSS reliability assessment in a comprehensive fashion.
3.2 Conflicting Gradients Projection for PhysicsInformed Neural Networks
As reported by wang2021understanding, PINN faces a fundamental mode of failure that is closely related to the stiffness of the backpropagated gradient flows because the loss terms and are highly imbalanced in magnitude. In particular, the loss term characterizing the PDE residual dominates the loss function and, consequently, the optimization algorithm is heavily biased towards minimizing the loss term . As a result, PINN performs poorly in fitting the initial conditions, and leads to quite unstable and erroneous predictions raissi2018deep; sun2020surrogate.
In this study, we aim to tackle this problem from a multitask learning perspective because it shares several features with PINN in common. In the multitask learning, the ultimate goal is to train a network on all tasks jointly. Towards this goal, multitask learning faces the same problem arising from unmatched gradients. More specifically, the gradient might be dominated by the value from one task at the cost of degrading the performance of the other task. In addition to imbalanced gradients, the gradients corresponding to different tasks (or loss terms in PINN) might be conflicting along the direction of descent with one another in a way that is detrimental to the progress of the optimization. These factors combined together result in the fact that the optimizer struggles to make progress in optimizing the weights of the network because the reduction in the loss value specific to one task eventually leads to the oscillation of losses in other tasks (see the demonstration in Section 4).
To resolve this issue, yu2020gradient proposed a projecting conflicting gradients (PCGrad) approach to minimize the gradient interference, which consists in projecting the gradient of a task onto the norm plane of any other task that has a conflicting gradient. In this paper, we adopt the PCGrad method to deconflict gradients during the training of PINN, where we treat each loss term as an individual task in the learning process. Specifically, consider two gradients and corresponding to the th and the th loss term in PINN. PCGrad first checks whether there are conflicts between and
using the cosine similarity defined in Eq. (
13).(13) 
where denotes the norm of the corresponding vector.
The cosine similarity results in a value within the range , where 1 denotes exactly the opposite direction, 1 means exactly the same, and 0 indicates orthogonality or decorrelation. If the cosine similarity between and is negative, then PCGrad projects to the norm plane of or the other way around. If the cosine similarity between and is nonnegative, then the original gradients and remain the same. Suppose we project to the norm plane of , then we have the gradient of after the projection as:
(14) 
where denotes the gradient after the projection.
Fig. 3 demonstrates the core idea in PCGrad. As it can be observed in Fig. 3(a), there is a high degree of conflict between the two gradients and . PCGrad either projects the gradient onto the norm vector of the gradient as illustrated in Fig. 3(b), or projects the gradient onto the norm vector of the gradient as shown in Fig. 3(c). Such operation amounts to removing the conflicting component from the gradient task, thus mitigating the destructive gradient interference among different tasks. In a similar way, PCGrad repeats the same procedures for all the other tasks following a randomly sampled order. Algorithm 1 summarizes the steps in PCGrad for projecting conflicting gradients in PINNs.
As the gradient projection operation accounts for the gradient information for all the tasks in a holistic manner, it significantly mitigates the conflicts among different tasks and results in a set of gradients with minimal gradient interference.
4 Numerical Examples
In this section, we demonstrate the proposed framework for MSS reliability assessment using a smallscale MSS of a single propulsion module in a railway system under either timeindependent state transitions or timedependent state transitions. We also illustrate its performance in assessing the reliability of a mediumscale MSS regarding a flow transmission system. The performance of the proposed framework is examined in comparison with the solution derived by the differential equation solver implemented in Matlab.
4.1 Example 1
Consider a single propulsion module in a multivoltage propulsion system designed for the Italian highspeed railway system. The propulsion module consists of a series of four components (transformer, filter, inverter and motor) and two parallel converters trivedi2017reliability
. The propulsion module can be represented by a threestate Markov model as illustrated in Fig.
4. The three states in the propulsion module correspond to three different levels of power delivery as described below:
Full operation: State 0 denotes a fully operational state. In this state, the propulsion delivers the maximum power (2200 kW) when all components are working.

Degraded state: State 1 indicates a degraded state. In this state, only half of the power (1100 kW) is delivered when all the series components, and one converter out of the two works normally.

Failed state: State 2 means the state of complete failure with no power delivered.
The transition probabilities across the three states are described by the following matrix:
(15) 
where denotes the sum of failure rates of the series components , and denotes the failure rate associated with each converter .
As it can be seen from Eq. (15), the transition rate is timeindependent, thus leading to a homogeneous continuoustime Markov chain (CTMC). The propulsion module starts at full capacity and denote its initial state as below:
(16) 
In MSS, we are interested in estimating the probabilities associated with the three states in the operation time horizon . Following the methodology described in Section 3, together with Eq. (5), we derive four loss terms that belong to two individual groups in the PINN as formulated below:
(17) 
where denotes the prediction of neural network on the th state in the threestate MSS at the time instant .
Ideally, the value of each component in the loss group
should be strictly zero. In PINN, we approximate these equations by embedding a soft loss term in the objective function. In this example, the PINN consists of two hidden layers with each hidden layer having 50 units and the Tanh activation function. The last layer of the PINN is a fullyconnected layer, and it has three outputs with each output corresponding to one state in the MSS, and softmax is employed as the activation function in the fullyconnected layer to ensure that the range of output value is within the range
^{1}^{1}1In this paper, PINN has the same architecture across the three numerical examples. The only difference is the number of outputs in the fullyconnected layer.. To train the neural network, we generate 5000 collocation points for with equal intervals representing Eq. (5). Next, we use the Adam algorithm with a learning rate of 0.001 to optimize the weights of the neural network. The two PINNs trained using Adam with and without PCGrad have the same architecture and initial weights. In the PINN with PCGrad, the two loss groups and are treated as two individual tasks in the PINN. Whereas, in the PINN without PCGrad, the two loss groups are combined together via equal weights^{2}^{2}2In the case of no PCGrad, the losses are combined following the same way in the subsequent two examples.. The paradigm of adopting the proposed PINNbased framework to model MSS reliability assessment apparently differs from existing methods for MSS reliability analysis. The proposed PINNbased framework provides an insightful point of view to take advantage of the power in neural network to tackle this challenging problem.Fig. 5 illustrates a snapshot demonstrating the convergence of the two loss groups and during PINN training. Obviously, without PCGrad, the decrease in the loss group (note loss dominates in the loss function in the snapshot) leads to severe oscillation of the other loss group , which eventually translates into the slow convergence of the PINN. In contrast, the situation is completely different after PCGrad is applied. The values of both loss groups and drop down in a steady trend, and PCGrad converges to a solution with a loss value that is much lower than that of PINN without PCGrad by one order of magnitude.
Fig. 6 compares the convergence performance of the two loss groups after the PINN is trained for 80,000 iterations. It can be noted that both and converge to a much lower values in the PINN with PCGrad than that of PINN without PCGrad. In particular, PINN with PCGrad achieves much better performance in approximating the transition equations than the case of no PCGrad as partially reflected by the significant gap in the early stage of iterations as indicated in Fig. 6. To examine the solution quality of PINN trained with PCGrad, we derive solutions to the PDE using the Runge–Kutta method for the 5000 time instants within the time range with a step size of 16 in Matlab. Next, we use the root squared mean error (RMSE) in relation to the results from RungeKutta using Matlab solver that is averaged over the three states as the performance metric to compare the probabilities associated with the three states generated by PINN with that of the Runge–Kutta method. Note that PINN is trained with the data in the time range
. In other words, we examine the performance of PINN in both interpolating and extrapolating the state probabilities.
Methods  Time Range  
RMSE of PINN without PCGrad  0.111139  0.1461463  0.0062300 
RMSE of PINN with PCGrad  0.008277  0.0108075  0.0006953 
Fig. 7 compares the predictions of the two PINN models with the Matlab solver. It can be observed that PCGrad outperforms the optimization without PCGrad when estimating the probabilities of state 1 and state 2. At the same time, PCGrad overestimates the probability associated with state 0. Another interesting observation is that both optimization methods with and without PCGrad perform well in extrapolation for the time range due to the incorporation of PDE equations. Fig. 8 displays the histogram of the mean absolute error between PINN’s predictions and the Matlab solver with respect to the three system states. Not surprisingly, PINN with PCGrad achieves substantially lower mean absolute error in all three system states than that of PINN without PCGrad. In other words, PCGrad significantly improves the quality of the state estimation in MSS. Table 1 summarizes the performance comparison quantitatively. Clearly, PINN with PCGrad achieves a RMSE that is lower than that of PINN without PCGrad by at least one order of magnitude across the three time ranges.
4.2 Example 2
In this example, we extend the Example 4.1 by imposing timeinhomogeneity in the component failures. Differing from Example 4.1, the transition rate is timedependent in this example, thus leading to a nonhomogeneous CTMC. In particular, we assume that the transition rates follow a Weibull distribution and the corresponding transition rates are defined as below:
(18) 
where and denote the initial failure rates associated with each state, and their values are the same as the Example in Section 4.1. More specifically, has a value of and has a value of .
For the sake of illustration, we set the parameters in the Weibull distribution as and . The corresponding transition matrix of this problem is shown in Eq. (19).
(19) 
In the nonhomogeneous CTMC, one difference worthy of mention is that each loss term in the loss group now involves the input time when constructing the ODE residual. Following similar steps, the problem can be formulated using the proposed PINNbased framework. To train the PINN, we generate 300 collocation points within the time range with equal intervals. The initial state is the same as Example 4. The architecture of PINN is exactly the same as in Example 4. As the gradient during backpropagation is complex in the nonhomogeneous CTMC case, we impose a monotonically decreasing learning rate that follows a polynomial decay schedule with an initial learning rate of and a final learning rate of . Fig. 9 compares the convergence performance of the two loss groups during the 150000 iterations. Clearly, PINN with PCGrad converges to lower loss values in both loss groups and than that of the PINN without PCGrad.
To test the performance of the trained PINN model, we generate another 301 points within the time range with equal intervals. Fig. 10 illustrates the histogram of the mean absolute error between PINN’s predictions with Matlab’s solutions with respect to the three system states. As it can be observed, PINN with PCGrad has similar performance when estimating the probability corresponding to state 0, while it achieves slightly better performance than the case of no PCGrad when predicting the probabilities associated with state 1 and state 2.
Next, we evaluate the quality of the solutions of PINN quantitatively by comparing them with solutions derived by the Matlab solver, as shown in Fig. 11. As it can be seen, PINN accurately captures the changing trend of probability associated with each system state over time. Table 2 summarizes the RMSE between the solutions of PINN with and without PCGrad and the Matlab solutions. PINN with PCGrad achieves a 8.86% of reduction in RMSE in comparison with the RMSE of PINN without PCGrad.
Methods  Time Range 
RMSE of PINN without PCGrad  0.00048846 
RMSE of PINN with PCGrad  0.00044518 
4.3 Example 3
Consider a flow transmission system consisting of three pipes. Fig. 12 shows the MSS structure of the flow transmission system lisnianski2010multi, where the oil flows from point C to point E in the flow transmission system. The performance of the flow transmission system is measured by its capacity in the unit of tons per minute. Both element 1 and element 2 have two states: operational and failed state. In the operational state, element 1 and element 2 have a capacity of 1.5 and 2 tons per minute, respectively. Whereas, their capacity degrades to zero if they are in a state of total failure. Differing from element 1 and element 2, element 3 has three states: a state of total failure corresponding to a capacity of 0, a state of partial failure corresponding to a capacity of 1.8 tons per minute, and a fully operational state with a capacity of 4 tons per minute.
Fig. 13 illustrates the state transitions of each component in the flow transmission system, where and denote the failure rate and repair rate associated with the th element when the element transitions between state and state in the flow transmission system. The specific values of the failure rates and repair rates are shown as below:
(20) 
In this application, we are interested in the systemlevel performance, which is measured by the maximum flow that can be transmitted from point C to point E. At the system level, there are 12 states () in total. The state transition diagram at the system level is shown in Fig. 14, where the corresponding system performance is presented in the lower parts of the circle, and the label along each arc denotes the transition probability from one state to another state. For the details in deriving the systemlevel state transition diagram, refer to page 83 in chapter 2 of the Ref. lisnianski2010multi.
The differential equations governing the transitions among different system performance rates are shown in Eq. (21).
(21) 
From the state transition diagram in Fig. 14, we observe that there are five unique performance rates at the system level, namely: state 1: = 3.5; state 2: = 2.0; states 4 and 6: = =1.8; states 3 and 7: = =1.5; states 5, 8, 9, 10, 11 and 12: = = = = = = 0. Hence, the reliability of the systemlevel performance rate is formulated as follows:
(22) 
Following the framework proposed in Section 3, we reformulate this problem in the context of PINN. Specifically,we generate 500 collocation points within the time range with equal intervals, and initialize the state of the system at the time instant as follows:
(23) 
In terms of the architecture of the neural network, the PINN used for reliability assessment in this numerical example has the same configuration as in Example 4.1, such as network architecture, learning rate, activation function and number of training steps. The PINN is trained for 40000 iterations using the Adam algorithm with a learning rate of . To compare the model performance, we generate 501 points within the time range with an equal step size of 0.0004. Fig. 15 shows the histogram of the mean absolute error between PINN’s predictions and Matlab’s solutions regarding the five systemlevel performance rates. PINN with PCGrad outperforms the case of no PCGrad when systemlevel performance rates are 3.5, 1.8, and 0. When systemlevel performance rates are at 2.0 and 1.5, PCGrad maintains almost the same level of performance as the case of no PCGrad.
Fig. 16 visualizes the results of PINN when estimating the probabilities associated with each system performance rate in comparison with the solutions derived from the Matlab solver. Again, PINN with PCGrad achieves much better performance than the case of no PCGrad. In particular, when is less than 0.05, PINN without PCGrad fails to capture the changing trend of the probability associated with each system performance rate while PINN with PCGrad matches with the Matlab solver consistently when estimating the probability corresponding to each system performance rate.
Next, we compute the RMSE of the differences in the solutions derived by PINN and the Matlab solver. Table 3 summarizes the RMSE between PINN and the Matlab solver in terms of the five system performance rates. Clearly, the RMSE of PINN with PCGrad is less than the RMSE of no PCGrad by nearly two orders of magnitude. In other words, PCGrad substantially improves the solution quality when performing reliability assessment in MSS.
Methods  Time Range 
RMSE of PINN without PCGrad  0.013388 
RMSE of PINN with PCGrad  0.000454 
4.4 Summary
As demonstrated in the previous three numerical examples, the benefits of deconflicting gradients using PCGrad in PINN are multifold. First of all, it substantially reduces the number of iterations and the amount of training data needed to tune the PINN, thus facilitating to achieve data and computationefficient PINN. It also alleviates the oscillation of loss values during the training of PINN, and allows PINN to converge to a better solution with a much lower RMSE in relation to the solutions derived using the Matlab solver than that of PINN without PCGrad. Last but not least, the introduction of gradient projection frees us from tuning the weight parameter as shown in Eq. (12) because all the tasks are treated independently in the PINN with PCGrad and the weight parameter is not needed any more.
5 Conclusion
Reliability assessment of multistate systems is of significant concerns in a broad range of areas. In this paper, we exploit the power of physicsinformed neural network and formulate a generic PINNbased framework for MSS reliability assessment. The developed framework tackles the problem of MSS reliability assessment from a machine learning perspective, and provides a viable paradigm for effective reliability modeling. The proposed methodology follows a twostep procedure. In the first step, MSS reliability assessment is reformulated as a machine learning problem in the framework of PINN, where loss functions are constructed to characterize the constraints associated with the initial condition and state transitions in MSS. Afterwards, to mitigate the high imbalance in the magnitude of gradients during the training of PINN, we leverage the projecting conflicting gradients (PCGrad) method to map the gradient of a task onto the norm plane of the other task that has a conflicting gradient. The embedding of PCGrad into the optimization algorithms significantly speeds up the convergence of the PINN to highquality solutions. The proposed PINNbased framework demonstrates promising performance in evaluating the reliability of MSS in a variety of scenarios.
Future work can be carried out along the following directions. First of all, we investigate PINN’s applications in the MSS, where state transitions are characterized by either homogeneous or nonhomogeneous CTMC. It is worth exploring how to adopt PINN to analyze the reliability of semiMarkov MSS. Another direction worthy of investigation is to explore more effective ways to incorporate the equations governing state transitions in MSS into the neural network. In this paper, the ODE residuals are embedded into the neural network in a soft manner by appropriately penalizing the loss function. The drawback of this approach is that PINN might still violate the state transition equations in some scenarios. Thus, it is meaningful to investigate other alternative approaches so as to guarantee that PINN is strictly in compliance with the equations governing the underlying state transitions in MSS. Last but not least, we add the training points representing the ODE for state transitions at one shot, it is essential to develop more effective methods to add the ODE residual points in an adaptive manner, for example, adaptively adding training points in the locations with the largest expected reduction in the ODE loss in batch mode.
Acknowledgement
The work described in this paper is partially supported by a grant from the Research Committee of The Hong Kong Polytechnic University under project code 1BE6V.
Comments
There are no comments yet.