Fault detection (FD) for industrial processes by multivariate statistical process monitoring (MSPM) methods has been a hot topic in the past few decades (9; 30). MSPM methods use various control charts to check statistical properties of process variables, among which control chart is one of the most effective ones since the Hotelling’s statistic is admissible and powerful in certain classes of hypothesis tests (39). It is worth mentioning that existing MSPM methods in the literature mainly focus on permanent faults (PFs), i.e., assuming once faults occur, they take effect permanently unless removed by external intervention. But more importantly, several studies (11; 28; 37) have shown that, in practice, many kinds of PF evolve gradually from intermittent faults (IFs). That is to say, IF is a prelude to PF. This implies that if faults are detected in this early stage, severe damage caused by PFs, such as system disruptions, plant shutdowns and even safety accidents, can be effectively avoided. IFs have been recently of noticeable interest (6; 45; 36), and thus a review of their current research status has been published (46).
The IF is a kind of non-permanent fault that lasts a limited period of time and then disappears without any treatment (46). The IF detection (IFD) and detectability problems in discrete event systems have been studied in (10; 14; 6) based on automata, and in (21) based on Petri nets. Timed failure propagation graphs (TFPGs) can model the dynamic evolution of failure propagation over time in practical systems. In (1), the TFPG model has been modified to make it detect IFs. For a class of linear stochastic systems with IFs, a set of sliding-window-based residuals has been designed and a robust detection scheme has been adopted (42; 41). Additionally, detection of IFs has been studied for linear time-varying systems subject to stochastic parameter uncertainty and limited resolution (45)
. However, prior knowledge of the system model is required for these methods. As for data-driven methods, wavelet transform, short-time fourier transform and undecimated discrete wavelet transform have been utilized to detect intermittent electrical and mechanical faults in synchronous motors(43; 26)
. In addition, combining the spectral kurtosis of vibration signals with-nearest neighbor distance analyses, a new method (38) has been developed to detect intermittent bearing faults in electric motors. Note that these signal-analysis-based methods are suitable to process unidimensional signals that possess periodicity. Moreover, the decision forest (36)
and dynamic Bayesian network(5) have also been presented to detect IFs in industrial systems, whereas historical data of various faults are needed.
So far, the IFD problems have not been fully investigated in the MSPM framework, where high-dimensional and correlated variables are easy to handle and historical fault data are not necessary. Generally speaking, IFs have small magnitudes and short durations (11), which make them even more difficult to detect than incipient faults. Moreover, system dynamics and multi-level closed-loop control make industrial data autocorrelated. Due to the high-speed sampling requirement for capturing IFs, the property of non-independence in data is stronger and thus non-ignorable during IFD.
As a result, existing MSPM methods have the following problems that limit their application to IFD. On the one hand, static MSPM methods such as principal components analysis (PCA) and partial least squares (PLS) have been found(16; 17; 35) to be inefficient for small shifts, let alone IFs. While their existing moving-average (MA) and moving-window (MW) based extensions such as MA-PCA (16), MA-PLS (17), exponentially weighted MA-PCA/PLS (40), multivariate -term sum PCA (7) and MW-HMM (23) are sensitive to small shifts, they cannot handle autocorrelations in data. Several studies (20; 19; 18) have indicated that monitoring dynamic data using static MSPM methods has the potential to produce excessive false alarms. On the other hand, dynamic MSPM methods such as dynamic PCA (DPCA) and canonical variate analysis (CVA) consider a time sequence of measurements and can capture process dynamics (i.e, handle autocorrelations). However, time lags are chosen only according to system orders, but not considering the characteristics of IFs (i.e., the fault duration and magnitude). Therefore, they may not gain enough sensitivity to IFs, and their efficiency of detecting intermittent small shifts still needs further study. These issues constitute the main motivations of our present study.
This paper investigates the IFD problem in a stationary Gaussian process. A time window and a weight vector are employed to increase the sensitivity to IFs, and the window length is selected considering the characteristics of IFs. Main contributions of the paper are summarized as follows: 1) A weighted moving average control chart (WMA-TCC) with stationary Gaussian observations is proposed. Different from existing methods that put equal weight on samples within a time window, WMA-TCC uses correlation (autocorrelation and cross-correlation) information to find an optimal weight vector. 2) The concept of IF detectability is defined and corresponding detectability conditions are provided, which further serve as selection criteria of the optimal weight. 3) The optimal weight is given in the form of a solution to nonlinear equations, whose existence is proven with the help of the Brouwer fixed-point theory. Moreover, the uniqueness of the optimal weight is proven in several special cases. 4) We reveal that the optimal weight possesses a symmetrical structure, and an equal weight scheme is optimal when data are independent, which gives more explanations for the rationality of existing MA-based methods. 5) Comprehensive comparative studies with existing static and dynamic MSPM methods, such as PCA, MA-PCA, DPCA and CVA, are carried out on a numerical example and the benchmark continuous stirred tank reactor (CSTR) process, which illustrate the superior IFD performance of the WMA-TCC.
The remainder of this paper is organized as follows. In Section 2, the WMA-TCC with stationary Gaussian observations is introduced for the IFD problem. Then, the detectability of IFs by the WMA-TCC is analyzed in Section 3. The detectability conditions are further utilized to determine the optimal weight in Section 4. Simulation results are presented in Section 5, and conclusions are given in Section 6.
Notation: Except where otherwise stated, the notations used throughout the paper are standard. represents a
-dimensional normal distribution with expectationand covariance matrix . represents a -dimensional Wishart distribution with degrees of freedom. is a central distribution with and degrees of freedom. is the percentile of the central distribution with and degrees of freedom. and denote the -dimensional Euclidean space and the set of all real matrices. and denote the Euclidean norm and infinity norm of a vector , respectively. , , and adj() stand for the transpose, the inverse, the determinant and the adjoint of a matrix , respectively. is the gradient of with respect to . is the Hessian matrix of with respect to . Scalars form a row vector by , and form a column vector by . is to give definition. or is an element of matrix located in the th row and th column. and are the th row and th column of matrix , respectively. is the matrix obtained from by deleting the row and column containing . and denote the
-dimensional identity matrix and itsth column, respectively; and denote the -dimensional column vectors with all of its entries being one and zero, respectively. The symbol denotes the Kronecker product and is the Kronecker function. and
are the minimum and maximum eigenvalues of matrix, respectively. and mean that is negative definite and negative semidefinite, respectively.
In this section, the WMA-TCC is proposed for the purpose of FD in stationary Gaussian processes.
The following lemma is the key result regarding Hotelling’s distribution, see (3).
Let , where and are independently distributed random variables with
are independently distributed random variables withand , where . Then
where the noncentrality parameter .
2.2 Weighted moving average control chart
The IFD task with stationary Gaussian observations concerns the analysis of latest new current process data at each time , to determine whether the process is statistically fault-free or not. Different from existing MA- or MW-based MSPM methods (16; 17; 40; 7; 23)
that ordinarily have independence and identically Gaussian distribution assumptions, we only assume that systems’ normal operation follows a stationary Gaussian process whose autocovariance function reduces to nearly zero for large time lags. That is, for all, and the autocovariance function depends only on the lag . Moreover, we have for large .
To constitute the WMA-TCC, we collect sets of consecutive observations , , from the stationary Gaussian process as training data, which can represent the statistic characteristics of systems’ normal operating conditions. Moreover, and are independent and identically distributed for . This can be achieved by taking samples with long enough intervals between different sets, and thus . Note that in the same set, the sampling rate of training data should be equal to that of current process data. To sum up, the sampling strategy for training data is shown in (2.2), where means a long enough interval.
The IFD problem is equivalent to a hypothesis testing problem concerning testing versus . Let be the weight vector. For the WMA-TCC, we put different weights on samples in the time window, as shown in (2.2) and (2.2).
In practice, parameters are unknown, and we only know the sample means and the sample covariance matrix instead:
Here, are abbreviations for , , , respectively, since they are actually matrix- or vector-valued functions of . We also know that the sample means and the sample covariance matrix are independently distributed, with
where is an abbreviation for .
According to Lemma 1, the WMA-TCC with window length , denoted as WMA-TCC(), with stationary observations at time instance is then
Here, we assume that is nonsingular for any weight vector . Detailed explanations are given in Assumption 1 and Proposition 1 of Section 4. For a given significance level , the process is considered normal at time instance , i.e., to accept , if
where is the control limit of the WMA-TCC(). Otherwise, an alarm occurs at time instance . Inequality (7) gives the acceptance region of the hypothesis testing.
3 Detectability analyses
For the WMA-TCC, the window length and the weight vector are crucial parameters that can directly affect the IFD performance. They should be carefully selected so that the detection capability for IFs is maximized. Thus, in this section, we analyze the IF detectability.
3.1 Guaranteed detectability
where represents the process fluctuation under normal conditions, is the direction of the fault in time instance , and is its magnitude. By introducing the time window, we have
where is the effect of all faults in the time window, and . When we analyze the fault detectability, we make the following additional assumption:
Inequality (10) is commonly assumed by literature addressing fault detectability problems in the MSPM framework (29; 2; 25; 17).
The assumption means that the fault-free process fluctuates within its acceptance region (7). Since a small significance level (i.e., ) is always selected, this assumption holds with high probability. Note that this assumption is only introduced to analyze detectability, and thus has no limitation to the practical application of the method.
) is always selected, this assumption holds with high probability. Note that this assumption is only introduced to analyze detectability, and thus has no limitation to the practical application of the method.
where is the step function; , represent the appearing and disappearing time of the th IF, satisfying ; and , are the direction and magnitude of the th IF, satisfying . Moreover, the active and inactive duration of the th IF are and , respectively. Thus, the th IF can be denoted by five parameters, i.e., IF.
Recall that the characteristics of IFs are small magnitude and short duration. In most cases, since the fault magnitude is small, when an IF becomes active, after exhibiting a short transient behavior, the system will be driven to another steady state soon by the closed-loop control, instead of being continuously sharp fluctuations or out of control. Similarly, when the IF becomes inactive, after a short transition, the closed-loop control will drive the system back to its normal steady state soon. Moreover, since the fault duration is short, we can assume the fault direction and magnitude within each IF to be constant. Therefore, IFs can be represented by the form of intermittent biases as (11). This statement will be confirmed by a realistic simulation of the practical CSTR benchmark in Section 5.
The fault detectability concept was first defined in (12; 13) within the MSPM framework, and has been widely adopted by a variety of MSPM methods (29; 2; 31; 25; 17) to study the FD performance. However, the concept has been mainly concerned with PFs. Compared with a PFD task, additional requirements for an IFD (4; 41; 46) are to determine each appearance (disappearance) of an IF before its subsequent disappearance (appearance), otherwise missing or false alarms occur. Following these considerations, this paper extends and generalizes the original fault detectability concept (12) to make it suitable for both PFs and IFs.
For a given significance level , the disappearance of the th IF is said to be guaranteed detectable (DPG-detectable) by the WMA-TCC(), if there exists a time instance such that for each , the detection statistic is guaranteed for all values of in (10).
For a given significance level , the appearance of the th IF is said to be guaranteed detectable (APG-detectable) by the WMA-TCC(), if the disappearance of the th IF is guaranteed detectable, and there exists a time instance such that for each , the detection statistic is guaranteed for all values of in (10).
For a given significance level , the th IF is said to be guaranteed detectable (G-detectable) by the WMA-TCC(), if both the th appearance and disappearance of the IF are guaranteed detectable.
3.2 Detectability conditions
Intuitively, to detect the disappearance/appearance of an IF, we can choose a window length that is no more than the IF’s inactive/active duration, so that the WMA-TCC() is free from interference of previous faulty/fault-free samples after some delay.
For the WMA-TCC() and a given significance level , when , the disappearance of the th IF is guaranteed detectable (DPG-detectable).
Proof. According to the IF model (11), when , there exists a time instance , such that for each , all current process samples within the time window are fault-free. Then we have and
Thus, for each , the detection statistic is guaranteed for all values of in (10). ∎
For the WMA-TCC() and a given significance level , when , the appearance of the th IF is guaranteed detectable (APG-detectable) if and only if
Proof. According to Lemma 2, when , the disappearance of the th IF is guaranteed detectable. Moreover, there exists a time instance , such that for each , all current process samples within the time window are faulty. Then we have and
We now prove the necessity by contraposition. The contrapositive of the necessity statement is: When , if , then the disappearance of the th IF is not guaranteed detectable, or for any time instance , there exists a time instance and a value of in (10), making valid. This contrapositive statement can be proven as follows. For any given , we consider time instance which satisfies . We further consider the following value of : , which satisfies (10) if . Note that at time instance , we have and consequently . Having proven the contrapositive, we infer the original statement and the proof of necessity is complete. ∎
For the WMA-TCC() and a given significance level , when , the th IF is guaranteed detectable (G-detectable) if and only if inequality (12) holds.
4 Determination of the weight and window length
In this section, methods to determine the weight vector and window length are provided, along with discussions on the existence, symmetry and uniqueness of the optimal weight.
4.1 Problem formulation and main results
Now, we are in the position to find the optimal weight vector based on the above derived detectability conditions, and present the main problem as follows.
For the WMA-TCC(), , find the optimal weight that
Proof. For this nonlinear constrained optimization problem, we can construct a Lagrange function given by
where is a Lagrange multiplier. According to the Karush-Kuhn-Tucker conditions (first-order necessary conditions) (24), the optimal weight should satisfy
By setting the above derivative of with respect to to zeros, the following equations can be obtained.
When meets (16), it is considered an extremum point or saddle point for function (14) subject to constraint (15). According to (8), second-order necessary conditions for to be a maximum point are: the leading principal minors of of order () have sign or equal to zero, where
is a bordered Hessian matrix and
Thus, the second-order necessary conditions for the optimization problem are derived as (17). ∎
4.2 Existence of the solution
In this subsection, we prove the existence of the solution of nonlinear equations (16) with the help of the well-known Brouwer fixed-point theory. We begin with the following assumption and the result is given in Theorem 3 at last. Additionally, methods to obtain the optimal weight are discussed and a bound of the optimal weight is given.
is nonsingular, where
Suppose Assumption 1 holds, then and are nonsingular for any .
Proof. By following a few reformulations, we have
For any , the matrix is full column rank. Then, by following Assumption 1, we know that is nonsingular. Let be the abbreviation of , and define
Then, it follows from Assumption 1 that
is nonsingular and positive definite. By following a few reformulations, we can rewrite , where
Thus, is nonsingular if and only if is nonsingular. We assume that is singular, then there exists , such that
Multiplying both sides by on the right, we have . Since is positive definite, we obtain . This means that the first rows of are linearly dependent, which contradicts the fact that has full row rank. Thus, is nonsingular and the proof is complete. ∎
According to Proposition 1, we can rewrite
It can be seen that is a fixed-point of function . According to our practical experience, can be obtained by successive approximations within 1000 iterations as follows
Since , Proposition 1 further guarantees this process is always implementable.
For any column vectors and matrix , the following inequality holds:
Proof. Directly derived from and . ∎
Suppose Assumption 1 holds, then and , for any .
Proof. Let , according to (28), we have
with , where
Note that . Thus, . Moreover, by following a few reformulations, we can rewrite