# Hybrid variable monitoring: An unsupervised process monitoring framework

Traditional process monitoring methods, such as PCA, PLS, ICA, MD et al., are strongly dependent on continuous variables because most of them inevitably involve Euclidean or Mahalanobis distance. With industrial processes becoming more and more complex and integrated, binary variables also appear in monitoring variables besides continuous variables, which makes process monitoring more challenging. The aforementioned traditional approaches are incompetent to mine the information of binary variables, so that the useful information contained in them is usually discarded during the data preprocessing. To solve the problem, this paper focuses on the issue of hybrid variable monitoring (HVM) and proposes a novel unsupervised framework of process monitoring with hybrid variables. HVM is addressed in the probabilistic framework, which can effectively exploit the process information implicit in both continuous and binary variables at the same time. In HVM, the statistics and the monitoring strategy suitable for hybrid variables with only healthy state data are defined and the physical explanation behind the framework is elaborated. In addition, the estimation of parameters required in HVM is derived in detail and the detectable condition of the proposed method is analyzed. Finally, the superiority of HVM is fully demonstrated first on a numerical simulation and then on an actual case of a thermal power plant.

## Authors

• 37 publications
• 8 publications
• 7 publications
12/14/2020

### A Feature Weighted Mixed Naive Bayes Model for Monitoring Anomalies in the Fan System of a Thermal Power Plant

With the increasing intelligence and integration, a great number of two-...
12/12/2020

### An improved mixture of probabilistic PCA for nonlinear data-driven process monitoring

An improved mixture of probabilistic principal component analysis (PPCA)...
12/12/2012

### Monitoring a Complez Physical System using a Hybrid Dynamic Bayes Net

The Reverse Water Gas Shift system (RWGS) is a complex physical system d...
09/08/2021

### Preprocessing and Modeling of Radial Fan Data for Health State Prediction

Monitoring critical components of systems is a crucial step towards fail...
05/11/2022

### Spatial-temporal associations representation and application for process monitoring using graph convolution neural network

Industrial process data reflects the dynamic changes of operation condit...
08/07/2021

### Self-learning sparse PCA for multimode process monitoring

This paper proposes a novel sparse principal component analysis algorith...
02/23/2022

### Continual learning-based probabilistic slow feature analysis for multimode dynamic process monitoring

In this paper, a novel multimode dynamic process monitoring approach is ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Process monitoring is indispensable because it is the premise and guarantee for the safe and stable running of industrial systems [12, 2, 14, 3, 36, 30]. In recent decades, a large number of data-driven approaches have been proposed for process monitoring [11, 20, 5, 23, 10, 35, 29, 4, 33, 38]. However, most of them are highly based on continuous variables because they can’t avoid involving Euclidean or Mahalanobis distance and can’t be utilized for hybrid variables (containing continuous and binary variables) [33].

Among data-driven methods, principal component analysis (PCA) has received continuous attention once it was applied in process monitoring due to its effectiveness of data dimensionality reduction

[19, 11]. Based on PCA, dynamic PCA (DPCA) adopted the technology of time lag shift to construct augmented matrix to mine time-related information [20]. Considering slowly changing in normal process, recursive PCA (RPCA) was proposed for adaptive process monitoring [24]. In order to capture nonlinear property, kernel PCA (KPCA) was developed [28, 5]. Unlike PCA, partial least squares (PLS) and its variants pay much attention to quality-related fault [26, 23]

. To weaken the Gaussian hypothesis, independent component analysis (ICA) was proposed for process monitoring

[22]. The Mahalanobis distance (MD) can also be directly used for process monitoring [18]. As understanding of the fault initiation becomes more and more thorough, the moving window methods also be proposed for incipient fault detection [17, 29, 27]. Considering the practical applicability in industrial processes, a large number of improved methods have been developed for multimode and nonstationary monitoring [37, 39, 16, 34].

The aforementioned methods have made remarkable achievements in process monitoring, but almost all methods are based on Euclidean or Mahalanobis distance and are highly dependent on continuous variables. However, the practical industrial processes sometimes have not only continuous variables, but also binary variables which may carry some useful information for process monitoring [33] and are usually deleted in the data preprocessing [13]. For hybrid variables, Langseth et al.

used hybrid Bayesian networks for estimating human reliability

[21]. Aguilera et al. developed the naïve Bayes (NB) and tree augmented naïve Bayes (TAN) models and applied to species distribution [1]. Zhu et al. considered the mixture of continuous and discrete variables in semantic model [40]. Talvitie et al. introduced a related model through employing an adaptive discretization approach for structure learning in Bayesian networks when there are both continuous and discrete variables [31]. Recently, Wang et al. utilized continuous and binary (two-valued) variables to detect the abnormalities of thermal power plant for the first time [33]

. Then a more effective anomaly monitoring model named feature weighted mixed naive Bayes model (FWMNBM) was developed

[32].

However, the hybrid variable approaches mentioned above are supervised methods and require both normal and fault data during training. Unfortunately, the systems in actual industrial processes are running without fault in most time, and the determination of fault samples requires repeated research and careful discussion by experts which is time-consuming and costly. So that the healthy state samples are usually available and it is difficult to collect sufficient fault instances, which is one of the reasons why monitoring methods only based on normal working condition data, such as PCA, PLS, ICA et al., have attracted much attention. Process monitoring methods with hybrid variables only based on healthy state data are very urgently. Therefore, this paper focuses on hybrid variable process monitoring and proposes a novel unsupervised framework of process monitoring with hybrid variables named HVM which can simultaneously capture the process information of both continuous and binary variables. The main contributions are summarized as follows:

1. The article firstly focuses on hybrid variable monitoring only based on healthy state data. And a novel unsupervised framework of process monitoring with hybrid variables named HVM is proposed.

2. Under the unsupervised framework, the statistics and the monitoring strategy suitable for hybrid variables are firstly defined and the physical explanation behind the framework is elaborated. In addition, the estimation of parameters is derived in detail and the detectable condition is analyzed.

3. The effectiveness and efficiency of the proposed method is fully demonstrated first on a numerical simulation and then on a practical fan system of ultra-supercritical power plant.

The remainder of the paper is organized as follows. The problem formulation and motivation are described in detail in Section 2. The framework of hybrid variable monitoring is introduced in Section 3. In Section 4, parameters learning and corresponding derivation are described. The fault form of hybrid variables is defined and the detectable condition is analyzed in Section 5. In Section 6, the effectiveness and efficiency of proposed framework is verified. Finally, conclusions are given in Section 7.

## 2 Problem formulation and motivation

With industrial processes becoming more and more complex and integrated, binary variables also appear in monitoring variables. For example, in Zhejiang Zheneng Zhongmei Zhoushan Coal and Electricity Co., Ltd. (Zhoushan Power Plant), Zhejiang Province, China, the number of variables monitoring in the No.1 power unit is about , in which the number of binary variables among them is as many as [33]. In the fan system of No.1 power unit, continuous variables and binary variables are collected, where the number of binary variables is more than that of continuous variables [32]. The appearance of binary variables makes traditional monitoring approaches no longer applicable and process monitoring with hybrid variables more intractable. The binary variables are usually discarded during the data preprocessing because the traditional approaches mostly have applied Euclidean or Mahalanobis distance which can’t be used to describe binary variables [13]. However, binary variables may carry some useful information for process monitoring [33, 32].

The issue of supervised classification with hybrid variables has been paid attention to and investigated in other fields [21, 1, 40, 31]. In process monitoring, Wang et al.

have utilized continuous and binary variables for the anomaly detection of thermal power plant

[33, 32]

. However, these approaches are supervised methods, which require both normal samples and fault instances to train the model. In practical processes, a lots of healthy state samples can be collected and it is difficult to obtain sufficient faulty samples. Therefore, this paper proposes a novel unsupervised framework of process monitoring with hybrid variables named HVM. HVM can simultaneously mine the information of both continuous and binary variables through a probabilistic framework. In HVM, the statistics of hybrid variables are computed with healthy state data and the control limit is determined by kernel density estimation (KDE)

[25]. Then for the arriving sample , the statistic can be computed with the same way of training. The state of can be determined through the monitoring strategy. Finally the superiority of HVM is demonstrated through a numerical simulation and an actual case in the fan system of a thermal power plant.

## 3 Hybrid variable monitoring framework

### 3.1 Off-line statistics

Training data are sampled under normal operating condition with samples. is the th instance and contains features where binary features and continuous features are respectively collected. Let be the th variable. When the system is running in a steady state, the monitoring data tends to be stationary and with no trends [38]. Then the following assumptions are introduced. If

is a continuous variable, we suppose it obeys Gaussian distribution under normal condition, that is

[33]

 Pc(xj|N)=N(xj|N;μj,σj), (1)

where

is the probability density function (pdf) which is defined as

, and

are the mean and corresponding variance of the

th variable.

If

is a binary variable, the Bernoulli distribution is introduced as follows

[8]:

 Pb(xj|N)=(ηj)xj(1−ηj)1−xj, (2)

where is the distribution series (ds),

is the response probability which is defined as

.

###### Definition 1.

The occurrence probability of under normal condition is defined as

 P(xi|N)=dcΠjc=1Pc((xi)jc|N)φjcdbΠjb=1Pb((xi)jb|N)φjb, (3)

where and mean the th and th variable of binary variables and continuous variables respectively, means the weight of the corresponding variable.

Affected by noise, there may be some outliers in data sampled under normal operating condition. Then the probability that

belongs to can be obtained by

 P(N,xi)=P(N)P(xi|N), (4)

where is the prior normal probability, which represents the confidence level of the health state data and equals to , is the significance level [9].

###### Theorem 1.

, a positive decimal to satisfy .

Proof. For , suppose Assumption 3.1 holds, then

 0

Since the number of training samples is an integer less than infinity and Assumption 3.1 is introduced, we have

 0

Then for any . There must be a positive value that satisfies

 0<ϱ≤P(xi|N)<1. (7)

The prior normal probability , so that a positive decimal can be find to satisfy , where . ∎

###### Remark 1.

and

are probability distributions (pdf or ds), which are fitted by training data. Thus the more

deviates from the statistical characteristics of , the smaller is and the smaller is.

When of is obtained, then is computed as

 f(xi)=ln(P(N,xi)), (8)

where is the natural logarithmic function.

###### Proposition 1.

Compared to , obtained in equation (8) is more sensitive to faulty instance.

Proof. According to Theorem 1, there must be a lower bound that satisfies for normal data . When , the natural logarithmic function monotonically increases, the derivative always satisfy that . Fault data often deviates more from the statistical characteristics of . Then we have . The detection performance is mainly reflected in the recognition ability of fault in the neighborhood of , where . Since , so is more sensitive to faulty instance than . ∎

According to equation (3), (4) and (8), can be written as

 f(xi)=ln(P(N)P(xi|N))=ln(P(N))+ln(P(xi|N)) =ln~δ+ln(dcΠjc=1Pc(xjc|N)φjcdbΠjb=1Pb(xjb|N)φjb). (9)

Let , it can be learned that

 Ψ=db∑jb=1φjblnPb(xjb|N)+dc∑jc=1φjclnPc(xjc|N). (10)

Considering equation (2), we have

 db∑jb=1φjblnPb(xjb|N)=db∑jb=1φjbln[(ηjb)xjb(1−ηjb)1−xjb] =db∑jb=1φjb[xjbln(ηjb)+(1−xjb)ln(~ηjb)] =db∑jb=1[φjbxjblnηjb~ηjb]+db∑jb=1φjbln~ηjb, (11)

where . According to equation (1), the following equation can be obtained that

 dc∑jc=1φjclnPc(xjc|N)=dc∑jc=1φjclnN(xjc|N;μjc,σjc) =dc∑jc=1φjcln[(2π)−1/2(σj)−1] +dc∑jc=1φjc[−(xj−μj)22−1(σj)−2]. (12)

Substituting equation (3.1) and (3.1) into equation (3.1), is learned as

 f(xi)=τi⋅~xTi+ξi+εi, (13)

where , , , , .

For the collected training samples , the monitoring statistics are computed as

 s =[s1,⋯,si,⋯,sn] =[f2(x1),⋯,f2(xi),⋯,f2(xn)], (14)

where is the statistic of .

### 3.2 On-line monitoring strategy

When the statistics of are obtained, the control limit can be got with the significance level by KDE [25], in this paper. In online detection, the statistic of arriving sample is computed by equation (13) and (3.1). Then the state of is determined through the monitoring strategy:

 {xa is normal,if sa

## 4 Parameters learning

The model described in 3.1 mainly involves the estimation of parameters , , , and . , and can be obtained through maximum likelihood estimation (MLE) [6].

 uj=n∑i=1xji/n, (16) σj={n∑i=1(xji−uj)2}1/2(n−1)−1/2, (17) ηj=n∑i=1xji/n,   (xji∈{0,1}). (18)

The practical data is usually correlated and variables that are more related to other variables are usually more sensitive when abnormalities occur [33]. So each variable is assigned with the different weight . The calculation of involves mutual information (MI) of continuous and binary variables. In order to estimate MI of hybrid variables, is constructed as follows if is continuous variable.

###### Definition 2.

If is a continuous variable, is constructed as

 x′ji=[xji>μj], (19)

where is Iverson brackets. If the condition is true, it returns , otherwise it returns .

Definition 2 makes it possible to characterize the correlation between hybrid variables. Then instead of is used to compute MI. However, the process of equation (19) brings the calculation error of MI between continuous variables.

###### Lemma 1.

For continuous variables and , assume and obey Gaussian distributions and respectively , and are constructed by equation (19), is the Pearson correlation coefficient between and , the Pearson correlation coefficient between and is denoted as . Then

 ρ′=2πarcsinρ. (20)

Proof. For continuous variables and , the joint probability function of and is

 f(xj,xj′)=(2πσjσj′)−1(1−ρ2)−1/2exp{−(2−2ρ2)−1 ×[(xj−μj)2(σj)−2+(xj′−μj′)2(σj′)−2 −2ρ(xj−μj)(xj′−μj′)(σj)−1(σj′)−1]}. (21)

The expectation of is

 E(x′jx′j′)=P(x′jx′j′=1)=P(xj>μj,xj′>μj′) =∞∫μj∞∫μj′f(xj,xj′)dxjdxj′=∞∫0∞∫0f(yj,yj′)dyjdyj′, (22)

where and . Since

 f(yj,yj′) =(2π)−1(1−ρ2)−1/2exp{−(2−2ρ2)−1 ×[(yj)2+(yj′)2−2ρyjyj′]}. (23)

Thus can be written to

 E(x′jx′j′) =∞∫0∞∫0{(2π)−1(1−ρ2)−1/2exp{−(2−2ρ2)−1 ×[(yj)2+(yj′)2−2ρyjyj′]}}dyjdyj′ =∞∫0π/2∫0{(2π)−1(1−ρ2)−1/2r ×exp{−(2−2ρ2)−1(1−ρsin2α)}}dαdr =π/2∫0{(2π)−1(1−ρ2)−1/2(1−ρsin2α)−1}dα =π/2∫0{(2π)−1(1−ρ2)−1/2 ×(1+tanα2−2ρtanα)−1}dtanα =12πarcsinρ+0.25. (24)

can also be obtained by

 E(x′jx′j′) =P(x′j=1,x′j′=1) =P(x′j′=1)P(x′j=1|x′j′=1) =0.5P(x′j=1|x′j′=1) =Cov(x′j,x′j′)+0.25. (25)

The Pearson correlation coefficient between and is

 ρ′=Cov(x′j,x′j′)(D(x′j)D(x′j′))−1/2 =4E(x′jx′j′)−1=2πarcsinρ. (26)

###### Lemma 2.

[7] is the MI of and . is the Pearson correlation coefficient between and . Then

 M(xj,xj′)=−0.5log(1−ρ2). (27)
###### Theorem 2.

If and are continuous variables, and obey Gaussian distributions and respectively, and are constructed by equation (19), the calculation error of MI is

 M(xj,xj′)=−12log[1−M′(x′j,x′j′)], (28)

where .

Proof. According to Lemma 1, we have

 ρ′=2πarcsinρ. (29)

Substituting equation (29) into equation (27), it can be learned that

 M(xj,xj′)=−12log[1−M′(x′j,x′j′)], (30)

where . Hence, Theorem 2 has been proved. ∎

With Definition 2, the MI computation of hybrid variables is transformed to that of binary variables (or constructed binary variables). is defined as

 M(xj,xj′)=∑xj,xj′P(xj,xj′)logP(xj,xj′)P(xj)P(xj′), (31)

where are binary variables or constructed binary variables. is the probability of , is the indicative coefficient ( when is being computed, and otherwise), is the joint probability of and . ( can be obtained in the same way) can be computed by

 P(xj)=ψxjn∑i=1xjin+(1−ψxj)(1−n∑i=1xjin), (32)

where is the value at time of .

###### Proposition 2.

For binary variables and , can be denoted as

 P(xj,xj′)=P(xj′=ψxj′)θψxjψxj′(1−θ)ψxj′−ψxjψxj′ ×θ′ψxj−ψxjψxj′(1−θ′)1+ψxjψxj′−ψxj−ψxj′, (33)

where .

Proof. Let

 P(xj=1|xj′=1)=θ,P(xj=1|xj′=0)=θ′, (34)

it has

 P(xj=0|xj′=1)=1−θ, (35) P(xj=0|xj′=0)=1−θ′. (36)

Then we have

 P(xj=ψxj|xj′=ψxj′)=θψxjψxj′(1−θ)ψxj′−ψxjψxj′ ×θ′ψxj−ψxjψxj′(1−θ′)1+ψxjψxj′−ψxj−ψxj′. (37)

Since

 P(xj,xj′)=P(xj=ψxj,xj′=ψxj′) =P(xj′=ψxj′)P(xj=ψxj|xj′=ψxj′). (38)

Thus Proposition 2 is proved. ∎

###### Theorem 3.

For binary variables and , is obtained as

 P(xj,xj′) =P(xj′=ψxj′){1−ψxj+(2ψxj−1) ×[ψxj′θ+(1−ψxj′)θ′]}, (39)

where , .

Proof. Since

 P(xj=xj1,xj′=xj′1)…P(xj=xjn,xj′=xj′n) =n∏i=1P(xj′=xj′i)P(xj=xji|xj′=xj′i). (40)

The likelihood function is

 ℓ(θ,θ′) =n∏i=1P(xj′=xj′i)P(xj=xji|xj′=xj′i) =n∏i=1P(xj′=xj′i)n∏i=1P(xj=xji|xj′=xj′i) =ωθn∑i=1xjixj′i(1−θ)n∑i=1xj′i−n∑i=1xjixj′iθ′n∑i=1xji−n∑i=1xjixj′i ×(1−θ′)n+n∑i=1xjixj′i−n∑i=1(xji+xj′i), (41)

where