Neural Component Analysis for Fault Detection

12/12/2017 ∙ by Haitao Zhao, et al. ∙ 0

Principal component analysis (PCA) is largely adopted for chemical process monitoring and numerous PCA-based systems have been developed to solve various fault detection and diagnosis problems. Since PCA-based methods assume that the monitored process is linear, nonlinear PCA models, such as autoencoder models and kernel principal component analysis (KPCA), has been proposed and applied to nonlinear process monitoring. However, KPCA-based methods need to perform eigen-decomposition (ED) on the kernel Gram matrix whose dimensions depend on the number of training data. Moreover, prefixed kernel parameters cannot be most effective for different faults which may need different parameters to maximize their respective detection performances. Autoencoder models lack the consideration of orthogonal constraints which is crucial for PCA-based algorithms. To address these problems, this paper proposes a novel nonlinear method, called neural component analysis (NCA), which intends to train a feedforward neural work with orthogonal constraints such as those used in PCA. NCA can adaptively learn its parameters through backpropagation and the dimensionality of the nonlinear features has no relationship with the number of training samples. Extensive experimental results on the Tennessee Eastman (TE) benchmark process show the superiority of NCA in terms of missed detection rate (MDR) and false alarm rate (FAR). The source code of NCA can be found in



There are no comments yet.


page 1

page 6

page 8

page 9

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Monitoring process conditions is crucial to its normal operation [1]. Over last decades, data-driven multivariate statistical process monitoring (MSPM) has been widely applied to fault diagnosis for industrial process operations and production results [2, 3]. Due to the data-based nature of MSPM, it is relatively convenient to apply to real processes of large scale comparing to other methods based on theoretical modelling or rigorous derivation of process systems [4, 5].

The task of MSPM is challenging mainly due to the “curse of dimensionality” problem and the “data rich but information poor” problem. Many methods have been proposed to transform original high dimensional process data into a lower dimensional feature space and then performing fault detection or fault diagnosis in that feature space

[6]. Principal component analysis (PCA) [7, 8, 9]

is one of the most widely used linear techniques for fault detection. Due to orthogonal linear projection, PCA separates data information into two subspaces: a significant subspace which contains most variation in training data and a residual subspace which includes noises or outliers in training data.

PCA-based methods to inherently nonlinear processes may lead to unreliable and inefficient fault detection, since a linear transformation is hard to tackle the nonlinear relationship between different process variables

[10, 11]. To deal with this problem, various nonlinear extensions of PCA have been proposed for fault detection. These extensions can be divided into 3 categories.

The first category is kernel approaches. Kernel PCA (KPCA) is one of the mostly used kernel approaches for fault detection [12]. KPCA implicitly maps the data from an input space into some high dimensional nonlinear feature sapce, where linear PCA can be applied. KPCA need to perform eigen decomposition (ED) on the kernel Gram matrix whose size is the square of the number of data points. When there are too many data points, the calculation of ED becomes hard to perform [13]. Moreover, KPCA need to determine the kernel and the associated parameters in advance.

The second category is based on linear approximation of nonlinear process. In the linear approximation, several local linear models are constructed and then integrated by Bayesian inference

[14]. Linear approximation is simple and easy to realize, but it may not be able to handle strong nonlinearities in the process.

The third category is neural-network-based models, such as robust autoencoder (RAE) [13] and autoassociative neural network [15]

. These models train a feedforward neural network to perform the identity encoding, where the inputs and the outputs of the network are the same. The network contains an internal “bottleneck” layer (containing fewer nodes than the output and input layers) for feature extraction. In autoassociative neural network

[15], the mapping from the input layer to the “bottleneck” layer can be considered as encoding, while de-mapping from the “bottleneck” layer to the output layer can be considered as decoding. Although the encoding can deal with the nonlinearities present in the data, it has no consideration of the orthogonal constraints used in PCA.

Recent years, a new trend in neural-network-based techniques known as deep learning has become popular in artificial intelligence and machine learning

[16]. Deep-learning-based models are widely used in unsupervised training to learn the representation of original data. Although these models are often derived and viewed as extensions of PCA, all of them lack the consideration of the orthogonal constraints used in PCA. The orthogonal constraints are quite important, since they can largely reduce the correlations between extracted features. Figure 1

shows simple plots of features of Vector

obtained by orthogonal projections and non-orthogonal projections respectively. From Figure (b)b it is easy to find and are largely correlated. It means that the extracted features contain redundant information and may distort the reconstruction of the original vector [17].

(a) Illustration of Vector on two orthogonal directions.
(b) Illustration of Vector on two non-orthogonal directions.
Figure 1: Projections of Vector on non-orthogonal and orthogonal directions respectively.

Motivated by the above analysis, this paper proposes a novel unified model, called

neural component analysis (NCA), for fault detection. NCA firstly utilizes a nonlinear neural network as an encoder to extract features. Then linear orthogonal transformation is adopted to decode the features to the original data space. Finaly, this unified model is trained by minimizing the reconstruction error between original data and the decoded data. After training, NCA can be used as an unsupervised learning method to extract the key features of process data. In this paper, Hotelling

statistic and the squared prediction error (SPE) statistic are used for fault detection. The merits of the proposed NCA method is demonstrated by both theoretical analysis and case studies on the Tennessee Eastman (TE) benchmark process.

Ii Autoencoder and PCA

An autoencoder model is an artificial neural network adopted for unsupervised feature extraction [18]. An autoencoder model tries to learn a representation (encoding) for original data, specifically for the purpose of dimensionality reduction. Recently, due to the research works in deep learning, the autoencoder concept has be widely accepted for generative models of data [19].

We assume that there are input samples . The simplest structure of an autoencoder model is a feedforward neural network which consists of one input layer with inputs, one hidden layer with units and one output layer with the same number of nodes as the input layer (see Figure 2). The purpose of this structure is to reconstruct its own inputs. Therefore, an autoencoder model belongs to unsupervised learning.

Figure 2: The autoencoder model.

An autoencoder model includes both the encoder part and the decoder part, which can be defined as transitions and , such that:


In this paper, we only consider the case . The encoder part takes the input and maps it to :

This feature is usually referred to as latent code, latent feature or latent representation. Here,

is an element-wise activation function such as a sigmoid function or a hyperbolic tangent function. Matrix

is a parameter matrix and

is a bias vector.

The decoder part maps the feature vector to the reconstruction of the same dimensionality as :

where , and for the decoder part may be different from the corresponding , and for the encoder part, depending on the applications of the autoencoder model.

In order to learn the parameters in the activation functions, an autoencoder model is often trained to minimize reconstruction error:

If linear activation functions are used, the optimal solution to an autoencoder is strongly related to PCA [20]. In this case, Equation (1) can be written as



where and is the Frobenius norm. and are two linear transformation matrices. Baldi and Hornik [21] showed that, if the covariance matrix associated with the data is invertible, a unique local and global minimum to Equation (2

) corresponding to an orthogonal projection onto the subspace spanned by the first principal eigenvectors of the covariance matrix

. More precisely, the optimal solutions to (2) can be obtain by , where are the eigenvectors corresponding to the first

largest eigenvalues of the covariance matrix

. In PCA, are defined as loading vectors or principal components and is the score matrix corresponding to the loading matrix .

Although there is no orthogonal constraints of the transformation matrices or , the solutions to Equation (2) are composed by orthogonormal bases. Due to the orthogonal decomposition, PCA can transform an original data space into two orthogonal subspaces, a principal subspace which contains most variation in original data and a residual subspace which includes noises or outliers. statistic and SPE statistic are often adopted as indicators for fault detection corresponding to the principal subspace and the residual subspace respectively. The orthogonal decomposition minimizes the correlations between these two subspaces and makes the linear reconstruction of original data with the least distortion [22, 23]. Because of this orthogonal property, PCA is widely used in process monitoring and many other fault detection methods can be considered as the extensions of PCA [1].

Iii Neural Component Analysis

A nonlinear autoencoder model can be trained to extract latent features. However, due to the lacking of the orthogonal property, the significant information of original data and the information of noises or outliers are largely combined in this model. An autoencoder model turns to overfit original data and learns to capture as much information as possible rather than reducing correlations in original data and extracting the significant information. Because of this problem, nonlinear autoencoder models are not used as widely as PCA.

In the section, we propose a novel unified model, called neural component analysis (NCA), for fault detection. NCA firstly utilizes a nonlinear neural network as a encoder to extract features. Then linear orthogonal transformation is adopted to decode the features to the original data space. In this way, NCA can be considered as a combination of nonlinear and linear models.

In NCA, we use linear transformation instead of nonlinear transition used in Equation (1). The optimization problem of NCA is


where is a neural network with outputs. Assume where , then the orthogonormal constraint, , means

Figure 3 illustrates the structure of NCA. Here contains three orthogonal bases (plotted in red, blue, and green), i.e. .

Figure 3: Illustration of neural component analysis (NCA).

Equation (3) shows the difference between NCA and autoencoder. Firstly, NCA is a unified model of nonlinear encoding and linear decoding. For the linear decoding, it will be shown later that the computation of the transformation matrix is quite simple and no gradient-descent-based optimization is needed. Secondly, the orthogonormal constraints are added to NCA. It means that the decoding from latent features is an orthogonal reconstruction, which can largely reduce the correlation of different variables.



the optimization problem of NCA in Equation (3) can also be written as


Matrix , corresponding to the score matrix of PCA, is the key features which we want to obtain for further analysis, such as fault detection and diagnosis. However, it is difficult to compute the optimal , and simultaneously since the optimization problem in Equation (5) is nonconvex. In this paper, we compute , and iteratively as follows. We firstly fix , and obtain , then can be computed by optimizing


Once is obtained, and can be updated by solving the following optimization problem:


The solution to the optimization problem in Equation (7) can be obtained by the backpropagation algorithm [24] which is widely adopted in training feedforward neural networks. The solution to Equation (6

) is to determine an orthogonal matrix, that rotates

to fit original data matrix . In linear algebra and statistics [25], Procrustes analysis is a standard technique for geometric transformations between two matrices. The orthogonal Procrustes problem can be viewed as a matrix approximation problem which tries to find the optimal rotation or reflection for the transformation of a matrix with respect to the other. Theorem 1 shows how to solve the reduced rank Procrustes rotation problem.

Theorem 1.

[26] Reduced Rank Procrustes Rotation. Let and be two matrices. Consider the constrained minimization problem

Suppose the singular value decomposition (SVD) of

is , then .

According to Theorem 1, we can design iterative procedures of NCA to obtain , and as follows:

  1. Perform PCA on original data to obtain the loading matrix and let .

  2. Fix , solve the optimization problem in Equation (7) by the backpropagation algorithm.

  3. Form by Equation (4) and perform SVD on , i.e.

  4. Compute .

  5. If , break;

    else, let and go to Step 2.

  6. Output , and .

After training, can be used for further feature extraction.

Iv Fault detection based on NCA

A novel fault detection method based on NCA is developed in this section. The implementation procedures are given as follows. Firstly in the modeling stage, the process data are collected under normal process conditions and scaled by each variable; Then NCA is performed to obtain the neural network for nonlinear feature extraction. Finally Hotelling and the squared prediction error (SPE) statistics are used for fault detection.

Let be the nonlinear latent features of and is the covariance matrix associated with the features . statistic of is computed as follows:


SPE statistic of feature can be calculated as follows:


Because of no prior information available about the distribution of , we compute the confidence limit for and

statistics approximately by kernel density estimation (KDE)

[27]. Let with an unknown density are the statistics of . The kernel density estimator of statistic is

where is a non-negative function that integrates to one and has zero mean and is a bandwidth parameter. In this paper, we take the RBF kernel for density estimation, which is given by

After estimating

, for the testing statistic

, the following condition is checked: If then is normal else is abnormal. The threshold is assigned globally and could be adjusted in order to lower the percentage of false alarm. Practically, is often equal to 0.01.

Similarly, we also can obtain

where with an unknown density are the statistics of . For the testing statistic , the following condition is checked: If then is normal else is abnormal.

The offline modeling and online monitoring flow charts are shown in Figure 4. The procedures of offline modeling and online monitoring are as follows:

Figure 4: The steps of the proposed NCA method for fault detection.
  • Offline modeling:

  1. Collect normal process data as the training data.

  2. Normalize the training data by each variable with zero mean and unit variance.

  3. Initialize as the loading matrix which contains the first eigenvectors of the covariance matrix of the training data.

  4. Fix to compute and by training a neural network optimizing Equation (7).

  5. Fix and to compute by Reduced Rank Procrustes Rotation solving Equation (6).

  6. Compute and SPE statistics of by Equation (8) and (9) respectively.

  7. Determine the control limit of and SPE by KDE respectively.

  • Online monitoring:

  1. Sample a new testing data point . Normalize it according to the parameters of the training data.

  2. Extract the feature .

  3. Compute and SPE statistics of the feature .

  4. Alarm if (or SPE) of the extracted feature exceed the control limit; Otherwise, view as a normal data.

V Simulation and discussion

The Tennessee Eastman process (TEP) has been widely used by process monitoring community as a source of publicly available data for comparing different algorithms. The simulated TEP is mainly based on an practical industrial process in which the kinetics, operation and units have been altered for specific reasons. The data generated by TEP are nonlinear, strong coupling and dynamic [28, 29]. There are five major units in TEP: a chemical reactor, condenser, recycle compressor, vapor/liquid separator, and stripper. A flow sheet of TEP with its implemented control structure is shown in Figure 5. The MATLAB codes can be downloaded from Besides normal data, the simulator of TEP can also generate 21 different types of faults in order to test process monitoring algorithms.

Figure 5: A diagram of the TEP simulator.

A total of 52 variables including 22 continuous process measurements, 19 compositions and 11 manipulated variables111the agitation speed was not included because it was not manipulated were selected as the monitoring variables in our experiments. The training data set contained 500 normal data. Twenty-one different faults were generated and 960 data for each fault were chosen for testing, in which the fault happened from 161th data to the end of the data. The 21 fault modes are listed in Table I.

Fault Description Type
1 A/C Feed ratio, B composition constant (Stream 4) Step
2 B composition, A/C ratio constant (Stream 4) Step
3 D feed temperature (Stream 2) Step
4 Reactor cooling water inlet temperature Step
5 Condenser cooling water inlet temperature Step
6 A feed loss (Stream 1) Step
7 C header pressure loss (Stream 4) Step
8 A, B, C feed composition (Stream 4) Random variation
9 D feed temperature (Stream 2) Random Variation
10 C feed temperature (Stream 4) Random Variation
11 Reactor cooling water inlet temperature Random Variation
12 Condenser cooling water inlet temperature Random Variation
13 Reaction kinetics Slow drift
14 reactor cooling water valve Sticking
15 Condenser cooling water valve Sticking
16 Unknown Unknown
17 Unknown Unknown
18 Unknown Unknown
19 Unknown Unknown
20 Unknown Unknown
21 Valve (Stream 4) Constant position
Table I: TEP fault modes.

In this paper, we compare our proposed NCA method with PCA, KPCA and autoencoder. For KPCA, we use the most widely used Gaussian kernel and select the kernel parameter as , where

is the mean of standard deviations of different variables


(a) Visualization plot of PCA.
(b) Visualization plot of KPCA.
(c) Visualization plot of autoencoder.
(d) Visualization plot of NCA.
Figure 6: Visualization of the normal samples and samples of Fault 1 on the first two dimensions of 4 different methods.

V-a Visualization of data of different methods

In order to intuitively visualize the features of different methods, we extract 2 features for each method and plot them in Figure 6. In Figure 6, the blue “” indicates normal data, while the red “” indicates fault data from Fault 1. It can be found that normal data and fault data are largely overlapped in Figure (a)a, (b)b, and (c)c. In this case, PCA, KPCA and autoencoder cannot find the significant information for fault detection. However, in Figure (d)d, the overlapping of normal samples and fault samples is much less than those in Figure (a)a, (b)b, and (c)c. Through the nonlinear feature extraction based on orthogonal constraints, NCA directly gives out the key features in a two-dimensional space which include significant information for fault detection.

Figure 7 shows the difference between of NCA and the loading matrix of PCA. Figure (a)a shows the result of . It is easy to find that contains orthogonormal columns, i.e. if , otherwise, . Moreover, the correlations between the columns of and those of are plotted in Figure (b)b. Obviously with the nonlinear feature extract, the optimum solution of NCA is utterly different from the loading matrix of PCA.

(a) Illustration of .
(b) Illustration of .
Figure 7: Illustration of the correlations of the columns of and the correlations between the columns of and those of .

Figure 8 plots the reconstruction errors in the training stage. It can be found that NCA converges very fast. Thanks for GPU-accelerated computing, the total training time of NCA is 2.58 seconds. We perform the experiment on a computer with Intel Core i7 3.4GHz, 16G RAM, and NVIDIA GeForce GTX 1080Ti. For comparison, KPCA is performed on the same computer and the training time is 1.53 seconds.

Figure 8: Convergence plot of NCA.

V-B Case studies

In this subsection, we investigate the performance of our proposed NCA. The performance is compared with PCA, KPCA, and autoencoder. The source codes of NCA and other methods can be found in

According to the cumulative percentage variance (CPV) rule, the reduced dimensionality, , was determined as 27 for PCA such that 85% of the energy in the eigenspetrum (computed as the sum of eigenvalues) was retained. In order to give a fair comparison, the same value was used for KPCA, autoencoder and NCA.

No. PCA KPCA autoencoder NCA
1 0.50(4.38) 0.13(20.6) 0.5(7.50) 0.63(5.63) 0.50(3.13) 0.75(0.00) 0.50(0.00) 0.75(0.00)
2 1.63(2.50) 0.75(18.1) 1.63(5.00) 1.75(5.00) 1.50(1.88) 2.00(0.00) 1.75(0.00) 1.50(0.00)
3 92.0(0.63) 71.9(30.0) 88.5(1.25) 93.8(0.63) 96.4(1.88) 99.9(0.00) 98.4(0.00) 97.9(0.63)
4 39.1(2.50) 0.00(24.4) 51.1(5.00) 81.8(3.75) 55.8(0.63) 96.3(0.00) 63.3(1.25) 5.75(0.63)
5 73.8(0.63) 50.3(22.5) 70.5(5.00) 79.3(3.75) 73.3(0.63) 77.9(0.00) 75.4(1.25) 71.1(1.88)
6 1.00(0.00) 0.00(10.6) 0.88(2.50) 2.88(3.13) 1.00(0.00) 1.13(0.00) 0.50(0.00) 0.88(0.00)
7 0.00(0.00) 0.00(16.3) 0.00(0.63) 2.0(0.63) 0.00(0.00) 30.1(0.63) 0.00(0.00) 0.00(1.88)
8 2.50(0.63) 1.75(17.5) 2.50(2.50) 4.75(3.13) 2.50(1.88) 4.38(0.00) 2.75(0.00) 2.50(2.50)
9 96.4(5.00) 76.9(23.8) 89.0(15.0) 94.1(5.63) 96.0(5.00) 99.38(1.88) 97.9(2.50) 94.0(5.63)
10 58.4(0.00) 24.13(15) 48.9(1.88) 81.6(0.63) 64.3(0.63) 77.38(0.00) 66.3(0.00) 72.8(1.88)
11 47.9(0.63) 19.0(20.0) 38.6(4.38) 52.4(1.88) 49.6(1.25) 71.9(0.00) 52.0(0.63) 30.1(8.13)
12 1.25(0.63) 1.13(22.5) 1.00(3.13) 10.4(0.63) 1.00(1.88) 2.63(0.00) 1.00(1.25) 5.50(7.5)
13 4.88(0.00) 3.75(12.5) 4.63(1.25) 18.0(0.63) 5.75(0.63) 5.88(1.25) 5.13(1.25) 4.88(3.75)
14 0.13(0.63) 0.0(23.13) 0.00(1.25) 9.88(0.63) 0.38(0.63) 4.88(0.00) 0.38(0.63) 0.00(5.00)
15 95.1(0.00) 74.4(16.9) 86.9(3.13) 94.1(2.50) 94.3(0.63) 98.1(0.00) 97.5(0.00) 90.9(3.75)
16 76.8(6.88) 30.8(22.5) 64.4(20.6) 86.9(6.25) 82.1(5.00) 91.0(1.88) 85.6(1.88) 76.6(1.25)
17 19.9(1.25) 2.50(25.6) 13.5(0.63) 28.0(0.00) 20.1(1.88) 27.4(0.00) 15.4(0.00) 53.3(0.00)
18 10.9(0.0) 6.25(20.6) 10.0(2.50) 14.4(2.50) 11.0(3.75) 11.63(0.63) 9.88(0.63) 12.1(0.00)
19 93.25(0.0) 41.8(14.4) 82.5(1.88) 87.3(0.63) 91.8(0.00) 99.8(0.00) 100(0.00) 99.4(0.00)
20 62.63(0.0) 26.9(15.6) 52.1(2.50) 90.0(2.50) 64.1(0.00) 78.1(0.00) 69.1(0.00) 52.0(0.00)
21 62.3(1.25) 33.5(31.3) 59.5(5.63) 71.0(3.75) 61.6(1.25) 79.6(0.63) 64.8(0.00) 78.0(0.00)
Table II: Missed Detection Rate (%) and False Alarm Rate (%) (shown in parentheses) of PCA, LPP, LPP_Markov and DGE in TEP

Missed detection rate (MDR) refers to the rate of the abnormal events being falsely identified as normal events in the monitoring process, which is only applied to the fault detection situation. For testing all the 21 faults, MDR is recorded together in Table II, where smaller values indicate better performances. False alarm rate (FAR) which refers to the normal process monitoring results of PCA, KPCA, autoencoder and NCA are shown in parentheses. The small FARs also indicate better performances. In Table II, the best achieved performance for each fault is highlighted in bold. In this study, we only consider the fault cases where MDR and FAR. Because if MDR, the detection performance would be even worse than random guess whose MDR is 50%. Moreover, we adopt the threshold value for FAR as 5% which is commonly used in fault detection.

NCA outperforms PCA, KPCA and autoencoder in 11 cases with lower MDRs. Autoencoder gives the best performance with 5 cases, KPCA gives the best performance with 7 cases and PCA gives the best performance with 4 cases. Autoencoder, KPCA, and NCA provide better results than PCA in more fault cases indicates that nonlinear extensions can generally obtain better fault detection results than linear PCA. Although both KPCA and NCA include orthogonal constraints in their feature extraction, the performances of NCA are much better than KPCA. The reason is that NCA adaptively learns the parameters of a neural network, while KPCA uses prefixed kernel and the associated parameter. NCA obtains different parameters for different variables with nonlinear combination through backpropagation strategy and should be much more suitable for nonlinear feature extraction and the following fault detection tasks.

Figure 9, 10 and 11 illustrate the detailed fault detection results of Fault 2, 4 and 7. In these figures, the blue points indicate the first 160 normal samples, while the red points represent the following 800 fault samples. The black dash lines are the control limits according to the threshold . The blue points above control limits lead to false alarm, while the red points below control limits cause missed detection.

According to Table II, all 4 methods can successfully detect Fault 2. For PCA, the FARs are 2.5% and 18.1% for statistic and SPE statistic respectively. The FARs of KPCA are 5% for both statistic and SPE statistic. Autoencoder achieves no false alarm for SPE statistic and its FAR for statistic is 1.88%. However, as for our proposed NCA, the FARs are zero for both two statistics. Figure 9 shows the plots the results of four methods on Fault 2. For each method, the subplot above is the results of statistic and the subplot below is the results of SPE statistic. The small overlay plots clearly show the results of the first 160 normal samples in each subplot. Based on Table II and Figure 9, for Fault 2, we can found that NCA have the lowest MDR with no false alarm.

(a) PCA
(b) KPCA
(c) Autoencoder
(d) NCA
Figure 9: Monitoring results of 4 different methods for Fault 2.

Figure 10 illustrates the results of Fault 4. In this experiment, neither KPCA nor autoencoder can detect this fault. the MDR of PCA is 39.1% using statistic. It means that PCA, KPCA, and autoencoder are not suitable for the detection of Fault 4. However, using SPE statistic, NCA obtains the MDR of 5.75% and the FAR of 0.63%. Obviously NCA is much more appropriate for detecting Fault 4.

(a) PCA
(b) KPCA
(c) Autoencoder
(d) NCA
Figure 10: Monitoring results of 4 different methods for Fault 4.

Figure 11 shows the detection results of four methods for Fault 7. For SPE statistic, the MDR of PCA is 0.00%, but the FAR is 16.3% which is greater than 5%. The reason for the high FAR is that the linear decomposition of PCA cannot obtain the appropriate residual subspace to determine the right control limit. KPCA can detect this fault with the MDR of 2.0% for SPE statistic and the FAR is 0.63%. As for autoencoder, the MDR for SPE statistic is 30.1% which is much higher than that of KPCA. NCA outperforms these two nonlinear methods with no missed detection for SPE statistic and the FAR is 1.88%. The cause for the high MDR of autoencoder for SPE statistic should be that autoencoder overfits the training data and becomes more prone to accept noises and outliers in the normal data to construct the significant subspace and residual subspace. Under this condition, the features of the residual subspace may not contain enough information to detect the fault samples. Thanks for orthogonal constraints in KPCA and NCA, both of them can successfully detect the fault data.

(a) PCA
(b) KPCA
(c) Autoencoder
(d) NCA
Figure 11: Monitoring results of 4 different methods for Fault 7.

We summarize the case studies below:

  1. Although no single method gives optimal performance for all fault cases consisting of diverse numbers of fault conditions. NCA outperforms PCA, KPCA, and autoencoder with regard to the number of best performances and emerges as the clear winner.

  2. Due to the incorporation of orthogonal constraints, NCA is less prone to overfit training data and performs better than autoencoder.

  3. Since NCA considers both nonlinear feature extraction and orthogonal constraints for feature extraction, NCA becomes effective in fault detection. Although NCA is an iterative method and need more time for training, given the superior performance of NCA, the trade-off between training time and performance seems justified.

Vi Conclusion

In this paper, we propose a nonlinear feature extraction method, nerual component analysis (NCA), for fault detection. NCA is a unified model which includes a nonlinear encoder part and a linear decoder part. Orthogonal constraints are adopted in NCA which can alleviate the overfitting problem occurred in autoencoder and improve performances for fault detection. NCA takes the advantages of the backpropogation technique and the eigenvalue-based techniques. The convergence of the iteration scheme of NCA is very fast. The idea behind NCA is general and can potentially be extended to other detection or diagnosis problems in process monitoring.

We compare NCA with other linear and nonlinear fault detection methods, such as PCA, KPCA, and autoencoder. Based on the case studies, it is clear that NCA outperforms PCA, KPCA, and autoencoder. NCA can be considered as an alternative to the prevalent data driven fault detection techniques.

Future works will be contributed to the design of new regularization terms to the optimization problem of NCA in Equation (5). Reconstruction-based feature extraction methods, such as PCA, KPCA, autoencoder, and NCA, are mainly focus on the global Euclidean structure of process data and overlook the latent local correlation of process data. In the future, we will try to design new constraints to ensure the local information is considered in nonlinear feature extraction.


This research is sponsored by National Natural Science Foundation of China (61375007) and Basic Research Programs of Science and Technology Commission Foundation of Shanghai (15JC1400600).


  • [1] S. J. Qin, “Survey on data-driven industrial process monitoring and diagnosis,” Annual Reviews in Control, vol. 36, no. 2, pp. 220–234, 2012.
  • [2] J. MacGregor and A. Cinar, “Monitoring, fault diagnosis, fault-tolerant control and optimization: Data driven methods,” Computers & Chemical Engineering, vol. 47, pp. 111–120, 2012.
  • [3] S. Yin, S. X. Ding, A. Haghani, H. Hao, and P. Zhang, “A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark tennessee eastman process,” Journal of Process Control, vol. 22, no. 9, pp. 1567–1581, 2012.
  • [4] Z. Q. Ge, Z. H. Song, and F. Gao, “Review of recent research on data-based process monitoring,” Industrial and Engineering Chemistry Research, vol. 52, no. 10, pp. 3543–3562, 2013.
  • [5] T. Feital, U. Kruger, J. Dutra, J. C. Dutra, and E. L. Lima, “Modeling and performance monitoring of multivariate multimodal processes,” AIChE Journal, vol. 59, no. 5, pp. 1557 – 1569, 2013.
  • [6] M. Askarian, G. Escudero, M. Graells, R. Zarghami, F. Jalali-Farahani, and N. Mostoufi, “Fault diagnosis of chemical processes with incomplete observations: A comparative study,” Computers & chemical engineering, vol. 84, pp. 104–116, 2016.
  • [7] X. Gao and J. Hou, “An improved svm integrated gs-pca fault diagnosis approach of tennessee eastman process,” Neurocomputing, vol. 174, pp. 906–911, 2016.
  • [8] Y. Wang, F. Sun, and B. Li, “Multiscale neighborhood normalization-based multiple dynamic pca monitoring method for batch processes with frequent operations,” IEEE Transactions on Automation Science and Engineering, vol. PP, no. 99, pp. 1–12, 2017.
  • [9] T. J. Rato, J. Blue, J. Pinaton, and M. S. Reis, “Translation-invariant multiscale energy-based pca for monitoring batch processes in semiconductor manufacturing,” IEEE Transactions on Automation Science and Engineering, vol. 14, pp. 894–904, April 2017.
  • [10] L. Luo, S. Bao, J. Mao, and D. Tang, “Nonlinear process monitoring based on kernel global-local preserving projections,” Journal of Process Control, vol. 38, no. Supplement C, pp. 11–21, 2016.
  • [11] N. Sheng, Q. Liu, S. J. Qin, and T. Chai, “Comprehensive monitoring of nonlinear processes based on concurrent kernel projection to latent structures,” IEEE Transactions on Automation Science and Engineering, vol. 13, pp. 1129–1137, April 2016.
  • [12] M. Mansouri, M. Nounou, H. Nounou, and N. Karim, “Kernel pca-based glrt for nonlinear fault detection of chemical processes,” Journal of Loss Prevention in the Process Industries, vol. 40, no. Supplement C, pp. 334 – 347, 2016.
  • [13] L. Jiang, Z. Song, Z. Ge, and J. Chen, “Robust self-supervised model and its application for fault detection,” Industrial & Engineering Chemistry Research, vol. 56, no. 26, pp. 7503–7515, 2017.
  • [14] Z. Ge, M. Zhang, and Z. Song, “Nonlinear process monitoring based on linear subspace and bayesian inference,” Journal of Process Control, vol. 20, no. 5, pp. 676 – 688, 2010.
  • [15] M. A. Kramer, “Nonlinear principal component analysis using autoassociative neural networks,” AIChE Journal, vol. 37, no. 2, pp. 233–243, 1991.
  • [16] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
  • [17]

    D. Cai, X. He, J. Han, and H. J. Zhang, “Orthogonal laplacianfaces for face recognition,”

    IEEE Transactions on Image Processing, vol. 15, pp. 3608–3614, Nov 2006.
  • [18] C.-Y. Liou, W.-C. Cheng, J.-W. Liou, and D.-R. Liou, “Autoencoder for words,” Neurocomputing, vol. 139, no. Supplement C, pp. 84 – 96, 2014.
  • [19] D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” ArXiv e-prints, Dec. 2013.
  • [20] D. Chicco, P. Sadowski, and P. Baldi, “Deep autoencoder neural networks for gene ontology annotation predictions,” in Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB ’14, (New York, NY, USA), pp. 533–540, ACM, 2014.
  • [21] P. Baldi and K. Hornik, “Neural networks and principal component analysis: Learning from examples without local minima,” Neural Networks, vol. 2, no. 1, pp. 53 – 58, 1989.
  • [22] I. Jolliffe, Principal Component Analysis. Springer Verlag, 1986.
  • [23] I. T. Jolliffe, “A note on the use of principal components in regression,” Journal of the Royal Statistical Society, vol. 31, no. 3, pp. 300 – 303, 1982.
  • [24] Y. Li, Y. Fu, H. Li, and S. W. Zhang, “The improved training algorithm of back propagation neural network with self-adaptive learning rate,” in 2009 International Conference on Computational Intelligence and Natural Computing, vol. 1, pp. 73–76, June 2009.
  • [25] D. G. Kendall, “A survey of the statistical theory of shape,” Statistical Science, vol. 4, no. 2, pp. 87–99, 1989.
  • [26] J. M. F. Ten Berge, “Orthogonal procrustes rotation for two or more matrices,” Psychometrika, vol. 42, pp. 267–276, Jun 1977.
  • [27] R. T. Samuel and Y. Cao, “Nonlinear process fault detection and identification using kernel pca and kernel density estimation,” Systems Science & Control Engineering, vol. 4, no. 1, pp. 165–174, 2016.
  • [28] L. H. Chiang, R. D. Braatz, and E. L. Russell, Fault detection and diagnosis in industrial systems. Springer Science & Business Media, 2001.
  • [29] P. Lyman and C. Georgakis, “Plant-wide control of the tennessee eastman problem,” Computers and Chemical Engineering, vol. 19, no. 3, pp. 321 – 331, 1995.
  • [30] H. Zhao, P. C. Yuen, and J. T. Kwok, “A novel incremental principal component analysis and its application for face recognition,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 36, pp. 873–886, Aug 2006.