A Novel Method For Designing Transferable Soft Sensors And Its Application

08/05/2020 ∙ by Hossein Shahabadi Farahani, et al. ∙ Synacor, Inc. 0

In this paper, a new approach is proposed for designing transferable soft sensors. Soft sensing is one of the significant applications of data-driven methods in the condition monitoring of plants. While hard sensors can be easily used in various plants, soft sensors are confined to the specific plant they are designed for and cannot be used in a new plant or even used in some new working conditions in the same plant. In this paper, a solution is proposed for this underlying obstacle in data-driven condition monitoring systems. Data-driven methods suffer from the fact that the distribution of the data by which the models are constructed may not be the same as the distribution of the data to which the model will be applied. This ultimately leads to the decline of models accuracy. We proposed a new transfer learning (TL) based regression method, called Domain Adversarial Neural Network Regression (DANN-R), and employed it for designing transferable soft sensors. We used data collected from the SCADA system of an industrial power plant to comprehensively investigate the functionality of the proposed method. The result reveals that the proposed transferable soft sensor can successfully adapt to new plants and new working conditions.



There are no comments yet.


page 1

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Intelligent condition monitoring plays a vital role in modern automation systems that are leading to a new industrial revolution [yin2014data]. Using methods for discovering information in a huge amount of data and facilities of Internet of Things (IoT) technology, many industries tend to deploy data-driven methods in order to achieve best insights into the condition of system’s operation. In this regards, process industries have taken the advantages of data-driven condition monitoring systems by utilizing SCADA systems for collecting huge amount of data from their process operation. Soft sensing is a remarkable application of data-driven condition monitoring systems in these industries.

In general, soft sensor or virtual sensor is used to make a conclusion based upon observed process variables whenever hardware measurements are not feasible [kadlec2011review, fortuna2007soft]

. Actually, soft sensor is a software by which several measurement’s signals are processed together in order to estimate the value of another variable of the systems. It has the advantage of being fast in responding and low in cost. Such a software can be utilized to tackle wide variety of industrial problems. It is used for reducing the cost of procurement and maintenance of hardware measurement

[kadlec2011review], fault detection and diagnosis [serpas2013fault], real-time estimation for monitoring or control [etien2013modeling], sensor validation [upadhyaya1992application] and normal behaviour modeling [schlechtingen2011comparative, makaremi2009abnormal].

Basically, soft sensors can be categorized into two different types, model driven and data driven, which are also called white-box and black-box, respectively [fortuna2007soft]

. The former, which is out the scope of this paper, employs first principle modeling based on physical knowledge of the system. On the contrary, the latter relies on the information extracted from the system’s historical data. Wide arrays of machine learning techniques have been used for data-driven designed soft sensors. Most of them are based on regression problem, such as support vector regression

[kaneko2014adaptive], Artificial Neural Networks (ANN) [wang2019two], gaussian regression[yuan2017probabilistic], partial least square [shao2015adaptive] and so forth. ANNs have been widely used as reliable tools for training regression models for soft sensing, since they are highly capable of capturing nonlinear dependencies of sensors data. Moreover, researchers have recently inclined to employ Deep Neural Networks (DNN) to construct better soft sensors [shang2014data, yao2017deep, yan2016data, yuan2020deep, yuan2019deep].

Soft sensors, as data-driven models, perform well under a general assumption that the training data distribution and the test data distribution are the same [ben2010theory]. Unfortunately, this assumption is not satisfied in many industrial machine learning applications since data distribution is altered due to practical issues. For instance, the data collected from each plant is slightly different from the other similar types of plants [yang2019intelligent], thus, models trained using the data collected from one plant can not be directly used for prediction in another plant. Another issue is that after years of operation, plants behave differently, a phenomenon which is usually known as aging or concept drift [kadlec2011review]. As a result, the model’s performance is likely to decline by time. Lastly, process models are not robust to variations in working condition [chen2019intelligent]. Consequently, models do not work properly when the plant meets new working conditions. From a machine learning perspective, label prediction regarding different data distributions is known as different tasks[pan2009survey]. Transfer Learning (TL) algorithms aim to utilize the knowledge collected during learning a task to learn a new but related task more efficiently[weiss2016survey]. TL can be used to handle problems caused by inconsistency of data distribution of industrial condition monitoring systems.

TL have drawn huge attention in machine learning community since the issue of inconsistency of data distribution is a common barrier to many machine learning application, specially image processing [csurka2017domain, wang2018deep]. As a result, TL methods have extensively improved by researchers in the last years. Nevertheless, little attention has been paid to the application of TL methods in improvements of the soft sensors. There exist few studies that have addressed application of TL in regression problems for condition monitoring purpose [zhang2018wind, cai2019probabilistic, qureshi2017wind], yet none of them is related to a process system. Also, some researchers have deployed TL methods in the fault detection and diagnosis of industrial systems. Yet again, rarely have these researches addressed the TL problem in process systems. They have mainly focused on fault detection of components like bearings and gearboxes in vibration systems using acceleration sensor data. For instance, [shao2018highly] and [han2020deep] introduced general frameworks for the fault diagnosis problem and achieved promising results in this area. To the best of authors’ knowledge, only a couple of studies related to the application of TL in fault diagnosis of gas turbines, a problem which is based on classification, are available [zhong2019novel, tang2019transfer].

As motioned before, the application of TL methods for improvement of the soft sensor systems has not been much discussed so far. On the other hand, results and conclusions drawn from the studies on the fault diagnosis of vibration systems can not be generalized to process systems, since the nature of vibration systems is far from process plants. Consequently, the effectiveness of TL methods in the domain of process data and for designing soft sensors in process plants still remains unclear. In this paper, we propose a novel TL-based regression method, named Domain Adversarial Neural Networks Regression (DANN-R), and employ it for designing transferable soft sensors. The proposed method is comprehensively examined using data collected from a real-world power plant. Our studies are divided into two groups of industrial scenarios, adaptation of a model to another gas turbines and adaptation of a model to another working condition. Up to the authors knowledge, this is the first time that a transferable soft sensor is designed for a process system and also the first time that TL is used between different working conditions of a process. Besides, the proposed method requires no labeled data from the target domain, which enables it to be employed despite of very restricted situations.

The remainder of this paper is structured as follows. The concept of TL and its formulation is elaborated in section II. Besides, a brief review on different categories of TL approaches is provided in this section. In section III, the structure of the neural network used for DANN-R and its training algorithm is introduced. Section IV is related to implementation of DANN-R for training transferable soft sensors and it consists of several parts in which the data sets, results and recipes for using DANN-R are discussed. The conclusion of this research is drawn in section V. Finally, possible future works are mentioned in section VI.

Ii Transfer Learning

Human beings are able to utilize the knowledge collected by learning a task to learn a new but related task more efficiently. The idea of TL is to actualize such a transition of knowledge in machine learning problems [pan2009survey]. Generally, the definition of TL is given in terms of source domain and target domain [pan2009survey]. Source domain is the domain from which the knowledge is collected and target domain is the domain to which this knowledge is applied. Generally, a domain of data is characterised by a specific data distribution.

The mathematical formulation of TL is as follows. The source domain is defined as , where is the feature space, , is the data and is the marginal distribution from which the source data is drawn. The corresponding ground truth of the source data is denoted by , , where is the output space. The assumption is that enough amount of labeled data from the source domain is available which enables training a predictive function to estimate the output based on the . Actually, is an approximation of the optimal function in the source domain .

Similar to the source domain, target domain is defined as . It is assumed that the feature space in both domains are the same,

but their probability distributions are different,

. However, no labeled data is available in the target domain. Therefore, it is not possible to train a predictive model dedicated to the target domain which can predict corresponding labels of target data, , based on . On the other hand, the model trained with the source data, , might not be an appropriate function for approximating the optimal function of the target domain, , , since the data distributions in these two domains are different. In this regards, the goal of TL methods is to find by using , and so that predicts the output in the target domain more accurately compering to a model that is trained only by source data, .

In our problems, is the space of gas-turbine’s sensors, is the space of estimated variable and is the soft sensor model. The source domain is related to the data sampled from a limited numbers of gas-turbines in a specific working condition. The target domain refers to the data sampled from gas turbine or a working condition other than the one in the source domain.

The methods and algorithms developed for TL can be categorized into three main groups [pan2009survey]. First, instance based methods, which assign weights to source instances based on their resemblance to target data. The amount of similarity of instances is measured in a probabilistic sense. These weights are used to train a model that can make more accurate predictions regarding the target data distribution [zadrozny2004learning].

Another group of methods, which is usually referred to as parameters-based TL methods, tries to collect the knowledge from the source domain via parameters of the model trained in the source domain. It is demonstrated that the features extracted by deep layers of a neural networks are domain adaptive

[donahue2014decaf]. In other words, these layers can extract abstract features that are meaningful for prediction in new related domains. In this group of methods, last layers of the neural networks are usually fine-tuned by limited available target samples.

Finally, the last group of methods is representation learning based methods which is also called feature-based methods. The main idea of representation learning based TL techniques is to learn a mapping that minimizes a notion of distance between domains along with the label predication risk in the source domain [ben2007analysis]. For example, some studies use Maximum Mean Discrepancy (MMD) as a criterion to measure domains distances between source and target data [tzeng2014deep]. Also, it is proved that the generalized error of classification between the source and target domain can be interpreted as score of divergence of the domains [ben2010theory]

. Based on this idea, domain adversarial training methods attempt to learn a representation in which the source and target data are indistinguishable, a goal which is achieved through the adversarial training between a feature extractor and a domain classifier


The idea of Domain Adversarial Training of Neural Networks (DANN) is first presented in [ganin2016domain]. Afterwards, other researches inspired from this paper and introduced new TL algorithms based upon the idea of domain adversarial training of neural networks [long2017deep], [hoffman2017cycada]. Besides, domain adversarial training is successfully used in wide arrays of applications in the topic of TL. For example, domain adversarial training is employed in a fully convolutional network for medical image segmentation [javanmardi2018domain]. Also, policies learnt in a simulation environment are transferred into real world in a robotic application using adversarial training of neural networks [zhang2019adversarial].

Iii Proposed Method

Although TL have drawn huge attentions in the community of machine learning, the main focus of studies is on the classification problem. Consequently, few researches have paid attention to application of TL in the regression problems. In this section, we propose a neural network structure for learning transferable regression models based on DANN, called DANN-R. It is successfully employed for designing transferable soft sensors that can adapt to new plants and new working conditions.

Fig. 1: The structure of the neural networks used for DANN-R.

Fig. 1 illustrates the architecture of DANN-R. This neural network consists of three major parts, feature extractor, regression model and domain discriminator. The input space is formed by an m-dimensional input data, , which is fed into the feature extractor, , with the model parameters . The feature extractor is a neural network that maps the input vector into an l-dimensional feature representation, . Under the representation of these features, a regression model, , with the model parameters , maps into a 1-dimensional space , which represents the space of corresponding output value of the input sample, . Moreover, is also introduced to the domain discriminator, with the model parameters . is a classifier that maps features into a 1-dimensional binary space , which represents the space of domain label of the input sample, . In other words, the domain discriminator tries to detect whether input instances are from the source domain or the target domain. When an instance is from the source domain or target domain, the output of domain discriminators is expected to be 0 or 1, respectively.

The training procedure is so that, while the features extracted from source and target by become more indistinguishable, the , which is trained based on only the source data, can predict the output value of target data more accurately. In DANN-R, such a feature extractor is found via adversarial training of and . Actually, is trained to classify the domain labels of extracted features. On the contrary, is trained to extract features that can not be classified between source and target. This adversarial game between these two neural networks helps both of them to gradually learn doing their desired function during the process of training[goodfellow2014generative]. Consequently, features extracted from the source and target data would be indistinguishable in terms of domain label.

The regression loss for the estimation of the output value as follow:


Also, the loss of classification of domains are defined as follow:


Our optimization goal is to find a feature extractor so that no domain discriminator can classify the features extracted from source and target samples accurately. On the other hand, the feature extractor is needed to make an appropriate representation for the regression model in order to accurately predict the output value. To find a feature extractor with the mentioned properties, an optimization problem is proposed in which the goal is to find a saddle point that optimizes the cost function


where and are the number of source and target data. As a result, optimal parameters of the networks are obtained as


The adversarial training of the neural networks, as a technique for optimizing ANNs parameters, is employed to tackle problem (4). In DANN-R, and are trained by using gradient descent approach in an adversarial procedure, such that in each step of performing gradient descent, one is updated to minimize the cost function and the other one is updated to maximize it. As shown by Fig. 1, the sign of the back propagated error from the domain discriminator is reversed after the feature layer. Meanwhile, in each step of updating model parameters, both and are trained to minimize the prediction error by back propagating the gradients of regression loss. Fig. 1 also illustrates that how the updating of is influenced by gradient flows from both and . Accordingly, The value of parameters’ updates are calculated as follow:


The hyper-parameter

is the feature extractor weighting parameter, which gradually decreases in each training epoch.

Since the parameters of are updated by the gradients propagated from the , its updating is influenced by parameters , during the training. It means that as is improved during the training process, it promotes to learn to extract more and more indistinguishable features. This will be discussed more in the next section.

Iv Results ans discussion

In this section, the proposed approach for designing transferable regression models, DANN-R, is used to design soft sensors which can deal with the issue of inconsistency of data distribution in the industrial gas-turbines. These soft sensors are designed to adapt to new working conditions and new plants. Besides, in this section we discuss how to find better training hyper-parameters for DANN-R by monitoring accuracy of the network’s predictions during the training.

Iv-a data set

We used an industrial process data set, which is collected from the SCADA system of a natural gas power plant. The power plant consists of five power units. Each of these units utilizes a Siemens™heavy-duty gas-turbines of class E [Siemens].

Gas-turbines experience various operation modes during their life time. Activation of a mode depends on the condition of the turbine operation. Most of these modes are related to transients, thus, they are met only during very short periods of times. Additionally, the behaviour of gas-turbines in different operation points are not the same. Consequently, data driven condition monitoring based on system’s historical data during many of these modes is not meaning-full. We select SCADA data from load control and frequency control, which are dominant operation modes of the system consisting roughly 95% of gas turbines’ data. The aim of the former mode is almost to generate the desired active power by gas-turbine, while in the latter, the control goal is to keep the turbine’s shaft rotational speed close to the desired reference.

The data collected from different units of the power plant are used to study the ability of DANN-R algorithm for learning transferable soft sensors. Two groups of TL learning scenarios are studied. In the first scenario, TL between data sets which are sampled from different units of the power plant is studied. In the second scenario, TL between data sets with different ambient temperature is studied. In both problems, the assumption is that target data sets are unlabeled. However, to evaluate the designed soft sensor the error between the soft sensor outputs and the actual output in the target domain is calculated.

Iv-B Scenario 1: TL between the plant units

Practically, the distribution of data collected from machines or plants of the same type are different from each other. This discrepancy is derived from different maintenance events that each unit experiences, measurement settings, mechanical behavior, and so forth. In this part, the aim is to evaluate the capability of DANN-R for TL between different machines. The source and target data are collected from the gas-turbines of different units of the power plant. Therefore, all environmental condition, like the temperature, humidity and the plant site altitude are the same.

In this part, we design soft sensors that predict the value of the active power. The input variables of the soft sensor models are introduced in Table I. This set of variables are selected according to the performance analysis of the gas-turbines [hanachi2018performance], [bartolini2011application]

. Three single layer neural networks with proper dimensions are selected for the feature extractor, domain discriminator and regression model. This is different from the feature extractor in original DANN [14] used for classification where ReLU activation function is used. We used sigmoid activation function in feature extractor, since we find it more functional for capturing non-linear relations in data sets when the depth of networks in not high. The feature extractor consists of 60 neurons with sigmoid activation function. The regression model and domain discriminator are logistic regression layer and regression layer, respectively.

Fig. 2 depicts the results of applying DANN-R for designing a transferable soft sensor in the case of TL between two different plants. The figure includes the plots related to the prediction of models in the both source and target domains. The real value of the active power, the prediction of the model trained without TL and the prediction of our transferable soft sensor are shown respectively by red, green and blue plots. The model trained without TL, means the model that is trained by using only source data. In the target domain, the green plot is unable to properly follow the ground truth. On the other hand, it can be seen that the prediction of the proposed DANN-R transferable soft sensor, i.e., the blue plot, provides far more accurate estimation for the real value of the active power.

Input Sensors Estimated Sensor
Ambient temperature
Ambient humidity
IGV angle
Fuel flow
Active power
TABLE I: Input and output of models studied in the case of transfer learning between different machines.
Fig. 2: Transfer learning from a gas-turbine to another gas-turbine.

Fig. 2 also shows the performance of trained models in the source domain. Usually, in the source domain, the prediction of the models trained without TL is better compered to the models trained using DANN-R. Indeed, using DANN-R for designing transferable soft sensors lowers the performance of models in the source domain, which is caused by the multi-objective training procedure of DANN-R. The term related to the adversarial training of domains in (3), in a sense, interfere with training of the model for regression in the source domain. Actually, the adversarial game between the feature extractor and the domain discriminator prevents the feature extractor from providing the best possible features for regression in the source domain. However, in TL problems, the accuracy of models in the source domain is not a concern because the main focus is to enhance the models performance in the target domain, where the intended model operates.

Table II and Table III provide quantitative results of implementations in this scenario. We have defined the transfer ratio score in order to evaluate the performance of transferable soft sensors. It is calculated as follow:


This score by some means shows that how the transferable soft sensor can enhance the estimation of the output variable. With industrial applications in mind, transfer ratio in both tables are appealing.

Target Source Source MSE Without TL Source MSE Using TL Target MSE without TL Target MSE Using TL transfer ratio
Unit 1 Another turbine 0.0016 0.0013 0.0262 0.0219 1.20
Unit 2 Another turbine 0.0017 0.0059 0.0061 0.0022 2.77
Unit 3 Another turbine 0.0059 0.0030 0.0235 0.0116 2.03
Unit 4 Another turbine 0.0054 0.0062 0.0112 0.0031 3.61
Unit 5 Another turbine 0.0015 0.0072 0.0314 0.0062 5.06
Average 0.0032 0.0047 0.0197 0.0090 2.93
TABLE II: MSE of soft sensors prediction in cases of transfer learning from one machine to another machine
Target Source Source MSE Without TL Source MSE Using TL Target MSE without TL Target MSE Using TL transfer ratio
Unit 1 All other turbines 0.0019 0.0018 0.0144 0.0136 1.06
Unit 2 All other turbines 0.0032 0.0026 0.0111 0.0071 1.56
Unit 3 All other turbines 0.0047 0.0044 0.0025 0.0008 3.13
Unit 4 All other turbines 0.0043 0.0041 0.0017 0.0014 1.21
Unit 5 All other turbines 0.0063 0.0064 0.0038 0.0018 2.11
Average 0.0041 0.0038 0.0067 0.0049 1.81
TABLE III: MSE of soft sensors prediction in cases of transfer learning from multiple machines to one machine

In both Table II and Table III, target domain is related to a single gas-turbine. Table II presents implementations results in which only one other gas-turbine is selected as the source domain, while Table III presents the results of implementations in which the source domain includes all other gas-turbines of the power plant.

The average of the MSE in the target domain without TL in the Table II is in average about 5.18 times higher than that of Table III, but source MSE without TL in this table is in average 1.25 times lower. This results are not far from expectations since each source domain in Table III includes data collected from other four turbines. In other words, data distribution of source domains in Table III are richer, thus, these data sets have a better generalization that enables the models trained in the source domain, even without TL, to make relatively accurate predictions on the target domain. On the other hand, it is more difficult to train an accurate model for regression in the source domains of the experiments in Table III because of high diversity of the data, which results in increase of the MSE in source domain.

Comparing the MSE of the target domain in Table II and Table III with and without TL reveals that the proposed method can enhance the generalization of the trained model to the target gas-turbine by remarkable ratios and decrease the MSE of models in the target domain by 2.93 and 1.81 times on average, respectively. On the other hand, the improvement of Table III suggests that even in many cases that training set consists of rich data sets and it may be believed that the generalization of the training data is high, still DANN-R is likely to considerably improve the performance of the models in the desired domain in which the model will be used.

The results in Table II and III also indicates that the transfer learning degrades the performance of the regression model in source domain. As mentioned before, the reason for this fact is that during the training of models using DANN-R, along with predicting the output variable in the source domain, the models are also trained to source and target domains. Therefore, they are not optimally trained to perform the regression task in the source domain. The analysis of models’ accuracy in the source domain is discussed in more detail in section IV-D.

Iv-C Scenario 2: TL between working conditions of the same unit

Input Sensors Estimated Sensor
Ambient temperature
Ambient humidity
IGV angle
Active power
Compressor outlet temperature
Fuel flow
TABLE IV: Input and output of models trained studied in the case of transfer learning between different working conditions.

Unit Number

Source Amb. Temp. range (C)

Target Amb. Temp. range (C)

Target MSE Without TL

Target MSE Using TL

Source MSE Without TL

Source MSE Using TL

TL Ratio

Impact of wider Source

Impact of wider Target

15 to 25 30 to 35 0.0239 0.0095 0.0113 0.0107 2.52 0.72
1 20 to 25 30 to 35 0.0326 0.0131 0.0166 0.0158 2.48
20 to 25 30 to 40 0.0230 0.0162 0.0182 0.0195 1.42 1.23
15 to 25 30 to 35 0.0265 0.0042 0.0086 0.0093 6.36 0.51
2 20 to 25 30 to 35 0.0204 0.0081 0.0118 0.0125 2.52
20 to 25 30 to 40 0.0407 0.0088 0.0114 0.0117 4.61 1.09
15 to 25 30 to 35 0.0044 0.0034 0.0027 0.0031 1.27 0.52
3 20 to 25 30 to 35 0.0081 0.0066 0.0056 0.0059 1.24
20 to 25 30 to 40 0.0084 0.0070 0.0059 0.0058 1.19 1.07
15 to 25 30 to 35 0.0064 0.0058 0.0062 0.0065 1.11 0.73
4 20 to 25 30 to 35 0.0118 0.0079 0.0092 0.0086 1.49
20 to 25 30 to 40 0.0115 0.0095 0.0085 0.0090 1.22 1.20
15 to 25 30 to 35 0.0063 0.0052 0.0064 0.0065 1.20 0.51
5 20 to 25 30 to 35 0.0211 0.0102 0.0104 0.0112 2.06
20 to 25 30 to 40 0.0154 0.0135 0.0108 0.0109 1.14 1.32
Average 0.0174 0.0086 0.0096 0.0098 2.12 0.60 1.18
TABLE V: MSE of soft sensors prediction in cases of transfer learning in different working conditions
Fig. 3: Transfer learning between different working conditions

Models that are trained in different working conditions of process plants might not be able to adapt to new working conditions. We employ DANN-R in order to tackle this challenge and investigate the applicability of our method in such problems. In this scenario, we design soft sensors that predict the value of the fuel flow. The set of input variables of the soft sensors are presented in Table IV. The structure of the models used in this section is identical to the structure used in the previous part.

In this scenario, the source and target samples are selected based on their ambient temperature, which its variation extremely affects the distribution of the collected data. In other words, source and target samples are related to some specific range of the ambient temperature. The condition ambient temperature is correlated with the operating point of the units. The average value the active power for in the source domain is about 111.5 (MW), while this value is about 132 (MW) for the target domain which means that the source and target data are related to different operating points of the power plant. Basically, this is because the demand for electric power is higher in the warmer days of the year.

Fig. 3 provides a comparison between the performance of the models that are trained using DANN-R and without transfer learning. In this case, it can be seen that even though the range of fuel flow is remarkably different between the source and target domains, the soft sensor designed based on DANN-R can transfer the knowledge collected in the an operation point to another operation point. In other words, the figure shows that the blue plots plot is a closer approximation of the ground truth compered to the green plot.

Table V provides the quantitative result in this scenario. The proposed method is applied for three different working conditions of every unit. High value of TL Ratio in all cases presented in this table indicate that the designed transferable soft sensors are able to improve the accuracy of models regrading the issue of inconsistency of data distribution resulted from changing working conditions. As it is denoted in Table V, the average of TL ratio in these cases are about 2.12.

In this scenario, we also investigate the effect of the range of working conditions that is defined by the source or target samples. In the Table V, for each of the units, three cases of TL problems are studied. These cases are different from each other with respect to the range of ambient temperature of the source and target domains. Case 2 is used as a , in which the ranges of the ambient temperature for the target and source domains are between 20 to 25 C and between 30 to 35 C, respectively. Case 3 has wider temperature range in the target domain (between 30 and 40 C) while case 1 includes wider temperature range on the source domain (between 15 and 25 C). The ratio of MSE in each of the cases 1 and 3 to the case 2 depicts the impact of the change of the domains in the TL. As it is presented in Table V, in case 3, the target MSE using TL is 1.18 times higher than the reference problem on average. On the other hand, case 1, the target MSE using TL is 0.60 times of the reference case 2 on average. These results suggest that the TL becomes more challenging as the target domain includes a wider range of operating points, while having a more divers source domain can improve TL. The later consistent with the result of scenario 1, in which including data of all units improves the TL efficiency.

Iv-D Recipes for finding better training hyper parameters in DANN-R

Fig. 4: The trend of accuracy of domain discriminator and the MSE of regression models during the process of training for three different setting of training hyper-parameters.

The hyper-parameters of training have always been critical factors for training neural networks specially when they are trained regarding multiple objectives. In this part, we discuss how to get an intuition for setting learning hyper-parameters of DANN-R by monitoring the accuracy of domain discriminator and regression model during the process of training.

Based on (5), two weighting hyper-parameters, and , impact the training of DANN-R which their values must be carefully determined. A primitive approach for finding appropriate weights is exhaustive search, but it is almost impractical for being extremely time consuming. One solution is to set learning rates to be inconstant during the process of learning. We used the formulas (7) which is suggests by [ganin2016domain].


In this formula, , , , and linearly increases from 0 to 1 as the process of learning goes on. It is important to keep in mind that these weights are not necessarily appropriate for every data set.

In the process of the training of DANN-R, the feature extractor is trained with respect to two terms of the cost function (3), MSE and Cross-entropy. The ratio of the average weights of these two terms for updating the parameters of the feature extractor is very important. For getting an intuition about how to set the learning hyper parameters, a rule of thumb is monitoring the regression and domain classification accuracy during the learning process. Fig. 4 shows plots of MSE and classification accuracy of regression model and domain discriminator during the training process. For having a better understanding of the behaviour of these plots, they are filtered via a moving average window with the length of 3 samples. Three different cases are compered in this figure. The parameter shows the average of during the process of training. The red plot is related to the case that has biggest and green plot is related to the case with lowest .

Red plots are related to an experiment in which the feature extractor is extremely updated regarding gradients propagated from domain discriminator. In such cases, the regression model would hardly learn to predict the output and the MSE oscillates during the training. The classification accuracy would oscillate as well and does not succeed in proper classification of samples. It means that the feature extractor and domain discriminator are very busy in competing with each other during the process of training. In such a case, domains are perfectly aligned via the feature extractor, but the extracted features are not appropriate for regression. Consequently, the trained model can perform regression in neither the source domain nor the target domain.

Green plots in Fig. 4 are related to an example in which the feature extractor is perfectly updated by gradients propagated from the regression model. Therefore, the MSE continuously during the learning procedure. Rarely do the classification accuracy oscillate in such cases. In some data sets, the domain discriminator may even preserve complete classification accuracy during the training since the feature extractor is not much involved in the adversarial game. In these cases, although the model is well trained for regression, but the domains are not aligned. Therefore, the trained model is not adapted to the target domain.

Usually, the optimal proportion of the weights of MSE and Cross entropy loss is observed to be a compromise between the regression accuracy in the source domain and classification accuracy of domain discriminator. The case related to blue plots in Fig. 4 is an example that hyper-parameters of training are appropriately selected. In this situation the MSE is reducing during the training but still the regression model is struggling to learn to reduce the MSE. On the other hand, there exits more inconsistency in accuracy of the domain discriminator compering to the green plot but still is not fluctuating like the red plot.

V Conclusion

In this paper, we successfully contributed to finding a solution for a crucial obstacle in data-driven condition monitoring of process systems which is caused by the issue of inconsistency of data distribution. We proposed a Transfer Learning (TL) based regression method for designing transferable soft sensors, Adversarial Neural Networks Regression (DANN-R), and extensively examined it by data collected from SCADA system of a power-plant.

We demonstrated that by using our method it is possible to extract knowledge from historical data of a gas turbine in a specific working condition and transfer it to either another turbine or another working condition. Both scenarios are likely to challenge condition monitoring systems in industrial practices of digitization. In the case of TL between machines, we demonstrated that a model trained with a rich data set ,which consists of samples from several gas-turbines fleet, has a great generalization by itself, yet still our method decrease the MSE of predictions on the target domain by 1.82 times on average. In the case of TL between working conditions, we showed that our method can work despite of remarkable shifts in the operating points. Also, we found that having a data set that covers bigger range of working conditions as the source data is likely to work better for TL.

The functionality of TL methods for soft sensor problems in process systems has been uncertain so far. Providing an approach for designing transferable sensors, this paper reveals that TL can dramatically enhance the performance of models in these problems. By using our transferable soft sensor, it is possible to predict the value of sensors that are defective or not installed in a power plants via the knowledge transferred from other gas turbine fleets. For instance, Lower Heating Value (LHV) sensor [Bhatia2012], which the hardware sensor is hard to be maintained and expensive to operate. Furthermore, there are some sensors that are installed in system only during limited periods of time, for example during Performance Guarantee Test (Commonly known as PG Test) whose values are very useful for condition monitoring purposes. A model trained with data gathered during such a limited period of time might not be able to make accurate predictions in all working conditions. Again in such cases, our transferable soft sensor can be useful.

Vi Future Works

The study of functionality of the DANN-R for coping with the issue of changing gas-turbine’s behaviour with time ,the aging phenomenon, have been left for the future due to lack of data. Future work concerns deeper analysis on the data collected from more power plants and bigger data sets which may lead to new findings.