A General Data Renewal Model for Prediction Algorithms in Industrial Data Analytics

08/22/2019 ∙ by Hongzhi Wang, et al. ∙ Harbin Institute of Technology 0

In industrial data analytics, one of the fundamental problems is to utilize the temporal correlation of the industrial data to make timely predictions in the production process, such as fault prediction and yield prediction. However, the traditional prediction models are fixed while the conditions of the machines change over time, thus making the errors of predictions increase with the lapse of time. In this paper, we propose a general data renewal model to deal with it. Combined with the similarity function and the loss function, it estimates the time of updating the existing prediction model, then updates it according to the evaluation function iteratively and adaptively. We have applied the data renewal model to two prediction algorithms. The experiments demonstrate that the data renewal model can effectively identify the changes of data, update and optimize the prediction model so as to improve the accuracy of prediction.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In the industrial processes, though strict regulations and stable operating conditions are required, even the most sophisticated machines cannot avoid the runtime exception [1]. Due to a large number of human interactions in the industrial manufacturing process, human errors may cause abnormalities in the overall process. According to statistics, the maintenance cost of various industrial enterprises accounts for about 15%-70% of the total production cost [2]. Therefore, how to conduct malfunction analysis, yield dynamic prediction and make timely fault detection for industrial processes to ensure the effectiveness and efficiency of the production process have received much attention in both academia and industry.

Because of the numerous sensors with high sampling frequency in industrial processes, the devices will accumulate large amounts of data in a short time interval. As time goes on, some parameters related to production prediction and fault detection are imperative to change synchronously due to the equipment aging, abrasion and so on. However, the currently known prediction algorithms in industry, mainly including artificial intelligence and data-driven Statistical methods

[3] [4] , are all constrained by time, up to a point. In other words, these prediction models can only accurately reflect the state of the industrial equipment in a certain period, whereas the inaccuracy increases over time.

Additionally, in the problems of malfunction diagnoses and predictions based on transfer learning

[14][15], though there are various kinds of faults in industrial processes, the similarities of them can be utilized to conduct the transfer learning of the malfunctions, so as to predict the faults efficieintly and effectively. We will give an outline of a transfer-learning-based fault prediction algorithm in Section 3. However, in our experiments, we find that the transferability of two different types of equipment in the same technological process considerably decreased if the data used in the process are at different periods. It shows that the crucial precondition for the application of transfer learning to industrial time-series data is that the data of the two different types of industrial equipment in the same production process should also be in the same period. Therefore, the prediction model is required to recognize the changes of industrial data stream and update the parameters automatically with the lapse of time.

This paper focuses on the automatic update and replacement of the model based on industrial time-series data, which are featured with periodicity and complex correlation. To address these problems, this paper proposes a general data renewal model based on lifelong machine learning[16][17][18]. It can be applied to some prediction algorithms to improve their accuracy. The main idea of the model is to assess the freshness of the industrial time-series data according to the existing prediction model and the new data stream. Then it can decide whether to invalid the old model and retrain a new one according to the similarity function and the loss function.

In our previous work, we attempted to establish a time-series forecasting model system which could solve the problems of both discrete and continuous variable prediction. We have proposed a time-series yield prediction algorithm and a transfer-learning-based fault prediction algorithm, which will be roughly described in Section 3. However, their practical effects were limited due to the reasons mentioned above. Therefore, in our experiments, we will apply the data renewal model to them and testify its effectiveness by making a comparison between the learning models with and without a data renewal model.

This paper makes following contributions.

  • We propose a general data renewal model combined with the similarity function and the loss function. It can be applied to some industrial prediction algorithms to find the regulations of renewing the prediction model based on industrial time-series, so as to update the prediction model opportunely and iteratively.

  • Through self-learning and automatic updating, the prediction algorithms applied with the data renewal model can be improved over time, thus reducing human interventions and perfecting the algorithm performance in industrial processes.

  • We evaluate the proposed model on three datasets and two prediction algorithms in industry. The results demonstrate that the model can be updated effectively according to the industrial data stream, and the accuracy of the predictions can be increased by at least 33%, which is a significant improvement.

The rest of the paper is organized as follows. In Section 2, firstly, we define the problem and our target, then describe the model and approach in detail. And we apply our data renewal model to two prediction algorithms. The tuning process, brief introduction of the two prediction algorithms and the experimental results are presented in Section 3. Section 4 makes a summary of the paper; meanwhile, it explains the future work.

Ii Prediction Algorithms combined with Data Renewal Model

Ii-a Task Definition and Overview

In a prediction algorithm, we build a renewal model based on the data, then continuously update the prediction model according to the data similarity and the loss function. In practical problems, the inputs are often time series. Given a sequence of data points measured in a fixed time interval, , and the existing model . The output is the updated prediction model .

The problem is formalized as follows. Given the data of a piece of industrial equipment acquired from time to time , for model , there exists a function , such that


In the meantime, keeps on running. Given the data of from time to time , , for model , there exists a function , such that


If , the model does not need to update. If , the model needs further analysis to determine whether it should be updated. If it is, then let .

In practical industrial problems, are time series. Therefore, we need to choose the model according to their features. Since the model often has a complicated mechanism and the correlation may be nonlinear,

is difficult to find an analytical solution. To achieve better performance, we often use neural network algorithms, such as BP Neural Network


, Convolutional Neural Network


and Recurrent Neural Network (LSTM)


In most cases, the loss function is enough to measure and determine whether a model needs to update. However, since the industrial data are often discrete, continuously collected and transferred by the sensors, the traditional mechanism model is ineffective. As a result, the model should be based on data instead of mechanism. For accurate estimation, the similarity function and the loss function are combined to estimate the time of updating the existing model automatically, then the model is retrained on the calculated time iteratively.

Ii-B Data Renewal Model

Fig. 1: The Components of the Data Renewal Model

The primary problem of the updating algorithm is to build a data renewal model to predict when the model ought to be updated. As shown in Figure 1, the model achieves this by two considerations. One is the similar of the original data and the new data. If they are similar enough, the model does not have to be updated or retrained. It is measure by the similarity computation, which will be described in Section II-B1. The other is the applicability of the existing model. If the model loses efficacy in new data stream, it should be updated or even retrained. It is measured by the loss function, which will be described in Section II-B2.

Ii-B1 Similarity Computation

The data similarity computation begins with analyzing the changed extent of the unprocessed data in the previous moment with the data in the next moment. Chiefly, the data are classified into two types: the binary attribute data and the numeric data. They will be discussed in this section, respectively.

For the binary attribute data, the similarity is measured by the counting method. That is, for time series and , assuming that and , the -th dimension of and , are both binary attributes, the number of data pairs of them both belong to the first state is , while the number of data pairs are both the second state is , the total amount of data is , then their similarity is shown as follows.


For a specific example, , , the 2nd and 3rd numbers of and are both 0, so , the 5th, 7th and 8th numbers of and are both 1, so , .

In the case of the numerical data, generally, there are two methods to measure the similarity degree: similarity coefficient and similarity measurement. Pearson Correlation Coefficient, as shown in equation (4), can be used to avoid the error of similarity measurement resulting from the severe dispersion of industrial data. It has a value between and , where is total positive linear correlation, 0 is no linear correlation, and is total negative linear correlation.


For the data in the same set but different periods, they also have a certain degree of similarity. In order to better measure the similarity between them, we take the absolute value and ignore the positive and negative correlation. In conclusion, for the purpose of achieving better performance, we adopt the similarity coefficient method here and modify the Pearson Correlation. The new coefficient formula is shown as follow.


With respect to different data vectors in the same data set, the similarity is gained by calculating the mean value of them and adjusted overall using the parameter

, which is shown in (5). In most cases, , if the data set does not contain attribute , then .


According to (3), (4) and (5), it can be concluded that the value of the similarity is in the range . The higher the value is, the higher the degree of similarity should be between the two data sets. When the value of the similarity function is lower than a threshold , the model needs to be updated.

Ii-B2 Loss Function

The loss function aims to determine whether to update or abandon the old model and train a new one. In fact, the updating is to adjust the parameters in the existing model. Therefore, it is necessary to estimate the effect of the model, and generally it is measured by loss function.


where is the loss function, is the regularization term or the penalty term. As regards to specific problems, we use a more specific loss function for analysis. One of the typical problems is the updating of the industrial big data model, which mainly involves two aspects, the model for continuous data such as yeild prediction, and the model for categorical attribute data such as fault prediction. According to the features of different models, we should define the corresponding loss function for estimation.

For continuous data prediction, we use RMSE (Root Mean Square Error) to evaluate the loss, since it shows greater estimation results than quadratic loss function in our experiments.


For classification prediction problems, 0-1 loss function, Log loss function, Hinge loss, and perceptual loss function

[27] [28] [29] are generally used to estimate the quality of the model. Here, we use the perceptual loss function:


In the loss function, two questions should be taken into account, whether to update the existing model and whether to abolish the old one. The loss of the existing model is , then the new data are collected and put into it for training, and the loss of the new model is . Then the change rate of the loss is computed as follow.

Fig. 2: The State-determining Schematic Diagram of the Loss Function Module

In the formula, obviously, is greater than 0. The threshold values need to be set to determine when to update and discard the model (retrain a new one). So the they are set as follows: if , the original model will be discarded and a new model will be built through retraining. If , the model is updated: new data are added to the model for training, so the model parameters are changed. If , the original model state is retained with no update. The standards are shown in Figure  2. The thresholds will be adjusted according to the experiments.

Algorithm 1 describes the procedure of the data renewal model. Line 1 analyzes the data similarity of period and period . Lines 2-11 decide whether to update the model according to the data similarity. Lines 3-9 calculate the value of loss function in the period and and output the flag bit according to the state-determination rules.

1:The data set in period ; The data set in period
2:The updating flag of the model
4:if  then
5:       ;
6:       if  then
7:             return 2;   // Discard the model and retrain a new one
8:       else if  and  then
9:             return 1;   // Add the new data and update the model
10:       else
11:             return 0;   // Retain the model        
13:       return 0;  // Retain the model
Algorithm 1 update(,)

Ii-C Algorithm Description

The data renewal model is based on the data similarity and the loss function. Furthermore, to control the updating frequency, the size of the new data should be controled. That is, the similarity will not be calculated until a certain amount of data has been accumulated. Only when the threshold of the data size is reached, will the loss function of the data be computed, and the corresponding processes of the model be performed.

Algorithm 2 describes the procedure of updating according to Algorithm 1 and the algorithm flow discussed above. Lines 1-4 decide whether enough new data have been accumulated, and Lines 5-12 decide whether to update the model referring to the data renewal model in Algorithm 1.

1:The data set in period ; The data set in period ; The minimum (threshold) amount of data ,the model
2:The model
4:if  then
5:       return ;
7:       ;
8:       if  then
9:             ;        
10:       if  then
11:             ;        
12:       if  then
13:             ;        
14:return ;
Algorithm 2 lifelong(,)

The complexity analysis of the algorithm is focused on the data renewal model.

Time Complexity Analysis Let the data size in the period and be and , respectively. The cost of similarity computation is . The loss function add the data in period into the existing model and compares the new loss with the previous one. It only needs one round of calculation, so the time complexity is . Therefore, the overall time complexity is

Space Complexity Analysis Let the needed space of the data in period and be and , respectively. The similarity matrix costs . is required to store the intermediate results calculated by the data in period to calculate its loss function. As a result, the overall spatial complexity is =

Iii Experiments and Evaluation

Iii-a Experimental Environment and Datasets

The experimental environment and datasets are shown in Table 1. Among them, the industrial boiler dataset and the generator dataset are generated by real-time production system in the Third Power Plant of Harbin. The boiler dataset includes more than 400,000 pieces of data, whose dimension are up to 70, and the main attributes involve time, flow, pressure, and temperature. Moreover, the generator dataset includes more than 80,000 pieces of data, whose dimension are up to 38, and the major attributes include time, speed, power, pressure and temperature.

In this section, the data renewal model is respectively applied to the time-series yield prediction algorithm and the transfer-learning-based fault prediction algorithm to verify its effectiveness in model updating and optimization.

Machine Configuration 2.7GHz Intel Core i5 8GB 1867 MHz DDR3r
Experimental Environment

Python 3.6.0; Tensorflow

Datasets Industrial Boiler Dataset; Industrial Generator Dataset; Synthetic Industrial Generator Dataset
Algorithms The Time-series Yield Prediction Algorithm; The Transfer-learning-based Fault Prediction Algorithm
TABLE I: The Experimental Environment and Datasets

Iii-B The Optimization of the Prediction Algorithm Based on Data Renewal Model

The data renewal model has two critical parameters, the similarity and the change rate of loss. Since the thresholds may have an impact on accuracy, we are imperative to test their impacts.

We first test the impact of the similarity. The goal of the similarity is to estimate the similar degree of the data at different time. The higher the value is, the more similar the data are. Therefore, the similarity threshold can neither be too low nor too high. Initially, it is set to 0.3, 0.5, and 0.7, while the loss rate threshold is set to 1/0.4. In order to observe the changes with different thresholds more intuitively, we set the original number of tuples to 10,000. The model is assessed whether to be updated according to the flag bit for every 10,000 pieces of data added.

Fig. 3: The Impact of the Similarity for Algorithm 1

As shown in Figure 3, when the similarity threshold is 0.3, the model updating frequency is low, while the model updates too often when the threshold is 0.7. To ensure that the frequency is kept at an appropriate level, the similarity threshold is set to 0.5 in the next experiments .

Fig. 4: The Impact of the Change Rate for Algorithm 1

Then we test the impact of the change rate of loss. We vary the rate of loss as 1/0.4, 0.9/0.4, 0.8/0.4, 1/0.3, 0.9/0.3, and 0.8/0.3. The results are shown in Figure  4. When the flag is 2, the model is discarded. When the flag is 1, the model is updated, and it is retained if the flag is 0. It is found that the discarding and updating of the model shows a more balanced frequency when the threshold is 0.9/0.3.

Fig. 5: The Impact of the Similarity for Algorithm 2
Fig. 6: The Impact of the Change Rate for Algorithm 2

The similarity and the loss rate are tuned in a similar process for Algorithm 2. The experimental results are shown in Figures  5 and  6. We have observed that when the similarity is 0.5 and the loss rate is 0.9/0.3, the updating is more effective. When the thresholds are moderate, the model can execute the three actions, update, discard or retain the model in a proper and balanced frequency.

From the experimental results described above, after tuning the prediction algorithm based on the data renewal model, the threshold is fixed at the similarity of 0.5, and the loss rate is 0.9/0.3. The experimental results show that the algorithm does not need to be tuned frequently and the fixed parameters can also achieve great experimental results.

Iii-C Results for the Time-series Yield Prediction Algorithm

We test the effectiveness of the proposed model renewal strategy on the time-series yield prediction algorithm. It’s an LSTM algorithm based on multi-variable tuning. The algorithm improves the traditional LSTM algorithm and converts the time-series data into supervised learning sequences utilizing their periodicity, so as to improve the prediction accuracy.

Fig. 7: The Procedure of the Time-series Yield Prediction Algorithm

The LSTM algorithm based on multi-variable tuning is divided into three modules, a data transform module, an LSTM modeling module, and a tuning module. The data transform module converts the time-series data into a supervised learning sequence, and simultaneously searches for the variable sets which are most relevant and have the highest Y regression coefficient; the LSTM modeling module connects multiple LSTM perception to form an LSTM network; the tuning module adjusts the parameters according to the RMSE in each round, and returns the adjusted parameters to the data transform module for training iteratively. Through continuous iteration, the approximate optimization solution of the algorithm is obtained. The algorithm process is shown in Figure  7. The following is the result of applying the data renewal model to the yield prediction algorithm based on time-series.

After determining the optimal thresholds, we selected different update frequencies, 10,000, 50,000 and 100,000 pieces of data for each batch, to verify the effectiveness of the model.

Fig. 8: The Result at a Frequency of 10000 pieces/batch

First, we set the data stream updating frequency to 10,000 pieces/batch. The experimental results are shown in Figure  8. The RMSE changes with the continuously updating of the data stream. During this process, the model’s RMSE remains unchanged (the flag is 0) or decreases (the flag is 1 or 2). As time goes by and the data accumulate, the RMSE of the model constantly decreases with an irregular frequency.

Fig. 9: The Result at a Frequency of 50000 pieces/batch

When the data updating frequency is 50,000 pieces/batch, the experimental results are shown in Figure  9. It is found that when the updating frequency of the data is reduced, the RMSE suffers different degrees of reduction, which indicates that different data update frequencies will affect prediction accuracy. Nevertheless, when the amount of data tend to be consistent, overall, the RMSE can be reduced to a specific constant with less error.

We set the frequency for data updating to 100,000 pieces/batch. Let the original data be 100,000 pieces, and take the reduction degree of RMSE as its updating accuracy. As shown in Table 2, the algorithm can perform effective model updating in various frequencies. Moreover, the updating accuracy is up to 63.94%.

No. Data Volume Similarity Loss Rate RMSE Accuracy
1 100000 0.34 0.592957366 30.17 -
2 100000 0.44 0.795650783 10.88 63.94%
3 100000 0.89 0.122410221 10.88 -
TABLE II: Experimental Parameters of the Yield Prediction Algorithm

Iii-D Results for Transfer-learning-based Fault Prediction Algorithm

In the problems of malfunction diagnosis and predictions, though there are various kinds of faults, the similarities of them can be utilized to adopt the transfer learning of malfunctions, so as to predict the errors efficiently and effectively.

Fig. 10: The Schematic Diagram of the Transfer Learning Based Fault Prediction Algorithm

The transfer-learning-based fault prediction algorithm mainly consists of three modules, the time window module, the mapping network module, and the model transfer module. The time window module preprocesses the data and conducts similarity analysis based on the time window. The mapping network module mainly uses the step of the similar region of time-series data to transfer the data, and construct the mapping network with the converted data. The model transfer module mainly utilizes the mapping network to transfer the trained model, which is prepared with the neural network method to construct a deep learning network. Figure  

10 shows the structure of the algorithm.

In the experiment, we suppose that there are two devices, device with labeled data and device with unlabeled data. The output is the fault detection model of device . The update of the algorithm is mainly composed of two parts. For device , the update target is the fault prediction model. For device , if the data change, it is the mapping network module that should be updated. Therefore, our experiments are also divided into two parts.

For the prediction of device , the original model is evaluated using the AUC (Area Under the Curve) [30]. When the update frequency of the data is 10,000 or 50,000 pieces/batch, the prediction accuracy AUC is about 0.97. When the frequency is 100,000 pieces /batch, the prediction accuracy AUC is about 0.958, and the loss is less than 1%, so the model need not be updated after the initial modeling.

For device which gets new data, the updating process is more complicated. The mapping network module is updated first. Then the prediction model of device through the mapping network is trained. Finally, the prediction model of device is analyzed. The experimental results are similar to that of the time-series yield prediction algorithm. We set the update frequency to 10,000, 50,000 and 100,000 pieces/batch for the verification.

Fig. 11: The Result at a Frequency of 10000 pieces/batch

During updating, the model’s AUC remains unchanged (the flag is 0) or increases (the flag is 1 or 2). The experimental results with a frequency of 10,000 pieces/batch are shown in Figure  11. With the accumulation of data over time, the AUC of the model is continually increasing, which indicates better performance.

Fig. 12: The Result at a Frequency of 50000 pieces/batch

When the data update frequency is 50,000 pieces/batch, the experimental results are shown in Figure  12. It is observed that the degree of updating varies with different updating frequencies. When the amount of data tends to be consistent, the values of AUC will get stable.

We set the frequency for data update to 100,000 pieces/batch, and take the AUC as its accuracy. We observe that when the data similarity is low, there will be a more considerable degree of updating, as the results shown in Table 3. It demonstrates that the data renewal model can effectively update the existing model with the data stream in different updating frequencies.

No. Data Volume Similarity Loss Rate AUC Accuracy
4 100000 0.33 0.874255365 0.68 -
5 100000 0.64 0.571217426 0.68 -
6 100000 0.23 0.427696411 0.91 33.82%
TABLE III: Experimental Results for the Transfer-learning-based Fault Prediction Algorithm

Iv Conclusions and future work

In this paper, we propose a general data renewal model to assess the industrial data stream and set the thresholds to update the prediction model adaptively. It can by applied to some prediction algorithms to improve their performance. The effectiveness of the model and its significance to improving the accuracy of prediction are demonstrated by experiments on real-world industrial datasets and prediction algorithms. However, since it’s a general model, the tuned parameters couldn’t be adaptive in any different kind of problems. So self-adaptive tuning may be taken into consideration in the future work. And we only apply the model to two industrial prediction algorithms in our work, the model can be examined by more algorithms.


  • [1] K. McKee, G. Forbes, I. Mazhar, R. Entwistle, and I. Howard, “A review of machinery diagnostics and prognostics implemented on a centrifugal pump,” in Engineering Asset Management 2011.   Springer, 2014, pp. 593–614.
  • [2] M. Bevilacqua and M. Braglia, “The analytic hierarchy process applied to maintenance strategy selection,” Reliability Engineering & System Safety, vol. 70, no. 1, pp. 71–83, 2000.
  • [3] M. Pecht, “Prognostics and health management of electronics,” Encyclopedia of Structural Health Monitoring, 2009.
  • [4] J. Sikorska, M. Hodkiewicz, and L. Ma, “Prognostic modelling options for remaining useful life estimation by industry,” Mechanical Systems and Signal Processing, vol. 25, no. 5, pp. 1803–1836, 2011.
  • [5]

    Z. Huang, Z. Xu, X. Ke, W. Wang, and Y. Sun, “Remaining useful life prediction for an adaptive skew-wiener process model,”

    Mechanical Systems and Signal Processing, vol. 87, pp. 294–306, 2017.
  • [6] X. Wang, N. Balakrishnan, and B. Guo, “Residual life estimation based on a generalized wiener process with skew-normal random effects,” Communications in Statistics-Simulation and Computation, vol. 45, no. 6, pp. 2158–2181, 2016.
  • [7] X.-S. Si, W. Wang, C.-H. Hu, D.-H. Zhou, and M. G. Pecht, “Remaining useful life estimation based on a nonlinear diffusion degradation process,” IEEE Transactions on Reliability, vol. 61, no. 1, pp. 50–67, 2012.
  • [8] G. Zhuang, Y. Li, and Z. Li, “Fault detection for a class of uncertain nonlinear markovian jump stochastic systems with mode-dependent time delays and sensor saturation,” International Journal of Systems Science, vol. 47, no. 7, pp. 1514–1532, 2016.
  • [9] J. Wan, S. Tang, D. Li, S. Wang, C. Liu, H. Abbas, and A. V. Vasilakos, “A manufacturing big data solution for active preventive maintenance,” IEEE Transactions on Industrial Informatics, vol. 13, no. 4, pp. 2039–2047, 2017.
  • [10] R. M. Hasani, G. Wang, and R. Grosu, “An automated auto-encoder correlation-based health-monitoring and prognostic method for machine bearings,” arXiv preprint arXiv:1703.06272, 2017.
  • [11] C. Zhang, W. Gao, S. Guo, Y. Li, and T. Yang, “Opportunistic maintenance for wind turbines considering imperfect, reliability-based maintenance,” Renewable energy, vol. 103, pp. 606–612, 2017.
  • [12] T. D. Batzel and D. C. Swanson, “Prognostic health management of aircraft power generators,” IEEE Transactions on Aerospace and Electronic Systems, vol. 45, no. 2, 2009.
  • [13] J. P. Kharoufeh, S. M. Cox, and M. E. Oxley, “Reliability of manufacturing equipment in complex environments,” Annals of Operations Research, vol. 209, no. 1, pp. 231–254, 2013.
  • [14] M. Long, C. Yue, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” 2015.
  • [15] Z. Ran, H. Tao, L. Wu, and G. Yong, “Transfer learning with neural networks for bearing fault diagnosis in changing working conditions,” IEEE Access, vol. PP, no. 99, pp. 1–1, 2017.
  • [16] Z. Chen and B. Liu, “Lifelong machine learning,” Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 10, no. 3, pp. 1–145, 2016.
  • [17] B. Liu, “Lifelong machine learning: a paradigm for continuous learning,” Frontiers of Computer Science, vol. 11, no. 3, pp. 359–361, 2017.
  • [18] D. L. Silver, Q. Yang, and L. Li, “Lifelong machine learning systems: Beyond learning algorithms.” in AAAI Spring Symposium: Lifelong Machine Learning, vol. 13, 2013, p. 05.
  • [19]

    M.-F. Balcan, A. Blum, and S. Vempala, “Efficient representations for lifelong learning and autoencoding,” in

    Conference on Learning Theory, 2015, pp. 191–210.
  • [20] N. R. Verstaevel, J. Boes, J. Nigon, D. d’Amico, and M.-P. Gleizes, “Lifelong machine learning with adaptive multi-agent systems,” 2017.
  • [21] D. L. Silver and R. E. Mercer, “The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness,” in Learning to learn.   Springer, 1996, pp. 213–233.
  • [22] S. P. Singh, “Transfer of learning by composing solutions of elemental sequential tasks,” Machine Learning, vol. 8, no. 3-4, pp. 323–339, 1992.
  • [23] D. L. Silver and R. E. Mercer, “The task rehearsal method of life-long learning: Overcoming impoverished data,” in Conference of the Canadian Society for Computational Studies of Intelligence.   Springer, 2002, pp. 90–101.
  • [24]

    Q. Liu, Z. Feng, L. Min, and W. Shen, “A fault prediction method based on modified genetic algorithm using bp neural network algorithm,” in

    IEEE International Conference on Systems, 2017.
  • [25] H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–1298, 2016.
  • [26]

    X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short-term memory neural network for traffic speed prediction using remote microwave sensor data,”

    Transportation Research Part C, vol. 54, pp. 187–197, 2015.
  • [27] L. Rosasco, E. D. Vito, A. Caponnetto, M. Piana, and A. Verri, “Are loss functions all the same?” Neural Computation, vol. 16, no. 5, pp. 1063–1076, 2004.
  • [28]

    Y. Shen, “Loss functions for binary classification and class probability estimation,” Ph.D. dissertation, University of Pennsylvania, 2005.

  • [29]

    H. Masnadi-Shirazi and N. Vasconcelos, “On the design of loss functions for classification: theory, robustness to outliers, and savageboost,” in

    Advances in neural information processing systems, 2009, pp. 1049–1056.
  • [30] T. Fawcett, “An introduction to roc analysis,” Pattern recognition letters, vol. 27, no. 8, pp. 861–874, 2006.