Predicting the wear out of components is pivotal in various domains such as the automotive, health and aerospace industries [2, 22, 23]. Robust and accurate predictions have a great potential for preventing unanticipated equipment failures and increasing productivity. With the recent widespread adoption of the Internet-of-Things (IoT), many sensor signals are now readily accessible for predicting the wear out of components.
At Bosch, we often encounter datasets with several hundreds of sensor measurements and other calculated values from vehicles . These are used for predicting the health-state of a component. For example, in automotive applications, we can predict the wear out of an engine-coolant system using signals from different sensors such as torque, pressure, temperature and speed. Traditional approaches select a small and predictive subset of these measurements (or attributes) by evaluating their relevance to the target (health-state) prediction [6, 16]
. Several off-the-shelf algorithms, viz., Decision Trees5], Gaussian processes 
and Support Vector Machines (SVM’s), were used on our fuel system data from different vehicles. Overall, we observed that all aforementioned algorithms selected a similar subset of attributes as the most relevant ones.
A problem arises in the case when one or more of these selected (relevant) attributes are invalid due to malfunctioning sensors. During malfunctioning, the sensors measurements are stuck at a constant value, e.g., zero, such cases are denoted as stuck-at-zero condition of the sensor . If such a malfunctioning sensor represents a relevant attribute for the target prediction, it leads to unreliable predictions. It is therefore essential to train a model that does not rely on a fixed subset of attributes. Additionally, sensors are electrical devices that are prone to be affected by noise. For example, the magnetic field generated by the ignition system of a vehicle can affect other sensors . Noisy sensors generate a few distorted measurements amidst valid values. Using these distorted sensors readings can lead to erroneous predictions and raise false alarms by the wear out prediction model. Industries spend millions of dollars to remove the noise from these signals . However, manual data cleansing process is laborious, time consuming and prone to errors .
The first challenge is to generate a prediction model that is robust to missing attributes, i.e., stuck-at-zero condition. The second challenge is to ensure that the prediction model is robust against noisy attributes. Solving these two problems are one of the foremost challenges that Bosch faces when predicting the health-state of the vehicle’s components. For the aforementioned challenges, we propose:
A technique for building prediction models that are robust to faulty or missing attributes.
A strategy for handling noise in the input attributes, that is built upon the data augmentation technique.
To enhance the robustness of the predictions in spite of faulty attributes, we propose using prediction models that do not rely on a small set of signals. Our approach is founded upon the Dropout technique, a well-known regularization technique used in the training of Artificial Neural Networks (ANN). Dropout randomly removes a few attributes during training. This forces the ANN to use more attributes during the training phase instead of relying on a single small subset of attributes. Moreover, random dropping of the ANN units during training of the network simulates the situation of sensor failure in the real world. To address the second challenge of noisy inputs, ANNs were trained with a certain magnitude of synthetically generated noise in the training data. By replacing the values of the attributes in the training data with random values from a Gaussian distribution, we indirectly simulate the noisy behavior of the sensors. This allows the ANN to learn the contributions of each feature for the output prediction amidst distorted inputs. Bosch provided a labeled dataset related to the health-state of the fuel system. Using this automotive data, we tested the robustness of our framework on a real world scenario.
2 Related Work
As elaborated in the previous section, first we aim to perform predictions based on a large subset of attributes to avoid incorrect predictions during sensor failure. Secondly, we aim to augment the training data to enhance the ability of the network to be able to identify relevant patterns amidst noisy input data.
Preprocessing techniques for handling noisy and missing input attributes have been of great interest in the data mining community [20, 32, 29, 14]. The aforementioned methods have their own strengths and weaknesses. However, in real world applications, we do not know the type of noise that can interfere with the sensor measurements. As mentioned in Section 1, valid sensor measurements can be stuck-at-zero 
in case of malfunction. Applying imputation techniques to extrapolate these values as in the case of a missing value problem is not desirable. Hence, it is not a pragmatic solution apply these data preprocessing techniques in real world applications.
Feature Selection algorithms predominantly focus on selecting a set of attributes relevant for the prediction task [16, 6, 23]. The recent work of Relevance and Redundancy ranking  is a feature ranking framework that has experimentally shown to be robust amidst noisy target labels. However, we focus on building prediction model using a large number of attributes to enhance robustness of predictions. Secondly, our application scenario involves noisy input attributes and not noisy target labels.
Multi-view learning algorithms perform predictions based on multiple attribute subsets. In the case of a failed attribute in one subset, the predictions can be supported by attributes from other subsets. However, existing multi-view approaches [24, 17] do not discuss the effect of faulty input attributes. Nor are they as resistant to multiple sensor failures that can occur over all of the attribute subsets.
Pruning of Decision trees was introduced to avoid over fitting to noisy training data 
. As classifiers learned from noisy data have less accuracy, pruning may have very limited effect in enhancing the system’s performance, especially in the situation that the noise level is relatively high.
Dropout technique in ANNs is similar to the idea of pruning in decision trees. The regularization technique of dropout aims to eliminate random units of the neural network to avoid over fitting. However, in this work we use this regularization technique because performing dropout in the inputs is analogous to the real world scenario of sensor failure.
The technique of adding noise to the training data is reported to enhance the generalization of ANNs by forcing more hidden units to be used . Hence, to address the second problem of noisy input attributes, we use artificially generated noise in the training data. By training the prediction model with artificially injected noise in the training data, we aim to enhance the prediction model’s ability to identify relevant patterns amidst noise in the real world scenario.
Hence, in contrast to the preprocessing techniques, our work aims to challenge the prediction model during training phase by forcing it to learn relevant patterns amidst noise.
3 Problem Definition
As explained in Section 1 we address the first problem building prediction models with inputs obtained from malfunctioned sensors. Hence, we begin with the formal definition of a faulty sensor.
Malfunction of sensors
Assume a -dimensional attribute space , where a subset of sensors are defective. This means that each attribute is stuck at zero and continuously generates null values.
The second problem being noise in the sensor data, we formally define the behavior of a noisy sensor.
Assume a subset of sensors , that are subjected to intermittent deviations or disturbances. This means that the random instances of attribute fluctuates to absurd values and deviates from the actual measurements.
We denote the accuracy of a prediction model trained using the attribute space as . We focus on enhancing the robustness of the predictions such that, in the event of a sensor failure, we aim to obtain an accuracy greater than or equal to that of a prediction model with all valid measurements.
Similarly, in the case of a noisy sensor,
4 Artificial Neural Networks
To obtain a deeper understanding about the dropout technique, it is necessary to revisit the basics of ANNs. ANNs are machine learning algorithms inspired by the biological nervous system and are capable of identifying complex non-linear relationships. Information is processed using a set of highly interconnected nodes, also referred to as neurons. A network of weighted nodes are stacked into multiple layers. At each node, an activation function combines the weights into a single value. This can effectively limit the signal propagation to the next layers. These weights, therefore, enforce or inhibit the activation of the network’s nodes. This process is comparable to feature selection. Additionally, ANN’s require minimal attribute engineering for classification[4, 31] and regression  problems. This enables ANNs to autonomously identify distinct patterns in the input attributes amidst noise. Hence, with embedded feature selection and the ability to identify distinct patterns with minimal preprocessing, we chose ANNs as an ideal candidate for our experiments. The ANN architecture is typically split into three types of layers: one input layer; one or more hidden layers; and one output layer (c.f. Figure 1). The input layer consumes the data. This layer connects to the first hidden layer, which in turn connects either to the next hidden layer (and so on) or to the output layer. The output layer returns the ANN’s predictions.
Dropout is proven to be an effective regularization technique for ANNs . Technically, it prevents the units from co-adapting too much and consequently avoids over-fitting while training the network. Dropping or removing an unit implies that both the input and output connections of the neuron are disconnected. In Figure 2, we provide an illustration of networks with fully connected and dropped out units. The principal idea of dropout involves removing random units from a layer (both hidden and visible) by setting its activation function to zero. That is, when applied on the input layer, the activations of selected neurons are nullified. Therefore, application of dropout on input layer is analogous to the sensor failure in real world scenario (c.f. Definition 1). By training the ANNs with dropout, we indirectly aim to make the network aware of these failures.
The abstract concept of Dropout  sounds very similar to the ensemble technique used by Random forests. Random forest aggregate prediction results from the multiple views of the data based on a number of decision trees that use randomly selected subsets of attributes. Similarly, Dropout networks essentially train different networks on multiple subset of the attributes. However, on a closer look into the details, there are considerable differences between both (c.f. Table 1).
|Random Forest||Dropout Network|
|All data samples are used.||A single sample is used to train a model.|
|Each tree has independent parameters.||
|Arithmetic mean to combine the results.||
Equally weighted geometric mean to combine results.
Dropping random neurons in each iteration enables every hidden unit to learn to identify relevant patterns from a randomly chosen sample of neurons of the preceding layer. This makes each hidden layer robust and drives them to create useful features on their own without requiring that the next layers correct their mistakes . Recent study also shows that Dropout networks are comparatively more accurate than Random forest for multi-class classification problems .
4.2 Data Augmentation
As explained in Section 1, in automotive applications, exposing the sensors to harsh-environmental conditions over a prolonged period of time can cause the sensor values to be distorted due to electrical or magnetic interference . Hence, training the machine learning models to identify relevant patterns irrespective of noisy attributes is of paramount importance. To mimic the problem of noisy sensors (c.f. Definition 2) in real world applications, we performed data augmentation on our training data. Data augmentation is a concept introduced from the literature of image classification . It involves transforming the original data (e.g., rotation, zoom, rescaling and cropping) to avoid over-fitting . For example, to build text-to-speech models, the data is collected from unfiltered Web pages with errors. Rather than using the large unstructured data for learning useful patterns, a small corpus of structured data is extracted and augmented. It is then used to train the machine learning model. This technique has also proven to be effective on unfiltered data that contain errors .
We adopt the concept of data augmentation and tailor it to address our second challenge (c.f. Section 1
), i.e., noisy attributes. We replace random attributes in the dataset with noise. That is, we deliberately introduce noise to the original training data and then train our models using this transformed dataset. In practical terms, the values of a randomly selected subset of attributes in each instance is replaced with random values obtained from a Gaussian distribution with mean zero and standard deviation of one, i.e.,. Hence, by training the models with certain levels of noise, we enhance their robustness against sensor failures in the real world.
In Section 4.1 and 4.2 we justified the use of dropout and data augmentation to address the problems we are confronted with (c.f Section 3). The theoretical concept of dropout and data augmentation emulates the real life situation of sensor failure and noise respectively. However, its practical application raises two major questions,
What is the magnitude of dropout to be used?
What is the level of augmentation to be applied for the transformation of the training data?
For this, we train multiple models with different levels of input dropout and data augmentation. These models are tested upon test data and we observe the prediction accuracy on it as a quality measure. We explain the finer details based on the dataset we use.
In this work, we apply the proposed methodology to an automotive dataset. We are provided with a high-dimensional attribute space of 149 attributes and 4 million instances. The attributes are obtained from various sensor sources present in the vehicles. It also include signals that are calculated in the vehicle hardware using the sensor measurements. The goal is to predict the target classes that represent the health-state of an automotive fuel system. Therefore, we are provided with the target labels () of nominal values and the dataset333Code and data: https://figshare.com/s/d5bcd9b4269afa642e53 is denoted as .
Table 2 shows the distribution of the different classes in the dataset. As the data for each health state was obtained from different vehicles, each instance can be seen as a snapshot of the fuel system. In other words, the dataset is not a time-series and health-states are therefore not correlated in time. In such stationary datasets, FNN’s are a preferable choice in comparison to RNN’s.
|Class||Health state||Class distribution|
The dataset is split into two parts for training and testing purposes based on the chronology of the data collection. That is, training is performed using the data collected on a specific time of the year (e.g., January) and the testing is performed on a dataset collected from a different time (e.g., August). Both train and test datasets were standardized by subtracting the mean and dividing by the standard deviation. This is also referred as z-score or a standard score.
The training dataset is used to train 7 different networks, each with different magnitude of input dropout. For example, denotes an ANN model with a dropout of 20 nodes in the input layer. Similarly, we instantiate multiple networks () with varying dropout levels of attributes respectively.
Given an ANN architecture and a dropout level, the dropout can be applied between any two consecutive layers. Nevertheless, we aim study the influence of dropout between the input and the first hidden layer. This implicitly means that each model is trained to predict with a different number of faulty sensors. However, a constant dropout rate of was still used in the hidden layers for regularization purposes. To drop one neuron, is technically setting the activations of this neuron to zero. Hence, we transform the original dataset to mimic the dropout process in the input layer by setting its value to zero. The reason for setting attribute values to zero instead of using the dropout in the input layer of the ANNs is that it allows us to simulate an equivalent dropout in the test dataset as well. The corresponding test datasets are denoted as . Moreover, this experimental setting is comparable to the problem of failed sensor that is stuck-at-zero (c.f. Definition 1). For simplicity we refer to the original train and test dataset as and respectively. The goal of the experiment is to identify the level of dropout that has the maximal accuracy on the unseen test data.
In the case of augmentation, injecting noise in all instances of a single subset of attributes is not challenging for the network because the ANN will simply neglect these attributes during training by inhibiting the corresponding network nodes. Hence, for each instance of the attribute space , a random attribute subset of size (where, ) is selected and replaced with random values from a Gaussian Distribution (c.f. Algorithm 1). In our experiments, denote different variants of training data with respectively. For example, represents a dataset where 20 random attributes of the training data are replaced by random numbers from a Gaussian distribution for each instance. By applying the transformation, our goal is to imitate the real world scenario of noisy sensors and analyze the influence of different noise levels in the input attributes. The corresponding transformation is also applied to the test data and is denoted as .
In electrical applications, white noise is also a commonly observed anomaly in the sensor measurements. Hence, we also generate test datasets with white noise, i.e.,. For the generation of data with white noise, we follow the same sequence of steps explained in Algorithm 1. However, instead of replacing (c.f. Line 5 in Algorithm 1), we add valid measurements in an instance with random values from . As a rule of thumb, all experiments in the forthcoming section will use a FNN architecture with: an input layer of 149 neurons, three hidden layers of 128, 256 and 128 neurons, and an output layer of 7 neurons.
6 Experimental Results
As described in Section 5, we have 4 types of data: train data with dropout, test data with dropout, train data with noise and test data with noise. To test the influence of dropout and noisy attributes on the test data accuracy, we begin with individual analysis of each technique.
6.1 Input drop
In this section, we experiment using ANN networks trained with different levels of dropout. In the first experiment, we trained multiple networks with the datasets . Each of these models were then evaluated on all test datasets that were subjected to the same input drop process which are denoted as respectively. The results are illustrated in Figure 3. The network trained with the original data, i.e., , is accurate when tested on datasets with low or no dropout, i.e., and . After this point onwards, its accuracy declines steeply with an increasing number of dropped inputs in the test dataset, until it reaches an accuracy of 0.5 for . Interestingly, we observe that the models which were trained on datasets with a larger number of dropped inputs, are comparatively more robust to test data with a large number of dropped inputs.
Moreover, they also maintain a high accuracy on test datasets that have more dropped inputs than the one used for training. From the experimental analysis, we observe that the average of all test data accuracies using is higher in comparison to the other models. It is therefore much more robust than with no dropped units. Let us assume is used in a real world scenario to predict the health of the fuel system. In-spite of the failure of 100 sensors () that are used as input attributes for the prediction model, the predictions will still have an approximate accuracy of 0.85. Hence, the idea of dropout helps us to tackle the problem of failed sensors in the real world prediction systems (c.f. Section 3).
6.2 Input noise
The above dropout experiment does not solve our problem completely because, a noisy sensor will not be seen as missing data. Instead, it will give us a wrong measurement. For this reason, we did a second experiment where we test the input dropout models, i.e., , on scenarios where the data has faulty measurements. That is, we tested the dropout models on test data obtained from the input noise approach, viz., . The behavior of the models are visually represented in Figure 4. In comparison to the previous experiment (c.f. Figure 3), all the models have worser performances because the decline in accuracy happens much earlier in Figure 4. This is not surprising because the training was performed with dropout technique without noise and the testing was performed with noisy data. Hence, the network is unaware of the noise in the test data. Nevertheless, by comparing the behavior of with and we observe that training models with input drop is helping them to be more robust to noisy measurements and was having the best performance in terms of accuracy.
To make the network aware of noisy attributes, we perform a third experiment. In the third experiment, we trained our models with the augmented dataset variants that include different levels of noise in the input data, i.e., . The corresponding networks trained using these datasets are denoted as . These models were validated on the test data that underwent a similar transformation (c.f. Algorithm 1). The results are plotted in Figure 5.
In Figure 5 we observe that and have very similar behaviors. For example, is able to predict with an accuracy of 0.88 even when 40 sensors measurements are noisy. This represents around of the entire set of inputs. On the other hand, on test datasets with higher levels of noise, like , and are unable to predict with high accuracy.
Moreover, when comparing Figures 4 and 5, the results indicate that the best way to deal with noisy sensors is by training the ANN with reasonable levels of noise. This makes the models more robust to defective sensor data in real world.
Practically, our idea of injecting noise involves replacing the instances of the attribute space with random values from a Gaussian distribution. This also includes zeros. For this reason, the noise models trained on data also performs with a high accuracy on test datasets with input dropouts (c.f. Figure 6). Also here, we observe that and have the best quality in comparison to the model trained with no random noise ().
Similarly, these models were robust on test data with white noise. For example, in Figure 7, for test data with extreme levels of white noise, i.e., , the accuracy of the models trained with our random noise (e.g., ) is better in comparison to model trained using the original data ().
Overall, our observation is that our proposed idea of injecting random noise in the instances of random features (c.f. Algorithm 1) enhance the robustness of the prediction model with malfunctioning and noisy sensors as inputs.
7 Conclusions and Future works
Bosch faces the challenge of generating prediction models with noisy and defective input attributes for applications such as predictive diagnostics. The models initially developed by Bosch using different classification algorithms produced very accurate results. However, a closer analysis showed that all these different prediction models relied on the same set of sensor data. Performing predictions with a single set of relevant sensor were not robust in the presence of faulty sensor data. Hence, we proposed and tested two approaches to tackle this problem. One approach (Input drop) uses the Dropout technique from ANNs in the input layer to make the model more robust against defective sensors. The second approach (Input noise) introduces noise into the training datasets, which can be seen as a way of simulating the noisy sensors.
Based on our observations, the best level of dropout is between 60 to 80 attributes (i.e., between and of the attributes). As for the right level of augmentation, results indicate that model (i.e., around of attributes) is ideal in terms of noisy and missing sensor data.
While the major advantages of ANN are the effective and efficient modeling of complex non-linear systems, one downside is that, training a model usually incurs high computational and storage costs. On the other hand, once an ANN is trained, it requires little effort to process the data. This way, such a system could be implemented in the vehicles in a simple way. As future work, we intend to study if this approach can be generalized to other application domains, where sensor data are partially missing or faulty.
This research has obtained funding from the Electronic Components and Systems for European Leadership (ECSEL) Joint Undertaking, the framework programme for research and innovation Horizon 2020 (2014-2020) under grant agreement number 662189-MANTIS-2014-1. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.
-  J. B. Ali, B. Chebel-Morello, L. Saidi, S. Malinowski, and F. Fnaiech. Accurate bearing remaining useful life prediction based on weibull distribution and artificial neural network. Mechanical Systems and Signal Processing, 56–57:150 – 172, 2015.
-  D. Allred, J. M. Harvey, M. Berardo, and G. M. Clark. Prognostic and predictive factors in breast cancer by immunohistochemical analysis. Modern pathology: an official journal of the United States and Canadian Academy of Pathology, Inc, 11(2):155–168, 1998.
-  R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2911–2918. IEEE, 2012.
-  W. G. Baxt. Use of an artificial neural network for data analysis in clinical decision-making: The diagnosis of acute coronary occlusion. Neural Computation, 2(4):480–489, 1990.
-  L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
-  G. Chandrashekar and F. Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28, 2014.
K. Cho, B. van Merrienboer, Ç. Gülçehre, D. Bahdanau,
F. Bougares, H. Schwenk, and Y. Bengio.
Learning phrase representations using RNN encoder-decoder for
statistical machine translation.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1724–1734, 2014.
-  M. Dziubiński, A. Drozd, M. Adamiec, and E. Siemionek. Electromagnetic interference in electrical systems of motor vehicles. In IOP Conference Series: Materials Science and Engineering, volume 148, page 012036. IOP Publishing, 2016.
-  K. Elleithy and T. Sobh. Innovations and advances in computer, information, systems sciences, and engineering, volume 152. Springer Science & Business Media, 2012.
-  P. Haeusser. How computers learn to understand our world, 2018.
-  N. Jaques and J. Nutini. A comparison of random forests and dropout nets for sign language recognition with the kinect.
-  M. Lázaro Gredilla. Sparse gaussian processes for large-scale machine learning. 2010.
-  H. R. Maier and G. C. Dandy. Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environmental Modelling & Software, 15(1):101 – 124, 2000.
-  J. I. Maletic and A. Marcus. Data cleansing: Beyond integrity analysis. In Iq, pages 200–209. Citeseer, 2000.
-  A. H. Moghaddam, M. H. Moghaddam, and M. Esfandyari. Stock market index prediction using artificial neural network. Journal of Economics, Finance and Administrative Science, 21(41):89 – 93, 2016.
-  L. C. Molina, L. Belanche, and À. Nebot. Feature selection algorithms: A survey and experimental evaluation. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), 9-12 December 2002, Maebashi City, Japan, pages 306–313, 2002.
-  N. C. Oza, K. Tumer, and P. Norwig. Dimensionality reduction through classifier ensembles. 1999.
L. Perez and J. Wang.
The Effectiveness of Data Augmentation in Image Classification using Deep Learning.ArXiv e-prints, Dec. 2017.
-  J. R. Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.
-  T. C. Redman and A. Blanton. Data quality for the information age. Artech House, Inc., 1997.
-  A. N. Refenes, A. Zapranis, and G. Francis. Stock performance modeling using neural networks: A comparative study with regression models. Neural Networks, 7(2):375–388, 1994.
-  P. Reuss, R. Stram, K. Althoff, W. Henkel, and F. Henning. Knowledge engineering for decision support on diagnosis and maintenance in the aircraft domain. In Synergies Between Knowledge Engineering and Software Engineering, pages 173–196. 2018.
-  A. K. Shekar, T. Bocklisch, P. I. Sánchez, C. N. Straehle, and E. Müller. Including multi-feature interactions and redundancy for feature ranking in mixed datasets. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Proceedings, Part I, pages 239–255, 2017.
-  A. K. Shekar, P. I. Sánchez, and E. Müller. Diverse selection of feature subsets for ensemble regression. In Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings, pages 259–273, 2017.
-  J. Sietsma and R. J. Dow. Creating artificial neural networks that generalize. Neural networks, 4(1):67–79, 1991.
-  A. Smola and V. Vapnik. Support vector regression machines. Advances in neural information processing systems, 9:155–161, 1997.
-  N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014.
J. V. Tu.
Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes.Journal of Clinical Epidemiology, 49(11):1225 – 1231, 1996.
-  R. Y. Wang, V. C. Storey, and C. P. Firth. A framework for analysis of data quality research. IEEE transactions on knowledge and data engineering, 7(4):623–640, 1995.
-  D. Warde-Farley, I. J. Goodfellow, A. Courville, and Y. Bengio. An empirical analysis of dropout in piecewise linear networks. arXiv preprint arXiv:1312.6197, 2013.
-  B. Widrow, D. E. Rumelhart, and M. A. Lehr. Neural networks: Applications in industry, business and science. Commun. ACM, 37(3):93–105, 1994.
-  X. Zhu and X. Wu. Class noise vs. attribute noise: A quantitative study. Artificial intelligence review, 22(3):177–210, 2004.