WaveletFCNN: A Deep Time Series Classification Model for Wind Turbine Blade Icing Detection

Wind power, as an alternative to burning fossil fuels, is plentiful and renewable. Data-driven approaches are increasingly popular for inspecting the wind turbine failures. In this paper, we propose a novel classification-based anomaly detection system for icing detection of the wind turbine blades. We effectively combine the deep neural networks and wavelet transformation to identify such failures sequentially across the time. In the training phase, we present a wavelet based fully convolutional neural network (FCNN), namely WaveletFCNN, for the time series classification. We improve the original (FCNN) by augmenting features with the wavelet coefficients. WaveletFCNN outperforms the state-of-the-art FCNN for the univariate time series classification on the UCR time series archive benchmarks. In the detecting phase, we combine the sliding window and majority vote algorithms to provide the timely monitoring of the anomalies. The system has been successfully implemented on a real-world dataset from Goldwind Inc, where the classifier is trained on a multivariate time series dataset and the monitoring algorithm is implemented to capture the abnormal condition on signals from a wind farm.


page 1

page 4


Time Series Anomaly Detection Using Convolutional Neural Networks and Transfer Learning

Time series anomaly detection plays a critical role in automated monitor...

Anomaly Detection of Wind Turbine Time Series using Variational Recurrent Autoencoders

Ice accumulation in the blades of wind turbines can cause them to descri...

Time-Series Anomaly Detection with Implicit Neural Representation

Detecting anomalies in multivariate time-series data is essential in man...

Applications of shapelet transform to time series classification of earthquake, wind and wave data

Autonomous detection of desired events from large databases using time s...

Wavelet variance scale-dependence as a dynamics discriminating tool in high-frequency urban wind speed time series

High frequency wind time series measured at different heights from the g...

Correlated power time series of individual wind turbines: A data driven model approach

Wind farms can be regarded as complex systems that are, on the one hand,...

A Machine-Learning Phase Classification Scheme for Anomaly Detection in Signals with Periodic Characteristics

In this paper we propose a novel machine-learning method for anomaly det...

Code Repositories


Source code and dataset for the paper ""

view repo

1. Introduction

Time series data is becoming ubiquitous due to the rapid development of the Internet of Things (IoT). Diversified sensors collect abundant data for further analysis in various domains, such as health monitoring (Hossain and Muhammad, 2016), smart manufacturing (Wang et al., 2018), and energy management (Shahriar and Rahman, 2015).

Wind energy, as an alternative to burning fossil fuels, is clean, renewable, and widely distributed. Therefore, wind energy is increasingly popular for generating electric power (Fthenakis and Kim, 2009)

. However, many wind farms are located in areas with a high probability of ice occurrence. Figure

1 illustrates a wind turbine facing icing condition. Blade icing may cause serious problems, such as measurement errors, power losses, overproduction, mechanical failures, and electrical failures (Parent and Ilinca, 2011). As a consequence, icing detection becomes a priority in order to avoid such problems.

Traditionally, applied physics and mechanical engineering researches resolve this problem by designing and installing new physical detectors. Various techniques, such as damping of ultrasonic waves (Luukkala, 1995), measurement of the resonance frequency (Carlsson, 2010), thermal infrared radiometry (Muñoz et al., 2016), optical measurement techniques (Pegau et al., 1992), ultrasonic guides waves (Muñoz et al., 2018)

and etc., have been applied for icing detection. However, such techniques are limited by high costs and energy demands. Besides, the ice sensors may provide inaccurate estimates of icing risks for wind turbines due to the internal unreliability

(Parent and Ilinca, 2011). Worse still, the installation of such detectors may require some mechanical change of the wind turbine and huge manpower. As a result, engineers in practice usually monitor the power curve of the wind turbine to estimate the blades’ icing conditions in order determine whether to trigger the deicing procedure. So here is one fundamental question we want to explore in this paper: Is that possible to analyze only the signals from the standard pre-installed sensors (eg., from SCADA) in the wind turbine in an attempt to design a deployable system for blade icing detection?

To this end, we propose a data-driven approach to inspect blade icing precisely on real time signals so that the deicing procedure can be started automatically with a very short response time. We formalize this anomaly detection task as two phases: (i) a training phase in order to obtain a time series classifier, where the input sequences are generated by the currently installed general-purpose sensors that detect the weather and turbine conditions, such as wind speed, internal temperature, yaw positions, pitch angles, power output, etc.; (ii) a detecting phase, in which we propose an algorithm based on sliding window and majority vote to apply the trained classifier attempting to provide accurate and robust detection of the blade icing situations.

Time series classification is a classical time series analysis task. Famous approaches include dynamic time warping (DWT) (Berndt and Clifford, 1994), Bag-of-Words (BoW) (Lin et al., 2007) and Bag-of-features (TSBF) (Baydogan et al., 2013)

, where a group of features are extracted to feed classifiers such as nearest neighbor (NN) and support vector machine (SVM). Additionally, spectral features also play an important role in feature engineering of time series classification. The spectrum analysis first obtains the frequency based features by converting the time based signal to the frequency domain using mathematical tools like the Fourier transform or the wavelet transform, so that the information about periodicity and volatility of the signals appears. Such spectral features have been widely used in time series classification applications, like speech recognition

(Bou-Ghazale and Hansen, 2000).

Recently, deep neural networks have achieved great success in a variety of areas, such as computer vision and text mining. Deep neural networks have also been applied to time series classification and show great success. For example,

(Wang et al., 2017)

proposes a fully connected neural network (FCNN), where deep multilayer perceptrons (MLP), fully convolutional neural networks, and the residual networks (ResNet) are adapted on raw sequences for univariate time series classification.

(Karim et al., 2018)

provides an augmentation of FCNN by including sub-modules of long short term memory (LSTM)

(Sak et al., 2014)recurrent neural network units. One attractive statement about deep learning is that the knowledge about the task can be automatically captured by tuning a large number of parameters so that heavy crafting on data preprocessing and feature engineering will be avoided. While this is true in some scenarios, we argue that sophisticated selected features combined with carefully designed neural network architectures will provide further improvement.

In this paper, in order to design a more accurate classifier, we improve the fully convolutional neural networks by augmenting input features with orthonormal discrete wavelet translation coefficients, which represent the variance of the sequence across multiple scopes. The decomposition of the original signal reveals the information embedded among multi-resolutions. Unlike the discrete Fourier transform which keeps only the spectral information, the discrete wavelet transform preserves information in both the time and frequency domains. Wavelet multi-resolution analysis can provide helpful information for fully convolutional neural networks since the original convolutional layer can learn information only within a fixed region constrained by the filter size. On the other hand, even with the enhanced classifier, it is still challenging to provide robust prediction in real time. In an attempt to address this issue, we design an anomaly monitoring algorithm combining sliding window and majority vote so that the inaccuracy and unstable limitations, which are inevitable outcomes of the classifier, will be minimized.

Our contribution. Specific contributions of this work are to:

  • Combine convolutional neural networks with wavelet multi-resolution analysis as an improved model for sequence classification .

  • Achieve enhanced accuracy on UCR (Chen et al., 2015) dataset for univariate time series classification.

  • Design a novel anomaly monitoring algorithm to provide accurate and robust real time ice detection.

  • Obtain promising results in detecting frozen blades in wind turbines by adapting the anomaly monitoring algorithm in real-life signals from a wind farm.

2. Related Work

2.1. Wind Turbine Anomaly Detection

In data mining community, anomaly detection is an important problem that has been researched within diverse application domains (Chandola et al., 2009). Based on the availability of labeled dataset, anomaly detection techniques fall into three main categories: supervised mode where the labeled instances for both normal and anomaly classes are accessible, semisupervised mode where the training set only includes normal samples, and unsupervised mode that does not require any training data.

Wind farms usually locate in remote mountainous or rough sea regions, which makes monitoring and maintenance more challenging. A fault detection system will help to avoid premature breakdown, to reduce maintenance cost, and to support for further development of a wind turbine (Ciang et al., 2008; Lu et al., 2009). Traditionally, anomaly detection researches from applied physics and mechanical engineering communities are focusing on design new physical detectors. For example, various detectors with advanced techniques have been proposed (Luukkala, 1995; Carlsson, 2010; Muñoz et al., 2016; Pegau et al., 1992; Muñoz et al., 2018). Another general approach for wind turbine icing detection is to design physical models by inferring ice accretion from power output signals, however, such approaches are usually validated by some software simulations without testing in the real world scenarios (Saleh et al., 2012; Corradini et al., 2016).

On the other hand, data driven approaches for wind turbine state monitoring and failure detection have gained more attention due to the easy of deployment comparing to installing complicated detectors. Most of the research efforts in detecting various failures in wind turbine are based on the supervised anomaly detection techniques. For example, (Regan et al., 2016)

utilizes logistic regression and support vector machine (SVM) to conduct acoustics-based damage detection of wind turbines’ enclosed cavities;

(Kusiak and Verma, 2012) and (Kuo, 1995) apply traditional neural networks to identify bearing faults and the existence of an unbalanced blade or a loose blade in wind turbines; (Malik and Mishra, 2015) implements a 3-layer probabilistic neural network (PNN) to diagnose wind turbine’s imbalance fault identification based on generator signals; (Zhang et al., 2012) includes both time domain and frequency domain information and tries variant classifiers to detect changes in the gearbox vibration excitement. Very recently, (Chen et al., 2018)

also applies deep neural networks for blades’ icing detection of wind turbines, where they learn a deep feature representation by clustering, and then use k-nearest neighbor for predicting. However, such nonparametric learning algorithms require a lot more data to train and suffer seriously from overfitting. Worse still, the inference is more compute-intensive, which makes the deployment more difficult on real time systems.

2.2. Spectral Features in Deep Neural Networks

Spectral analysis transforms signals into the frequency domain to reveal information invisible in the time domain. Among these spectral approaches, wavelet transform has been widely applied in data mining researches (Li et al., 2002). Wavelet transform keeps both the time and frequency information to construct the multi-resolution analysis for signals. The signal is adaptively filtered by short windows at high frequencies and long windows at low frequencies, so that the wavelet spectrogram is able to capture the interactions between time and frequency information, which show strength in signal processing and data mining.

Although one romanticized advertisement for deep neural networks is that heavy crafted feature engineering will become unessential since the models are advanced enough to work this out by themselves, there are plenty of works illustrating that feature engineering still plays an important role for obtaining promising results. In theory, recurrent neural network (RNN) (Mikolov et al., 2010) can use their internal memory to learn arbitrary connection within the sequence. However, in practice, elaborately designed neural network architectures combined with spectral features show advantage in plenty of time series applications. For instance, (Zhang et al., 2017) introduces a state frequency memory (SFM) recurrent network to capture the multi-frequency patterns from previous trading data to make long and short term stock price predictions over time; (Han et al., 2018) proposes a multi-frequency decomposition (MFD) method based on the real fast Fourier transform (RFFT), which can be added to neural networks as a layer, in order to enhance the performance of the fully convolutional neural networks; (Zhao et al., 2018) applies wavelet transform to capture the time-frequency features and combines these features with other neural network architectures, such as CNN, LSTM, and attention mechanism to improve the prediction accuracy in time series forecasting. Besides the wide usage in time series application, neural networks enhanced by spectral features also demonstrate good performance in computer vision applications (Chen et al., 2016; Jin et al., 2017; Fujieda et al., 2018), which we will not enumerate exhaustively due to the space limit.

3. WaveletFCNN Architecture

Figure 2. The architecture of WaveletFCNN.

In this section, we will first review some important concepts and properties of the discrete wavelet transform, then introduce the design of the WaveletFCNN architecture that can take advantages of good properties from the discrete wavelet transform.

3.1. Discrete Wavelet Transform

The discrete wavelet transform decomposes a discrete time signal into a discrete wavelet representation (Chun-Lin, 2010). Formally, given

that represents a length N signal, and the basis functions of the form and , then the coefficients for each translation (indexed by ) in each scale level (indexed by or ) are projections of the signal onto each of the basis functions:


where is called approximation coefficient, and is called detail coefficient.

The detail coefficients at different levels reveal variances of the signal on different scales, while the approximation coefficient yields the smoothed average on that scale. One important property of the discrete wavelet transform is that detail coefficients at each level are orthogonal, that says for any pair of detail coefficients not in the same level, the inner product is :


As a result, we can interpret the detail coefficients as an additive decomposition of the signal called multi-resolution analysis. There are abundant reasons to believe that this wavelet spectrum that represents the variance decomposition can provide good information for time series classifiers, which is difficult to learn from an end-to-end neural network.

3.2. Enhanced Fully Convolutional Neural Networks

In an attempt to leverage the good properties of the discrete wavelet coefficients, we convert the original fully convolutional neural networks as it illustrates in Figure 2. In a nutshell, we first compute the discrete wavelet coefficients of the input signal to a specific level, which can be viewed as a hyper-parameter of WaveletFCNN, according to the famous pyramid algorithm (Vishwanath, 1994); then we put the original signal and the detail coefficients at each level into separated sub convolutional neural networks in order to capture the knowledge embedded in different scales from the wavelet spectrum; lastly, the global pooling outputs from each sub network are concatenated for generating the final classification outcomes. it is worth noting that we does not put the approximation coefficients into the neural network for two reasons: (i) the approximation coefficients represent some smoothed averages of the input signal, such knowledge should be easily learned by the convolutional layers when processing the original signal; (ii) unlike the detail coefficients, approximation coefficients are not orthogonal to each other, the redundancy of the input will enlarge the parameter space of the neural network, which sets obstacles for both model training and inference.

Concretely, the detailed description about each layer in WaveletFCNN is enumerated below:

  • Input: The input of the WaveletFCNN is the time series sample with size , where is the length and is the dimension of the signal. For univariate sequences, D is 1. The detail coefficients are then computed from the input signal by the pyramid algorithm until reaching the target level specified as a hyper-parameter. The size of each wavelet coefficients layer is , where .

  • Conv1D

    : The convolutional layer is the essential building block of the neural network, which will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume.

  • BN and ReLu

    : Batch normalization layer

    (Ioffe and Szegedy, 2015)

    , as an alternative to dropout, will reduce the amount by what the hidden unit values shift around and act as a regularizer to prevent overfitting. Relu layer

    (Nair and Hinton, 2010)

    will apply an element-wise activation function thresholding activations at zero.

  • Global Average Pooling: At the end of each sub network, a global average pooling layer (Zhou et al., 2016) is applied to reduce the temporal dimensions by computing the average within each convolutional channel. Comparing to the fully connected layer, the global average pooling layer can minimize overfitting by reducing the total number of parameters in the model.

  • Concatenate

    The concatenate layer just merges the outcomes from each sub neural network into a single tensor.

  • Softmax

    The softmax layer is the most widely applied activation function for multi-class classification applications, where the softmax function is used to compute a categorical probability distribution illustrating the probability that any of the classes are true.

This proposed architecture effectively combines the spectral features and the fully convolutional neural networks, which generally enhances the classifier’s accuracy as we will show in the experimental results.

4. Anomaly Monitoring Algorithm

In order to generate accurate and robust anomaly detection in real time, we propose an algorithm that use sliding window and majority vote. We first define two variables: a window size and a step size , where and . Imagine that the time series is partitioned into blocks of length , our algorithm lets an active window of length move along the input time series by a step of size . The trained classifier will provide a prediction for the sequence within the active window. Each time the active window moves, a prediction will be made, so that all the blocks except the first block in the last active window will get a new prediction. In this way, each block will accumulate predictions as the sliding window moves along the signal. As a result, we can use majority vote to determine if the current block is an anomaly or not. In our design, the majority vote can be even more flexible by setting a threshold , if the ratio of the positive predictions is larger than or equal to the threshold, we will generate a positive prediction, otherwise a negative one.

Figure 3. An illustration of the online anomaly monitoring algorithm.

Figure 3 illustrates a detailed example about how this algorithm works. Suppose the signal is first split to blocks along the time axis, the length for each block is for simplicity and the sliding window size is . Let us focus on the green block, when the slide window (represented by the black frame) covers the green block for the first time, the classifier will generate a prediction . In the next three steps, as the window moves along the signal, the classifier will sequentially generate predictions , and . Finally, we will use majority vote based on , , and to decide whether the green block indicates an anomaly due to the frozen blades or not.

Design considerations. All the design of this algorithm is targeting at generating accurate and robust monitoring for blade icing detection. Since the decisions are based on majority votes, this schema will effectively reduce the risk of misclassifying any small region within the signal. One may concern that this approach will demand intense computation for neural network inferences, however, the SCADA system provides signal of very low sample rate in reality. In fact, the sampling interval is seven seconds for the turbine state monitoring. That says even if we set the step size literally the same as the sampling interval, there should be sufficient time for the model inference on any modern hardware. Detailed evaluation about this algorithm combining with the WaveletFCNN classifier will be introduced in the next section.

5. Evaluation

In this section, we describe our experimental evaluation of our WaveletFCNN classifier and the anomaly monitoring algorithm 222All the source code and datasets are available in this Github repository: https://github.com/BinhangYuan/WaveletFCNN.. The aim is to answer the following questions:

  • Do the discrete wavelet coefficients and the corresponding change of the convolutional neural network architecture generally enhance the classifier in terms of accuracy?

  • How accurately and robustly does the anomaly monitoring algorithm equipped with the improved classifier conduct the anomaly detection of the frozen blades for real life signals from a wind farm?

In an attempt to answer these two questions, we conduct two sets of experiments. In the first experiment, we benchmark WaveletFCNN on URC time series archive to illustrate the improvement over original FCNN. In the second experiment, we first train WaveletFCNN on labeled multivariate signals then test the anomaly monitoring algorithm on the real world signals from a wind farm.

5.1. UCR Univariate Time Series Classification

5.1.1. Motivation

In order to examine our WaveletFCNN architecture, we first conduct the univariate time series classification on the UCR datasets (Chen et al., 2015). The UCR time series archive includes 85 different datasets covering various applications. We take advantage of the datasets’ diversity to verify the general improvement of WaveletFCNN comparing to the original FCNN model. It should be discreetly noted that we are not claiming WaveletFCNN is the best model for the UCR archive so far (although we do obtain the best accuracy in some datasets as we will introduce later). Instead, we attempt to validate the statement that the wavelet multi-resolution spectrum is a helpful augmentation of the feature space, and the corresponding neural network architecture is capable of leveraging such information to improve the classification accuracy.

5.1.2. Setup

We test WaveletFCNN on all 85 UCR time series datasets and compare the result with the original FCNN. In order to be fair, we follow the experiment’s settings described in (Wang et al., 2017) and briefly repeat the setup here: all the datasets are split into training and testing by default from (Chen et al., 2015)

; the loss function for all tested models is categorical cross entropy; the model that achieves the lowest training loss is preserved, and the performance of this model on the test set is reported. We also use the same hyper-parameters such as batch size, learning rate and epochs as

(Wang et al., 2017) during training. Additionally, as we introduced before, WaveletFCNN has one more hyper-parameter, the wavelet decomposition level. In this experiment, we examine this hyper-parameter of the values: , , or , and report the best result representing WaveletFCNN.

5.1.3. Experimental Results

Table 2 illustrates the complete comparison between WaveletFCNN and the original FCNN 333The original FCNN’s results are obtained from the Github repository (https://github.com/cauchyturing/UCR_Time_Series_Classification_Deep_Learning_Baseline) published in (Wang et al., 2017).. Briefly, in 64 out of 85 datasets, WaveletFCNN obtains better or equivalent accuracy (most of the equivalent situations happens when both methods make correct predictions on that dataset). Some highlighted results of WaveletFCNN are summarized in Table 1. As far as we know, WaveletFCNN outperforms all the state-of-the-art models (Wang et al., 2017; Smith and Williams, 2018; Bagnall et al., 2015; Schäfer, 2015; Karim et al., 2018; Bostrom and Bagnall, 2015; Schäfer and Leser, 2017; Lines and Bagnall, 2015) in these 7 datasets, where WaveletFCNN’s accuracy is strictly larger than that of the state-of-arts (again the equivalence situations are not included here).

Dataset State of the art WaveletFCNN
fish 0.011 (Wang et al., 2017) 0.006
ProxPhalanxOutlineAgeGroup 0.107 (Karim et al., 2018) 0.102
RefrigerationDevices 0.405(Karim et al., 2018) 0.389
ShapesAll 0.078 (Karim et al., 2018) 0.067
StarLightCurves 0.020(Bagnall et al., 2015) 0.018
ToeSegmentation2 0.038(Schäfer, 2015) 0.031
Wine 0.092(Karim et al., 2018) 0.074
Table 1. The highlighted comparison of test errors between WaveletFCNN and the state-of-the-arts.
Dataset FCNN WaveletFCNN Dataset FCNN WaveletFCNN
50words 0.321 0.191 Adiac 0.143 0.156
ArrowHead 0.12 0.103 Beef 0.25 0.033
BeetleFly 0.05 0 BirdChicken 0.05 0
Car 0.083 0.083 CBF 0 0
ChlorineConcentration 0.157 0.159 CinCECGtorso 0.187 0.069
Coffee 0 0 Computers 0.152 0.204
CricketX 0.185 0.264 CricketY 0.208 0.267
CricketZ 0.187 0.236 DiatomSizeR 0.07 0.415
DistalPhalanxOutlineAgeGroup 0.165 0.153 DistalPhalanxOutlineCorrect 0.188 0.177
DistalPhalanxTW 0.21 0.205 Earthquakes 0.199 0.177
ECG200 0.1 0.12 ECG5000 0.059 0.05
ECGFiveDays 0.015 0.005 ElectricDevices 0.277 0.237
FaceAll 0.071 0.057 FaceFour 0.068 0.046
FacesUCR 0.052 0.095 fish 0.029 0.006
FordA 0.094 0.049 FordB 0.117 0.068
GunPoint 0 0 Ham 0.238 0.257
HandOutlines 0.224 0.094 Haptics 0.449 0.445
Herring 0.297 0.297 InlineSkate 0.589 0.498
InsectWingbeatSound 0.598 0.378 ItalyPower 0.03 0.032
LargeKitchenAppliances 0.104 0.096 Lighting2 0.197 0.262
Lighting7 0.137 0.288 MALLAT 0.02 0.031
Meat 0.033 0.017 MedicalImages 0.208 0.234
MiddlePhalanxOutlineAgeGroup 0.232 0.205 MiddlePhalanxOutlineCorrect 0.205 0.192
MiddlePhalanxTW 0.388 0.368 MoteStrain 0.05 0.071
NonInvThorax1 0.039 0.048 NonInvThorax2 0.045 0.048
OliveOil 0.167 0.033 OSULeaf 0.012 0.004
PhalangesOutlinesCorrect 0.174 0.172 Phoneme 0.655 0.689
Plane 0 0 ProxPhalanxOutlineAgeGroup 0.151 0.102
ProxPhalanxOutlineCorrect 0.1 0.093 ProxPhalanxTW 0.19 0.175
RefrigerationDevices 0.467 0.389 ScreenType 0.333 0.397
ShapeletSim 0.133 0.044 ShapesAll 0.102 0.067
SmallKitchenAppliances 0.197 0.197 SonyAIBORobot 0.032 0.031
SonyAIBORobotII 0.038 0.057 StarLightCurves 0.033 0.018
Strawberry 0.031 0.015 SwedishLeaf 0.034 0.034
Symbols 0.038 0.026 SyntheticControl 0.01 0.003
ToeSegmentation1 0.031 0.022 ToeSegmentation2 0.085 0.031
Trace 0 0 TwoLeadECG 0 0
TwoPatterns 0.103 0.004 UWaveGestureLibraryAll 0.174 0.078
UWaveX 0.246 0.158 UWaveY 0.275 0.251
UWaveZ 0.271 0.23 wafer 0.003 0.001
Wine 0.111 0.074 WordSynonyms 0.42 0.284
Worms 0.331 0.298 WormsTwoClass 0.271 0.21
yoga 0.155 0.118
Table 2. The comparison of the test errors between WaveletFCNN and the original FCNN (Wang et al., 2017).

5.1.4. Statistic Analysis and Discussion

In order to obtain a precise conclusion about the comparison between WaveletFCNN and the original FCNN, we conduct a proper statistical analysis. Based on the comparison of results on each dataset, we apply the classical paired t-test

(David, 1963)

with the null hypothesis

that ”the expected test error from the original FCNN is not any higher than that of the WaveletFCNN” at the level. In this case, the p-value is , so that we can strongly reject the assertion. Through this analysis, we can draw a statistically significant conclusion that our WaveletFCNN model is a general improvement of the original FCNN for time series classification.

To be cautious, we also want to point out one potential drawback of WaveletFCNN, that the new hyper-parameter of wavelet decomposition level will require some tuning efforts for different datasets. We agree that it will improve the current WaveletFCNN architecture if there is an automatic schema for determining the optimal wavelet decomposition level. However, on the other hand, we can also view this question as a general model selection task in neural networks, which is also an active research area. For instance, recent research shows that counter-intuitively overparameterized neural networks have advantages in generalization as well (Li and Liang, 2018). In a word, we agree that increasing the complexity of hyper-parameter selection is one temporary drawback of WaveletFCNN, and the model selection of WaveletFCNN can be an interesting future work.

5.2. Blade Icing Detection

5.2.1. Task Overview

In this work, the ultimate goal of designing the advanced classifier is to provide accurate abnormal monitoring in the icing detection scenario. We will evaluate how the anomaly monitoring algorithm equipped with the improved classifier accomplishes this task.

Thanks to the engineers from Goldwind Inc, one of the world’s largest wind turbine manufacturers, we have access to the real data from the wind farms that deploy Goldwind’s turbines. Three wind turbines’ monitoring data are obtained, which represents the running time of Machine 1 for 305.77 hours, Machine 2 for 695.59 hours and Machine 3 for 329.28 hours. It is worthy to mention that only the log for machine 3 is continuous while there are interruptions in the record for Machine 1 and Machine 2. With this in mind, we take of the signal from Machine 3 as the test data in the detecting phase, the other data are processed, as we will introduce later, to create the training set and validation set for the training phase. The raw data is collected from the supervisory control and data acquisition (SCADA) system which includes hundreds of dimensions. According to the engineers’ domain-specific knowledge, 28 continuous variables relevant to frozen blades are preserved as the input multivariate signals. Detailed description of these variables is also available online444https://github.com/BinhangYuan/WaveletFCNN. Further, the engineers also help us label the ranges, during which the blade icing occurs. It should be carefully noted that including these labels make our approach fundamentally different from those unsupervised change point detection or time series segmentation algorithms (Eg., (Matsubara et al., 2014; Gharghabi et al., 2018)).

As we mentioned in the previous sections, our anomaly detection framework includes two parts, an offline learning phase and an online detecting stage. During the learning process, we split the raw signals into a collection of fixed length fragments each with a binary label indicating whether the blades are frozen or not for this period of time. This collection is further split to training and validation sets for learning a WaveletFCNN classifier. In the online detection stage, we use our anomaly monitoring algorithm along with the WaveletFCNN classifier to generate accurate and robust detection of the blade icing situations. We step the two parameters in the anomaly monitoring algorithm as below: the window is set the same as the fragments’ length in the training set; the step size is set to , where L is the wavelet decomposition level of the WaveletFCNN classifier.

The training details of WaveletFCNN classifier will be described below. We will then present experimental evaluations and relevant discussions of deploying the anomaly monitoring algorithm in a simulated real time setup.

5.2.2. WaveletFCNN Training Details

In order to train the WaveletFCNN classifier on the multivariate SCADA signals, we first partition the signal to fixed length fragments. The SCADA system fetches signals from different sensors every seven seconds. We cut the input multivariate time series into a group of fragments of length 512, where each time series fragment approximately represents the wind turbine’s status within one hour. One risk of directly applying this collection for training is that the dataset will be strongly biased according to the labeled ranges, since the turbines function properly for most of the time. In order to address this issue, we augment the number of positive samples (samples representing anomaly) by generating overlapping positive fragments and negative fragments without overlap regions. For example, suppose a wind turbine functions properly from to , and there is a malfunction due to blading icing from to , then we will cut the normal range without overlap so that 8 negative fragments each representing the state within on hour will be created, on the other hand, we can apply a step size of 10 minutes and a sliding window of one hour to move along the abnormal region and generate 7 positive fragments (Eg., to , to , and etc.). In reality, we set the step size to length of 16 (representing the signals of 112 seconds) to create a label balanced dataset. We further split the augmented collection into training and validation sets to train the WaveletFCNN classifier.

The change from univariate to multivariate sequence for WaveletFCNN is straight-forward: we just conduct the discrete wavelet transform for each signal dimension independently to construct the input layer and change the first convolutional layer’s input dimension to 26 in each sub-module, the rest of the model is untouched.

Finally, we obtain a training set with 1221 sequences and a validation set with 308 sequences. The accuracy of the WaveletFCNN classifier is , by contrast, the original FCNN classifier only achieves .

5.2.3. Anomaly Monitoring Experimental Setup and Results

Simulation setup. In order to evaluate the performance of our anomaly monitoring algorithm, we use a labeled continues time series (unseen during the training phase) in order to simulate a real time setting. The signal is then split into small blocks of length , to keep simplicity for each small block, we use a single label indicating the state of anomaly or not. The label is determined by whether half or more than half of the block falls into the anomaly regions. As the simulation begin, we accumulate blocks into the sliding window. Once the blocks fill up the sliding window, the WaveletFCNN will do a classification on the signal within the current window. When the next block arrives, the sliding window will move one step forward, so that the earliest arrived block will be abandoned and the latest arrived block will be placed in the end of the sliding window, then the classifier will make another prediction based on the new updated time series within the sliding window. This simulation attempts to mimic the scenario, where the monitoring center fetches a signal block every 112 seconds and combines this block with the previous blocks to make a prediction, the predicting results will be cached for majority vote; the voting results will indicate whether a blade icing situation is detected. Simulation Results. We will present the simulation results by answering the below following questions. Before that, we prefer to first review a few general metrics for classification in our settings:

  • Accuracy is the number of correct predictions made by the model over all kinds predictions made. Since the input signal is strongly biased in this simulation, accuracy is not so informative.

  • Precision is a measure that tells us what proportion of positions that we diagnose as an anomaly, actually are anomalies.

  • Recall measures what proportion of positions that are actually anomalies is diagnosed by the algorithm as an anomaly.

  • F1 score is the Harmonic mean of precision and recall, which provides a general evaluation of the classifier.

How does WaveletFCNN classifier perform in this simulation?

Since this signal is strongly biased which is different from the training set, it is important to verify that WaveletFCNN can still make robust predictions. To this end, we record the prediction for each signal fragment that ever stays in the sliding window. Again, for simplicity, we use a single label indicating the state of anomaly or not of this signal fragments which is determined by whether half or more than half of the fragments falls into the labeled anomaly regions. The classification performance metrics for both WaveletFCNN and original FCNN are illustrated in Table 3. From the table, we can see that WaveletFCNN still outperforms FCNN in terms of accuracy, precision, and F1 score. We also observe that FCNN tend to produce very few positive prediction so that the recall is higher than that of FCNN. However, in reality, we cannot take this as a good property of a classifier since false negative predictions are far more dangerous than that of false positive predictions.

Measurement WaveletFCNN FCNN
Accuracy 0.858 0.619
Precision 0.215 0.096
Recall 0.664 0.766
F1 score 0.325 0.171
Table 3. Classification performance metrics for WaveletFCNN and FCNN for the signal in the simulation.

How do the sliding window and majority voting schemas help?

With the careful design of the anomaly monitoring algorithm, it is critical to find out whether this algorithm is capable of improving the accuracy and robustness of the detection. Based on this, we compare our proposed algorithm with a straw-man monitoring algorithm that use a non-overlapping slide window to process the signal, and simply take the outcome from the WaveletFCNN classifier as the monitoring results. As the results in Table 4 show, there are significant improvements of accuracy, precision and F1 score of the proposed algorithm; the recall of the proposed algorithm is slightly worse than that of the straw-man, but as we discussed, the guided practice of this term is limited.

Measurement Straw-man Proposed algorithm
Accuracy 0.846 0.882
Precision 0.181 0.251
Recall 0.666 0.654
F1 score 0.286 0.363
Table 4. Comparison between a straw-man approach and our proposed algorithm.

Will the flexible voting schema provide further improvement?

We investigate the relationship between the threshold and the evaluation measurement, and record the results in Table 5. Although the accuracy increases as the threshold increases, which is intuitive since the signal is biased on the negative class, the F1 score reaches its maxima when threshold . We believe tuning this variable can further enhanced the detecting performances in reality.

Threshold Accuracy Precision Recall F1 score
0.1 0.756 0.169 0.963 0.288
0.2 0.816 0.210 0.935 0.343
0.3 0.847 0.239 0.907 0.378
0.4 0.880 0.284 0.879 0.429
0.5 0.882 0.251 0.654 0.363
0.6 0.892 0.255 0.570 0.353
0.7 0.895 0.242 0.486 0.323
0.8 0.904 0.240 0.402 0.301
0.9 0.906 0.209 0.299 0.246
Table 5. Relationship between the voting threshold and the evaluation measurements.

5.2.4. Discussion

Although our algorithm has achieved promising results for both training and detecting phases, there are several reasons to be at least a bit cautious in asserting that no further improvement about the framework can be made.

First, one can imagine that labeling the monitoring data of more than 1200 hours requires huge manpower, however, deep learning models can still easily overfit a dataset of the current scale. Based on our observation, WaveletFCNN ends up with a accuracy on the training set although we already include batch normalization to prevent overfitting. One straighfowrd option to address this issue is to generate a larger dataset in the future.

Likewise, in the training phase, we include signals from 3 different machines in the training set. A more reasonable design will be training separated models for each machine since the microclimate and working status for each turbine may vary. However, due to the limitation of dataset’s scale, we do not apply this setting.

Finally, our framework can benefit from plenty of other machine learning techniques. For example, we can leverage active learning

(Settles, 2014)

in order to reduce the human efforts for labeling the data. Additionally, we can first train a general classifier on signals from multiple wind turbines, and then use transfer learning

(Pan et al., 2010) to quickly update the general model based on small amount of data from a target turbine and deploy the new model to monitor this target wind turbine.

6. Conclusion

We present a classification-based anomaly detection technique in order to monitor the wind turbine’s blade icing issue. In the training phase, WaveletFCNN, a fully convolutional neural network augmented by the discrete wavelet transform coefficients, is proposed in order to enhance the original FCNN’s performances. We also design a novel anomaly monitoring algorithm for the detecting phase in order to provide accurate, robust and deployable detection in a real-world scenario. Experimental results show that WaveletFCNN outperforms the original FCNN not only for this frozen blade monitoring application but also for extensive applications from the UCR time series archive. The anomaly monitoring algorithm is also verified by the simulated real time setup and shows promising results. We plan to deploy this prototyped system in real-world wind farms in the near future.


  • (1)
  • gol ([n. d.]) [n. d.]. Top 10 Wind Turbine Manufacturers in the World. https://www.bizvibe.com/blog/top-10-wind-turbine-manufacturers-world/. Accessed: 2019-02-03.
  • Bagnall et al. (2015) Anthony Bagnall, Jason Lines, Jon Hills, and Aaron Bostrom. 2015. Time-series classification with COTE: the collective of transformation-based ensembles. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2522–2535.
  • Baydogan et al. (2013) Mustafa Gokce Baydogan, George Runger, and Eugene Tuv. 2013. A bag-of-features framework to classify time series. IEEE transactions on pattern analysis and machine intelligence 35, 11 (2013), 2796–2802.
  • Berndt and Clifford (1994) Donald J Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series.. In KDD workshop, Vol. 10. Seattle, WA, 359–370.
  • Bostrom and Bagnall (2015) Aaron Bostrom and Anthony Bagnall. 2015. Binary shapelet transform for multiclass time series classification. In International Conference on Big Data Analytics and Knowledge Discovery. Springer, 257–269.
  • Bou-Ghazale and Hansen (2000) Sahar E Bou-Ghazale and John HL Hansen. 2000. A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on speech and audio processing 8, 4 (2000), 429–442.
  • Carlsson (2010) Viktor Carlsson. 2010. Measuring routines of ice accretion for Wind Turbine applications: The correlation of production losses and detection of ice.
  • Chandola et al. (2009) Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 15.
  • Chen et al. (2018) Longting Chen, Guanghua Xu, Lin Liang, Qing Zhang, and SiCong Zhang. 2018. Learning Deep Representation for Blades Icing Fault Detection of Wind Turbines. In 2018 IEEE International Conference on Prognostics and Health Management (ICPHM). IEEE, 1–8.
  • Chen et al. (2016) Yushi Chen, Hanlu Jiang, Chunyang Li, Xiuping Jia, and Pedram Ghamisi. 2016.

    Deep feature extraction and classification of hyperspectral images based on convolutional neural networks.

    IEEE Transactions on Geoscience and Remote Sensing 54, 10 (2016), 6232–6251.
  • Chen et al. (2015) Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, and Gustavo Batista. 2015. The UCR Time Series Classification Archive. www.cs.ucr.edu/~eamonn/time_series_data/.
  • Chun-Lin (2010) Liu Chun-Lin. 2010. A tutorial of the wavelet transform. NTUEE, Taiwan (2010).
  • Ciang et al. (2008) Chia Chen Ciang, Jung-Ryul Lee, and Hyung-Joon Bang. 2008. Structural health monitoring for a wind turbine system: a review of damage detection methods. Measurement Science and Technology 19, 12 (2008), 122001.
  • Corradini et al. (2016) Maria Letizia Corradini, Andrea Cristofaro, and Silvia Pettinari. 2016. A model-based robust icing detection and estimation scheme for wind turbines. In Control Conference (ECC), 2016 European. IEEE, 1451–1456.
  • David (1963) Herbert Aron David. 1963. The method of paired comparisons. Vol. 12. London.
  • Fthenakis and Kim (2009) Vasilis Fthenakis and Hyung Chul Kim. 2009. Land use and electricity generation: A life-cycle analysis. Renewable and Sustainable Energy Reviews 13, 6 (2009), 1465–1474.
  • Fujieda et al. (2018) Shin Fujieda, Kohei Takayama, and Toshiya Hachisuka. 2018. Wavelet Convolutional Neural Networks. arXiv preprint arXiv:1805.08620 (2018).
  • Gharghabi et al. (2018) Shaghayegh Gharghabi, Chin-Chia Michael Yeh, Yifei Ding, Wei Ding, Paul Hibbing, Samuel LaMunion, Andrew Kaplan, Scott E Crouter, and Eamonn Keogh. 2018. Domain agnostic online semantic segmentation for multi-dimensional time series. Data Mining and Knowledge Discovery (2018), 1–35.
  • Han et al. (2018) Yongming Han, Shuheng Zhang, and Zhiqiang Geng. 2018. Multi-Frequency Decomposition with Fully Convolutional Neural Network for Time Series Classification. In

    2018 24th International Conference on Pattern Recognition (ICPR)

    . IEEE, 284–289.
  • Hossain and Muhammad (2016) M Shamim Hossain and Ghulam Muhammad. 2016. Cloud-assisted industrial internet of things (iiot)–enabled framework for health monitoring. Computer Networks 101 (2016), 192–202.
  • Ioffe and Szegedy (2015) Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).
  • Jin et al. (2017) Kyong Hwan Jin, Michael T McCann, Emmanuel Froustey, and Michael Unser. 2017. Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing 26, 9 (2017), 4509–4522.
  • Karim et al. (2018) Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Shun Chen. 2018. LSTM fully convolutional networks for time series classification. IEEE Access 6 (2018), 1662–1669.
  • Kuo (1995) RJ Kuo. 1995. Intelligent diagnosis for turbine blade faults using artificial neural networks and fuzzy logic.

    Engineering Applications of Artificial Intelligence

    8, 1 (1995), 25–34.
  • Kusiak and Verma (2012) Andrew Kusiak and Anoop Verma. 2012. Analyzing bearing faults in wind turbines: A data-mining approach. Renewable Energy 48 (2012), 110–116.
  • Li et al. (2002) Tao Li, Qi Li, Shenghuo Zhu, and Mitsunori Ogihara. 2002. A survey on wavelet applications in data mining. ACM SIGKDD Explorations Newsletter 4, 2 (2002), 49–68.
  • Li and Liang (2018) Yuanzhi Li and Yingyu Liang. 2018.

    Learning overparameterized neural networks via stochastic gradient descent on structured data. In

    Advances in Neural Information Processing Systems. 8168–8177.
  • Lin et al. (2007) Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. 2007. Experiencing SAX: a novel symbolic representation of time series. Data Mining and knowledge discovery 15, 2 (2007), 107–144.
  • Lines and Bagnall (2015) Jason Lines and Anthony Bagnall. 2015. Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery 29, 3 (2015), 565–592.
  • Lu et al. (2009) Bin Lu, Yaoyu Li, Xin Wu, and Zhongzhou Yang. 2009. A review of recent advances in wind turbine condition monitoring and fault diagnosis. In Power Electronics and Machines in Wind Applications, 2009. PEMWA 2009. IEEE. IEEE, 1–7.
  • Luukkala (1995) Mauri Luukkala. 1995. Detector for indicating ice formation on the wing of an aircraft. US Patent 5,467,944.
  • Malik and Mishra (2015) Hasmat Malik and Sukumar Mishra. 2015. Application of Probabilistic Neural Network in Fault Diagnosis of Wind Turbine Using FAST, TurbSim and Simulink. Procedia Computer Science 58 (2015), 186–193.
  • Matsubara et al. (2014) Yasuko Matsubara, Yasushi Sakurai, and Christos Faloutsos. 2014. Autoplait: Automatic mining of co-evolving time sequences. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 193–204.
  • Mikolov et al. (2010) Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.
  • Muñoz et al. (2018) Carlos Quiterio Gómez Muñoz, Alfredo Arcos Jiménez, and Fausto Pedro García Márquez. 2018. Wavelet transforms and pattern recognition on ultrasonic guides waves for frozen surface state diagnosis. Renewable Energy 116 (2018), 42–54.
  • Muñoz et al. (2016) Carlos Quiterio Gómez Muñoz, Fausto Pedro García Márquez, and Juan Manuel Sánchez Tomás. 2016. Ice detection using thermal infrared radiometry on wind turbine blades. Measurement 93 (2016), 157–163.
  • Nair and Hinton (2010) Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). 807–814.
  • Pan et al. (2010) Sinno Jialin Pan, Qiang Yang, et al. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22, 10 (2010), 1345–1359.
  • Parent and Ilinca (2011) Olivier Parent and Adrian Ilinca. 2011. Anti-icing and de-icing techniques for wind turbines: Critical review. Cold regions science and technology 65, 1 (2011), 88–96.
  • Pegau et al. (1992) W Scott Pegau, Clayton A Paulson, and J Ronald V Zaneveld. 1992. Optical techniques for the measurement of frazil ice. In Ocean Optics XI, Vol. 1750. International Society for Optics and Photonics, 498–508.
  • Regan et al. (2016) Taylor Regan, Rukiye Canturk, Elizabeth Slavkovsky, Christopher Niezrecki, and Murat Inalpolat. 2016. Wind Turbine Blade Damage Detection Using Various Machine Learning Algorithms. In ASME 2016 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, V008T10A040–V008T10A040.
  • Sak et al. (2014) Haşim Sak, Andrew Senior, and Françoise Beaufays. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Fifteenth annual conference of the international speech communication association.
  • Saleh et al. (2012) SA Saleh, R Ahshan, and CR Moloney. 2012. Wavelet-based signal processing method for detecting ice accretion on wind turbines. IEEE Transactions on Sustainable Energy 3, 3 (2012), 585–597.
  • Schäfer (2015) Patrick Schäfer. 2015. The BOSS is concerned with time series classification in the presence of noise. Data Mining and Knowledge Discovery 29, 6 (2015), 1505–1530.
  • Schäfer and Leser (2017) Patrick Schäfer and Ulf Leser. 2017. Fast and accurate time series classification with weasel. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 637–646.
  • Settles (2014) Burr Settles. 2014. Active learning literature survey. 2010. Computer Sciences Technical Report 1648 (2014).
  • Shahriar and Rahman (2015) Md Sumon Shahriar and M Sabbir Rahman. 2015. Urban sensing and smart home energy optimisations: A machine learning approach. In Proceedings of the 2015 International Workshop on Internet of Things towards Applications. ACM, 19–22.
  • SHOJA ([n. d.]) SIAVASH SHOJA. [n. d.]. Guided Wave Propagation in Composite Structures. ([n. d.]).
  • Smith and Williams (2018) Kaleb E Smith and Phillip Williams. 2018.

    Time series classification with shallow learning shepard interpolation neural networks. In

    International Conference on Image and Signal Processing. Springer, 329–338.
  • Vishwanath (1994) Mohan Vishwanath. 1994. The recursive pyramid algorithm for the discrete wavelet transform. IEEE Transactions on Signal Processing 42, 3 (1994), 673–676.
  • Wang et al. (2018) Jinjiang Wang, Yulin Ma, Laibin Zhang, Robert X Gao, and Dazhong Wu. 2018. Deep learning for smart manufacturing: Methods and applications. Journal of Manufacturing Systems (2018).
  • Wang et al. (2017) Zhiguang Wang, Weizhong Yan, and Tim Oates. 2017. Time series classification from scratch with deep neural networks: A strong baseline. In Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE, 1578–1585.
  • Zhang et al. (2017) Liheng Zhang, Charu Aggarwal, and Guo-Jun Qi. 2017. Stock Price Prediction via Discovering Multi-Frequency Trading Patterns. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2141–2149.
  • Zhang et al. (2012) Zijun Zhang, Anoop Verma, and Andrew Kusiak. 2012. Fault analysis and condition monitoring of the wind turbine gearbox. IEEE transactions on energy conversion 27, 2 (2012), 526–535.
  • Zhao et al. (2018) Yi Zhao, Yanyan Shen, Yanmin Zhu, and Junjie Yao. 2018. Forecasting Wavelet Transformed Time Series with Attentive Neural Networks. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 1452–1457.
  • Zhou et al. (2016) Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2921–2929.