WaveletFCNN
Source code and dataset for the paper ""
view repo
Wind power, as an alternative to burning fossil fuels, is plentiful and renewable. Data-driven approaches are increasingly popular for inspecting the wind turbine failures. In this paper, we propose a novel classification-based anomaly detection system for icing detection of the wind turbine blades. We effectively combine the deep neural networks and wavelet transformation to identify such failures sequentially across the time. In the training phase, we present a wavelet based fully convolutional neural network (FCNN), namely WaveletFCNN, for the time series classification. We improve the original (FCNN) by augmenting features with the wavelet coefficients. WaveletFCNN outperforms the state-of-the-art FCNN for the univariate time series classification on the UCR time series archive benchmarks. In the detecting phase, we combine the sliding window and majority vote algorithms to provide the timely monitoring of the anomalies. The system has been successfully implemented on a real-world dataset from Goldwind Inc, where the classifier is trained on a multivariate time series dataset and the monitoring algorithm is implemented to capture the abnormal condition on signals from a wind farm.
READ FULL TEXT VIEW PDFSource code and dataset for the paper ""
Time series data is becoming ubiquitous due to the rapid development of the Internet of Things (IoT). Diversified sensors collect abundant data for further analysis in various domains, such as health monitoring (Hossain and Muhammad, 2016), smart manufacturing (Wang et al., 2018), and energy management (Shahriar and Rahman, 2015).
Wind energy, as an alternative to burning fossil fuels, is clean, renewable, and widely distributed. Therefore, wind energy is increasingly popular for generating electric power (Fthenakis and Kim, 2009)
. However, many wind farms are located in areas with a high probability of ice occurrence. Figure
1 illustrates a wind turbine facing icing condition. Blade icing may cause serious problems, such as measurement errors, power losses, overproduction, mechanical failures, and electrical failures (Parent and Ilinca, 2011). As a consequence, icing detection becomes a priority in order to avoid such problems.Traditionally, applied physics and mechanical engineering researches resolve this problem by designing and installing new physical detectors. Various techniques, such as damping of ultrasonic waves (Luukkala, 1995), measurement of the resonance frequency (Carlsson, 2010), thermal infrared radiometry (Muñoz et al., 2016), optical measurement techniques (Pegau et al., 1992), ultrasonic guides waves (Muñoz et al., 2018)
and etc., have been applied for icing detection. However, such techniques are limited by high costs and energy demands. Besides, the ice sensors may provide inaccurate estimates of icing risks for wind turbines due to the internal unreliability
(Parent and Ilinca, 2011). Worse still, the installation of such detectors may require some mechanical change of the wind turbine and huge manpower. As a result, engineers in practice usually monitor the power curve of the wind turbine to estimate the blades’ icing conditions in order determine whether to trigger the deicing procedure. So here is one fundamental question we want to explore in this paper: Is that possible to analyze only the signals from the standard pre-installed sensors (eg., from SCADA) in the wind turbine in an attempt to design a deployable system for blade icing detection?To this end, we propose a data-driven approach to inspect blade icing precisely on real time signals so that the deicing procedure can be started automatically with a very short response time. We formalize this anomaly detection task as two phases: (i) a training phase in order to obtain a time series classifier, where the input sequences are generated by the currently installed general-purpose sensors that detect the weather and turbine conditions, such as wind speed, internal temperature, yaw positions, pitch angles, power output, etc.; (ii) a detecting phase, in which we propose an algorithm based on sliding window and majority vote to apply the trained classifier attempting to provide accurate and robust detection of the blade icing situations.
Time series classification is a classical time series analysis task. Famous approaches include dynamic time warping (DWT) (Berndt and Clifford, 1994), Bag-of-Words (BoW) (Lin et al., 2007) and Bag-of-features (TSBF) (Baydogan et al., 2013)
, where a group of features are extracted to feed classifiers such as nearest neighbor (NN) and support vector machine (SVM). Additionally, spectral features also play an important role in feature engineering of time series classification. The spectrum analysis first obtains the frequency based features by converting the time based signal to the frequency domain using mathematical tools like the Fourier transform or the wavelet transform, so that the information about periodicity and volatility of the signals appears. Such spectral features have been widely used in time series classification applications, like speech recognition
(Bou-Ghazale and Hansen, 2000).Recently, deep neural networks have achieved great success in a variety of areas, such as computer vision and text mining. Deep neural networks have also been applied to time series classification and show great success. For example,
(Wang et al., 2017)proposes a fully connected neural network (FCNN), where deep multilayer perceptrons (MLP), fully convolutional neural networks, and the residual networks (ResNet) are adapted on raw sequences for univariate time series classification.
(Karim et al., 2018)provides an augmentation of FCNN by including sub-modules of long short term memory (LSTM)
(Sak et al., 2014)recurrent neural network units. One attractive statement about deep learning is that the knowledge about the task can be automatically captured by tuning a large number of parameters so that heavy crafting on data preprocessing and feature engineering will be avoided. While this is true in some scenarios, we argue that sophisticated selected features combined with carefully designed neural network architectures will provide further improvement.In this paper, in order to design a more accurate classifier, we improve the fully convolutional neural networks by augmenting input features with orthonormal discrete wavelet translation coefficients, which represent the variance of the sequence across multiple scopes. The decomposition of the original signal reveals the information embedded among multi-resolutions. Unlike the discrete Fourier transform which keeps only the spectral information, the discrete wavelet transform preserves information in both the time and frequency domains. Wavelet multi-resolution analysis can provide helpful information for fully convolutional neural networks since the original convolutional layer can learn information only within a fixed region constrained by the filter size. On the other hand, even with the enhanced classifier, it is still challenging to provide robust prediction in real time. In an attempt to address this issue, we design an anomaly monitoring algorithm combining sliding window and majority vote so that the inaccuracy and unstable limitations, which are inevitable outcomes of the classifier, will be minimized.
Our contribution. Specific contributions of this work are to:
Combine convolutional neural networks with wavelet multi-resolution analysis as an improved model for sequence classification .
Achieve enhanced accuracy on UCR (Chen et al., 2015) dataset for univariate time series classification.
Design a novel anomaly monitoring algorithm to provide accurate and robust real time ice detection.
Obtain promising results in detecting frozen blades in wind turbines by adapting the anomaly monitoring algorithm in real-life signals from a wind farm.
In data mining community, anomaly detection is an important problem that has been researched within diverse application domains (Chandola et al., 2009). Based on the availability of labeled dataset, anomaly detection techniques fall into three main categories: supervised mode where the labeled instances for both normal and anomaly classes are accessible, semisupervised mode where the training set only includes normal samples, and unsupervised mode that does not require any training data.
Wind farms usually locate in remote mountainous or rough sea regions, which makes monitoring and maintenance more challenging. A fault detection system will help to avoid premature breakdown, to reduce maintenance cost, and to support for further development of a wind turbine (Ciang et al., 2008; Lu et al., 2009). Traditionally, anomaly detection researches from applied physics and mechanical engineering communities are focusing on design new physical detectors. For example, various detectors with advanced techniques have been proposed (Luukkala, 1995; Carlsson, 2010; Muñoz et al., 2016; Pegau et al., 1992; Muñoz et al., 2018). Another general approach for wind turbine icing detection is to design physical models by inferring ice accretion from power output signals, however, such approaches are usually validated by some software simulations without testing in the real world scenarios (Saleh et al., 2012; Corradini et al., 2016).
On the other hand, data driven approaches for wind turbine state monitoring and failure detection have gained more attention due to the easy of deployment comparing to installing complicated detectors. Most of the research efforts in detecting various failures in wind turbine are based on the supervised anomaly detection techniques. For example, (Regan et al., 2016)
utilizes logistic regression and support vector machine (SVM) to conduct acoustics-based damage detection of wind turbines’ enclosed cavities;
(Kusiak and Verma, 2012) and (Kuo, 1995) apply traditional neural networks to identify bearing faults and the existence of an unbalanced blade or a loose blade in wind turbines; (Malik and Mishra, 2015) implements a 3-layer probabilistic neural network (PNN) to diagnose wind turbine’s imbalance fault identification based on generator signals; (Zhang et al., 2012) includes both time domain and frequency domain information and tries variant classifiers to detect changes in the gearbox vibration excitement. Very recently, (Chen et al., 2018)also applies deep neural networks for blades’ icing detection of wind turbines, where they learn a deep feature representation by clustering, and then use k-nearest neighbor for predicting. However, such nonparametric learning algorithms require a lot more data to train and suffer seriously from overfitting. Worse still, the inference is more compute-intensive, which makes the deployment more difficult on real time systems.
Spectral analysis transforms signals into the frequency domain to reveal information invisible in the time domain. Among these spectral approaches, wavelet transform has been widely applied in data mining researches (Li et al., 2002). Wavelet transform keeps both the time and frequency information to construct the multi-resolution analysis for signals. The signal is adaptively filtered by short windows at high frequencies and long windows at low frequencies, so that the wavelet spectrogram is able to capture the interactions between time and frequency information, which show strength in signal processing and data mining.
Although one romanticized advertisement for deep neural networks is that heavy crafted feature engineering will become unessential since the models are advanced enough to work this out by themselves, there are plenty of works illustrating that feature engineering still plays an important role for obtaining promising results. In theory, recurrent neural network (RNN) (Mikolov et al., 2010) can use their internal memory to learn arbitrary connection within the sequence. However, in practice, elaborately designed neural network architectures combined with spectral features show advantage in plenty of time series applications. For instance, (Zhang et al., 2017) introduces a state frequency memory (SFM) recurrent network to capture the multi-frequency patterns from previous trading data to make long and short term stock price predictions over time; (Han et al., 2018) proposes a multi-frequency decomposition (MFD) method based on the real fast Fourier transform (RFFT), which can be added to neural networks as a layer, in order to enhance the performance of the fully convolutional neural networks; (Zhao et al., 2018) applies wavelet transform to capture the time-frequency features and combines these features with other neural network architectures, such as CNN, LSTM, and attention mechanism to improve the prediction accuracy in time series forecasting. Besides the wide usage in time series application, neural networks enhanced by spectral features also demonstrate good performance in computer vision applications (Chen et al., 2016; Jin et al., 2017; Fujieda et al., 2018), which we will not enumerate exhaustively due to the space limit.
In this section, we will first review some important concepts and properties of the discrete wavelet transform, then introduce the design of the WaveletFCNN architecture that can take advantages of good properties from the discrete wavelet transform.
The discrete wavelet transform decomposes a discrete time signal into a discrete wavelet representation (Chun-Lin, 2010). Formally, given
that represents a length N signal, and the basis functions of the form and , then the coefficients for each translation (indexed by ) in each scale level (indexed by or ) are projections of the signal onto each of the basis functions:
(1) |
where is called approximation coefficient, and is called detail coefficient.
The detail coefficients at different levels reveal variances of the signal on different scales, while the approximation coefficient yields the smoothed average on that scale. One important property of the discrete wavelet transform is that detail coefficients at each level are orthogonal, that says for any pair of detail coefficients not in the same level, the inner product is :
(2) |
As a result, we can interpret the detail coefficients as an additive decomposition of the signal called multi-resolution analysis. There are abundant reasons to believe that this wavelet spectrum that represents the variance decomposition can provide good information for time series classifiers, which is difficult to learn from an end-to-end neural network.
In an attempt to leverage the good properties of the discrete wavelet coefficients, we convert the original fully convolutional neural networks as it illustrates in Figure 2. In a nutshell, we first compute the discrete wavelet coefficients of the input signal to a specific level, which can be viewed as a hyper-parameter of WaveletFCNN, according to the famous pyramid algorithm (Vishwanath, 1994); then we put the original signal and the detail coefficients at each level into separated sub convolutional neural networks in order to capture the knowledge embedded in different scales from the wavelet spectrum; lastly, the global pooling outputs from each sub network are concatenated for generating the final classification outcomes. it is worth noting that we does not put the approximation coefficients into the neural network for two reasons: (i) the approximation coefficients represent some smoothed averages of the input signal, such knowledge should be easily learned by the convolutional layers when processing the original signal; (ii) unlike the detail coefficients, approximation coefficients are not orthogonal to each other, the redundancy of the input will enlarge the parameter space of the neural network, which sets obstacles for both model training and inference.
Concretely, the detailed description about each layer in WaveletFCNN is enumerated below:
Input: The input of the WaveletFCNN is the time series sample with size , where is the length and is the dimension of the signal. For univariate sequences, D is 1. The detail coefficients are then computed from the input signal by the pyramid algorithm until reaching the target level specified as a hyper-parameter. The size of each wavelet coefficients layer is , where .
Conv1D
: The convolutional layer is the essential building block of the neural network, which will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume.
BN and ReLu
: Batch normalization layer
(Ioffe and Szegedy, 2015), as an alternative to dropout, will reduce the amount by what the hidden unit values shift around and act as a regularizer to prevent overfitting. Relu layer
(Nair and Hinton, 2010)will apply an element-wise activation function thresholding activations at zero.
Global Average Pooling: At the end of each sub network, a global average pooling layer (Zhou et al., 2016) is applied to reduce the temporal dimensions by computing the average within each convolutional channel. Comparing to the fully connected layer, the global average pooling layer can minimize overfitting by reducing the total number of parameters in the model.
Concatenate
The concatenate layer just merges the outcomes from each sub neural network into a single tensor.
Softmax
The softmax layer is the most widely applied activation function for multi-class classification applications, where the softmax function is used to compute a categorical probability distribution illustrating the probability that any of the classes are true.
This proposed architecture effectively combines the spectral features and the fully convolutional neural networks, which generally enhances the classifier’s accuracy as we will show in the experimental results.
In order to generate accurate and robust anomaly detection in real time, we propose an algorithm that use sliding window and majority vote. We first define two variables: a window size and a step size , where and . Imagine that the time series is partitioned into blocks of length , our algorithm lets an active window of length move along the input time series by a step of size . The trained classifier will provide a prediction for the sequence within the active window. Each time the active window moves, a prediction will be made, so that all the blocks except the first block in the last active window will get a new prediction. In this way, each block will accumulate predictions as the sliding window moves along the signal. As a result, we can use majority vote to determine if the current block is an anomaly or not. In our design, the majority vote can be even more flexible by setting a threshold , if the ratio of the positive predictions is larger than or equal to the threshold, we will generate a positive prediction, otherwise a negative one.
Figure 3 illustrates a detailed example about how this algorithm works. Suppose the signal is first split to blocks along the time axis, the length for each block is for simplicity and the sliding window size is . Let us focus on the green block, when the slide window (represented by the black frame) covers the green block for the first time, the classifier will generate a prediction . In the next three steps, as the window moves along the signal, the classifier will sequentially generate predictions , and . Finally, we will use majority vote based on , , and to decide whether the green block indicates an anomaly due to the frozen blades or not.
Design considerations. All the design of this algorithm is targeting at generating accurate and robust monitoring for blade icing detection. Since the decisions are based on majority votes, this schema will effectively reduce the risk of misclassifying any small region within the signal. One may concern that this approach will demand intense computation for neural network inferences, however, the SCADA system provides signal of very low sample rate in reality. In fact, the sampling interval is seven seconds for the turbine state monitoring. That says even if we set the step size literally the same as the sampling interval, there should be sufficient time for the model inference on any modern hardware. Detailed evaluation about this algorithm combining with the WaveletFCNN classifier will be introduced in the next section.
In this section, we describe our experimental evaluation of our WaveletFCNN classifier and the anomaly monitoring algorithm ^{2}^{2}2All the source code and datasets are available in this Github repository: https://github.com/BinhangYuan/WaveletFCNN.. The aim is to answer the following questions:
Do the discrete wavelet coefficients and the corresponding change of the convolutional neural network architecture generally enhance the classifier in terms of accuracy?
How accurately and robustly does the anomaly monitoring algorithm equipped with the improved classifier conduct the anomaly detection of the frozen blades for real life signals from a wind farm?
In an attempt to answer these two questions, we conduct two sets of experiments. In the first experiment, we benchmark WaveletFCNN on URC time series archive to illustrate the improvement over original FCNN. In the second experiment, we first train WaveletFCNN on labeled multivariate signals then test the anomaly monitoring algorithm on the real world signals from a wind farm.
In order to examine our WaveletFCNN architecture, we first conduct the univariate time series classification on the UCR datasets (Chen et al., 2015). The UCR time series archive includes 85 different datasets covering various applications. We take advantage of the datasets’ diversity to verify the general improvement of WaveletFCNN comparing to the original FCNN model. It should be discreetly noted that we are not claiming WaveletFCNN is the best model for the UCR archive so far (although we do obtain the best accuracy in some datasets as we will introduce later). Instead, we attempt to validate the statement that the wavelet multi-resolution spectrum is a helpful augmentation of the feature space, and the corresponding neural network architecture is capable of leveraging such information to improve the classification accuracy.
We test WaveletFCNN on all 85 UCR time series datasets and compare the result with the original FCNN. In order to be fair, we follow the experiment’s settings described in (Wang et al., 2017) and briefly repeat the setup here: all the datasets are split into training and testing by default from (Chen et al., 2015)
; the loss function for all tested models is categorical cross entropy; the model that achieves the lowest training loss is preserved, and the performance of this model on the test set is reported. We also use the same hyper-parameters such as batch size, learning rate and epochs as
(Wang et al., 2017) during training. Additionally, as we introduced before, WaveletFCNN has one more hyper-parameter, the wavelet decomposition level. In this experiment, we examine this hyper-parameter of the values: , , or , and report the best result representing WaveletFCNN.Table 2 illustrates the complete comparison between WaveletFCNN and the original FCNN ^{3}^{3}3The original FCNN’s results are obtained from the Github repository (https://github.com/cauchyturing/UCR_Time_Series_Classification_Deep_Learning_Baseline) published in (Wang et al., 2017).. Briefly, in 64 out of 85 datasets, WaveletFCNN obtains better or equivalent accuracy (most of the equivalent situations happens when both methods make correct predictions on that dataset). Some highlighted results of WaveletFCNN are summarized in Table 1. As far as we know, WaveletFCNN outperforms all the state-of-the-art models (Wang et al., 2017; Smith and Williams, 2018; Bagnall et al., 2015; Schäfer, 2015; Karim et al., 2018; Bostrom and Bagnall, 2015; Schäfer and Leser, 2017; Lines and Bagnall, 2015) in these 7 datasets, where WaveletFCNN’s accuracy is strictly larger than that of the state-of-arts (again the equivalence situations are not included here).
Dataset | State of the art | WaveletFCNN |
---|---|---|
fish | 0.011 (Wang et al., 2017) | 0.006 |
ProxPhalanxOutlineAgeGroup | 0.107 (Karim et al., 2018) | 0.102 |
RefrigerationDevices | 0.405(Karim et al., 2018) | 0.389 |
ShapesAll | 0.078 (Karim et al., 2018) | 0.067 |
StarLightCurves | 0.020(Bagnall et al., 2015) | 0.018 |
ToeSegmentation2 | 0.038(Schäfer, 2015) | 0.031 |
Wine | 0.092(Karim et al., 2018) | 0.074 |
Dataset | FCNN | WaveletFCNN | Dataset | FCNN | WaveletFCNN |
---|---|---|---|---|---|
50words | 0.321 | 0.191 | Adiac | 0.143 | 0.156 |
ArrowHead | 0.12 | 0.103 | Beef | 0.25 | 0.033 |
BeetleFly | 0.05 | 0 | BirdChicken | 0.05 | 0 |
Car | 0.083 | 0.083 | CBF | 0 | 0 |
ChlorineConcentration | 0.157 | 0.159 | CinCECGtorso | 0.187 | 0.069 |
Coffee | 0 | 0 | Computers | 0.152 | 0.204 |
CricketX | 0.185 | 0.264 | CricketY | 0.208 | 0.267 |
CricketZ | 0.187 | 0.236 | DiatomSizeR | 0.07 | 0.415 |
DistalPhalanxOutlineAgeGroup | 0.165 | 0.153 | DistalPhalanxOutlineCorrect | 0.188 | 0.177 |
DistalPhalanxTW | 0.21 | 0.205 | Earthquakes | 0.199 | 0.177 |
ECG200 | 0.1 | 0.12 | ECG5000 | 0.059 | 0.05 |
ECGFiveDays | 0.015 | 0.005 | ElectricDevices | 0.277 | 0.237 |
FaceAll | 0.071 | 0.057 | FaceFour | 0.068 | 0.046 |
FacesUCR | 0.052 | 0.095 | fish | 0.029 | 0.006 |
FordA | 0.094 | 0.049 | FordB | 0.117 | 0.068 |
GunPoint | 0 | 0 | Ham | 0.238 | 0.257 |
HandOutlines | 0.224 | 0.094 | Haptics | 0.449 | 0.445 |
Herring | 0.297 | 0.297 | InlineSkate | 0.589 | 0.498 |
InsectWingbeatSound | 0.598 | 0.378 | ItalyPower | 0.03 | 0.032 |
LargeKitchenAppliances | 0.104 | 0.096 | Lighting2 | 0.197 | 0.262 |
Lighting7 | 0.137 | 0.288 | MALLAT | 0.02 | 0.031 |
Meat | 0.033 | 0.017 | MedicalImages | 0.208 | 0.234 |
MiddlePhalanxOutlineAgeGroup | 0.232 | 0.205 | MiddlePhalanxOutlineCorrect | 0.205 | 0.192 |
MiddlePhalanxTW | 0.388 | 0.368 | MoteStrain | 0.05 | 0.071 |
NonInvThorax1 | 0.039 | 0.048 | NonInvThorax2 | 0.045 | 0.048 |
OliveOil | 0.167 | 0.033 | OSULeaf | 0.012 | 0.004 |
PhalangesOutlinesCorrect | 0.174 | 0.172 | Phoneme | 0.655 | 0.689 |
Plane | 0 | 0 | ProxPhalanxOutlineAgeGroup | 0.151 | 0.102 |
ProxPhalanxOutlineCorrect | 0.1 | 0.093 | ProxPhalanxTW | 0.19 | 0.175 |
RefrigerationDevices | 0.467 | 0.389 | ScreenType | 0.333 | 0.397 |
ShapeletSim | 0.133 | 0.044 | ShapesAll | 0.102 | 0.067 |
SmallKitchenAppliances | 0.197 | 0.197 | SonyAIBORobot | 0.032 | 0.031 |
SonyAIBORobotII | 0.038 | 0.057 | StarLightCurves | 0.033 | 0.018 |
Strawberry | 0.031 | 0.015 | SwedishLeaf | 0.034 | 0.034 |
Symbols | 0.038 | 0.026 | SyntheticControl | 0.01 | 0.003 |
ToeSegmentation1 | 0.031 | 0.022 | ToeSegmentation2 | 0.085 | 0.031 |
Trace | 0 | 0 | TwoLeadECG | 0 | 0 |
TwoPatterns | 0.103 | 0.004 | UWaveGestureLibraryAll | 0.174 | 0.078 |
UWaveX | 0.246 | 0.158 | UWaveY | 0.275 | 0.251 |
UWaveZ | 0.271 | 0.23 | wafer | 0.003 | 0.001 |
Wine | 0.111 | 0.074 | WordSynonyms | 0.42 | 0.284 |
Worms | 0.331 | 0.298 | WormsTwoClass | 0.271 | 0.21 |
yoga | 0.155 | 0.118 |
In order to obtain a precise conclusion about the comparison between WaveletFCNN and the original FCNN, we conduct a proper statistical analysis. Based on the comparison of results on each dataset, we apply the classical paired t-test
(David, 1963)with the null hypothesis
that ”the expected test error from the original FCNN is not any higher than that of the WaveletFCNN” at the level. In this case, the p-value is , so that we can strongly reject the assertion. Through this analysis, we can draw a statistically significant conclusion that our WaveletFCNN model is a general improvement of the original FCNN for time series classification.To be cautious, we also want to point out one potential drawback of WaveletFCNN, that the new hyper-parameter of wavelet decomposition level will require some tuning efforts for different datasets. We agree that it will improve the current WaveletFCNN architecture if there is an automatic schema for determining the optimal wavelet decomposition level. However, on the other hand, we can also view this question as a general model selection task in neural networks, which is also an active research area. For instance, recent research shows that counter-intuitively overparameterized neural networks have advantages in generalization as well (Li and Liang, 2018). In a word, we agree that increasing the complexity of hyper-parameter selection is one temporary drawback of WaveletFCNN, and the model selection of WaveletFCNN can be an interesting future work.
In this work, the ultimate goal of designing the advanced classifier is to provide accurate abnormal monitoring in the icing detection scenario. We will evaluate how the anomaly monitoring algorithm equipped with the improved classifier accomplishes this task.
Thanks to the engineers from Goldwind Inc, one of the world’s largest wind turbine manufacturers, we have access to the real data from the wind farms that deploy Goldwind’s turbines. Three wind turbines’ monitoring data are obtained, which represents the running time of Machine 1 for 305.77 hours, Machine 2 for 695.59 hours and Machine 3 for 329.28 hours. It is worthy to mention that only the log for machine 3 is continuous while there are interruptions in the record for Machine 1 and Machine 2. With this in mind, we take of the signal from Machine 3 as the test data in the detecting phase, the other data are processed, as we will introduce later, to create the training set and validation set for the training phase. The raw data is collected from the supervisory control and data acquisition (SCADA) system which includes hundreds of dimensions. According to the engineers’ domain-specific knowledge, 28 continuous variables relevant to frozen blades are preserved as the input multivariate signals. Detailed description of these variables is also available online^{4}^{4}4https://github.com/BinhangYuan/WaveletFCNN. Further, the engineers also help us label the ranges, during which the blade icing occurs. It should be carefully noted that including these labels make our approach fundamentally different from those unsupervised change point detection or time series segmentation algorithms (Eg., (Matsubara et al., 2014; Gharghabi et al., 2018)).
As we mentioned in the previous sections, our anomaly detection framework includes two parts, an offline learning phase and an online detecting stage. During the learning process, we split the raw signals into a collection of fixed length fragments each with a binary label indicating whether the blades are frozen or not for this period of time. This collection is further split to training and validation sets for learning a WaveletFCNN classifier. In the online detection stage, we use our anomaly monitoring algorithm along with the WaveletFCNN classifier to generate accurate and robust detection of the blade icing situations. We step the two parameters in the anomaly monitoring algorithm as below: the window is set the same as the fragments’ length in the training set; the step size is set to , where L is the wavelet decomposition level of the WaveletFCNN classifier.
The training details of WaveletFCNN classifier will be described below. We will then present experimental evaluations and relevant discussions of deploying the anomaly monitoring algorithm in a simulated real time setup.
In order to train the WaveletFCNN classifier on the multivariate SCADA signals, we first partition the signal to fixed length fragments. The SCADA system fetches signals from different sensors every seven seconds. We cut the input multivariate time series into a group of fragments of length 512, where each time series fragment approximately represents the wind turbine’s status within one hour. One risk of directly applying this collection for training is that the dataset will be strongly biased according to the labeled ranges, since the turbines function properly for most of the time. In order to address this issue, we augment the number of positive samples (samples representing anomaly) by generating overlapping positive fragments and negative fragments without overlap regions. For example, suppose a wind turbine functions properly from to , and there is a malfunction due to blading icing from to , then we will cut the normal range without overlap so that 8 negative fragments each representing the state within on hour will be created, on the other hand, we can apply a step size of 10 minutes and a sliding window of one hour to move along the abnormal region and generate 7 positive fragments (Eg., to , to , and etc.). In reality, we set the step size to length of 16 (representing the signals of 112 seconds) to create a label balanced dataset. We further split the augmented collection into training and validation sets to train the WaveletFCNN classifier.
The change from univariate to multivariate sequence for WaveletFCNN is straight-forward: we just conduct the discrete wavelet transform for each signal dimension independently to construct the input layer and change the first convolutional layer’s input dimension to 26 in each sub-module, the rest of the model is untouched.
Finally, we obtain a training set with 1221 sequences and a validation set with 308 sequences. The accuracy of the WaveletFCNN classifier is , by contrast, the original FCNN classifier only achieves .
Simulation setup. In order to evaluate the performance of our anomaly monitoring algorithm, we use a labeled continues time series (unseen during the training phase) in order to simulate a real time setting. The signal is then split into small blocks of length , to keep simplicity for each small block, we use a single label indicating the state of anomaly or not. The label is determined by whether half or more than half of the block falls into the anomaly regions. As the simulation begin, we accumulate blocks into the sliding window. Once the blocks fill up the sliding window, the WaveletFCNN will do a classification on the signal within the current window. When the next block arrives, the sliding window will move one step forward, so that the earliest arrived block will be abandoned and the latest arrived block will be placed in the end of the sliding window, then the classifier will make another prediction based on the new updated time series within the sliding window. This simulation attempts to mimic the scenario, where the monitoring center fetches a signal block every 112 seconds and combines this block with the previous blocks to make a prediction, the predicting results will be cached for majority vote; the voting results will indicate whether a blade icing situation is detected. Simulation Results. We will present the simulation results by answering the below following questions. Before that, we prefer to first review a few general metrics for classification in our settings:
Accuracy is the number of correct predictions made by the model over all kinds predictions made. Since the input signal is strongly biased in this simulation, accuracy is not so informative.
Precision is a measure that tells us what proportion of positions that we diagnose as an anomaly, actually are anomalies.
Recall measures what proportion of positions that are actually anomalies is diagnosed by the algorithm as an anomaly.
F1 score is the Harmonic mean of precision and recall, which provides a general evaluation of the classifier.
How does WaveletFCNN classifier perform in this simulation?
Since this signal is strongly biased which is different from the training set, it is important to verify that WaveletFCNN can still make robust predictions. To this end, we record the prediction for each signal fragment that ever stays in the sliding window. Again, for simplicity, we use a single label indicating the state of anomaly or not of this signal fragments which is determined by whether half or more than half of the fragments falls into the labeled anomaly regions. The classification performance metrics for both WaveletFCNN and original FCNN are illustrated in Table 3. From the table, we can see that WaveletFCNN still outperforms FCNN in terms of accuracy, precision, and F1 score. We also observe that FCNN tend to produce very few positive prediction so that the recall is higher than that of FCNN. However, in reality, we cannot take this as a good property of a classifier since false negative predictions are far more dangerous than that of false positive predictions.
Measurement | WaveletFCNN | FCNN |
---|---|---|
Accuracy | 0.858 | 0.619 |
Precision | 0.215 | 0.096 |
Recall | 0.664 | 0.766 |
F1 score | 0.325 | 0.171 |
How do the sliding window and majority voting schemas help?
With the careful design of the anomaly monitoring algorithm, it is critical to find out whether this algorithm is capable of improving the accuracy and robustness of the detection. Based on this, we compare our proposed algorithm with a straw-man monitoring algorithm that use a non-overlapping slide window to process the signal, and simply take the outcome from the WaveletFCNN classifier as the monitoring results. As the results in Table 4 show, there are significant improvements of accuracy, precision and F1 score of the proposed algorithm; the recall of the proposed algorithm is slightly worse than that of the straw-man, but as we discussed, the guided practice of this term is limited.
Measurement | Straw-man | Proposed algorithm |
---|---|---|
Accuracy | 0.846 | 0.882 |
Precision | 0.181 | 0.251 |
Recall | 0.666 | 0.654 |
F1 score | 0.286 | 0.363 |
Will the flexible voting schema provide further improvement?
We investigate the relationship between the threshold and the evaluation measurement, and record the results in Table 5. Although the accuracy increases as the threshold increases, which is intuitive since the signal is biased on the negative class, the F1 score reaches its maxima when threshold . We believe tuning this variable can further enhanced the detecting performances in reality.
Threshold | Accuracy | Precision | Recall | F1 score |
---|---|---|---|---|
0.1 | 0.756 | 0.169 | 0.963 | 0.288 |
0.2 | 0.816 | 0.210 | 0.935 | 0.343 |
0.3 | 0.847 | 0.239 | 0.907 | 0.378 |
0.4 | 0.880 | 0.284 | 0.879 | 0.429 |
0.5 | 0.882 | 0.251 | 0.654 | 0.363 |
0.6 | 0.892 | 0.255 | 0.570 | 0.353 |
0.7 | 0.895 | 0.242 | 0.486 | 0.323 |
0.8 | 0.904 | 0.240 | 0.402 | 0.301 |
0.9 | 0.906 | 0.209 | 0.299 | 0.246 |
Although our algorithm has achieved promising results for both training and detecting phases, there are several reasons to be at least a bit cautious in asserting that no further improvement about the framework can be made.
First, one can imagine that labeling the monitoring data of more than 1200 hours requires huge manpower, however, deep learning models can still easily overfit a dataset of the current scale. Based on our observation, WaveletFCNN ends up with a accuracy on the training set although we already include batch normalization to prevent overfitting. One straighfowrd option to address this issue is to generate a larger dataset in the future.
Likewise, in the training phase, we include signals from 3 different machines in the training set. A more reasonable design will be training separated models for each machine since the microclimate and working status for each turbine may vary. However, due to the limitation of dataset’s scale, we do not apply this setting.
Finally, our framework can benefit from plenty of other machine learning techniques. For example, we can leverage active learning
(Settles, 2014)in order to reduce the human efforts for labeling the data. Additionally, we can first train a general classifier on signals from multiple wind turbines, and then use transfer learning
(Pan et al., 2010) to quickly update the general model based on small amount of data from a target turbine and deploy the new model to monitor this target wind turbine.We present a classification-based anomaly detection technique in order to monitor the wind turbine’s blade icing issue. In the training phase, WaveletFCNN, a fully convolutional neural network augmented by the discrete wavelet transform coefficients, is proposed in order to enhance the original FCNN’s performances. We also design a novel anomaly monitoring algorithm for the detecting phase in order to provide accurate, robust and deployable detection in a real-world scenario. Experimental results show that WaveletFCNN outperforms the original FCNN not only for this frozen blade monitoring application but also for extensive applications from the UCR time series archive. The anomaly monitoring algorithm is also verified by the simulated real time setup and shows promising results. We plan to deploy this prototyped system in real-world wind farms in the near future.
Deep feature extraction and classification of hyperspectral images based on convolutional neural networks.
IEEE Transactions on Geoscience and Remote Sensing 54, 10 (2016), 6232–6251.2018 24th International Conference on Pattern Recognition (ICPR)
. IEEE, 284–289.Engineering Applications of Artificial Intelligence
8, 1 (1995), 25–34.Learning overparameterized neural networks via stochastic gradient descent on structured data. In
Advances in Neural Information Processing Systems. 8168–8177.Time series classification with shallow learning shepard interpolation neural networks. In
International Conference on Image and Signal Processing. Springer, 329–338.