Non-stationarity in correlation matrices for wind turbine SCADA-data and implications for failure detection

by   Henrik M. Bette, et al.
Universität Duisburg-Essen

Modern utility-scale wind turbines are equipped with a Supervisory Control And Data Acquisition (SCADA) system gathering vast amounts of operational data that can be used for failure analysis and prediction to improve operation and maintenance of turbines. We analyse high freqeuency SCADA-data from an offshore windpark and evaluate Pearson correlation matrices for a variety of observables with a moving time window. This renders possible an asessment of non-stationarity in mutual dependcies of different types of data. Drawing from our experience in other complex systems, such as financial markets and traffic, we show this by employing a hierarchichal k-means clustering algorithm on the correlation matrices. The different clusters exhibit distinct typical correlation structures to which we refer as states. Looking first at only one and later at multiple turbines, the main dependence of these states is shown to be on wind speed. In accordance, we identify them as operational states arising from different settings in the turbine control system based on the available wind speed. We model the boundary wind speeds seperating the states based on the clustering solution. This allows the usage of our methodology for failure analysis or prediction by sorting new data based on wind speed and comparing it to the respective operational state, thereby taking the non-stationarity into account.



There are no comments yet.


page 7

page 8

page 11

page 12


Uncertainty Set Prediction of Aggregated Wind Power Generation based on Bayesian LSTM and Spatio-Temporal Analysis

Aggregated stochastic characteristics of geographically distributed wind...

Integrative Probabilistic Short-term Prediction and Uncertainty Quantification of Wind Power Generation

We develop an integrative framework to predict the wind power output, co...

Data-based wind disaster climate identification algorithm and extreme wind speed prediction

An extreme wind speed estimation method that considers wind hazard clima...

Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq

A detailed understanding of wind turbine performance status classificati...

Bivariate Gaussian models for wind vectors in a distributional regression framework

A new probabilistic post-processing method for wind vectors is presented...

Damage detection in operational wind turbine blades using a new approach based on machine learning

The application of reliable structural health monitoring (SHM) technolog...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Renewable energies are indispensable to respond to the temperature rise caused by climate change. Wind turbines are one key technology paving the way towards a green and emission-less energy production. The Global Wind Energy Council (GWEC) reports a growth of 93 in wind installations for 2020, bringing the total up to 650 with an increase of the growth rate by compared to 2019. Offshore installations are on the rise, their comparatively low total increased from by more than a fifth to 35.3 in 2019 [Lee2021]. The GWEC also forecasts more than 205 new offshore wind capacity by 2030 in its 2020 Global Offshore Wind Report [Lee2020]. They point out a growing acceptance of the fact that the price of offshore wind power can out-compete fossil and nuclear fuels. The various reports agree that wind power cannot be the only clean energy used to reach a net zero in emissions by 2050, but onshore and especially offshore wind turbines are crucial to reach this goal.
While offshore locations usually provide steadier and higher wind speeds, this comes at the cost of harsh environmental conditions and increased difficulties for operation and maintenance. Reference [Maldonado-Correa2020] lists various authors claiming that Operation and Maintenance (O&M) costs account for 20-35% of the total expenditure for offshore wind farms. The corresponding numbers are lower but still significant (approx. 10-15%) for onshore turbines [Zhao2017, Tavner2012].
Not surprisingly, among the many topics to be studied in the wind turbine field [Veers2019], O&M is on focus for researchers. They follow a general trend in most industries, aiming at moving from scheduled maintenance towards condition-based maintenance to reduce costs and efforts [Li2020]. Researchers and industry alike undertake increased efforts to effectively optimize O&M procedures for wind turbines and thereby reduce this cost factor (e.g. Ref. [Dao2021, Ulmer2020, Yan2019, Martin-del-Campo2020, Chen2021]). A key to achieving this goal is the prediction of failures in wind turbines with sufficient lead time to react and carry out preemptive maintenance instead of correctional maintenance. This reduces not only the money lost in turbine downtime, but also enables cheaper maintenance. The idea is to optimize assets by replacing components exactly when needed [Colone2018]. Furthermore, similar methods could not only be used to detect component failures, but also to control problems such as, for example, yaw misalignment or under-performance [Pandit2018, Meyer2020].
One possibility for such a Condition Monitoring System (CMS) is to install additional vibrational sensors into the wind turbine. See Ref. [Stetco2019] for a review of these methods. However, the goal is to reduce costs and installing additional sensors has its own inherent costs. This has led many researchers to develop methods for failure detection and prediction based on data from Supervisory Control and Data Acquisition (SCADA) systems, which are installed in all major wind farms since commission. This SCADA data contains many variables, usually averaged over 10

intervals. The corresponding standard deviations are often included. As these are usually the only available data, most developed methods try to employ them for failure detection. The reader is referred to ref.

[Maldonado-Correa2020] and [Tautz-Weinert2017]

for reviews. Common approaches include neural networks, physical models and statistical analysis. Usually historical data are used to define a

normal state of the system for these methods. The process is commonly called training. This concept is very clearly outlined in ref. [Jin2021]. An easily understood example is modelling the power curve [Kolumban2017, Gottschall2008] and then testing for deviations [Aziz2021]. For a neural network, it might mean to predict one (or more) variables based on other measurements and then compare with the actual measurement. An explicit example for this can be found in ref. [Ulmer2020], while ref. [Helbing2018] reviews multiple neural network approaches. These authors also raise two important points: First, it is often complicated or impossible to reliably label failure events in the data due to scarcity of available log and maintenance data. Second, early detection of failures is shown to be possible with 10 SCADA data in ref. [Ulmer2020]

by applying convolutional neural networks. However, it is found that the averaging process naturally leads to a loss of information. Other researchers have tried to avoid the data availability problem by using simulated high frequency data with labelled faults for the development of their models, e.g. ref.

[Pozo2018] uses a benchmark implemented by ref. [Odgaard2013].
In ref. [Pozo2018]principal component analysis (PCA) [Denis2021] is used to reduce the dimensionality of the data by projecting onto the first three principal components. Novelty (i.e. failure) is then detected by applying Hotelling’s -statistics to compare datasets in this principal component space. PCA is a common tool for statistical analysis also applied by others to wind turbines [Yan2019, Vidal2018] and also to the aforementioned vibration data [Daga2020]

, see also the aforementioned reviews. The composition of the principal components of a time series dataset is equal to the structure of the eigenvectors one obtains from spectrally decomposing the correlation matrix of that dataset. While PCA is one major statistical tool based on correlations, it is not the only one. Correlations themselves are used on industry scale to monitor condition

[Siemens2021]. Research around failure detection often uses the Mahalanobis distance [Jin2021, RuizdelaHermosaGonzalez-Carrato2018], which also involves the correlation matrix.
This, however, poses an interesting question: The above approaches rely in one way or the other on the assumption of stationarity, but are the correlation matrices for SCADA data obtained from a wind turbine stationary in time? Generally, this assumption does not hold true for complex systems containing many different variables, mechanisms and external influences. The stability of correlations in financial stock market data was analysed, for example, in ref. [Buccheri2013]. Reference [Munnix2012]

shows that the correlation matrix of this data inhabits different states over time by using cluster analysis. The stability of these states was further analysed in ref.

[Rinn2015, Stepanov2015]. Similar studies were done for traffic data in ref. [Wang2020]. Some general correlation analysis for wind turbine data was carried out in ref. [Braun2020].
Here, we want to show that the correlations indeed change over time when looking at high frequency SCADA data from real wind turbines. To this end, we will apply cluster analysis to the correlation matrices of different SCADA signals. Furthermore, we show that one of the prime influences leading to significantly different structures in the correlation matrix stems from the turbine’s own control mechanisms.

Such a distinction into different operational states in the correlation structure of wind turbine data will help improve CMS systems relying directly on correlations (such as PCA and Mahalanobis distance). Furthermore, the establishment of multiple normal states might also be beneficial for machine learning techniques like neural networks or methods tackling under performance in general.

This paper is structured as follows: In section 2 we will introduce the dataset we work with before moving on to the theoretical background of correlation matrices and clustering in section 3. Then we will present our clustering results for a single turbine in section 4. We find proof of non-stationarity and identify the turbine control mechanism as the prime influence. Afterwards, we model the boundary wind speeds between the states in section 5. Finally, we will show that the established method works for multiple turbines without further problems and can even be improved by the increase in available data in section 6. We present our conclusions in section 7.

2 Data

Our dataset includes 100 offshore wind turbines. It contains observables that are measured at approximately 5 time intervals. To obtain aligned, synchronized data, we average the time series on 10 time intervals resulting in a data frequency . This ensures continuity in the data even when the actual measurement frequency fluctuates around . This does not hinder the calculation of correlations for our purposes. In fact, it is rather similar to any measurement process: Every sensor will in reality average over a short time span to obtain a value. Taking the mean over 10 seems therefore more natural than, for example, using only the last value from each interval. If at some time the deviation from the 5 interval becomes stronger or if there are actually missing data points, data points will be missing in our averaged data as well. This might occur if there was a problem with, for example, sensors or communications. Another reason is simply a decreased measurement frequency when the turbine is switched off. This is underlined by the majority of missing values occurring in the very low wind speed regime beneath the turn-on wind speed. As the turbines are not running during these times, this wind range is not of interest for us. These missing values do not pose a problem for our analysis. For these reasons, we decided to transfer missing values from the original dataset into the 10 data instead of replacing them by other means.
We are interested in identifying changes in the correlation structure, which emerge while the turbine is operating normally, in contrast to changes caused by failures. Therefore, we look at the following basic observables:

  • generated active power (ActivePower)

  • generated current (CurrentL1)

  • rotation per minute of the rotor (RotorRPM)

  • rotation per minute of the high speed shaft at the generator (GeneratorRPM)

  • wind speed (WindSpeed)

These observables provide a good picture of the main turbine systems. Wind speed makes the rotor move. Its rotation is transmitted via gears to the rotation of the high speed shaft at the generator. This, in turn, generates electrical current and power. While measurements of temperatures are very common and useful for failure analysis [Jin2021, Tautz-Weinert2017, Maldonado-Correa2020], it seems reasonable to assume that their behaviour in normal states is always coupled to mechanical variables, e.g. higher rotation speeds will lead to increased bearing temperatures. Excluded from the study at hand, further analysis of temperature correlations is nevertheless desirable for failure detection.
We used approximately three weeks of data from 5th to 24th March 2017. The data from such a time span are still easy enough to handle while providing enough data points to obtain reasonable clustering results. In view of possible practical applications identifying normal states for failure detection, three weeks is a short enough time span to make it easily usable. There would be no need to collect huge amounts of operational data beforehand. However, it is of course necessary that the data used for identifying different operational states covers all possible states. In practice it turns out later that this means, we need a wide range of wind speeds in our data. The actual time span was chosen, because for at least one wind turbine there are no manual or automatic alarms or services during this period (cp. section 4). Two turbines have no recorded data for this time span, effectively reducing our data set to 98 turbines.
Due to confidentiality agreements we will never show absolute values of any observable. In fact, only wind speed is shown directly and is then presented in units of the nominal wind speed at which the turbine starts to produce its nominal power output according to the manufacturer. The tilde is introduced to mark it as the rated value provided by the manufacturer as we will later on try to infer this value also from the data.
Each of the measured signals for each turbine yields a time series of data points , , , . Here, we assume that all times are given as unit free steps. In the case of our dataset we have , and - assuming complete data - . The data is arranged into rectangular data matrices


where each row is the time series of signal .

3 Theoretical background

Our analysis to distinguish different states is based on identifying differences in the correlation matrix of observables listed in section 2. In section 3.1 we define the way in which we are calculating the correlation matrices.
To identify non-stationarity in the time series of these matrices we will use a distance measure and a clustering algorithm. These are introduced in section 3.2.

3.1 Correlation matrices

To identify changes over time in the correlation structure, the correlation matrices are calculated with a moving time window of . The time intervals do not overlap. We effectively create a time series of matrices. To this end, the signal time series are divided into disjoint intervals of , i.e. the lengths of the intervals is

in our dimensionless time variable. We refer to these intervals as epochs. Hence, we have

epochs. To avoid notational confusion, we introduce the new time variable labelling the epochs. We reserve the notation for the original time series and write for the data matrix containing the different time series for turbine from to . The length of represents a compromise. Longer time spans would provide more data points per correlation matrix and would thereby decrease noise. However, we want to distinguish different states in time. Considering external conditions, e.g. wind, changing on short time scales of several minutes to hours, we have to choose relatively short epochs to ensure resolution of the non-stationarity . Such compromises are common when dealing with correlation matrix time series [Marti2021, Heckens2020a].
The first step towards calculating correlation matrices is the normalization of to zero mean and standard deviation one in each epoch. With the mean value


and the standard deviation


of the epoch , the normalized time series for signal and turbine is given by


The normalized data matrix for each epoch is then defined analogous to .
The Pearson correlation matrix for each turbine during the time is then easily calculated as


where denotes the transpose of . In the matrix each element is the Pearson correlation coefficient of the signals and for turbine during the epoch from to . The diagonal values are one by definition.

3.2 Clustering

We will now introduce the clustering, which allows us to sort the correlation matrices into groups (clusters) and check, whether different typical states do exist. If we can identify these, we will refer to them as operational states. An integer will be assigned to each of them and the algorithm will label each matrix in the time series with one such integer. Instead of a time series of correlation matrices, we then have a new integer time series with the range when is the number of clusters created.
The first outcome will then be as follows: If any decent clustering solution can be found, it is proof that typical states of the correlation structure exist. Then, analysing the resulting integer time series can much easier reveal dependencies of the state on time or other factors.
Any method separating objects into groups needs a distance measure defined between those objects. For the correlation matrices we choose the euclidean distance [Heckens2020a]

. The reader can imagine that all matrix entries are written into a vector, effectively arranging the columns of the matrix underneath each other, so that the standard euclidean distance between vectors can be applied. The distance between the correlation matrices for the epochs starting at

and of turbine is then


We choose the bisecting -means algorithm to perform our clustering [Steinbach, Tan2019]. This is the algorithm that was also used to determine states in the financial markets ref. [Munnix2012, Heckens2020a]. It can be described as a hybrid of standard -means clustering [Lloyd1982, Jain2010]

and hierarchical clustering

[Kaufman1990]. While the former directly divides the whole set of objects into groups (see appendix A), the latter is performed step-wise. In each step either two groups are merged (agglomerative) or one group is divided into two (divisive). Bisecting -means is a divisive clustering algorithm, meaning that at the start all objects belong to one big cluster and during each step one of the existing clusters is split into two new clusters. This bisection is performed by running a standard -means on all objects within the group to be split with . Which of the currently existing clusters is split during a step, is decided based on the average internal distance of all objects in a cluster


Here, denotes the number of objects in cluster and is the centroid of cluster defined by the element-wise mean:


Each step the cluster with the largest average internal distance is bisected. The algorithm is terminated when a set number of clusters is reached.
This is slightly different from the approach used by [Munnix2012, Rinn2015, Stepanov2015, Heckens2020a], where a threshold is introduced and all clusters are bisected until no single existing cluster has an average internal distance larger than the threshold. However, the threshold is then set based on the number of clusters wanted. It can easily be understood that the resulting clustering will be the same if one either uses our approach to produce clusters or chooses the threshold in such a way that clusters are produced.
Applying this algorithm, we split the set of all correlation matrices into subsets . The centroid of each cluster according to eq. (8) is interpreted as the mean correlation matrix of a cluster representing its typical correlation structure. Thereby, we only need to look at matrices and a series of integers instead of as many matrices.
Later on we will see that the emerging typical correlation matrices correspond to different control settings of the turbines. This explains in a simplified way, why the hierarchical -means works better than a normal -means in our case. Approximately, we can describe the controller of a wind turbine as a mechanism fixing certain signals to a fixed value. This means the correlation of that signal with other non-fixed signals should vanish. The divisive clustering will first extract a group where signal might be fixed. Then this group might be further divided into subgroups where signal is either fixed or not. And in theory this could go on. Such a problem is very well suited for divisive clustering.
In order to check, if our clustering is sensible, we will do two things. Firstly, we will just look at the cluster centroids and see, if we can interpret them and if they are substantially different from each other. Secondly, we will calculate silhouette coefficients [Rousseeuw1987]


with the average distance to all other matrices in the same cluster


and the smallest average distance to a single other cluster


This coefficient will take values between and with larger positive values representing matrices that are well clustered and negative values showing matrices that are closer to another cluster than to their own. To get an indicator for the overall clustering we will use the average silhouette coefficient


4 State identification via clustering

In the following, we analyse the correlation matrix time series of one turbine, which will henceforth be referred to as turbine 1 (WT1). We are singling out this wind turbine, because the time frame of the total dataset was selected in such a way that for WT1 no problems were listed in the automatic alarm logs and manual service reports. The idea is that this will make analysis and definition of normal states easier as there was no (reported) unusual behaviour.

The correlation matrices are calculated for non-overlapping epochs of 30 minutes each. This results in 960 matrices per turbine. However, due to several reasons there might be missing data in the time series. In such a case, any time stamp missing one or more of the measured variables is excluded from the calculation of the correlation matrix. Hereby, no estimation of values, which could influence the actual correlation coefficient, is necessary. More data could be used by calculating the correlation coefficients pair-wise, i.e. for any two observables just remove the time stamps where one of them is missing data. However, this does not result in a well defined positive semi-definite correlation matrix. For further analysis only those matrices, for which at least half of the expected data points (90 out of 180) exist, are considered. This is not a problem as missing measurements usually stem from turbines being operational but switched off during times of very low wind speeds

smaller than turn-on wind speed . For WT1 the average wind speed for 746 epochs with enough data is , while the average for 214 epochs where no correlation matrix could be calculated is only . It is obvious that these times where a wind turbine is not operating are unsuited for an identification of operational states. Of course, there might also be other reasons causing the missing data, e.g. a problem with the measurement of a signal. However, as for WT1 there are no alarms or services logged, we would not know what happened in those cases anyway. Any estimation of missing values would therefore need considerable guessing. As our results show that using only the epochs with enough data points is sufficient to reach a good differentiation of operational states, we are confident that just excluding missing values instead of estimating them is a good approach for our purposes.
When applying the hierarchical -means algorithm described in the previous section to the set of matrices, the first step is to decide how many clusters provide a good solution. To this end, we calculate the silhouette coefficients for solutions with 2-5 clusters. The silhouette plots can be found in fig. 1

. The fifth cluster is almost imperceptible as it consists only of 3 matrices. Some descriptive statistics for these silhouette coefficients are shown in table

1. The mean corresponds to the average silhouette coefficient from eq. (12

). All statistics provided decrease with increasing cluster number, implying that a few different states are sufficient to describe the dynamics of the analysed correlation matrices. In the plots we can see some negative coefficients implying elements that would fit better into a different cluster. Such imperfections are to be expected when using heuristics like clustering algorithms. It is however clear, that all solutions provide a good grouping with largely positive silhouette coefficients. This is a strong indication that the correlation matrix is not stationary, rather different states can be detected via clustering.

Figure 1: Silhouette plots for clustering solutions with 2-5 clusters. Each clustered element (matrix) is represented by a horizontal line the length of which is the silhouette coefficient for that element. Different clusters are colour coded.
clusters min 1st Qu. median mean 3rd Qu. max
2 -0.046 0.473 0.571 0.540 0.664 0.734
3 -0.157 0.379 0.537 0.491 0.640 0.718
4 -0.262 0.343 0.508 0.465 0.631 0.716
5 -0.352 0.314 0.479 0.439 0.626 0.707
Table 1:

Minimum, first quartile, median, mean, third quartile and maximum of silhouette coefficients for the clustering solutions with 2-5 clusters for correlation matrices of WT1.

Figure 2: Cluster centroids as calculated in eq. (8) for WT1 for different numbers of clusters. The colour indicates the value of the correlation coefficient. Black lines connect child and parent clusters of the hierarchical algorithm and the number of cluster elements is given as .

As mentioned before, we also look at the cluster centroids to see if the matrices show indeed different behaviour and if this distinction facilitates clear identification. Figure 2 shows the matrices calculated via eq. (8

) in a dendogram for the hierarchical clustering. The solutions for two and three clusters show distinctly different structures in the matrices, whereas the fourth cluster stems from cluster three, but is structurally very similar to cluster one, only differing in the strength of the mean correlation. The introduction of a fifth cluster only produces a very small cluster with only three elements. The algorithm does not identify new groups, but rather starts to classify outliers. While the average silhouette coefficient favours two clusters, we continue our analysis with three clusters as we have seen that up to this point structural differences in the matrices occur and we will later see that these can be interpreted very reasonably. Here, we also point out that structural differences in the matrix have a stronger influence on the structure of the eigenvectors, i.e. principal components, than differences in average correlation strength. They are therefore more important to distinguish when using methods like PCA and Mahalanobis distance.

The classification of the matrices for three clusters is shown as an integer time series in fig. 3. All three states appear to have a certain stability. Consecutive epochs often belong to the same cluster. However, there is no obvious behaviour in dependency of the time. State 3 appears far less often than states 1 and 2. To get a better idea what each state might represent, we look at the matrices for the cluster centroids calculated according to eq. (8) once more. They are seen in the third row of fig. 2. Generally, as the differences between the matrices are quite clear we can conclude - in accordance with the silhouette coefficient - that the clustering does indeed separate the matrix time series into meaningful groups. The correlation matrices are non-stationary.
In every cluster the strongest correlations are clearly visible between the observable pairs ActivePower-CurrentL1 and RotorRPM-GeneratorRPM. This was to be expected as these pairs are very directly linked. Apart from this we can see that for cluster 1 both of these pairs and the WindSpeed are all correlated with each other. Put differently, higher wind speed leads to faster rotation and thus to higher power. In cluster 2 this changes and the observables RotorRPM and GeneratorRPM decouple from the others. Cross-correlations between these two and any other measurement vanish. The remaining cross-correlation between WindSpeed and the two observables ActivePower and CurrentL1 also vanish in cluster 3. Clearly, the three different states of the correlation structure identified via clustering are meaningful: They show distinctly different correlation behaviour between different observables.
To interpret the meaning of the clustering solution, it is helpful to look at turbine control systems. The basic functionality of such a system can, for example, be found in [Schutt2014]. The specific functionality varies for individual turbine types, so it is likely that not all turbine types will show the same operational states. The turbine control system of the turbines analysed in this paper is one with variable pitch and variable speed. We can connect the three clusters to different operational states of this turbine type, which are seperated by boundary wind speeds and . For very low wind speeds just above the turn-on wind speed the generator rotation is kept constant at the lowest possible value defined by the maximum slip in the generator. This results in a correlation structure as seen in cluster 2. Already for slightly higher wind speeds the system controls the rpm proportional to the wind speed to operate at maximum aerodynamic efficiency of the rotor. This corresponds to cluster 1. With even more wind, but still not enough to reach nominal power output the turbine operates at fixed nominal rpm by controlling the torque. The rotational variables decouple again as for very low wind speeds and the correlation structure corresponds to cluster 2 again. Of course, fluctuations in wind speed will cause the rotation to fluctuate around the nominal value leading to some noise in the correlation structure. If wind speed is high enough to allow full power output the nominal power output is reached and therefore kept constant alongside the rpm. This results in correlations as seen in cluster 3. All boundary wind speeds are turbine dependent and usually not public knowledge.

Figure 3: Cluster identifier over time for WT1. Each dot represents a epoch.
Figure 4: Cluster identifier over wind speed for WT1. Each dot represents a epoch.

We can see that our reasoning for the turbine at hand is correct by plotting the cluster state over the mean wind speed of the epoch instead of the time stamp in fig. 4. For now disregarding the interval due to lack of enough data, cluster 1 represents low wind speeds, cluster 2 intermediate wind speeds and cluster 3 high wind speeds, where nominal power output can be reached. Some exceptions are to be expected due to either wrong sorting during the clustering or wind speeds changing during the epoch. Also, the response time of a wind turbine controller is finite in the range of seconds or minutes and therefore the correlation structure does not respond instantly to fluctuations and changes in wind speed [Behnken2020]. One example are the epochs sorted into cluster 3 whose wind speeds seem to lie in the range of cluster 1. They are all moved into the fourth cluster if we take another step in the algorithm. Its centroid is shown in the dendogram fig. 2 and exhibits a structure very close to cluster 1. Such mismatches occur due to the heuristic nature of the algorithm and noise and fluctuations in the data, which results in matrices lying on the edge between two clusters. We will see in the following section that we can use the silhouette coefficient to identify them. The dependency on wind speed is also in accordance with the stability in time as periods with stable wind speeds are common [Weber2019].
We have seen that the correlation matrix is non-stationary in time. The clustering has revealed a primary influence of the control strategies in dependence of the wind speed. Further analysis of matrices with more and different variables as well as an attempt to distinguish other influencing factors could be attempted in the future.

5 Cluster prediction by wind speed

Having established a strong influence of the control system onto the structure of the analysed correlation matrices, we will now try to predict which correlation matrix state, i.e. operational state, the turbine should be in based on the wind speed. Being able to do this is crucial to use different states for failure prediction. New (live) data can be matched to the expected normal state based on wind speed.
We look at the distribution of wind speeds in the different states, analyse them and then predict the boundary wind speeds that separate the distinct groups.

Figure 5: Probability density functions for the epoch mean wind speed per cluster. The width of bins is .
Figure 6: Probability density functions for the

epoch mean wind speed per cluster using only epochs with a silhouette coefficient above the first quartile of all silhouette coefficients. For each cluster a normal distribution was fitted. Then black vertical lines indicate the intersections of these distributions and thereby the boundary wind speeds. The width of bins is


In fig. 5 one can see the empirical probability density functions (pdf) for wind speeds per cluster state. As already expected from fig. 4, we can clearly distinguish the different regimes. However, we identify much more clearly what we are calling mismatches: epochs that are sorted into cluster 3, but have mean wind speeds in the range associated with cluster 1. They make up the left peak of the distribution for cluster 3. Furthermore we can see a small peak in the probability density function for cluster 2 lying at very low wind speeds beneath the distribution for cluster 1. These could be reasonable as the rotation of the generator shaft is kept at a minimum rotational speed needed for operation of the turbine for very low wind speeds as discussed in the previous section. However, the data in the very low wind regime is sparse and not as reliable as the turbines often move in and out of operation during these times due to shutting off below a certain minimal wind speed, therefore we will disregard this first boundary for now.
Before modelling the boundaries between the distributions, we try to compensate for mismatches due to matrices lying at the edge of two clusters, matrices being wrongly sorted, or singular outliers. This can easily be done by using the silhouette coefficient we have introduced before. It gives an indication of how good a member of a cluster fits into this cluster compared to the other clusters. This means that any epoch that has been sorted into cluster but should rather be in cluster will have a very small or negative silhouette coefficient. We can use this fact and remove from the calculation of the probability density function all epochs with a silhouette coefficient below the first quartile of all silhouette coefficients, which can be seen in table 1. The resulting probability density functions can be seen in fig. 6. The second peak at low wind speeds for cluster 3 disappears. This indicates that our reasoning of a mismatch was correct. The persistence of the small peak at very low wind speeds for cluster 2 on the other hand shows that it indeed points toward a control of the rotational speeds in this regime.
The empirical distributions are noisy due to the finite amount of data points. This is especially true for cluster 3, which contains the least epochs. However, it is very clear that every cluster is representing a wind speed regime. There are now two basic approaches to defining the boundaries between these regimes. One can simply look at the empirical data and define for each value of wind speed the maximum likelihood state based on the empirical probability density function. Secondly, one can fit a distribution to the data and calculate the intersections of these, which represent the boundaries. For now, we choose to fit distributions as it turns out that a normal distribution is a good choice for each wind speed regime (see fig. 6) and the other method could be heavily influenced by noisiness in the empirical data. Of course, this will dismiss the smaller peak of cluster 2. It should be taken into account if enough data exists in this regime (cp. section 6). For now, cluster 1 has a mean of and a standard deviation of . Clusters 2 and 3 are centred at and with standard deviations of and respectively. These values lead to boundaries at and . Interestingly, the last value shows that as calculated in our analysis is larger than . We want to point out that it is not proven that a normal distribution will always provide the best fit. For example, if much larger wind speeds exist in the data set, the distribution for cluster 3 would have much heavier tails in the large wind speed regime.
In the current case for WT1 the model fitted works very well. Compared to the clustering solution we have 9.9% of changes in group assignment. If one only looks at epochs with silhouettes above the first quartile this number reduces to 3.7%. This is clear as the epochs previously characterized as mismatches obviously change their cluster allocation when applying the model to all epochs. The mean correlation matrices of the states as sorted by the model are shown in fig. 7. They clearly exhibit the different structures discussed above produced by the control system of the turbine showing that our state prediction works.

Figure 7: Matrices corresponding to the group centroids after sorting with the epochs according to the calculated boundaries for WT1. The mean matrices were calculated for all epochs, not only those used to determine the boundaries. The colour indicates the value of the correlation coefficient.

6 Application to multiple turbines

For a single wind turbine we identified different operational states in the correlation structure and presented a model to distinguish these states based on wind speed. To be useful for failure analysis and prediction, our findings need to be general characteristics and not be specific for one turbine. We proceed and test our methodology for all turbines in the data set. We want to emphasize this does not mean applying the previously fitted model to all turbines, rather performing a cluster analysis and fitting the boundaries for each turbine separately.
An easily comparable indicator for the quality of the proposed methodology is the relative numbers of cluster allocation changes from the model compared to the clustering itself. We already discussed that for WT1 at the end of the previous section. This number will drastically increase if either of the two steps in the calculation does not work well on a turbine: If the clustering algorithm returns a solution that is not grouped by wind speed, sorting on the basis of wind speed will change the allocation of many epochs. If they are clustered according to wind speed, but the boundaries are less sharp than for WT1, it will again result in many changed allocations.
A histogram over all calculated allocation changes for the 98 turbines is shown in fig. 9. We can see that the changes for WT1 lie in the lower end of changes, as expected due to it not showing any alarms or failures in the chosen time span. This does not hold true for the other turbines, most of which exhibit a few more changes. However, there are multiple turbines which show no more changes than WT1. Also, for those that do the fitted boundaries still work remarkably well. Some allocation changes are always expected for the reasons discussed above and especially near the boundaries the distinction between two states is not always perfect. Concluding that our proposed method works well for all turbines in our dataset, we proceed with an optimization.

Figure 8: Histogram counts showing the frequency of relative changes in state allocation when comparing clustering and individual models per turbine.
Figure 9: Histogram counts showing the frequency of relative changes in state allocation per turbine when comparing the clustering solution and the maximum likelihood model based on data from all turbines.

Looking at multiple turbines it stands to reason that the probability density functions for wind speeds per cluster should be much smoother than for one turbine as we have, in our case, approximately 98 times as many data points. This is indeed so as can be seen in fig. 10

. Furthermore, we can see that the previously assumed Gaussian fit would not work well anymore. Especially the distribution shown for cluster 1 is skewed. Also, the peak in the distribution for cluster 2 beneath wind speeds of

becomes more explicit. This second point is explained in sections 4 and 5 and actually underlines our reasoning: For very low wind speeds rotation is kept constant and with more data from all turbines we can distinguish this regime more clearly than before.

At a first glance, the skewness of the probability density functions does not fit our theory so well. If the controller would work perfectly and instantly and the wind speeds were constant during each epoch, we would expect the distinctions between the operational states to be represented by rectangular functions as probability densities. In non-perfect conditions these would overlap and smooth out to be something similar to a gaussian distribution as assumed before, but they should not become skewed. However, the reason can be found in the underlying distribution of wind speeds in the inlay in fig.

10. It follows roughly the expected Weibull distribution. The deviations could be explained by combinations of influences in the environment of the wind farm, overlying of different Weibull distributions for different wind directions or maybe even measurement effects due to the sensor being placed behind the rotor. Some differences might also be introduced by the removal of low silhouette coefficients as these will appear often in the regimes of the boundaries between states where two correlation structures might be mixed during an epoch. This is of interest for future studies. For now, we can take away that the skewness of this underlying distributions might lead to the skewness in the cluster probability density functions. To check this we replace the histogram count of epochs for wind speed and cluster with the rescaled count

by dividing with the total histogram count of epochs for that wind speed and all clusters. This basically removes the effect of the underlying wind speed distribution by transforming it into an equal distribution. The resulting probability density functions for each cluster can be seen in fig. 11. Indeed, we can now see symmetric areas, showing behaviour very similar to rectangular functions for cluster 1 and 3 with cluster 2 being smoothed out to a more Gaussian curve, because its wind regime is quite narrowly bounded by overlaps with the other states. This strongly underlines the existence of three regimes corresponding to operational states of the turbines.
As the functions for all turbines are much smoother and the bin size can be reduced, we can apply the direct maximum likelihood method instead of fitting a continuous curve to decide cluster allocation based on the epoch wind speed. This leads to three instead of the previous two boundary wind speeds to account for the appearance of cluster 2 in the very low wind regime. The values of the boundaries are , and . The resulting histogram of changes due to model allocation compared to the clustering is presented in fig. 9 and shows less changes compared to fig. 9. One reason for this is the taking into account of the very low wind speed regime in cluster 2. Such a method without need for fitting could be easily transferred to other data and turbine types providing high usability for failure prediction.
Overall we conclude that the results formerly shown for WT1 are easily transferred to multiple wind turbines. Furthermore, the model to decide state allocation based on wind speed can be optimized by taking into account more turbines and thereby more data.

Figure 10: Probability density functions for the epoch mean wind speed per cluster considering all turbines. Only those epochs with a silhouette coefficient above the first quartile of all silhouette coefficients for each turbine were used (calculated seperately per turbine). The underlying wind speed distribution without cluster seperation is shown as inlay. The width of bins is .
Figure 11: Probability density functions for the epoch mean wind speed per cluster considering all turbines after dividing by the total number of counts per wind speeds to rescale the underlying distribution to an equal one. Only those epochs with a silhouette coefficient above the first quartile of all silhouette coefficients for each turbine were used (calculated seperately per turbine). The width of bins is .

7 Conclusions

Using a matrix distance measure and clustering algorithm formerly applied to other complex systems we were able to identify different operational states in -correlation matrices of high frequency wind turbine data without prior knowledge of the control system. This demonstrates the non-stationarity of the correlation matrix of wind turbines. While the states quite often exhibit stability over a certain time period, the real dependency was identified to lie with wind speed. Furthermore, it was possible to model the boundary wind speeds between the different states - again without knowledge about the actual parameters used in the control system. This allowed us to recreate the cluster allocation solely based on the average wind speeds. Being developed on one turbine, the method transferred easily onto multiple turbines. Results were even improved by an increased amount of data.
Our method enables the definition of multiple operational states for wind turbines, whereby an optimization for failure analysis and prediction could be undertaken. New or live data could be compared to the respective operational states based on wind speed after modelling the boundary wind speeds with historical data. While all techniques used for failure analysis might benefit from a more confined parameter space when using multiple normal states compared to just one normal behaviour, those techniques relying on the correlation matrix - such as principal component analysis or Mahalanobis distance - are likely to benefit most. Charmingly, the proposed ansatz does not require changes in established techniques, it just requires their application to multiple subgroups and is therefore easily implemented.
We aim to quantify the possible improvement in future studies using SCADA-data with labelled failure events. This will also allow training of the model based on all intervals without failures instead of a continuous time span thereby increasing the amount of available data per turbine. Furthermore, it will then be possible to study long-term non-stationarity indicated by ref. [Jin2021] where failure prediction works better if the training period of the model is a moving window. These are all interesting aspects to study in future works.
Alongside this, we will analyse the non-stationarity for other wind turbine types and different observables.
Our study so far has made it quite clear that the assumption of stationarity in the correlation structure of wind turbine SCADA-data is questionable and non-stationarity effects have to be considered when developing failure analysis and prediction methods.


We acknowledge fruitful conversations with Prof. Dr. J. Peinke, Dr. M. Wächter, C. Phillipp and Dr. T. Lichtenstein. This study was carried out in the project Wind farm virtual Site Assistant for O&M decision support – advanced methods for big data analysis (WiSAbigdata) funded by the Federal Ministry of Economics Affairs and Energy, Germany (BMWi). One of us (H.M.B.) thanks for financial support in this project. We are very grateful to Vattenfall AB for providing the data.

Appendix A Standard -means clustering

The standard -means algorithm sorts every object from a set of objects into subsets . Every subset is called a cluster. Generally, the optimal number has to be determined by a separate method [Jain2010, Kaufman1990].
The input for the algorithm are the objects and a distance measure as well as a method to compute the centroid of any cluster . Note, that the distance measure must also be defined for the centroids. Then the algorithm works as follows:

  1. Select objects as starting cluster centroids.

  2. Assign every object to the nearest cluster based on the distance from the object to the cluster centroid.

  3. Calculate the new cluster centroids.

  4. Repeat steps 2 and 3 until no allocation changes occur.

In this paper the objects are the correlation matrices of the subset we wish to split into two in the hierarchical approach, therefore always. The cluster centroids are calculated according to eq. (8) and the distance is calculated by eq. (6).