The Error is the Feature: how to Forecast Lightning using a Model Prediction Error

11/23/2018 ∙ by Christian Schön, et al. ∙ 8

Despite the progress throughout the last decades, weather forecasting is still a challenging and computationally expensive task. Most models which are currently operated by meteorological services around the world rely on numerical weather prediction, a system based on mathematical algorithms describing physical effects. Recent progress in artificial intelligence however demonstrates that machine learning can be successfully applied to many research fields, especially areas dealing with big data that can be used for training. Current approaches to predict thunderstorms often focus on indices describing temperature differences in the atmosphere. If these indices reach a critical threshold, the forecast system emits a thunderstorm warning. Other meteorological systems such as radar and lightning detection systems are added for a more precise prediction. This paper describes a new approach to the prediction of lightnings based on machine learning rather than complex numerical computations. The error of optical flow algorithms applied to images of meteorological satellites is interpreted as a sign for convection potentially leading to thunderstorms. These results are used as the base for the feature generation phase incorporating different convolution steps. Tree classifier models are then trained to predict lightnings within the next few hours (called nowcasting) based on these features. The evaluation section compares the predictive power of the different models and the impact of different features on the classification result.



There are no comments yet.


page 6

page 11

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Weather forecasting is still a very complex and challenging task requiring extremely complex models running on large supercomputers. Besides delivering forecasts for parameters such as the temperature, one key task for meteorological services is the detection and prediction of severe weather conditions. Thunderstorms are one such phenomenon which is often accompanied by heavy rain fall, hail, and strong wind. However, detecting them and giving precise information about their severity and direction of movement is a hard task.

Current state of the art systems operated by weather services to forecast thunderstorms usually include indices based on temperature differences between different levels of the atmosphere. Differences which exceed a certain threshold are interpreted as a sign for critical conditions potentially leading to a thunderstorm. Radar systems are used to detect frozen water particles in the atmosphere and clouds potentially developing towards thunderstorms. Lightning detection systems finally allow us to localize thunderstorms by measuring electric waves. State of the art systems such as NowCastMIX [13], the system operated by the Deutscher Wetterdienst, include all these measurements to generate warnings for severe weather conditions such as thunderstorms.

Satellite data which is nowadays part of many weather forecasting products however are not yet used to predict thunderstorms directly. Our approach therefore investigates a new way to predict lightnings based on satellite data. Our core idea is to interpret the error resulting from the application of optical flow algorithms to satellite images as a sign for convection leading to thunderstorms. The resulting features are then used to train tree based machine learning models in order to predict the occurrence of lightnings.

Our main contributions to the problem of forecasting thunderstorms are the following:

We present a new set of features based on the error of optical flow algorithms applied to satellite images in different channels which can be used to predict thunderstorms. These features differ from previously used approaches mainly based on temperature differences as presented in Section 2. Based on this feature set, we then apply different tree based machine learning algorithms in order to automatically predict the occurrence of lightnings. This process is described in more detail in Section 3.

Considering the immediate future, i.e.  the next 15 minutes, our model achieves accuracy values of more than 90%. Especially features based on convolution with large kernel sizes have shown a positive impact on the predictive power of the tree classifiers. Even for larger forecast periods of up to five hours, the accuracy remains above 70% showing promising results for applications in nowcasting. A detailed description of the results can be found in Section 5.

2 Thunderstorms

Thunderstorms belong to a class of weather phenomena called convective systems. The main characteristic of this class is an updraft of warm air from lower levels of the atmosphere towards higher and colder levels, accompanied by a downdraft allowing the cool air to flow back towards the ground. This system of up- and downdraft is called convection in meteorology. Thunderstorms essentially form a special case of convective systems which are characterized by a strong updraft of warm, moist air which freezes in the upper atmosphere leading to load inequalities and lightnings. Based on the number and size of such systems of up- and downdrafts, thunderstorms are usually subdivided into single cell, multi cell, and supercell systems. A basic understanding of convection and the emergence of thunderstorms is therefore crucial to identify potential features that can be used in machine learning algorithms. Subsection 2.1 gives a short introduction to the basic effects associated with thunderstorms, followed by Subsection 2.2 containing an overview of currently used methods to identify and predict thunderstorms.

2.1 Emergence

The emergence of thunderstorms is usually divided into three separate phases: the developing stage, the mature stage, and finally the dissipating stage [7] [16].

Developing Stage.

In the first stage of a thunderstorm emergence, warm and moist air is rising to upper levels of the atmosphere. This so called updraft can be caused by geographic reasons such as air rising at the slopes of mountains, but also by thermal reasons such as large temperature differences between the ground and the upper atmosphere, especially in summer. Given that the convection is strong enough, the warm and moist air will eventually pass the dew point where it starts to condensate, forming a so called cumulus congestus which is characterized by strong updrafts offering a fresh supply of warm, moist air. During this first phase, the condensed air might form first precipitation particles which however do not yet fall to the ground.

Mature Stage.

In the second stage of the thunderstorm emergence, the air cools down in the troposphere, one of the upper layers of the atmosphere. If the system is strong enough, the moist air will eventually pass the point where it starts to freeze and sinks back to the ground at the sides of the updraft. This leads to a so called downdraft, a strong wind formed by falling (and potentially evaporating) precipitation. The horizontal divergence at the top of the cloud leads to the typical anvil form called cumulonimbus. This process of rising and finally freezing moist air is also responsible for the emergence of lightnings. Cold, frozen particles might split into smaller, negatively charged particles and larger ones with a positive charge. The smaller the particle, the faster it will rise in the updraft and finally fall down back to ground. This potentially leads to a separation of negative charge in the downdraft and positive charge in the updraft, resulting in lightnings trying to balance this load imbalance.

Dissipating Stage.

In the last stage of a thunderstorm, the updrafts loose strength and finally stops. Due to the missing supply of warm, moist air, the system becomes unstable and eventually breaks down. Precipitation looses intensity, mainly consisting of the remaining condensed air at the top of the cloud.

2.2 Forecast

The forecast of thunderstorms is often based on indices describing the state of the atmosphere and its tendency to become a thunderstorm. The detection of strong convection eventually leading to thunderstorms is mainly based on temperature differences which are used as a sign for the potential energy of a system. Common indices computed to measure these differences include the Convective Available Potential Energy (CAPE) [17], the Lifted Index (LI) [11] or the KO Index used by the German meteorological service [2].

All these indices essentially consider the potential temperature at different levels of the atmosphere (described by their pressure). The greater these differences are, the more likely the atmosphere will become unstable and develop a convective system potentially leading to a thunderstorm.

Besides using such index based forecasting methods, state of the art systems such as NowCastMIX include additional data sources. Radar systems are used to detect precipitation and frozen water particles in the atmosphere. Specialized systems such as KONRAD [14] and CellMOS [12] try to detect thunderstorm cells and predict their movement. Lightning detection systems such as LINET [6] measure electric waves in the atmosphere to identify lightnings and their location. General weather forecasting models such as the COSMO-DE [5] model are used to identify potentially interesting areas in advance. A combination of these systems finally allows us to predict thunderstorms and follow their movement.

Although machine learning algorithms are not yet used widely in practice, there are already first attempts to apply them to weather prediction and partially also to thunderstorm forecast. Ruiz and Villa tried to distinguish convective and non-convective systems based on logistic regression and Random Forest models

[18]. The dataset used in this paper was extremely imbalanced and consisted of different features derived from satellite images, especially exact values and differences for temperatures and gradients. Similar approaches have been presented by Williams et al., trying to classify convective systems with a forecast period of one hour [20] [21]. They trained Random Forests with datasets consisting of different kinds of features, covering raw satellite image values, derived fields such as CAPE or LI and radar data.

Although these approaches use similar machine learning algorithms, they differ from our approach, especially in the feature set used for training. To the best of our knowledge, there has been no approach so far which is based on using the error of nowcasting algorithms for satellite images such as optical flow to predict thunderstorms.

3 Error-Based Approach

State of the art models which are currently operated by weather services to predict thunderstorms already include a multitude of different data sources. However, they do not yet include satellite data although these data have shown significant impact on the forecast accuracy in many other areas. Using the current, second generation of Meteosat satellites, multichannel images of Europe, Africa and the surrounding seas are available every 15 minutes which can be used for weather forecasting. Instead of directly deriving features from these images, our approach is based on the error resulting from forecasting the next image based on the previous images using optical flow algorithms.

A high-level overview of our approach is shown in Figure 1. The core idea can be formulated as follows: The movement of air within the atmosphere is a three dimensional phenomenon. Optical flow algorithms which can be used to predict future images based on past observations however can only detect and predict two dimensional movements. Using the prediction and the actual image at the same point in time, we can calculate the error value between these two. The interpretation of this error is not trivial: It might result from a bad prediction of the two dimensional movement of clouds, essentially meaning that our optical flow is not accurate enough. It might however also result from a vertical movement within the atmosphere, leading to brightness differences in the images. This vertical movement might be interpreted as a sign for potential convection leading to thunderstorms. Machine Learning algorithms can then be used to learn the relation between these error values and the occurrence of lightnings, allowing these models to predict future thunderstorms.

The following subsections present our approach in more detail, covering necessary preprocessing steps, the feature generation and the model creation.

Figure 1: Overview of our approach: In a first step, two consecutive satellite images at and are read to predict the next image using TV-L. The error computed by taking the absolute difference between and the original image is then used for lightning prediction based on tree classifier.

3.1 Data Preprocessing

The raw input used for our approach essentially consists of two data sources: binary files containing images from the Meteosat satellite for the different channels and CSV files containing a list of lightnings as detected by the LINET system. As LINET only covers lightnings detected in Europe, satellite data representing surrounding areas can be ignored. The first step in our pipeline was therefore a projection of the satellite images to the area covered by LINET. We decided to use the first nine channels of the satellite, covering a spectrum from the visible light at wavelength up to the infra red light at [19].

In the second step, the error values are computed in the following way: Two consecutive satellite images were fed into the TV-L algorithm [22] to compute the next image of this sequence. The implementation was based on the OpenCV python library [1]. As the satellite images are available every 15 minutes, we essentially take the images at and minutes to generate the image . In addition, we read the original image at and compute the absolute difference between these two images which we consider the error of our nowcasting.

The lightning data extracted from the LINET network were supplied as simple CSV files containing information about the time the lightnings occurred, their location as well as information about their charge and height. Applying the same projection as in the previous step, these lists can be transformed to maps where each pixel stores the number of lightnings occurring at this location. As the corresponding satellite images were only available in steps of 15 minutes, the lightning data were transformed into a similar shape by assigning all lightnings occurring between and to the map at time . Each such map therefore represents the lightnings occurring in the immediate future. Based on this information, we can then simply divide all pixels into two classes: the lightning class and the no-lightning class indicating the presence or absence of lightnings at this pixel.

For performance reasons, the images containing the error values as well as the maps containing the lightning information were stored as intermediate results to avoid the repetition of the costly projection operation before each training step.

Figure 2: Examples images resulting from the preprocessing steps described in this section. The right image shows the error in channel WV6.2 at 2017-06-01 13:15, the left image the lightning data for the following 15 minutes as binary mask

3.2 Feature Generation

Based on our general assumption that high error values indicate the presence of lightnings, we use the absolute error values in each of the first nine channels of the Meteosat satellite as our basic features.

As clouds and thunderstorms move over time, we also include features covering information about surrounding pixels, allowing to identify thunderstorms entering a pixel from one of its neighbours. This information was created using convolution on the images containing the error values. We decided to use kernels of size (3x3), (5x5), (7x7), and (9x9) to cover local lightning events as well as larger areas.

For each kernel size, we applied different convolution operations: Based on the assumption that high error values correlate with lightnings, we decided to use the maximum and minimum operations over the window specified by the kernel size to identify areas with extreme values. In addition, we used the average value over the different kernel sizes to cover information about the overall state within a certain area. The last convolution operation applied was a normalized Gaussian kernel which allows to weight error values near the centre of the kernel higher than values at the border, corresponding to the assumption that geographically near locations should influence the result more than locations at larger distance. The implementations were based on the following filters defined in the ndimage package of the SciPy Python library [4]: maximum_filter, minimum_filter, uniform_filter and a combination of gaussian_kernel and convolve for the last operation.

This combination of four convolution operations using four kernel sizes applied to nine channels plus the raw error values in each channel led to a total of features.

3.3 Models

The final step in our approach was the training of existing machine learning algorithms in order to predict lightnings based on the features described in the previous section. We decided to use tree based classifiers as they allow an easier understanding of how a decision was taken by the model compared to neural networks. The input for all models were the 153 features on a per pixel basis, i.e.  the models were not trained on two-dimensional images, but on single pixels and their corresponding feature values. Training was done as classification, i.e.  the target values were simply binary values indicating the presence or absence of lightnings at this pixel given a specified offset compared to the time stamp of the error values.

The models chosen for evaluation are simple Decision Trees, Random Forests and Decision Trees using either Gradient Boosting or AdaBoost. Decision Trees are simple binary trees which at each node split a given dataset into two subtrees based on the value of one of the features. The implementation used in this paper is based on the CART (Classification and Regression Trees) algorithm by Breiman et al.  

[9]. Random Forests are an extension to this model again introduced by Breiman [8] which builds an ensemble of trees where each single tree is built upon a randomly chosen subset of features and samples. The remaining two models also build ensembles based on simple Decision Trees, but try to minimize the prediction error by applying a special boosting function after each iteration. AdaBoost was first introduced by Freund and Schapire [10]

and uses the exponential loss function to measure the error of the prediction. Gradient Boosting introduced by Mason et al.  

[15] is a slightly more general approach abstracting from a concrete loss function by using gradients instead. All implementations were based on the corresponding classes in the Scikit-Learn Python library [3]: DecisionTreeClassifier, RandomForestClassifier, AdaBoostClassifier and GradientBoostingClassifier.

Most parameters of the different models were used with their default values as specified in the documentation, some have been adapted to avoid overfitting. The most important parameters are depicted in Table 1.

Parameter Decision Tree Random Forest AdaBoost Gradient Boosting
Max depth 12 12 6 7
Criterion gini-index gini-index gini-index gini-index
Min % samples per leaf 0.01% 0.01% 0.01% 0.01%

# estimators

- 200 50 100
# jobs - 16 - -
Learning rate - - 1.0 0.1
Boosting criterion - - - Friedman MSE
Table 1: Parameters of the different tree classifiers. A dash means this parameter is not present for the specific model.

4 Experimental Setup

All Experiments for this paper have been conducted on a computer equipped with an Intel Xeon E5-2600 v4 processor, 32 GB of RAM and two Nvidia GTX 1080 Ti graphics cards. The implementation was based on Python 2.7.14 using Satpy 0.9.2 and Pyproj to read and project the raw satellite data. The optical flow computation was based on OpenCV and the models on Scikit-Learn in version 0.19.1.

Our training was based on cross-validation with a size of four, meaning that the complete set of available data points was divided into four sets where three sets have been combined to form the training data and the remaining set was used to test the performance of the trained model. Our dataset approximately covers one month, from 2018-06-01 00:30 to 2018-07-04 06:30. Simply splitting this time range into training and test set at some point in time called would lead to the problem that data points taken from the last image before and data points taken from the first image after could be highly correlated as the weather conditions did not change much during these 15 minutes. To avoid such problems, the cross-validation sets were designed in a way such that the start and end points of the different sets are separated by a twelve hour margin. Taking into account all available pixels would lead to highly imbalanced sets as more than 99.9% of all pixels belong to the no-lightning class (compare Table 4). Models trained with such imbalanced data would tend to always predict the absence of lightnings, achieving an accuracy of more than 99.9% this way. To avoid this problem, we decided to balance the training and test sets in the following way: Balancing was done on a per image basis, meaning that we took all pixels with a lightning event present on an image and chose the same number of pixels without lightning event at random.

Due to various errors during the extraction of the raw, binary satellite data, not all images within this timespan of about one month could be taken into account. The resulting cross-validation sets and their sizes are shown in Table 2. Combining three of these cross-validation sets into one large training set while using the remaining set for testing purposes, we created four different combinations of training and test sets depicted in Table 3.

# Time Range Nr. of data points for each class
0 2017-06-01 00:30:00 to 2017-06-08 23:00:00 624,877
1 2017-06-09 11:00:00 to 2017-06-17 09:45:00 65,261
2 2017-06-17 21:45:00 to 2017-06-25 20:15:00 256,456
3 2017-06-26 08:15:00 to 2017-07-04 06:30:00 291,956
Table 2: Main characteristics of the cross-validation sets.
# Training Sets # Samples # Test Set # Samples
1, 2, 3 1,227,346 0 1,249,754
0, 2, 3 2,346,578 1 130,522
0, 1, 3 1,964,188 2 512,912
0, 1, 2 1,893,188 3 583,912
Table 3: Main characteristics of the training and test sets created for lightning forecast. The table shows the cross-validation sets belonging to each training or test set as well as the total number of samples.

5 Results

The main assumption of our approach is the correlation between high error values and the existence of lightnings. A simple statistical evaluation gives evidence for this assumption: Table 4 shows the distribution of the error values in channel WV6.2 for the timespan 2018-06-01 00:30 to 2018-06-04 00:00. The first column refers to all pixel on all images within this time range while the following two columns represent the two classes, namely the no-lightning and the lightning class with respect to the next 15 minutes. Comparing these values, we can state that the mean error for the lightning class is higher than the one for the no-lightning class. Simply assuming that a high error value necessarily indicates the presence of lightnings is however wrong as the overall maximum error value belongs to the no-lightning class. Nevertheless, we can still conclude that high error values more likely indicate the presence of lightnings than the absence.

The comparison of the number of samples also shows the extreme imbalance of the dataset. Only 272,757 samples out of more than 309 million belong to the lightning class, resulting in a fraction of only 0.088%. Lightnings can therefore be considered a really rare event which makes training with unbalanced, raw datasets pretty hard.

All Pixel No-Lightning Class Lightning Class
# samples 309,657,000 309,384,200 272,757
mean error 0.5461644 0.5446506 2.263237
standard deviation 0.8155396 0.8102507 2.732701
minimum error 0.0 0.0 0.0

25% quantile

0.1420135 0.1420135 0.537567
50% quantile 0.3040619 0.3036652 1.335831
75% quantile 0.6234589 0.6223297 2.884674
maximum error 30.83765 30.83765 30.035492
Table 4: Comparison of the error distribution resulting from nowcasting for the timespan 2018-06-01 00:30 to 2018-06-04 00:00 in channel WV6.2. The distribution is shown for all pixel as well as for the two individual classes with respect to lightnings within the next 15 minutes.

5.1 Predicting the Immediate Future

To check whether our assumption is correct and we can indeed train machine learning models to predict lightnings, we first conducted an experiment to forecast the immediate future: Given the error resulting from the satellite images at , and , can we predict whether there will be lightnings within the next 15 minutes, i.e.  between and ?

Table 5 shows the resulting accuracy values for the different models and test sets. As expected, the simple Decision Tree shows the worst performance in each test set, reaching values between 84% and 89%. The Gradient Boosting model performed best with accuracy results between 89% and 92%. Random Forest and AdaBoost rank in between these two. The last row of the table shows the weighted average of the different models with respect to the size of the test sets. This average underlines the above rankings, showing a strong Gradient Boosting model, a weak Decision Tree model and the Random Forest as well as AdaBoost models with similar results in between.

Considering the computational effort, we must however state that the improvement of about 2% between Random Forests and Gradient Boosting comes at the cost of a much higher training time. Considering the largest training set consisting of the cross-validation sets 0, 2 and 3, the training phase for the Random Forest which could be conducted in parallel on all CPU cores took about 14 minutes. The Gradient Boosting model however which can by nature not run in parallel took about 818 minutes which is more than 54 times the duration of the Random Forest training.

Test Set Decision Tree Random Forest AdaBoost Gradient Boosting
0 87.494% 87.661% 90.203% 90.973%
1 86.549% 87.219% 88.648% 89.090%
2 89.946% 91.724% 90.744% 92.714%
3 84.118% 90.050% 85.483% 90.500%
weighted average 87.156% 89.042% 89.120% 91.123%
Table 5: The accuracy for each model and test set, rounded to three decimals. The last row shows the weighted average based on the size of the test sets. The best and worst performance per test set are highlighted in green and red, respectively.

A high overall accuracy does however not necessarily indicate an equally good performance on both classes. We therefore also present the overall confusion matrices for each model in Figure 3, created by summing up the results on the individual test sets. The ideal result would be a dark blue diagonal in each plot, indicating very high accuracies for both classes. Especially the Gradient Boosting model at the bottom right shows a strong performance on both classes, reaching accuracies of 89% (no-lightning class) and 93% (lightning class). The Random Forest model does even slightly outperform the Gradient Boosting model for the lightning class, but at the cost of a much worse performance for those pixels without a lightning. All models except AdaBoost tend to achieve better results for the lightning class, which seems slightly easier to learn for machine learning models.

Figure 3: Comparison of the confusion matrices over all test sets: Decision Tree (top left), Random Forest (top right), AdaBoost (bottom left) and Gradient Boosting (bottom right). The first number in each square depicts the absolute number of samples, the number in parentheses denotes the corresponding fraction with respect to the true label.

The models chosen for this paper all belong to the class of tree classifier algorithms which essentially all work in a very similar way: In each step, the model chooses a feature and a corresponding threshold and performs a binary split of the set of samples into two distinct subsets assigned to the left and right subtree. The decision which feature and threshold to choose is taken according to the gini index which measures the impurity of a node where lower values indicate purer nodes. Given the relative frequencies of each class , the gini index is defined as:

This gini index can also be used to estimate the impact of a decision in the following way: Let be the gini index of some parent node in a tree with samples and a class distribution of . Let and be the gini indices of its left and right subtree with and samples and class distributions of and , respectively. Then we can define the gini importance of the parent node as follows:

The higher the importance value, the higher the discriminative power of the decision taken in the parent node. These gini importances allow us to measure the impact of each feature considering its ability to distinguish the two classes. To compute this feature importance , we simply sum up the gini importances for each node where is used discriminative feature and weight them based on the number of all samples reaching node :

For the case of Random Forests, AdaBoost and Gradient Boosting which use an ensemble of multiple Decision Trees, the results are obtained by computing them for each tree individually and then summing them up using the same weights as for the prediction process:

Figure 4 shows an example plot including the top 35 features of the Gradient Boosting model trained on the cross-validation sets 0, 2 and 3 which together form the largest training set. According to this evaluation, the maximum value within a 9x9 pixel window of channel WV6.2 is the most important feature, reaching a total gini decrease of about 0.14. The second ranked feature uses a maximum convolution with the largest kernel in channel IR3.9, however reaching just half the importance of the first feature. The third and following features keep loosing importance with gini decreases between 0.4 and 0.1. Considering the feature importances for the other three training sets yields very similar results: The maximum values for a 9x9 kernel in the channels WV6.2 and IR3.9 remain the prominent features ranking either first or second. The feature on the first rank roughly reaches twice the gini decrease of the second ranked one, followed by slightly decreasing importances for the following features.

Comparing Figure 4 to equivalent plots of the other models, we can state that Random Forests and AdaBoost produce very similar results. The Decision Tree model however relies much more on a small subset of features. The top ranked feature for the Decision Tree has a gini decrease of 0.35 to 0.6, more than twice the value of the top ranked feature for the Gradient Boosting model. The features on the following ranks show a fast drop of the gini decrease values, indicating that the model weights them much lower considering their discriminative power. Models based on multiple Decision Trees such as Random Forests, AdaBoost and Gradient Boosting obviously favour a broad range of comparably strong features over a small subset of extremely discriminative features which potentially explains their improved accuracy.

Figure 4: The top 35 features and their impact for the decisions taken by the Gradient Boosting model trained on the sets 0, 2 and 3

Table 6 shows the number of occurrences for channels, kernel sizes as well as convolution types within the top 35 features of the Gradient Boosting model for the different training sets. Considering the channels, we can clearly state that IR3.9 belongs to the most prominent channels, appearing six to eight times among the top features, followed by IR9.7 and VIS0.8. There is however no clear concentration on a specific subset of channels chosen by the model. Usually all channels appear in all training sets, except for one set where IR8.7 does not appear within the top 35 features.

Considering the different kernel sizes used as features, the model clearly favours larger kernels over smaller ones. The 1x1 column which means taking the exact error value of a single pixel shows no usage at all. The largest kernel size of 9x9 however is always used at least 20 times, i.e.  in nearly two third of all features within the top 35.

A similar observation holds true for the convolution types where the maximum convolution clearly dominates with 15 to 21 appearances within the top features. In contrast, the gaussian convolution does not appear at all, indicating that this convolution type does not include helpful information for the model to discriminate the samples.

training sets VIS0.6 VIS0.8 NIR1.6 IR3.9 WV6.2 WV7.3 IR8.7 IR9.7 IR10.8
1, 2, 3 4 5 2 6 4 2 3 5 4
0, 2, 3 4 5 2 8 4 2 2 5 3
0, 1, 3 4 5 2 8 4 4 0 5 3
0, 1, 2 4 3 2 8 4 3 1 7 3
(a) Channels
training sets 1x1 3x3 5x5 7x7 9x9
1, 2, 3 0 0 2 8 25
0, 2, 3 0 2 3 7 23
0, 1, 3 0 3 4 8 20
0, 1, 2 0 2 4 9 20
(b) Kernels
training sets Max Min Avg Gaussian
1, 2, 3 15 12 8 0
0, 2, 3 18 9 8 0
0, 1, 3 21 6 8 0
0, 1, 2 20 8 7 0
(c) Convolutions
Table 6: Number of occurrences for channels, kernel sizes as well as convolution types within the top 35 features of the Gradient Boosting model for each training set

5.2 Increasing the Forecast Period

Forecasting lightnings just for the next 15 minutes is probably the easiest task, but not necessarily the most useful as it gives little to no time to react to this forecast. We therefore decided to conduct additional experiments to evaluate if our approach does also offer the possibility to increase the forecast period up to five hours.

The basic setup remains the same, especially the data preprocessing and feature generation steps. Instead of taking the maps which include the lightnings for the next 15 minutes, we now used a specific offset to determine the maps used as target values for our machine learning models. An offset of +0:00:00 essentially just means the same as in the previous section, namely considering the lightnings for the next 15 minutes. An offset of +5:00:00 however now indicates that the model is trained to learn the presence or absence of lightnings five hours in the future, i.e.  between +5:00:00 and +5:15:00 compared to the last available satellite image.

Table 7 shows the results of the corresponding experiments for the Random Forest model. This model has been chosen as it offered the best trade-off between accuracy and training time. Considering the performance achieved, we must state that the model looses about 3-6% of accuracy within the first hour, depending on the test set. This loss of accuracy continues with increasing offset, which is however explainable by the fact that the weather is a very chaotic phenomenon where accurate forecasts become more and more difficult with increasing forecast period. But even with an offset of five hours, the Random Forest model still achieves an average accuracy of more than 70%. This is still notably better than the 50% one would expect for a balanced test set if the model would simply guess the results.

Test Set +0:00:00 +1:00:00 +2:00:00 +3:00:00 +4:00:00 +5:00:00
0 87.661% 83.680% 79.304% 76.365% 73.417% 71.409%
1 87.219% 82.788% 79.813% 76.368% 73.491% 73.827%
2 91.724% 85.334% 82.902% 78.328% 73.650% 70.993%
3 90.050% 87.471% 82.191% 76.999% 74.631% 70.893%
weighted average 89.042% 84.607% 80.285% 76.672% 73.682% 71.391%
Table 7: Comparison of the forecast accuracy of the Random Forest model on all test sets for different offsets from 0 hours to 5 hours.

Comparing the feature importances of the Random Forest model without offset and with the maximum offset of 5:00:00, we see a clear difference. While the features based on the water vapor channels WV6.2 and WV7.3 clearly dominate for the prediction of the immediate future in (a), the picture changes if the forecast period is extended to five hours, now showing a clear favour for the higher infra red channels IR8.7 to IR10.8 in (b). The dominance of large kernel sizes as well as the preference for the maximum convolution operation remain. In contrast to the Gradient Boosting model, the Random Forest also uses features based on convolution with a Gaussian kernel, however only to a very limited amount.

(a) Offset 0:00:00
(b) Offset 5:00:00
Figure 5: The top 35 features and their impact for the decisions taken by the Random Forest model trained on the sets 0, 2 and 3

6 Conclusion

The results of our approach seem very promising. Using just error values resulting from the nowcast of satellite images based on optical flow, different tree classifiers could be trained to predict lightnings in the immediate future with an accuracy of more than 90%. A comparison of the per class performance shows that the models correctly predict both, pixels with a lightning event as well as pixels without. Features using convolution with larger kernel sizes show the greatest impact on the accuracy of the models. The results of our models are based on a broad range of channels which are all present within the top 35 features. However, the ranking of these channels changes depending on the forecast period. While water vapour channels were the most prominent feature for the immediate future, an increase of the forecast period led to an increased importance of channels with a larger wavelength. Even for the largest forecast period of five hours considered in this paper, the accuracy of the models still remains above 70% which is clearly better than the 50% which one would expect from random guessing.

7 Future Work

The approach presented in this paper shows promising, first results. However there is still room for further improvement. Our approach currently only relies on tree classifiers and a limited set of features consisting of the raw nowcasting error and different forms of convolution. Neural networks have shown impressive results in different applications throughout the last years, making them an interesting alternative to the tree classifiers in our approach.

Considering the features used, we see different ways to potentially further improve the results: The evaluation of the feature importances showed a clear preference for large kernel sizes, which however are currently limited to a 9x9 kernel in our approach. Further increasing this size could potentially increase the accuracy, especially for larger forecast periods as they depend more on cloud movement as local events in the immediate future. Adding additional data sources such as the already existing indices CAPE or LI might also have a positive impact on the accuracy of the system. To avoid an exploding number of features and an extreme computational effort resulting from it, the feature importances could also be used to rank the features according to their influence, allowing to choose the most important subset of features without loosing too much accuracy. Especially the boosted models such as AdaBoost and Gradient Boosting which do not profit much from modern multi-core processors would benefit from this dimensionality reduction, potentially allowing the generation of deeper trees without additional computation time.


We would like to thank Deutscher Wetterdienst and nowcast GmbH for providing us the necessary data for our research. We would like to thank also Stephane Haussler from DWD for providing us the Python scripts to read the binary satellite data.


  • [1] OpenCV. Accessed: 2018-11-21.
  • [2] Potential instability and KO-index. Accessed: 2018-11-21.
  • [3] Scikit-learn library. Accessed: 2018-11-21.
  • [4] Scipy ndimage package. Accessed: 2018-11-21.
  • [5] M. Baldauf, C. Gebhardt, S. Theis, B. Ritter, and C. Schraf. Beschreibung des operationellen Kürzestfristvorhersagemodells COSMO-D2 und COSMO-D2-EPS und seiner Ausgabe in die Datenbanken des DWD. Deutscher Wetterdienst (DWD), 2018.
  • [6] H. D. Betz, K. Schmidt, P. Laroche, P. Blanchet, W. P. Oettinger, E. Defer, Z. Dziewit, and J. Konarski. Linet—an international lightning detection network in europe. Atmospheric Research, 91(2):564 – 573, 2009. 13th International Conference on Atmospheric Electricity.
  • [7] A. Bott. Synoptische Meteorologie: Methoden der Wetteranalyse und-prognose. Springer-Verlag, 2 edition, 2016.
  • [8] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
  • [9] L. Breiman, J. Friedman, C. Stone, and R. Olshen. Classification and Regression Trees. The Wadsworth and Brooks-Cole statistics-probability series. Taylor & Francis, 1984.
  • [10] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
  • [11] J. G. Galway. The lifted index as a predictor of latent instability. Bulletin of the American Meteorological Society, 37(10):528–529, 1956.
  • [12] J. Hoffmann. Entwicklung und Anwendung von statistischen Vorhersage- Interpretationsverfahren für Gewitternowcasting und Unwetterwarnungen unter Einbeziehung von Fernerkundungsdaten. Ph.D. dissertation, Freie Universität Berlin, 2008.
  • [13] P. M. James, B. K. Reichert, and D. Heizenreder. NowCastMIX: Automatic integrated warnings for severe convection on nowcasting time scales at the german weather service. Weather and Forecasting, 33(5):1413–1433, 2018.
  • [14] P. Lang. Cell tracking and warning indicators derived from operational radar products. In Proceedings of the 30th International Conference on Radar Meteorology, Munich, Germany, pages 245–247, 2001.
  • [15] L. Mason, J. Baxter, P. L. Bartlett, and M. R. Frean. Boosting algorithms as gradient descent. In Advances in neural information processing systems, pages 512–518, 2000.
  • [16] H. Mogil. Extreme weather : understanding the science of hurricanes, tornadoes, floods, heat waves, snow storms, global warming and other atmospheric disturbances. Black Dog & Leventhal Publishers, New York, NY, 2007.
  • [17] M. Moncrieff and M. Miller. The dynamics and simulation of tropical cumulonimbus and squall lines. Quarterly Journal of the Royal Meteorological Society, 102(432):373–394, 1976.
  • [18] A. Ruiz and N. Villa. Storms prediction: Logistic regression vs random forest for unbalanced data. arXiv preprint arXiv:0804.0650, 2008.
  • [19] J. Schmetz, P. Pili, S. Tjemkes, D. Just, J. Kerkmann, S. Rota, and A. Ratier. An introduction to meteosat second generation (MSG). Bulletin of the American Meteorological Society, 83(7):977–992, 2002.
  • [20] J. K. Williams, D. Ahijevych, S. Dettling, and M. Steiner. Combining observations and model data for short-term storm forecasting. In Remote sensing applications for aviation weather hazard detection and decision support, volume 7088, page 708805. International Society for Optics and Photonics, 2008.
  • [21] J. K. Williams, D. Ahijevych, C. Kessinger, T. Saxen, M. Steiner, and S. Dettling. A machine learning approach to finding weather regimes and skillful predictor combinations for short-term storm forecasting. In AMS 6th Conference on Artificial Intelligence Applications to Environmental Science and 13th Conference on Aviation, Range and Aerospace Meteorology, 2008.
  • [22] C. Zach, T. Pock, and H. Bischof. A duality based approach for realtime TV-L 1 optical flow. In

    Joint Pattern Recognition Symposium

    , pages 214–223. Springer, 2007.