Predicting Solar Flares Using a Long Short-Term Memory Network

05/17/2019 ∙ by Hao Liu, et al. ∙ New Jersey Institute of Technology 0

We present a long short-term memory (LSTM) network for predicting whether an active region (AR) would produce a gamma-class flare within the next 24 hours. We consider three gamma classes, namely >=M5.0 class, >=M class, and >=C class, and build three LSTM models separately, each corresponding to a gamma class. Each LSTM model is used to make predictions of its corresponding gamma-class flares. The essence of our approach is to model data samples in an AR as time series and use LSTMs to capture temporal information of the data samples. Each data sample has 40 features including 25 magnetic parameters obtained from the Space-weather HMI Active Region Patches (SHARP) and related data products as well as 15 flare history parameters. We survey the flare events that occurred from 2010 May to 2018 May, using the GOES X-ray flare catalogs provided by the National Centers for Environmental Information (NCEI), and select flares with identified ARs in the NCEI flare catalogs. These flare events are used to build the labels (positive vs. negative) of the data samples. Experimental results show that (i) using only 14-22 most important features including both flare history and magnetic parameters can achieve better performance than using all the 40 features together; (ii) our LSTM network outperforms related machine learning methods in predicting the labels of the data samples. To our knowledge, this is the first time that LSTMs have been used for solar flare prediction.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Solar flares, the largest explosive events in our solar system, are intense bursts of radiation that occur in the Sun’s atmosphere and release massive amounts of energy into space. They last from minutes to hours and are often seen as bright chromospheric ribbons and hot coronal loops on the Sun. Some flares are small and innocent while others can be tremendous and violent. Powerful flares and the often accompanied coronal mass ejections (CMEs) can cause severe influences on the near-Earth environment, resulting in potentially life-threatening consequences (Daglis et al., 2004). Therefore, substantial efforts are being invested on solar flare research including forecast and mitigation plans.

The triggering mechanism of solar flares is far from being fully understood. Many studies have shown that flares and CMEs could be powered by the free magnetic energy accumulated in the coronal field, which can be impulsively released by magnetic reconnection (Priest & Forbes, 2002; Shibata & Magara, 2011). Since the buildup process of coronal free energy is driven by long-term evolution of the magnetic field on the photosphere (Takasao et al., 2015)

, the features of the photospheric magnetic field, which can be directly observed and derived from photospheric vector magnetograms, may be crucial indicators for the energy transportation and triggering processes of flares/CMEs. These features include the size and complexity of sunspots, unsigned magnetic flux, gradient of the magnetic field, magnetic energy dissipation, vertical electric currents, integrated Lorentz forces, magnetic shear, magnetic helicity injection, and so on

(Park et al., 2008; Song et al., 2009; Yu et al., 2009; Steward et al., 2010). With the recent development of instruments and techniques, it becomes easier to obtain extensive measurements of these features.

Many researchers have demonstrated that using photospheric vector magnetograms in combination with machine learning can predict solar flares effectively. Bobra & Couvidat (2015) described 25 features, or predictive parameters, derived from vector magnetograms provided by the Helioseismic and Magnetic Imager (HMI; Schou et al., 2012) on board the Solar Dynamics Observatory (SDO; Pesnell et al., 2012). The authors considered flares of M1.0 class or higher, as defined by the peak 1–8 Å flux measured by the Geostationary Operational Environmental Satellite system (GOES). Liu et al. (2017) took 13 parameters out of the 25 features and used them to perform multiclass predictions of solar flares. Nishizuka et al. (2017) employed both photospheric vector-magnetic field data and chromospheric data to predict prominent flares. The authors observed that pre-flare events such as ultraviolet brightening are associated with trigger mechanisms of solar flares. They counted the number of previous flares in an active region (AR) and showed that both the previous flare activity information and ultraviolet brightening are crucial for flare prediction. The authors later extended their study to include more features such as the X-ray intensity to further improve flare prediction performance (Nishizuka et al., 2018). Jonas et al. (2018) carried out solar flare prediction by utilizing photospheric vector-magnetic field data, flaring history, as well as multiple wavelengths of image data from the chromosphere, transition region, and corona. In contrast to the flare history used by Nishizuka et al. (2017, 2018), Jonas et al. (2018) constructed flare time series for each AR by taking the list of associated flares in the GOES solar-flare catalogs. The constructed time series are then convolved with exponentially decaying windows of varying length for flare prediction.

Machine learning is a subfield of artificial intelligence, which grants computers abilities to learn from the past data and make predictions on unseen future data

(Alpaydin, 2009)

. Commonly used machine learning methods for flare prediction include decision trees

(Yu et al., 2009, 2010)

, random forests

(Barnes et al., 2016; Liu et al., 2017; Florios et al., 2018; Breiman, 2001), k-nearest neighbors (Li et al., 2008; Huang et al., 2013; Winter & Balasubramaniam, 2015; Nishizuka et al., 2017)

, support vector machines

(Qahwaji & Colak, 2007; Yuan et al., 2010; Bobra & Couvidat, 2015; Boucheron et al., 2015; Muranushi et al., 2015; Florios et al., 2018)

, ordinal logistic regression

(Song et al., 2009), the least absolute shrinkage and selection operator (LASSO) (Benvenuto et al., 2018; Jonas et al., 2018), extremely randomized trees (Nishizuka et al., 2017)

, and neural networks

(Qahwaji & Colak, 2007; Wang et al., 2008; Colak & Qahwaji, 2009; Higgins et al., 2011; Ahmed et al., 2013). Recently, Nishizuka et al. (2018) adopted a deep neural network, named Deep Flare Net, for flare prediction.

In this paper, we attempt to use SDO/HMI vector magnetic field data together with flaring history to predict solar flares that would occur in an AR within 24 hours of a given time point, with a deep learning method, named long short-term memory (LSTM) (Hochreiter & Schmidhuber, 1997)

. An LSTM network is a special kind of recurrent neural networks (RNNs)

(Hopfield, 1982) that can learn the order dependence between samples in a sequence. LSTMs have been widely used in a variety of applications such as speech recognition (Graves & Schmidhuber, 2005; Fernández et al., 2007; Graves et al., 2013), handwriting recognition (Graves et al., 2007; Graves & Schmidhuber, 2008), time series forecasting (Schmidhuber et al., 2005) among others. In a solar flare prediction task, the observations in each AR form time series data, and hence LSTMs are suitable for this prediction task. To our knowledge, this is the first time that LSTMs are used for solar flare prediction.

The rest of this paper is organized as follows. Section describes our data collection scheme and predictive parameters used in the study presented here. Section 3 details our LSTM architecture and algorithm. Section 4 reports experimental results. Section 5 concludes the paper.

2 Data and Predictive Parameters

In this study we adopt the data product, named Space-weather HMI Active Region Patches (SHARP; Bobra et al., 2014), produced by the SDO/HMI team. These data were released at the end of 2012 (Bobra et al., 2014) and can be found as the hmi.sharp data series at the Joint Science Operations Center (JSOC).111http://jsoc.stanford.edu/ The SHARP data encompass automatically identified and tracked ARs in map patches and provide many physical parameters suitable for flare prediction. Another useful data series, produced based on SHARP data, is cgem.Lorentz

. This data series includes estimations of integrated Lorentz forces (Sun et al. 2014) which help diagnose the dynamic process of each AR

(Fisher et al., 2012).

Our deep learning method requires training samples. We surveyed flares that occurred in the period between 2010 May and 2018 May, using the GOES X-ray flare catalogs provided by the National Centers for Environmental Information (NCEI), and selected flares with identified ARs in the NCEI flare catalogs. This yielded a database of 4,203 B-class flares, 6,768 C-class flares, 704 M-class flares, and 49 X-class flares. We used both the hmi.sharp data series and cgem.Lorentz data series, which were queried from the JSOC website by using SunPy (SunPy Community et al., 2015). The data samples were collected at a cadence of 1 hour.

We adopt two groups of predictive parameters for flare prediction. The first group contains the 25 physical parameters described in Bobra & Couvidat (2015) that characterize AR magnetic field properties. The second group contains 15 features related to flaring history. Six of the 15 features are related to time decay values and are calculated based on the formula described in Jonas et al. (2018). Specifically, for the data sample observed at time point in an AR, the time decay value of with respect to B-class (C-class, M-class, X-class, respectively) flares, denoted Bdec() (Cdec(), Mdec(), Xdec(), respectively), is computed respectively as

(1)
(2)
(3)
(4)

where (, , , respectively) represents the set of B-class (C-class, M-class, X-class, respectively) flares that occurred in the same AR before the data observation time point , denotes the occurrence time of flare , and is a decay constant that is set to 12 as suggested by Jonas et al. (2018). Figure 1 illustrates how to calculate Mdec() for a data sample .

Figure 1: Calculation of the time decay value Mdec() in an AR. The white rectangular box represents the data sample observed and collected at time point in the AR. There are M-class flares that occurred in the same AR prior to time point , so contains flares , . Red vertical lines represent the occurrence times of the flares in where the th red vertical line indicates the starting time of the th M-class flare .

The other two time decay values for the data sample observed at time point in an AR are computed by considering all flares, regardless of their classes, that occurred in the same AR before the time point as follows:

(5)
(6)

where and is the magnitude of flare .

In addition, the second group contains 9 flare history features for the data sample in the AR as described in Nishizuka et al. (2017). These nine features include Bhis (Chis, Mhis, Xhis, respectively) representing the total number of B-class (C-class, M-class, X-class, respectively) flares that occurred in the same AR before the data observation time point , Bhis1d (Chis1d, Mhis1d, Xhis1d, respectively) representing the total number of B-class (C-class, M-class, X-class, respectively) flares that occurred in the same AR during the 24 hours (i.e., 1 day) prior to the time point , and Xmax1d representing the maximum X-ray intensity in the same AR during the 24 hours prior to the time point . In total, we use 40 features, including 25 physical features and 15 flare history features, which are summarized in Table 1.

Keyword Description Formula
TOTUSJH Total unsigned current helicity
TOTBSQ Total magnitude of Lorentz force
TOTPOT Total photospheric magnetic free energy density
TOTUSJZ Total unsigned vertical current
ABSNJZH Absolute value of the net current helicity
SAVNCPP Sum of the modulus of the net current per polarity
USFLUX Total unsigned flux
AREA_ACR Area of strong field pixels in the active region
MEANPOT Mean photospheric magnetic free energy
R_VALUE Sum of flux near polarity inversion line
SHRGT45 Fraction of area with shear Area with shear total area
MEANSHR Mean shear angle
MEANGAM Mean angle of field from radial
MEANGBT Mean gradient of total field
MEANGBZ Mean gradient of vertical field
MEANGBH Mean gradient of horizontal field
MEANJZH Mean current helicity
MEANJZD Mean vertical current density
MEANALP Mean characteristic twist parameter,
TOTFX Sum of x-component of Lorentz force
TOTFY Sum of y-component of Lorentz force
TOTFZ Sum of z-component of Lorentz force
EPSX Sum of x-component of normalized Lorentz force
EPSY Sum of y-component of normalized Lorentz force
EPSZ Sum of z-component of normalized Lorentz force
Bdec Time decay value based on the previous B-class flares only
Cdec Time decay value based on the previous C-class flares only
Mdec Time decay value based on the previous M-class flares only
Xdec Time decay value based on the previous X-class flares only
Edec Time decay value based on the magnitudes of all previous flares
logEdec Time decay value based on the log-magnitudes of all previous flares
Bhis Total history of B-class flares in an AR -
Chis Total history of C-class flares in an AR -
Mhis Total history of M-class flares in an AR -
Xhis Total history of X-class flares in an AR -
Bhis1d 1-day history of B-class flares in an AR -
Chis1d 1-day history of C-class flares in an AR -
Mhis1d 1-day history of M-class flares in an AR -
Xhis1d 1-day history of X-class flares in an AR -
Xmax1d Maximum X-ray intensity one day before -
Table 1: Overview of the 40 Features Including 25 SDO/HMI Magnetic Parameters and 15 Flare History Features

Because the features have different units and scales, we normalize the feature values as follows. For the 25 physical features, let denote the normalized value of the feature of the data sample. Then

(7)

where is the original value of the feature of the data sample, is the mean of the feature, and

is the standard deviation of the

feature. For the 15 flare history features, we have

(8)

where and are the maximum and minimum value of the feature, respectively.

C class M class M5.0 class
Positive Negative Positive Negative Positive Negative
Training 18,266 66,311 2,710 81,867 633 83,944
Validation 7,055 19,418 1,347 25,126 292 26,181
Testing 8,732 35,957 1,278 43,411 180 44,509
Table 2: Numbers of Positive and Negative Samples for Each Flare Class

3 Methodology

3.1 Prediction Task

Following Bobra & Couvidat (2015), Jonas et al. (2018) and Nishizuka et al. (2018), we intend to use past observations of an AR to predict its future flaring activity. Specifically, we want to solve the following binary classification problem: will this AR produce a -class flare within the next 24 hours? We consider three classes separately: M5.0 class, M class, and C class. The importance of these classes has been discussed in recent works (Jonas et al., 2018; Nishizuka et al., 2018). Both the M class and C class were studied in Nishizuka et al. (2018). Also, the M class was discussed in Liu et al. (2017) and the C class was analyzed in Jonas et al. (2018). In addition, we consider the M5.0 class due to the few X-class flares in our dataset where a M5.0-class flare means the GOES X-ray flux value of the flare is above Wm. A flare in the M5.0 class is generally considered a major flare.

As in Bobra & Couvidat (2015), observation data whose AR is outside of the center meridian or whose features are incomplete are ignored. Data samples collected in years 2010–2013 are used for training, those in year 2014 are used for validation, and those in years 2015–2018 are used for testing. The training set and testing set are disjoint, and hence our algorithm will make predictions on ARs that it has never seen before. Figure 2 illustrates how we construct positive samples and negative samples used by the proposed deep learning method. For the C class, data samples collected 24 hours prior to an X-class, M-class, or C-class flare in an AR are positive and all other data samples in the AR are negative. For the M class, data samples collected 24 hours prior to an X-class or M-class flare in an AR are positive and all other data samples in the AR are negative. For the M5.0 class, data samples collected 24 hours prior to a M5.0-class flare in an AR are positive and all other data samples in the AR are negative. Notice that, if a data sample is missing at some time point or if there are insufficient data samples during the 24 hours prior to a

-class flare, we adopt a zero-padding strategy by adding synthetic data samples with zeros for all feature values to yield a complete non-gapped time-series dataset. This zero-padding method is used after applying the normalization procedures described in Equations (

7) and (8). Therefore, the zero-padding method does not affect the normalization procedures. Table 2 shows the numbers of positive and negative samples for each flare class where there are 1,269 ARs in total. Because most ARs do not produce flares, our approach yields an imbalanced dataset in which negative samples greatly outnumber positive samples.

Figure 2: Construction of positive and negative samples used in the prediction task. Each rectangular box represents a data sample, and corresponds to 1 hour in time. The red vertical line indicates the starting time of a -class flare. Data samples collected 24 hours prior to the flare, represented by blue rectangular boxes, belong to the positive class. The other data samples, represented by green rectangular boxes, belong to the negative class. The white rectangular box, in which the flare occurs, is not included in our dataset.

For a given time point and an AR, the proposed deep learning method predicts whether the AR will produce a -class flare within the next 24 hours of . There are three classes, namely M5.0 class, M class, and C class. Therefore, we build three deep learning models separately, referred to as the M5.0 model, M model, and C model respectively, each corresponding to a class of flares.

3.2 Prediction Method

Our deep learning models employ a long short-term memory (LSTM) network. An LSTM unit contains four interactive parts including a memory cell, an input gate, an output gate and a forget gate, as illustrated in Figure 3.

Figure 3: Illustration of an LSTM unit. Here, is the forget gate, is the input gate, is the output gate, is the cell state, is the input vector to the LSTM unit, and is the output vector of the LSTM unit.

The key to LSTMs is the cell state, which is represented by the horizontal line at the top of the diagram in Figure 3. Specifically, the new cell state is updated by the old cell state and the candidate cell state as follows:

(9)

where the forget gate that controls the extent to which a value remains in the cell is calculated as:

(10)

and the input gate that controls the extent to which a new value flows into the cell is computed as:

(11)

Here represents the input vector at time step and represents the output vector at time step . The candidate cell state is computed as:

(12)

Finally, the output vector at time step , which is based on the new cell state , is computed as:

(13)

where

(14)

In the above equations, and contain weights and biases respectively, which need to be learned during training; denotes the concatenation of two vectors;

is the sigmoid function, i.e.,

; is the hyperbolic tangent function, i.e., ; denotes the Hadamard product (element-wise multiplication).

Our deep learning architecture contains an LSTM layer with LSTM units (in the study presented here, is set to 10). Motivated by the previous work in language translation (Bahdanau et al., 2014), where attention mechanism was applied to allow a model to automatically search for parts of a source sentence that are related to the prediction of a target word, we add an attention layer with neurons above the LSTM layer to focus on information in relevant time steps. The attention layer would take the states in all time steps into account and assign a weight to each state, which indicates the importance of information that state has. The weight for state is derived by comparing the target state with each source state, which is computed as:

(15)

Here, is the state at the last time step and is a content-based function. We adopt the function used by Luong et al. (2015), which is defined as

(16)

where contains learnable parameters. After is obtained, a context vector can be computed as:

(17)

The final attention vector of the input sequence is derived by concatenating the context vector and last hidden state , and then being activated by a hyperbolic tangent layer as follows:

(18)

This activation vector is then sent to two fully connected layers, the first one having 200 neurons and the second one having 500 neurons. Finally the output layer with 2 neurons, which is activated by the softmax function, produces predicted values. Figure 4 shows the overall architecture of our LSTM network.

Figure 4: Architecture of the proposed LSTM network. This network is mainly comprised of an LSTM layer, an attention layer, two fully connected layers and an output layer. Each gray box in the LSTM layer is an LSTM unit as shown in Figure 3. There are LSTM units in the LSTM layer, neurons in the attention layer, 200 neurons in the first fully connected layer, 500 neurons in the second fully connected layer, and 2 neurons in the output layer activated by the softmax function. The LSTM network takes as input the sequence and produces as output a 2-dimension vector with a value of or

which is determined by the probability calculated by the softmax function in the output layer.

Let represent the data sample collected at time point . During training, for each time point , we take consecutive data samples from the training set and use the consecutive data samples to train the LSTM network. The label of these consecutive data samples is defined to be the label of the last data sample . Thus, if belongs to the positive class, then the input sequence is defined as positive; otherwise the sequence is defined as negative. Because the data samples are collected continuously at a cadence of 1 hour and missing values are filled up by our zero-padding strategy, the input sequence spans hours.

Because the dataset at hand is imbalanced where negative samples outnumber positive samples (see Table 2), we use a weighted cross entropy cost function for optimizing model parameters during training. The cost function is computed as:

(19)

Here, is the total number of sequences each having consecutive data samples in the training set, is the number of classes, which is 2 in our case since we have only positive and negative classes, is the weight of the th class, which is derived by the ratio of the sizes of the positive and negative classes with more weight given to the minority (i.e., positive) class, and denote the observed probability (which is equal to if the th sequence belongs to the th class) and the estimated probability of being in the th class of the th sequence, respectively.222We model data samples in ARs as time series. However, we do not bind the many time series to a single one. Instead, we process ARs separately as follows. Our LSTM network accepts as input =10 data samples at a time. Assuming there are data samples on an active region , with our zero-padding strategy, we generate sequences each having 10 data samples from , and feed these sequences, one sequence at a time, to the LSTM network. Next, assuming there are data samples on another active region , we generate sequences each having 10 data samples from , and feed these sequences, one sequence at a time, to the LSTM network. Although and may have overlapping time points, we process the active regions separately, one at a time. in Equation (19) represents the total number of sequences we generate from the training set.

The proposed LSTM network is implemented in Python, TensorFlow and Keras. A mini-batch strategy

(Goodfellow et al., 2016)

is used to achieve faster convergence during backpropagation. The validation dataset is used for tuning model hyperparameters. The optimizer used is Adam

(Kingma & Ba, 2014)

, which is a method for stochastic gradient descent, where the learning rate is set to 0.001,

is set to 0.9, and

is set to 0.999. The batch size is set to 256 and the number of epochs is set to 7 . The length of each input sequence,

, is set to 10, meaning that every time 10 consecutive data samples are used as input to our LSTM network.

During testing, to predict whether an AR will produce a -class flare within the next 24 hours of a time point , we take and its preceding data samples, and then feed the consecutive testing data samples into the trained LSTM network. Here, the class refers to the M5.0 class, M class, and C class, respectively. The output of the LSTM network, i.e., the predicted result, is a 2-dimension vector with a value of or , indicating is positive (i.e., the AR will produce a -class flare within the next 24 hours of ) or is negative (i.e., the AR will not produce a -class flare within the next 24 hours of ). This value is determined by comparing the probability calculated by the softmax function with a threshold. If the probability is greater than or equal to the threshold, then is predicted to be positive; otherwise is predicted to be negative. It should be pointed out that, the way we use the consecutive testing data samples to predict whether there is a M5.0-class (M-class, C-class, respectively) flare within the next 24 hours of the time point is totally different from the previously published machine learning methods for solar flare prediction (Bobra & Couvidat, 2015; Jonas et al., 2018; Nishizuka et al., 2018), which used only the testing data sample to make the prediction.

4 Results

4.1 Performance Metrics

Given an AR and a data sample observed at time point , we define to be a true positive (TP) if the M5.0 (M, C, respectively) model predicts that is positive, i.e., the AR will produce a M5.0- (M-, C-, respectively) class flare within the next 24 hours of , and is indeed positive. We define to be a false positive (FP) if the M5.0 (M, C, respectively) model predicts that is positive while is actually negative i.e., the AR will not produce a M5.0- (M-, C-, respectively) class flare within the next 24 hours of . We say is a true negative (TN) if the M5.0 (M, C, respectively) model predicts to be negative and is indeed negative; is a false negative (FN) if the M5.0 (M, C, respectively) model predicts to be negative while is actually positive. We also use TP (FP, TN, FN, respectively) to represent the total number of true positives (false positives, true negatives, false negatives, respectively) produced by a model.

The performance metrics used in this study include the following:

(20)
(21)
(22)
(23)
(24)
(25)

ACC is not suitable for imbalanced classification (He & Garcia, 2009)

. The reason is that a naive classifier predicting all instances in the minority class to belong to the majority class would still get a high ACC value. Instead, BACC is suggested for imbalanced classification

(He & Garcia, 2009). Because of its unbiasedness over class-imbalance ratios, we also follow the suggestion of Bloomfield et al. (2012) to use the TSS score, which is the recall subtracted by the false alarm rate. The Heidke Skill Score (HSS) (Heidke, 1926) is used to measure the fractional improvement of our prediction over the random prediction (Florios et al., 2018). The larger BACC, HSS, and TSS score a method has, the better performance the method achieves.

4.2 Model Evaluation

We first conduct an ablation study to analyze the proposed LSTM framework by considering three alternative architectures, denoted by LSTM, LSTM and LSTM, respectively. LSTM (LSTM, LSTM, respectively) is obtained by removing the attention layer (the two fully connected layers, both the attention layer and the two fully connected layers, respectively) from the LSTM architecture in Figure 4. Table 3 shows the prediction results of the four architectures for the three classes, namely M5.0 class, M class, and C class, of flares. It can be seen from Table 3 that the proposed LSTM architecture outperforms the ablations LSTM, LSTM and LSTM in terms of BACC, HSS and TSS scores. By including the attention layer and the two fully connected layers, the performance of the LSTM framework improves. Our zero-padding strategy does not negatively affect the prediction performance. Although some normalized feature values of a data sample may become zeros due to the normalization procedures described in Equations (7) and (8), the attention layer of our LSTM model is able to distinguish between this data sample and those synthetic data samples added by our zero-padding method whose feature values are all zeros. The attention layer pays little attention to the synthetic data samples whose feature values are all zeros.

M5.0 class M class C class
Recall LSTM 0.944 0.888 0.743
LSTM 0.939 0.899 0.747
LSTM 0.956 0.876 0.750
LSTM 0.978 0.881 0.762
Precision LSTM 0.042 0.181 0.543
LSTM 0.039 0.184 0.537
LSTM 0.041 0.216 0.536
LSTM 0.038 0.222 0.544
ACC LSTM 0.914 0.882 0.828
LSTM 0.904 0.883 0.825
LSTM 0.910 0.906 0.824
LSTM 0.899 0.909 0.829
BACC LSTM 0.929 0.885 0.796
LSTM 0.933 0.891 0.795
LSTM 0.933 0.891 0.796
LSTM 0.938 0.895 0.803
HSS LSTM 0.074 0.267 0.519
LSTM 0.068 0.271 0.515
LSTM 0.071 0.316 0.514
LSTM 0.074 0.347 0.539
TSS LSTM 0.858 0.770 0.591
LSTM 0.865 0.782 0.591
LSTM 0.865 0.783 0.592
LSTM 0.877 0.790 0.607
Table 3: Flare Prediction Results (within 24 hours) of Four LSTM Architectures

We next compare our LSTM framework with five closely related machine learning methods including multilayer perceptrons (MLP)

(Haykin & Network, 2004; Florios et al., 2018), Jordan network (JN) (Jordan, 1997), support vector machines (SVM) (Qahwaji & Colak, 2007; Yuan et al., 2010; Bobra & Couvidat, 2015; Boucheron et al., 2015; Muranushi et al., 2015; Florios et al., 2018), random forests (RF) (Barnes et al., 2016; Liu et al., 2017; Florios et al., 2018), and a recently published deep learning-based method, Deep Flare Net (DeFN; Nishizuka et al., 2018). All these methods including ours (LSTM) can be used as a binary classification model (Nishizuka et al., 2018; Jonas et al., 2018) or a probabilistic forecasting model (Florios et al., 2018). A binary classification model predicts whether or not an AR will produce a M5.0- (M-, C-, respectively) class flare within the next 24 hours. A probabilistic forecasting model predicts the probability for an AR to produce a M5.0- (M-, C-, respectively) class flare within the next 24 hours. A probabilistic forecasting model can be converted into a binary classification model by using a probability threshold to make predictions as follows. If the predicted probability is greater than or equal to the threshold, then the AR will produce a flare within the next 24 hours; otherwise the AR will not produce a flare within the next 24 hours.

The MLP method consists of an input layer, an output layer and two hidden layers with 200 neurons and 500 neurons respectively. For the JN method, its output dimension is set to 10. SVM uses the radial basis function (RBF) kernel. RF has two parameters: mtry (number of features randomly selected to split a node) and ntree (number of trees to grow in the forest). We vary the values of ntree

{300, 500, 1,000} and mtry [2, 8], and set ntree to 500 and mtry to 3 since these two parameter values yield the maximum TSS for RF. Table 4 compares our LSTM with the five related methods. All the methods are treated as binary classification models where the probability thresholds are chosen to maximize their respective TSS values, and the same threshold is used to calculate all performance metrics including BACC, HSS and TSS for each method with respect to each flare class. It can be seen from Table 4 that LSTM and RF are the two best methods. The probability thresholds used by LSTM in Table 4 are 50%, 60% and 50% for M5.0, M and C models respectively (the same thresholds are used in Table 3). The probability thresholds used by RF in Table 4 are 0.5%, 5% and 25% for M5.0, M and C models respectively.

M5.0 class M class C class
Recall MLP 0.944 0.812 0.637
SVM 0.644 0.692 0.746
JN 0.923 0.851 0.701
DeFN 0.889 0.891 0.761
RF 1.000 0.850 0.727
LSTM 0.978 0.881 0.762
Precision MLP 0.037 0.143 0.451
SVM 0.014 0.106 0.497
JN 0.033 0.178 0.543
DeFN 0.037 0.173 0.497
RF 0.034 0.252 0.532
LSTM 0.038 0.222 0.544
ACC MLP 0.901 0.855 0.778
SVM 0.818 0.824 0.803
JN 0.882 0.884 0.826
DeFN 0.907 0.872 0.801
RF 0.886 0.924 0.822
LSTM 0.899 0.909 0.829
BACC MLP 0.922 0.834 0.725
SVM 0.732 0.760 0.781
JN 0.903 0.868 0.779
DeFN 0.898 0.881 0.786
RF 0.943 0.888 0.786
LSTM 0.938 0.895 0.803
HSS MLP 0.064 0.204 0.389
SVM 0.020 0.141 0.472
JN 0.056 0.260 0.502
DeFN 0.064 0.253 0.476
RF 0.059 0.361 0.502
LSTM 0.074 0.347 0.539
TSS MLP 0.845 0.669 0.449
SVM 0.464 0.520 0.562
JN 0.806 0.736 0.558
DeFN 0.796 0.763 0.572
RF 0.886 0.776 0.572
LSTM 0.877 0.790 0.607
Table 4: Flare Prediction Results (within 24 hours) of Our LSTM and Five Related Machine Learning Methods

To further understand the behavior of the proposed LSTM method, we conduct a cross-validation study as follows. Refer to Table 2. For each of the M5.0, M and C models, we partition its training (testing, respectively) set into 10 equal-sized folds. For every two training (testing, respectively) folds and , , fold and fold are disjoint; furthermore, fold and fold contain approximately the same number of positive training (testing, respectively) data samples and approximately the same number of negative training (testing, respectively) data samples. In run (, ), , , all training samples except those in training fold are used to train a model, and the trained model is used to make predictions on all testing samples except those in testing fold . We calculate the performance metric values based on the predictions made in run (, ). There are 100 runs. The means and standard deviations over the 100 runs are calculated and recorded.

4.3 Feature Assessment

Motivated by RF which uses only 3 features to split a node when constructing a tree, we wonder whether using fewer features can also achieve better performance for our LSTM method. We thus analyze the importance of each of the 40 features studied in the paper with respect to the M5.0, M, and C models respectively based on the LSTM architecture in Figure 4 using the cross-validation methodology described above. Each time only one feature is used by the M5.0 (M, C, respectively) model to predict whether a given AR will produce a M5.0- (M-, C-, respectively) class flare within the next 24 hours. The probability threshold is set to maximize the TSS score in each test. The corresponding mean TSS score is recorded. There are 40 features, so 40 mean individual TSS scores are recorded. These 40 mean individual TSS scores are sorted in descending order, and the 40 corresponding features are ranked from the most important to the least important accordingly. The sorted, mean individual TSS scores and ranked features are plotted in a chart for each model. Then, according to the ranked features, mean cumulative TSS scores are calculated and plotted in the chart. Specifically, the mean cumulative TSS score of the top , , most important features with respect to the M5.0 (M, C, respectively) model is equal to the mean TSS score of the M5.0 (M, C, respectively) model that uses the top most important features altogether for flare prediction.

Figure 5 (6, 7, respectively) presents the feature importance chart for the M5.0 (M, C, respectively) model. In each figure, blue bars represent mean individual TSS scores of the 40 features and the red polygonal line represents mean cumulative TSS scores of the top , , most important features. Error bars, representing standard deviations, are also plotted. It can be seen from the figures that predictive parameters that are consistently ranked in the top 20 list for all the three models include the following 16 features: TOTUSJH, TOTBSQ, TOTPOT, TOTUSJZ, ABSNJZH, SAVNCPP, USFLUX, AREA_ACR, MEANPOT, TOTFX, Cdec, Chis, Chis1d, Edec, Mhis and Xmax1d. Among these 16 features, there are 10 physical parameters, or SDO/HMI magnetic parameters, including TOTUSJH, TOTBSQ, TOTPOT, TOTUSJZ, ABSNJZH, SAVNCPP, USFLUX, AREA_ACR, MEANPOT and TOTFX. All these 10 physical parameters except TOTFX are among the 13 magnetic parameters that are also considered important in Bobra & Couvidat (2015) and Liu et al. (2017), which used different methods for assessing the importance of features. Thus, our findings are consistent with those reported in the literature. There are 4 magnetic parameters, TOTFZ, R_VALUE, SHRGT45 and EPSZ, which are considered important in Bobra & Couvidat (2015) and Liu et al. (2017), but are not ranked high in our list; some flare history features are ranked higher than these four magnetic parameters. It is worth noting that TOTUSJH plays the most important role, i.e., is ranked the top one, for all the three models. Moreover, some features including TOTUSJH, TOTUSJZ, TOTPOT, TOTBSQ, USFLUX, SAVNCPP, Cdec, Chis and Chis1d show strong predictive power and are consistently ranked in the top 10 list for all the three models.


Figure 5: Assessment of feature importance for predicting M5.0-class flares. Blue bars represent mean individual TSS scores and the red polygonal line represents mean cumulative TSS scores of the features.

Figure 6: Assessment of feature importance for predicting M-class flares. Blue bars represent mean individual TSS scores and the red polygonal line represents mean cumulative TSS scores of the features.
Figure 7: Assessment of feature importance for predicting C-class flares. Blue bars represent mean individual TSS scores and the red polygonal line represents mean cumulative TSS scores of the features.

We note that the history of C-class flares has a high impact on flare prediction. Chis and Chis1d from Nishizuka et al. (2017) count the number of previous C-class flares. Cdec from Jonas et al. (2018) is the time decay value based on previous C-class flares. These three flare history features are ranked high in our list. The history of M-class flares plays a more important role for the M5.0 and M models than for the C model. Other flare history features such as the histories of B-class and X-class flares are relatively unimportant for predicting M5.0- (M-, C-, respectively) class flares.

By carefully examining Figures 5, 6 and 7, we find that using all the 40 features together does not yield the highest mean cumulative TSS scores. In fact, using roughly the top 14-22 most important features together yields the highest mean cumulative TSS scores, achieving the best performance. Specifically, using the top 20 (22, 14, respectively) most important features yields the highest mean cumulative TSS score for the M5.0 (M, C, respectively) model. This happens probably because low ranked features are noisy features, and using them may deteriorate the performance of the models. In subsequent experiments, we use the best features for each model.

4.4 Comparison between RF and LSTM

Table 4 shows that RF and LSTM are the two best methods. In this subsection, we further compare RF and LSTM using the cross-validation methodology described above. Table 5 shows their performance metric values where standard deviations are enclosed in parentheses. The probability thresholds used by RF are set to 0.5%, 5%, 25% for M5.0, M, C class respectively where the thresholds are chosen to maximize TSS. The probability thresholds used by LSTM are set to 75%, 60%, 50% for M5.0, M, C class respectively. These optimal thresholds are slightly different from those used in Table 4. This happens because the performance metric values in Table 4 are obtained from a single dataset whereas those in Table 5 are obtained from cross validation. Furthermore, LSTM in Table 4 uses all 40 features together whereas LSTM in Table 5 uses only the (14-22) best features. According to Table 5 and a Wilcoxon signed rank test (Wilcoxon, 1947), our LSTM method is significantly better than RF (p 0.05) in all three flare classes in terms of BACC, HSS and TSS.333The source code of LSTM and dataset can be downloaded from https://web.njit.edu/~wangj/LSTMpredict/. These results indicate that LSTM outperforms RF when the methods are used as binary classification models.

M5.0 class M class C class
Recall RF 0.960 (0.027) 0.888 (0.006) 0.730 (0.002)
LSTM 0.960 (0.017) 0.885 (0.017) 0.773 (0.027)
Precision RF 0.026 (0.001) 0.179 (0.003) 0.499 (0.003)
LSTM 0.048 (0.008) 0.222 (0.023) 0.541 (0.030)
ACC RF 0.853 (0.003) 0.880 (0.002) 0.804 (0.001)
LSTM 0.921 (0.014) 0.907 (0.013) 0.826 (0.015)
BACC RF 0.906 (0.014) 0.884 (0.003) 0.776 (0.001)
LSTM 0.940 (0.007) 0.896 (0.004) 0.806 (0.004)
HSS RF 0.042 (0.002) 0.262 (0.004) 0.469 (0.003)
LSTM 0.084 (0.015) 0.323 (0.030) 0.526 (0.021)
TSS RF 0.812 (0.027) 0.768 (0.005) 0.552 (0.003)
LSTM 0.881 (0.014) 0.792 (0.008) 0.612 (0.009)
Table 5: Flare Prediction Results (within 24 hours) of Our LSTM and RF Obtained from Cross Validation

We next compare RF and LSTM using (i) skill scores profiles (SSP) of BACC, HSS and TSS as functions of the probability threshold, (ii) Receiver Operating Characteristic (ROC) curves, and (iii) Reliability Diagrams (RD). ROC curves describe the relationship between the true positive rate and false positive rate. The Area Under the Curve (AUC) in the ROC represents the degree of separability, indicating how well a model is capable of distinguishing between two classes with the ideal value of one (Marzban, 2004). The RD describes the relationship between the actual observed frequencies of flares of interest and the probabilities predicted by a model. A bin diagram, presented as an inset in the RD, is used to show the distribution of the predicted probabilities. The X-axis of the bin diagram represents the predicted probabilities and the Y-axis represents the numbers of data samples. As in Florios et al. (2018) we use 20 bins of length 0.05 each. Thus, for example, the Y value of the first bin shows the number of data samples whose predicted probabilities of having a flare of interest in the next 24 hours are within 0.05. The ideal situation is represented by the diagonal line in the RD; a point on the diagonal indicates that among those data samples whose predicted probability is , the ratio of the data samples that actually have a flare of interest within the next 24 hours is also . In addition, we use the Brier Score (BS) (Brier, 1950) and Brier Skill Score (BSS) (Wilks, 2010, 2011) to quantitatively assess the performance of probabilistic forecasting models. The values of BS range from 0 to 1 with the perfect score being 0. The values of BSS range from minus infinity to 1 with the perfect score being 1.

Figure 8 presents SSP, ROC and RD plots for RF and LSTM respectively when the methods are used in M5.0-class flare prediction. Refer to the SSP plots. For RF, the maximum TSS=0.812 is obtained with a probability threshold of 0.5%. With this threshold, BACC=0.9060.014 and HSS=0.0420.002. For LSTM, the maximum TSS=0.881 is obtained with a probability threshold of 75%. With this threshold, BACC=0.9400.007 and HSS=0.0840.015. The ROC curve of LSTM is better than that of RF. LSTM has an AUC of 0.9840.003, which is better than RF with an AUC of 0.9480.011. Refer to the RD plots. The curves of RF and LSTM are far away from the diagonal lines in the RD plots. The BS and BSS achieved by RF are 0.0040.001 and 0.0530.008 respectively. The BS and BSS achieved by LSTM are 0.0900.011 and -21.5762.956 respectively. In terms of BS and BSS, RF is better than LSTM.

Figure 8: Comparison between RF (top) and LSTM (bottom) for M5.0-class flare prediction where the corresponding SSP, ROC and RD are displayed from left to right.

Figure 9 presents SSP, ROC and RD plots for RF and LSTM respectively when the methods are used in M-class flare prediction. Refer to the SSP plots. For RF, the maximum TSS=0.768 is obtained with a probability threshold of 5%. With this threshold, BACC=0.8840.003 and HSS=0.2620.004. For LSTM, the maximum TSS=0.792 is obtained with a probability threshold of 60%. With this threshold, BACC=0.8960.004 and HSS=0.3230.030. The ROC curve of LSTM is slightly better than that of RF. LSTM has an AUC of 0.9480.003, which is better than RF with an AUC of 0.9350.002. Refer to the RD plots. The curve of RF is closer to the diagonal than the curve of LSTM. The BS and BSS achieved by RF are 0.0210.001 and 0.2600.006 respectively. The BS and BSS achieved by LSTM are 0.0900.009 and -2.2410.319 respectively. These results suggest that RF be a better probabilistic forecasting model than LSTM.

Figure 9: Comparison between RF (top) and LSTM (bottom) for M-class flare prediction where the corresponding SSP, ROC and RD are displayed from left to right.

Figure 10 presents SSP, ROC and RD plots for RF and LSTM respectively when the methods are used in C-class flare prediction. Refer to the SSP plots. For RF, the maximum TSS=0.552 is obtained with a probability threshold of 25%. With this threshold, BACC=0.7760.001 and HSS=0.4690.003. For LSTM, the maximum TSS=0.612 is obtained with a probability threshold of 50%. With this threshold, BACC=0.8060.004 and HSS=0.5260.021. The ROC curve of LSTM is better than that of RF. LSTM has an AUC of 0.8710.002, which is better than RF with an AUC of 0.8510.001. Refer to the RD plots. The curve of RF almost overlaps the diagonal. The BS and BSS achieved by RF are 0.1030.001 and 0.3440.002 respectively. The BS and BSS achieved by LSTM are 0.133 0.007 and 0.1520.047 respectively. These results indicate that RF is better than LSTM when the methods are used as probabilistic forecasting models.

Figure 10: Comparison between RF (top) and LSTM (bottom) for C-class flare prediction where the corresponding SSP, ROC and RD are displayed from left to right.

5 Discussion and Conclusions

We develop a long short-term memory (LSTM) network to predict whether an AR would produce a -class flare within the next 24 hours. We consider three classes, namely M5.0 class, M class, and C class, and build three LSTM models separately, each corresponding to a class. Each LSTM model is used to make predictions of its corresponding -class flares. We build a dataset containing the data in the period from 2010 May to 2018 May, gathered from the JSOC website. Each sample in the dataset has 40 features, including 25 magnetic parameters provided by SHARP and related data products as well as 15 flare history parameters. We divide the dataset into three subsets: the subset covering 2010–2013 for training, the subset covering 2014 for validation, and the subset covering 2015–2018 for testing. The training subset and testing subset are disjoint, and hence our proposed method will make predictions on ARs that it has never seen before. With extensive experiments, we evaluate the performance of all three LSTM models and compare them with closely related machine learning methods using different performance metrics. The main results are summarized as follows.

1. Solar data samples in an AR are considered as time series in this study. Although some researchers (Nishizuka et al., 2017; Jonas et al., 2018) utilize information concerning flaring history for solar flare forecasts, none of the previous studies model the data samples as time series and adopt LSTMs to capture dependencies in the temporal domain of the data samples. To our knowledge, this is the first attempt of using LSTMs for solar flare prediction.
2. We evaluate the importance of each of the 40 features used in this study. Our experimental results show that, among these 40 features, 10 SDO/HMI magnetic parameters including TOTUSJH, TOTBSQ, TOTPOT, TOTUSJZ, ABSNJZH, SAVNCPP, USFLUX, AREA_ACR, MEANPOT, TOTFX, and 6 flare history parameters including Cdec, Chis, Chis1d, Edec, Mhis and Xmax1d are more important than the other features for flare prediction. Our findings on the SDO/HMI magnetic parameters are mostly consistent with those reported in Bobra & Couvidat (2015). It was also observed that the history of C-class flares contributes the most to flare prediction among all the flare history parameters. Although the rankings of the features are not the same for the three LSTM models, some features such as TOTUSJH, TOTUSJZ, TOTPOT, TOTBSQ, USFLUX, SAVNCPP, Cdec, Chis and Chis1d exhibit great predictive power for all the three models. Furthermore, using only 14-22 most important features including both flare history and magnetic parameters can achieve better performance than using all the 40 features together.
3. Our LSTM-based approach achieves better performance than related machine learning methods such as multilayer perceptrons (MLP) (Haykin & Network, 2004; Florios et al., 2018), Jordan network (JN) (Jordan, 1997), support vector machines (SVM) (Qahwaji & Colak, 2007; Yuan et al., 2010; Bobra & Couvidat, 2015; Boucheron et al., 2015; Muranushi et al., 2015; Florios et al., 2018), and a recently published deep learning-based method, Deep Flare Net (DeFN; Nishizuka et al., 2018). In addition, we conduct an ablation study by considering three alternative architectures (ablations) LSTM, LSTM and LSTM. Our experimental results show that the proposed LSTM architecture achieves better performance than the three ablations, demonstrating the effectiveness of adding the attention layer and fully connected layers to LSTM units.
4. A related machine learning algorithm, namely random forests (RF) (Barnes et al., 2016; Liu et al., 2017; Florios et al., 2018), is comparable to our LSTM method. If one is interested in getting a probabilistic estimate of how likely an AR will produce a M5.0- (M-, C-, respectively) class flare within the next 24 hours, then RF would be the best choice. On the other hand, if one is interested in getting a firm answer regarding whether or not an AR will produce a M5.0- (M-, C-, respectively) class flare within the next 24 hours, then our LSTM method is significantly better than RF and is recommended.

Based on our experimental results, we conclude that the proposed LSTM-based framework is a valid method for solar flare prediction. Considering flare history parameters, besides SDO/HMI magnetic parameters, helps improve prediction performance, a finding consistent with that reported in Nishizuka et al. (2017) and Jonas et al. (2018). As solar data from various instruments are gathered at an unprecedented rate, some researchers attempt to employ additional features including solar images for flare prediction (Jonas et al., 2018). In future work, we plan to incorporate the image data into deep learning methods for predicting solar flares and other events (e.g., filament eruptions and CMEs).

We thank the referee for very helpful and thoughtful comments. We also thank the team of SDO/HMI for producing vector magnetic field data products. The X-ray flare catalogs were prepared by and made available through NOAA NCEI. The related machine learning methods studied in this work were implemented in Python. C.L. and H.W. acknowledge the support of NASA under grants NNX16AF72G, 80NSSC17K0016, 80NSSC18K0673 and 80NSSC18K1705.

References

  • Ahmed et al. (2013) Ahmed, O. W., Qahwaji, R., Colak, T., et al. 2013, Sol. Phys., 283, 157, doi: 10.1007/s11207-011-9896-1
  • Alpaydin (2009) Alpaydin, E. 2009, Introduction to machine learning (MIT press)
  • Bahdanau et al. (2014) Bahdanau, D., Cho, K., & Bengio, Y. 2014, CoRR, abs/1409.0473. https://arxiv.org/abs/1409.0473
  • Barnes et al. (2016) Barnes, G., Leka, K. D., Schrijver, C. J., et al. 2016, ApJ, 829, 89, doi: 10.3847/0004-637X/829/2/89
  • Benvenuto et al. (2018) Benvenuto, F., Piana, M., Campi, C., & Massone, A. M. 2018, ApJ, 853, 90, doi: 10.3847/1538-4357/aaa23c
  • Bloomfield et al. (2012) Bloomfield, D. S., Higgins, P. A., McAteer, R. T. J., & Gallagher, P. T. 2012, ApJ, 747, L41, doi: 10.1088/2041-8205/747/2/L41
  • Bobra & Couvidat (2015) Bobra, M. G., & Couvidat, S. 2015, ApJ, 798, 135, doi: 10.1088/0004-637X/798/2/135
  • Bobra et al. (2014) Bobra, M. G., Sun, X., Hoeksema, J. T., et al. 2014, Sol. Phys., 289, 3549, doi: 10.1007/s11207-014-0529-3
  • Boucheron et al. (2015) Boucheron, L. E., Al-Ghraibah, A., & McAteer, R. T. J. 2015, ApJ, 812, 51, doi: 10.1088/0004-637X/812/1/51
  • Breiman (2001) Breiman, L. 2001, Machine learning, 45, 5
  • Brier (1950) Brier, G. W. 1950, Monthly Weather Review, 78, 1
  • Colak & Qahwaji (2009) Colak, T., & Qahwaji, R. 2009, Space Weather, 7, S06001, doi: 10.1029/2008SW000401
  • Daglis et al. (2004) Daglis, I., Baker, D., Kappenman, J., Panasyuk, M., & Daly, E. 2004, Space Weather, 2, S02004, doi: 10.1029/2003SW000044
  • Fernández et al. (2007) Fernández, S., Graves, A., & Schmidhuber, J. 2007, in Proceedings of the 17th International Conference on Artificial Neural Networks, Part II, 220–229. https://doi.org/10.1007/978-3-540-74695-9_23
  • Fisher et al. (2012) Fisher, G. H., Bercik, D. J., Welsch, B. T., & Hudson, H. S. 2012, Sol. Phys., 277, 59, doi: 10.1007/s11207-011-9907-2
  • Florios et al. (2018) Florios, K., Kontogiannis, I., Park, S.-H., et al. 2018, Sol. Phys., 293, 28, doi: 10.1007/s11207-018-1250-4
  • Goodfellow et al. (2016) Goodfellow, I. J., Bengio, Y., & Courville, A. C. 2016, Deep Learning, Adaptive computation and machine learning (MIT Press). http://www.deeplearningbook.org/
  • Graves et al. (2007) Graves, A., Fernández, S., Liwicki, M., Bunke, H., & Schmidhuber, J. 2007, in Proceedings of the 21st Annual Conference on Neural Information Processing Systems, 577–584
  • Graves et al. (2013) Graves, A., Mohamed, A., & Hinton, G. E. 2013, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 6645–6649. https://doi.org/10.1109/ICASSP.2013.6638947
  • Graves & Schmidhuber (2005) Graves, A., & Schmidhuber, J. 2005, Neural Networks, 18, 602, doi: 10.1016/j.neunet.2005.06.042
  • Graves & Schmidhuber (2008) Graves, A., & Schmidhuber, J. 2008, in Proceedings of the 22nd Annual Conference on Neural Information Processing Systems, 545–552
  • Haykin & Network (2004) Haykin, S., & Network, N. 2004, Neural networks, 2, 41
  • He & Garcia (2009) He, H., & Garcia, E. A. 2009, IEEE Trans. Knowl. Data Eng., 21, 1263, doi: 10.1109/TKDE.2008.239
  • Heidke (1926) Heidke, P. 1926, Geografiska Annaler, 8, 301
  • Higgins et al. (2011) Higgins, P. A., Gallagher, P. T., McAteer, R. T. J., & Bloomfield, D. S. 2011, Advances in Space Research, 47, 2105, doi: 10.1016/j.asr.2010.06.024
  • Hochreiter & Schmidhuber (1997) Hochreiter, S., & Schmidhuber, J. 1997, Neural Computation, 9, 1735, doi: 10.1162/neco.1997.9.8.1735
  • Hopfield (1982) Hopfield, J. J. 1982, Proceedings of the National Academy of Sciences, 79, 2554, doi: 10.1073/pnas.79.8.2554
  • Huang et al. (2013) Huang, X., Zhang, L., Wang, H., & Li, L. 2013, A&A, 549, A127, doi: 10.1051/0004-6361/201219742
  • Jonas et al. (2018) Jonas, E., Bobra, M., Shankar, V., Todd Hoeksema, J., & Recht, B. 2018, Sol. Phys., 293, 48, doi: 10.1007/s11207-018-1258-9
  • Jordan (1997) Jordan, M. I. 1997, in Advances in psychology, Vol. 121 (Elsevier), 471–495
  • Kingma & Ba (2014) Kingma, D. P., & Ba, J. 2014, CoRR, abs/1412.6980. https://arxiv.org/abs/1412.6980
  • Li et al. (2008) Li, R., Cui, Y., He, H., & Wang, H. 2008, Advances in Space Research, 42, 1469, doi: 10.1016/j.asr.2007.12.015
  • Liu et al. (2017) Liu, C., Deng, N., Wang, J. T. L., & Wang, H. 2017, ApJ, 843, 104, doi: 10.3847/1538-4357/aa789b
  • Luong et al. (2015) Luong, M., Pham, H., & Manning, C. D. 2015, CoRR, abs/1508.04025. https://arxiv.org/abs/1508.04025
  • Marzban (2004) Marzban, C. 2004, Weather and Forecasting, 19, 1106
  • Muranushi et al. (2015) Muranushi, T., Shibayama, T., Muranushi, Y. H., et al. 2015, Space Weather, 13, 778, doi: 10.1002/2015SW001257
  • Nishizuka et al. (2018) Nishizuka, N., Sugiura, K., Kubo, Y., Den, M., & Ishii, M. 2018, ApJ, 858, 113, doi: 10.3847/1538-4357/aab9a7
  • Nishizuka et al. (2017) Nishizuka, N., Sugiura, K., Kubo, Y., et al. 2017, ApJ, 835, 156, doi: 10.3847/1538-4357/835/2/156
  • Park et al. (2008) Park, S.-H., Lee, J., Choe, G. S., et al. 2008, ApJ, 686, 1397, doi: 10.1086/591117
  • Pesnell et al. (2012) Pesnell, W. D., Thompson, B. J., & Chamberlin, P. C. 2012, Sol. Phys., 275, 3, doi: 10.1007/s11207-011-9841-3
  • Priest & Forbes (2002) Priest, E. R., & Forbes, T. G. 2002, A&A Rev., 10, 313, doi: 10.1007/s001590100013
  • Qahwaji & Colak (2007) Qahwaji, R., & Colak, T. 2007, Sol. Phys., 241, 195, doi: 10.1007/s11207-006-0272-5
  • Schmidhuber et al. (2005) Schmidhuber, J., Wierstra, D., & Gomez, F. J. 2005, in Proceedings of the 19th International Joint Conference on Artificial Intelligence, 853–858. http://ijcai.org/Proceedings/05/Papers/1452.pdf
  • Schou et al. (2012) Schou, J., Scherrer, P. H., Bush, R. I., et al. 2012, Sol. Phys., 275, 229, doi: 10.1007/s11207-011-9842-2
  • Shibata & Magara (2011) Shibata, K., & Magara, T. 2011, Living Reviews in Solar Physics, 8, 6, doi: 10.12942/lrsp-2011-6
  • Song et al. (2009) Song, H., Tan, C., Jing, J., et al. 2009, Sol. Phys., 254, 101, doi: 10.1007/s11207-008-9288-3
  • Steward et al. (2010) Steward, G., Lobzin, V., & Wilkinson, P. J. 2010, AGU Fall Meeting Abstracts, SH11B
  • Sunet al. (2014) Sun, X., & for the CGEM Team 2014, arXiv:1405.7353
  • SunPy Community et al. (2015) SunPy Community, Mumford, S. J., Christe, S., et al. 2015, Computational Science and Discovery, 8, 014009, doi: 10.1088/1749-4699/8/1/014009
  • Takasao et al. (2015) Takasao, S., Fan, Y., Cheung, M. C. M., & Shibata, K. 2015, ApJ, 813, 112, doi: 10.1088/0004-637X/813/2/112
  • Wang et al. (2008) Wang, H. N., Cui, Y. M., Li, R., Zhang, L. Y., & Han, H. 2008, Advances in Space Research, 42, 1464, doi: 10.1016/j.asr.2007.06.070
  • Wilcoxon (1947) Wilcoxon, F. 1947, Biometrics, 3, 119
  • Wilks (2010) Wilks, D. S. 2010, Quarterly Journal of the Royal Meteorological Society, 136, 2109
  • Wilks (2011) —. 2011, Statistical Methods in the Atmospheric Sciences, Volume 100 (Academic Press)
  • Winter & Balasubramaniam (2015) Winter, L. M., & Balasubramaniam, K. 2015, Space Weather, 13, 286, doi: 10.1002/2015SW001170
  • Yu et al. (2010) Yu, D., Huang, X., Hu, Q., et al. 2010, ApJ, 709, 321, doi: 10.1088/0004-637X/709/1/321
  • Yu et al. (2009) Yu, D., Huang, X., Wang, H., & Cui, Y. 2009, Sol. Phys., 255, 91, doi: 10.1007/s11207-009-9318-9
  • Yuan et al. (2010) Yuan, Y., Shih, F. Y., Jing, J., & Wang, H.-M. 2010, Research in Astronomy and Astrophysics, 10, 785, doi: 10.1088/1674-4527/10/8/008