## 1 Introduction

The weather-related forecast is one of the difficult problems to solve due to the complex interplay between various cause factors. Accurate tropical cyclone intensity prediction is one such problem that has huge importance due to its vast social and economic impact. Cyclones are one of the devastating natural phenomena that frequently occur in tropical regions. Being a tropical region, Indian coastal regions are frequently affected by tropical cyclones [chaudhuri2012appraisal] that originate into the Arabian Sea (AS) and Bay of Bengal (BOB), which are parts of the North Indian Ocean (NIO). With the increasing frequency of cyclones in NIO [frank1999effects], it becomes more crucial to develop a model that can forecast the intensity of a cyclone for a longer period of time by observing the cyclone only for a small period of time. Various statistical and numerical methods have been developed to predict the intensity of cyclones [jarvinen1979statistical, demaria1999updated, baik1998tropical, chaudhuri2009severity, dvorak1984tropical] but all these methods lack effectiveness in terms of accuracy and computation time.

India Meteorological Department (IMD) keep track of tropical cyclones which originates in the North Indian Ocean between E and E. Typically, the intensity of the tropical cyclone is stated in terms of "Grade", which is directly derived from different ranges of the Maximum Sustained Surface Wind Speed (MSWS), see Table 1 [imd]. Therefore, MSWS being a continuous variable is a better choice for the intensity prediction [chaudhuri2017swarm]

, which we adopt for this work. We have used latitude, longitude, MSWS, estimated central pressure, distance, direction, and sea surface temperature as input features.

Grade | Low pressure system | MSWS (in knots) |

0 | Low Pressure Area (LP) | 17 |

1 | Depression (D) | 17-27 |

2 | Deep Depression (DD) | 28-33 |

3 | Cyclonic Storm (CS) | 34-47 |

4 | Severe Cyclonic Storm (SCS) | 48-63 |

5 | Very Severe Cyclonic Storm (VSCS) | 64-89 |

6 | Extremely Severe Cyclonic Storm (ESCS) | 90-119 |

7 | Super Cyclonic Storm (SS) | 120 |

Recently, Artificial neural Networks (ANNs) have been successful in capturing the complex non-linear relationship between input and output variables

[APHY2012, ASSA2012, KRC2009]. ANNs are also explored recently to predict the cyclone intensity [chaudhuri2015track, chaudhuri2017swarm, roy2012tropical, mohapatra2013evaluation]. These studies use the various recordings of weather conditions for a particular time point and predict the intensity of cyclones at a particular future time point. However, this does not allow us to fully utilise the time series data available. Instead, we use the Long Term Short Memory (LSTM) network to forecast the intensity of tropical cyclones in NIO. By using LSTM, we are able to use the different weather characteristics for a certain number of continuous-time points to forecast the cyclone intensity for an immediately succeeding a certain number of time points.In related works [chaudhuri2015track, chaudhuri2017swarm, mohapatra2013evaluation] for tropical cyclones in NIO, the MSWS has been predicted for a particular future time point. Although it is important to know the intensity at a particular time point, the change in intensity as the tropical cyclone progresses is equally important to know. As mentioned earlier, using ANNs, this continuous variation can not be captured effectively. Our work is unique in the sense that we have used an LSTM based model for the first time to predict the cyclone intensity for multiple successive future time points and report the combined accuracy for these time points. The reported combined accuracy outperforms the accuracy reported at a single point in other works. Our model works consistently well, even for a large number of future time points, and increases gradually with the number of future time points. In Section II, we present a brief description of our model, and in Section III, the used dataset is described. Section IV presents results and their analysis, and Section V includes the conclusion and future directions.

## 2 Methodology

### 2.1 Artificial Neural Networks

An artificial neural network (ANN) is a connected network inspired by the human brain’s biological neurons. ANN has an input layer, an output layer, and multiple hidden layers. Each hidden layer contains several artificial neurons at which incoming information is first linearly combined using weights and bias and then acted upon by an activation function. Mathematically, for input matrix

, the th neuron in the th layer can be written as(1) |

where and

are weight matrix and bias vector of the corresponding neuron, respectively, and

is the activation function. The weights and bias at each neuron are updated using the gradient descent algorithm to make the final loss as small as possible. An example of a fully connected ANN with two hidden layers is given in the Figure 1. A type of ANN that can handle time series or sequential data is Recurrent Neural Network, which we will discuss next.

### 2.2 Recurrent Neural Network

Recurrent Neural Networks (RNNs) can take a sequence of inputs and produce a sequence of outputs and the outputs are influenced not just by weights applied on inputs like a regular ANN, but also by a hidden state vector representing the learned information based on prior inputs and outputs

[rnn1, rnn2, rnn3, rnn4]. RNN can be represented by a chain-like structure, see Figure 2, where the lower, middle, and upper chains represent the sequence of inputs, hidden state vector, and sequence of outputs, respectively. Mathematically, a simple RNN can be written as(2) | ||||

(3) |

where is the activation function, is the input vector at timestamp , is the hidden state vector at timestamp , is the output vector at timestamp , is the output vector at timestamp , and are weight matrices, and

are the biases. The gradient vector of RNN can increase or decrease exponentially during the training period, which leads to exploding or vanishing gradient problems because of which an RNN cannot retain a very long term history from the past. This problem is solved by Long Short Term Memory Networks.

### 2.3 Long Short Term Memory Networks

Long Short Term Memory Network (LSTM) was first introduced in 1997 by Hochreiter and Schmidhuber[10.1162/neco.1997.9.8.1735] and further few improved version of LSTM was proposed later [lstm1, lstm2, lstm3]. LSTM was defined in such a way so that it can remember long term dependencies from past and overcome the issue of vanishing and gradient problem of RNN. An LSTM network works on the philosophy of selectively forget, selectively read, and selectively write. LSTM can add or delete information to the cell state. This states are called gates. LSTM has three gates which are usually known as Input, Forget and Output gate. The equations for the three LSTM gates are

(4) | ||||

(5) | ||||

(6) |

and the equations for the cell state, candidate cell state and output are

(7) | ||||

(8) | ||||

(8) |

where represents input gate, represents forget gate, represents output gate, is output from previous LSTM block at timestamp, is input at current timestamp, represents cell state at timestamp , is candidate for cell state at timestamp , and are weight matrices, are bias vectors and is the activation function, see figure 3.

### 2.4 Stacked LSTM

Stacked LSTM is an extended version of the LSTM model. In this model, there are multiple hidden layers where the next layer is stacked on top of the previous layer, and each layer contains multiple LSTM cells. Stacking layers make the model more deeper and help to learn patterns in sequence-learning and time-series problems more accurately.

### 2.5 Bidirectional LSTM

A more modified and classic version of LSTM is Bidirectional LSTM (BiLSTM) [BiLstm]. LSTM is trained to learn in one direction; on the other hand, BiLSTM learns in two directions, one from past to future and another from future to past. BiLSTM has two separate LSTM layers in opposite directions of each other. The input sequence is fed into one layer in the forward direction and another layer in the backward direction. Both of the layers are connected to the same output layer, and it collects information from the past and future simultaneously.

### 2.6 Dropout

A model will learn complex and dynamical behaviour more accurately if hidden units are independent of each other while learning features. It has been observed that in neural network models, some of the neurons become highly correlated and dependent on each other while the rest are independent. They can significantly affect model performance, which may lead to overfitting. This is called co-adaptation, and it’s a major issue in large networks. This problem is solved using the dropout method [dropout]

. Dropout is a regularisation technique used in neural network models to prevent the model from overfitting. It randomly ignores neurons with probability

and keeps neurons with probability during training time. It helps the model to learn more powerful features and patterns from data.## 3 Data

Various organisations around the world keep a record of all tropical cyclones in that region; such records are generally known as Best Track Datasets (BTDs). In this study, we have used the BTD of tropical cyclones in the North Indian ocean provided by the Regional Specialized Meteorological Centre, New Delhi ^{1}^{1}1http://www.rsmcnewdelhi.imd.gov.in/index.php?option=com_content&view=article&id=48&Itemid=194&lang=en.

The dataset contains three hourly records of 341 tropical cyclones from 1982 to 2018 in the NIO. There are a total of 7662 recordings, and each recording contains information about the cyclone’s basin of origin (AS or BOB), name (if any), date and time of occurrence, latitude, longitude, estimated central pressure (ECP), and MSWS, pressure drop (PD), T.No., and grade. The trajectories (with MSWS) of all the cyclones in NIO along with the trajectory (with MSWS) of two recent devastating cyclones Vayu and Fani are shown in the Figures 4, 4, 4 respectively.

After processing the dataset for possible errors, we obtained a dataset of 341 cyclones with an average number of 27 recordings. The largest cyclone has 90 such recordings. The dataset has lots of missing data, and it is handled using imputing techniques. We have used pandas linear interpolation techniques to fill the missing values.

We have used latitude, longitude, ECP, and MSWS from the BTD. The sea surface temperature (SST) is an important factor for cyclone generation and governs its intensity. SST is obtained from the NOAA dataset provided at ^{2}^{2}2http://apdrc.soest.hawaii.edu/erddap/griddap/hawaii_soest_afc8_9785_907e.html. We have generated two new features, distance and direction from latitudes and longitudes ^{3}^{3}3https://www.movable-type.co.uk/scripts/latlong.html. The features latitude, longitude, MSWS, ECP, distance, direction, and SST are used in our model as input variables.

It has been observed that RNN learns from its inputs equally with the scaled data. We have kept MSWS in original scale and re-scaled latitude, longitude, ECP, SST, distance and direction to [-1,1] range. We have utilized Scikit learn for features re-scaling using MinMaxScaler transformation [scikit-learn]. MinMaxScaler transformation will map the interval one-to-one to the interval , defined as follows

(9) |

with the assumption that and and , and is the transformation.

## 4 Training and Proposed Model implementation

### 4.1 Model training

Model weights and biases have been trained and updated via back-propagation algorithm. We consider Mean Square Error (MSE) as the loss function and evaluate the model performance using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The definitions are given as follow

(10) | |||

(11) | |||

(12) |

where is the actual value and is the model predicted value.

### 4.2 Model Implementation

We have considered a total of four stacked BiLSTM layers (one input, one output, and two hidden layers) for our proposed model architecture. Latitude, longitude, distance, direction, SST, ECP, and MSWS used as data tuples in the input layer. We have implemented our proposed model using Keras API

[chollet2015keras]. Keras is an API which runs on top of low-level language TensorFlow

[tensorflow2015-whitepaper], developed by Google. We use a learning rate 0.01, which is useful to update weights at each layer of BiLSTM and is improved using Adaptive moment estimation (Adam)

[adam] and it helps to minimise the loss function. We have set dropout value equals to 0.02 in internal layers. Our proposed model is shown in Figure 5. It has been generated using Keras API.Data Used | Mean Absolute Error | Root Mean Square Error | ||||||

Training Points (hours) | Predict Points (hours) | Training data size (no of cyclones) | 5-fold MAE on Validation Data | Test on Cyclone Vayu | Test on Cyclone Fani | 5-fold RMSE on Validation Data | Test on Cyclone Vayu | Test on Cyclone Fani |

4 (12) | 1 (3) | 7211 (320) | 1.54 | 3.95 | 3.20 | 3.12 | 4.47 | 4.51 |

4 (12) | 6263 (306) | 3.66 | 9.81 | 7.12 | 6.72 | 11.52 | 9.17 | |

8 (24) | 5109 (266) | 5.88 | 11.31 | 13.52 | 9.95 | 13.98 | 18.48 | |

12 (36) | 4110 (228) | 7.42 | 17.52 | 20.04 | 12.64 | 21.03 | 26.05 | |

16 (48) | 3244 (195) | 8.96 | 15.50 | 21.32 | 14.82 | 18.22 | 27.12 | |

20 (60) | 2512 (165) | 10.15 | 16.98 | 22.72 | 16.07 | 21.34 | 28.87 | |

24 (72) | 1888 (138) | 11.92 | 15.01 | 22.97 | 17.87 | 20.83 | 28.14 | |

6 (18) | 1 (3) | 6574 (311) | 1.55 | 2.98 | 2.32 | 3.20 | 3.31 | 3.56 |

4 (12) | 5657 (277) | 3.72 | 8.01 | 7.03 | 6.37 | 10.36 | 9.42 | |

8 (24) | 4588 (243) | 6.19 | 13.24 | 12.53 | 10.62 | 15.84 | 16.23 | |

12 (36) | 3657 (211) | 7.92 | 17.32 | 20.54 | 13.23 | 20.01 | 26.12 | |

16 (48) | 2864 (179) | 9.31 | 10.82 | 27.74 | 15.35 | 13.05 | 27.52 | |

20 (60) | 2185 (151) | 10.22 | 11.82 | 22.56 | 16.42 | 15.01 | 28.52 | |

24 (72) | 1627 (117) | 11.52 | 16.34 | 19.82 | 17.85 | 28.98 | 35.86 | |

8 (24) | 1 (3) | 5957 (300) | 1.52 | 2.32 | 2.52 | 3.32 | 2.98 | 3.56 |

4 (12) | 5109 (266) | 3.82 | 7.08 | 6.82 | 6.88 | 9.01 | 9.02 | |

8 (24) | 4110 (228) | 5.98 | 9.92 | 13.46 | 10.82 | 12.32 | 18.57 | |

12 (36) | 3244 (195) | 8.12 | 10.66 | 21.62 | 13.18 | 13.03 | 30.05 | |

16 (48) | 2512 (165) | 9.91 | 10.02 | 25.11 | 15.82 | 11.82 | 30.98 | |

20 (60) | 1888 (138) | 11.52 | 14.02 | 27.12 | 17.56 | 16.52 | 33.26 | |

24 (72) | 1399 (104) | 12.01 | 18.82 | 32.52 | 18.52 | 21.98 | 39.80 | |

12 (36) |
1 (3) | 4843 (255) | 1.72 | 2.22 | 3.45 | 3.44 | 2.52 | 5.41 |

4 (12) | 4110 (228) | 3.96 | 4.52 | 5.37 | 7.52 | 6.01 | 8.12 | |

8 (24) | 3244 (195) | 6.52 | 8.37 | 17.85 | 11.32 | 9.75 | 20.33 | |

12 (36) | 2512 (165) | 8.62 | 8.01 | 21.10 | 14.31 | 10.51 | 27.33 | |

16 (48) | 1888 (138) | 10.32 | 8.51 | 29.31 | 16.57 | 10.85 | 27.82 | |

20 (60) | 1399 (104) | 10.98 | 14.02 | 21.27 | 17.63 | 17.07 | 28.02 | |

24 (72) | 1016 (80) | 12.01 | 15.55 | 34.34 | 18.57 | 19.52 | 39.37 |

## 5 Result and Analysis

In this study, we are interested in predicting a cyclone’s intensity in the form of MSWS for number of time points at a regular interval, using the cyclone information for number of time points, again given at regular time interval. For example, if our is 4 and is 6, then we want to train a model that uses a cyclone data of 4 regular time points and predict the intensity of the cyclone immediately succeeding 6 regular time points. We report the results for different combinations of and . To compare our results with the existing related works, the model performance is reported in terms of RMSE and MAE. We have reported our model’s performance for two recent cyclones Vayu and Fani, along with 5-fold cross-validation scores in Table 2. These two named cyclones are not part of the training data.

The predicted intensity of cyclones Vayu and Fani for different combinations of and is shown in Figures 6, 6, 6, and 6 along with the actual intensity for number of time points. These figures show that our model effectively captures the future evolution of cyclone intensity from the given training data. The graphs for Fani cyclones in Figures 6, and 6, show that though the intensity of cyclone remains almost constant during training time, still our model predicts the intensity quite accurately for future number of time points. This demonstrates that our model is effectively learning from its input features latitude, longitude, direction, distance, ECP, and SST. A similar but reverse behaviour can be observed from the graph of the Fani cyclone in Figure 6, where though the intensity is continuously increasing during training time, still, our model accurately captures the almost constant intensity for future number of time points. Thus, we can conclude that our model is learning the complex non-linear relationship of input features with the output effectively and successfully predicts sudden changes in the intensity of cyclones. Graphs, as reported in Figure 6 for any future cyclone, can be used for preventives measures well in time.

From Table 2, it is clear that our model is able to learn MSWS with good accuracy in terms of MAE and RMSE for different combinations of and . If we fix , from the Table 2, we can see that as increases, error increases as well. This is expected because we are predicting MSWS for a longer duration as increases from the same set of training points. Moreover, as increases, there is no significant difference between errors. For example, if and , MAE is 1.54 and if and , MAE is 1.52. Similar trend can be observed for other combinations of and . So, even though we increased the number of training points two-fold, the change in MAE is not significant. This indicates that the model is learning most from the initial few data points. For practical purposes, this is very important as it reduces both the waiting and computational time.

The model can predict MSWS up to 24 h well within 6 knots. From Table 1, range of MSWS for higher Grades () is at least 15. It means with a high probability, the model will predict the grades greater than equal to 3 accurately up to 24 h.

## 6 Conclusion

The authors presented a BiLSTM model to predict the intensity of tropical cyclones in NIO for continuous long-duration future time points with high accuracy. The BiLSTM model is used for the first time for the intensity forecast. As we are able to achieve high accuracy for longer period intensity forecast, this means that our model is successfully able to capture the complex relationship between various cause factors behind cyclone formation that governs the evolution of cyclone intensity.

The most crucial time point to predict the cyclone intensity would be at landfall. In future, one can try to predict the time of landfall and then train a model that accurately predicts the intensity at that instance. This will make the forecast more practical and effective as the government agencies can take more precise preventive steps both in terms of the region to focus and level of preparedness.

## Acknowledgment

The authors would like to thank the India Meteorological Department (IMD), New Delhi, for providing BTD for this study. The authors acknowledge NOAA for providing the SST data.

Comments

There are no comments yet.