I Introduction
Trains have been a prominent mode of longdistance travel for decades, especially in the countries with a significant land area and large population. India, with a population of billion people in 2016, has a railway system of network route length of kilometers, with locomotives, stations, that served billion ridership in [7]. The Indian railway system is fourth largest in the world in terms of network size. However its trains are plagued with endemic delays that can be credited to (a) obsolete technology, e.g., dated rail engines, (b) size, e.g., large network structure and high railway traffic, (c) weather, e.g., fog in winter months in north India and rains during summer monsoons countrywide.
In this paper, we take the initial steps in understanding and predicting train delays. Specifically, we focus on the delays of trains, totaling 135, which pass through the busy Mughalsarai station (Station Code: MGS), over a two year period. We build an Order Markov Late Minutes Prediction Framework (OMLMPF) which, as we show, predicts near accurate late minutes at the stations the trains travel to. To the best of our knowledge, this is the first effort to predict train delays for Indian rail network. The closest prior work is by Ghosh et al. [4] [5] who study the structure and evolution of Indian Railway network, however, they do not estimate delays. Our analysis is complementary and agrees with the characteristics of the busiest train stations that they find. We now define the problem, outline contributions, and present our approach.
Problem Statement: Given a train and its route information, predict the delay in minutes at an inline station during its journey on a valid date.
Ia Contributions
Our main contributions are that we:

as a first, present the dataset of 135 Indian trains’ running status information (which captures delays along stations), collected for two years. We plan to make it public.

build a scalable, trainagnostic, and ZeroShot competent framework for predicting train arrival delays, learning from a fixed set of trains and transferring the knowledge to an unknown set of trains.

study delays using order Markov Process Regression models and do Akaike Information Criterion (AIC) and Schwartz Bayesian Information Criterion (BIC) analysis to find the correct order of the Markov Process. Most of the 135 trains follow 1order Markovian Process.

discuss how the trainagnostic framework can leverage different types of trained models and be deployed in real time to predict the late minutes at an inline station.
The rest of paper is arranged as follows. We first discuss the data about train operation and its analysis in Section II and then present the proposed model in Section III. Next, in section IV
, we outline the experiments conducted with two different regression models: Random Forest Regression and Ridge Regression and give an exhaustive analysis of our results. Finally, we conclude with pointers for future research.
Ii Data Preprocessing and Analysis
This section gives details of train information we collected for a span of two years from site[10]. Table I gives the statistics.
Total number of trains considered  135 

Total number of unique stations covered  819 
Maximum number of journeys made by a train  334 
Average number of journeys made by a train  48 
Maximum number of stations in a train’s route  129 
Average number of stations in a train’s route  30 
Iia Data Collection and Segregation
We considered 135 trains that pass through Mughalsarai Station (MGS), one of top busiest stations in India. For them, we collected train running status information (Train Data) over the period of March 2016 to February 2018. A train’s Train Data consists of multiple instances of journeys, where each journey has the same set of inline stations that the train plies through. Table II has important fields of interest in Train Data.
Due to the infrequent running of trains, the amount of data collected for each of the trains greatly varied. Using the file size as criterion, we selected Train Data of 52 frequent trains (henceforth mentioned as Known Trains), out of 135, as training data. The data of remaining 83 trains (henceforth mentioned as Unknown Trains) were used for testing and evaluating the transfer of knowledge through trained models. Figure 1 pictorially illustrates the actual segregation of collected Train Data
from March 2016 to February 2018 for 135 trains. One may recall that in traditional machine learning, the training and test data are drawn from the same set (or class). In contrast, we train our models on a seen set of
Known Trains and test it on an unseen set of Unknown Trains, thus employing zero data of Unknown Trains for training, hence the term ZeroShot. This problem setting is similar to Zero Shot Learning [8] where training and test set classes’ data are disjoint. Figure 2 shows a train journey and related notations used in this paper.Field Name  Description 

Actual arrival date of train at a station e.g.  
Station code name (acronym) for a station e.g.  
Late minutes (arrival delay) at station e.g.  
Distance of a station from the source in kilometers e.g.  
Jan, Feb, Mar… extracted from  
Mon, Tue, Wed… extracted from 
IiB Data Preparation
We define a dataframe as a collection of multiple rows with fixed number of columns. For our experiments we prepared two types of dataframes, with one type being a dataframe Table III for each station (henceforth mentioned as Known Stations, totaling 621 out of 819) falling in the journey route of Known Trains by extracting required information from Train Data Table II of respective trains (in whose route the station fell) to train the models. Another type consisted of only one dataframe Table IV capturing certain information of all 819 stations; irrespective of whether they are inline to Known Trains or Unknown Trains. We divided the journey data in 52TrnsTrCv Data in ratio 4 to 1 to train and crossvalidate the models and prepared dataframe (Table III) for the chosen 80% journey data. However we did not prepare any dataframes (Table III) for rest 20% of 52TrnsTrCv Data, 52TrnsTe Data and 83TrnsTe Data, thereby leaving them in their native format of Train Data Table II.
IiC Data Analysis
Here we analyze the most important factors which drive our learning and prediction algorithm. As observed in Figures 3, 4, and 5, the spikes in each month signify that mean late minutes at a station varies monthly (the colored dots are the individual late minutes during the month). This premise was verified with similar graphs obtained for other trains and their inline stations. In Figures 6, 7, and 8, the dots represent the mean of late minutes at each inline station during a train’s journey in a particular month. In Figure 6 we can see that the mean late minutes increase during journey uptill station and later it decreases. We observed similar graphs for other trains and found that partial sequences of consecutive inline stations characterize the delays during a train’s journey.
train_type  zone  is_superfast  month  weekday  
Is it Special, Express or Other? 

Is it super fast?  Month in which the journey is made 


Obtained from [2] through train number (e.g. 13050 for Train 13050)  Obtained from actarr_date (Table II)  
Stn_{1}_code  …  Stn_{n}_code  late_mins_Stn_{1}  …  late_mins_Stn_{n}  db_Stn_{0}_Stn_{1}  …  db_Stn_{n1}_Stn_{n}  

… 


… 


… 


Obtained from station_code (Table II)  Obtained from latemin (Table II)  Obtained from distance (Table II)  
Stn_{1}_dfs  …  Stn_{n}_dfs  tfc_of_Stn_{1}  …  tfc_of_Stn_{n}  deg_of_Stn_{1}  …  deg_of_Stn_{n}  

… 


… 


… 


Obtained from distance (Table II)  Obtained from Open Government Data (OGD) [4]  
Stn_{0}_dfs  Stn_{0}_tfc  Stn_{0}_deg  Stn_{0}_late_minutes  
distance from source station  traffic strength  degree strength  Current Station’s target late minutes to be predicted  
Obtained from distance (Table II)  Obtained from OGD [4]  Obtained from latemin (Table II) 
The bold font texts are the columns in our prepared dataframe for each Known Station. We assert that Stn_{0}_late_minutes depends on the values mentioned in other columns. tfc_of_Stn_{i} and deg_of_Stn_{i} are the total number of trains passing through and total number of direct connections of to other stations respectively. Such a dataframe is called  dataframe of a target station () for which it is prepared, where depends on the number of previous stations (a partial sequence of consecutive stations) considered.
station  latitude  longitude  stn_tfc  stn_deg  
Latitude  Longitude 





Obtained from OGD [3] 
The bold font texts are the columns in our prepared dataframe for collectively all 819 stations of Known Trains and Unknown Trains. station is used as a key to obtain rest features on which kNN is run. This data frame helps to determine the semantically nearest station to a given station.
Iii Proposed Model
In this section, we explain our proposed regressionbased OMLMPF algorithm and its components. Regression is the task of analyzing the effects of independent variables (in a multivariate data) on a dependent continuous variable and predicting it. In our setting, the independent variables are the ones mentioned in Table III and the dependent continuous variable to be predicted is the target late minutes (Stn_{0}_late_minutes
). Our regression experiments with low RMSE and significant accuracy under 95% Confidence Interval back our hypothesis to cast it as a Regression based problem. We used Random Forest Regressors (RFRs) and Ridge Regressors (RRs) as two types of individual regression models in
OMLMPF to learn, predict, evaluate, and compare results.For realtime deployment and scalability, we avoided building trainspecific models. Hence we looked for entities which would help us to frame a trainagnostic algorithm as well as enable knowledge transfer from Known Trains to Unknown Trains. A train’s route is composed and characterized by the Stations inline in its journey. Significant delays along a route which has more number of busy stations can be expected compared to the ones having lesser number of busy stations.
Through the analysis of multiple figures similar to the ones mentioned in subsection IIC we observed the following details about the delay at inline stations during a journey:

Partial routes of consecutive Stations can be identified during journey which either increase or decrease the delay at next stations ( in Fig.6).
Above points suggest that multiple deciding factors (e.g. the month of travel, the sequence of stations during a journey etc.) determine the late minutes at a station considered. Since we sought to use Stations to frame a trainagnostic late minutes prediction algorithm and for knowledge transfer, we prepare a dataframe Table III for each of the Known Stations capturing the details mentioned. Later, we train Order Markov Process Regression models for each Known Station; described next.
Iiia Order Markov Process Regression (OMPR) Models
The Markov Process asserts that the outcome at a current state depends only on the outcome of the immediately previous state. However if the current state’s outcome depends on previous states, we call it an Order Markov Process. Here we assert that the late minutes at a current target station depends on the details of its previous stations (henceforth mentioned as ). This notion is effectively captured in dataframe Table III where we capture general features of a train, day and month of a journey and the characteristics of the  along with that of the current target station. The idea is to learn OMPR models (Random Forest Regressors and Ridge Regressors) for each of the Known Stations using Algorithm 1 and later use those trained models to frame a trainagnostic late minutes prediction algorithm (OMLMPF Algorithm 2). Regression models are trained on each of the Known Stations’ corresponding  dataframe Table III with the values of depending on the number of stations previous to it, subject to its positions during the journeys of multiple trains. This design will be clarified in section IIIC. We used python sklearn.ensemble library [9] and sklearn.linear_model library [9] for learning Random Forest Regressor and Ridge Regressor models respectively.
IiiB Nearest Neighbor (NN) Search
Unknown Stations (USs) are the ones which, along with the Known Stations (KSs), build the journey route of Unknown Trains. Since we made Unknown Trains’ data Zero Shot, dataframe Table III is not prepared for USs, thus we do not have OMPR models for them. Hence, we look for a which is best similar to the current target with respect to features stated in Table IV; whose model could be used to approximate the predicted late minutes at the .We employ NN search algorithm (Algorithm 3) to fulfill this objective. A twostep NN search is applied since latitude and longitude data are semantically different from traffic and degree strength data. We used python sklearn.neighbors library [9] with default options.
IiiC Example
In our example, let there be five Known Trains () routes and two Unknown Trains () routes with dummy stations and to explain our proposed framework, where (a..q) and (r..w) are Known Stations and Unknown Stations, respectively. Figure 9 shows the train route map where source stations are colored green.

Journey:

Journey:

Journey:

Journey:

Journey:

Journey:

Journey:
IiiC1 Data Preparation and Training
We collect Train Data Table II for each of the seven trains and divide them into two categories: Known Trains ( ) and Unknown Trains ( ) based on the amount of data collected for each train. After the actual segregation of collected data as showed in Fig.1, we prepare  dataframe Table III for each using ’s Table II data.

Preparation of  dataframes Table III for :
We prepare a  dataframe for owing to Train only since it has as one station previous to it. It is navigated by also, but it is the source station there, thus has zero stations previous to it. 
Preparation of  dataframes Table III for :
We prepare a  dataframe for owing to trains , , and since it has a valid set of one station previous to it and a  dataframe owing to train , as it has two stations previous to it. 
Preparation of  dataframes Table III for :
We prepare a  dataframe for it owing to Train , and as they have a valid one station previous to during the journey. Another  dataframe is prepared for it owing to Train and , and a  dataframe owing to Train .
Similarly, for each of the Known Stations, we prepare valid  dataframes Table III, depending on the number of stations previous to them during the journey of Known Trains. Later we use those  dataframes to train OMPR models (RFR and RR) for each Known Station as explained in Algorithm 1. While training the models, we also maintain a list of stations which stores the names of stations (station codes) which have OMPR models. For example, in context of all five Known Trains here, the stations in are (, , , , , , , , , , , , , ) since they have one valid station previous to them during the journey of various ; … has stations (, , , , , , , ) since each of them has a valid set of stations previous to them.
IiiC2 Prediction of Late Minutes for Train Journeys
We explain OMLMPF algorithm (Algorithm 2) here with the help of above train examples. We employ a feedforward method for late minutes prediction at each of the inline stations where the late minutes predicted for the previous stations and their other details are incorporated in current target station’s  row dataframe. (A row dataframe consists of only one row of Table III).
Known Trains Late Minutes Prediction
Stations inline during the journeys of crossvalidation set and the test set of Known Trains consist of only Known Stations for which we have trained models saved from Algorithm 1. The column entries in  row dataframe (Table III) for the current station at which late minutes are to be predicted are filled accordingly as explained in the table, except Stn_{0}_late_minutes since we aim to predict it here. Say for train ’s crossvalidation or test data, we predict late minutes at each station. As per the execution steps of Algorithm 2 the late minutes at:

is assumed to be since it is a source station thus list is .

is predicted through since we have this OMPR model trained over the  training dataframe for . We fill the  row dataframe for with set as and late minutes at set as the first entry in i.e. . Say the predicted late minutes at is 5, hence extends to .

is predicted through as we have this OMPR model trained for it. The first and second entry in list, ( and ) are used as late minutes at station and respectively in the  row dataframe for station to predict the late minutes at it; say minutes. So the list becomes .

In a similar fashion, we keep feedforwarding the predicted late minutes at previous stations to predict the late minutes at , , and through OMPR models , , and respectively.
Unknown Trains Late Minutes Prediction
We choose train for explaining Algorithm 2 to predict late minutes for Unknown Trains’ inline stations. The late minutes at:

is assumed to be since it is the source station. Thus the late minutes list is initialized with .

is predicted as follows. We do not have a trained OMPR model (neither RFR nor RR) for since it is an Unknown Station, thus not in . Hence, via Algorithm 3 we find a Known Station nearest to it among the ones in which have a OMPR model (RFR and RR), say station is found. Next, the  row dataframe prepared for with set as is fed to the model to predict late minutes at , say minutes. Thus list extends to .

is predicted through model with , and late minutes at , late minutes at set as , and , respectively; say minutes is predicted, thus the list becomes .

is predicted as follows. It can be noticed from above set of Known Trains journey that we do not have a valid trained model in spite of the current target station being a Known Station since no  dataframe for station could be prepared from any of the Known Trains. So we choose a station among which is best similar to through Algorithm 3 (say station is chosen). Thus is used to predict the late minutes (say minutes) on the row dataframe for with , , and being , , and respectively with corresponding late minutes as , and . Thus the list becomes .

is predicted through a OMPR model; say where is obtained through Algorithm 3 for . The  row dataframe for it has , , set as , , and respectively.

is predicted through model on its  row dataframe with , , and set as , , and respectively.
Iv Experiments and Result Analysis
The OMLMPF Algorithm 2 was executed on three sets of data, namely Crossvalidation Data of Known Trains, Test Data of Known Trains and Test Data of Unknown Trains as mentioned in Figure 1 for different values of (in OMLMPF). We enumerate four detailed experiments below, which were conducted with both RFR and RR models individually:

Exp 1: We ignored tfc_of_Stn_{i}, deg_of_Stn_{i} and Stn_{i}_dfs columns from dataframe Table III since these features are implicitly captured in Stn_{i}_code. Experiment was conducted on dataset 52TrnsTrCv.

Exp 2: We ignored the Stn_{i}_code columns from dataframe Table III as tfc_of_Stn_{i}, deg_of_Stn_{i} and Stn_{i}_dfs numerically capture the property of station codes. This was done for Unknown Trains case because we did not have partial consecutive inline station path of s and s (hence no Stn_{i}_codes) due to the test data being ZeroShot. The experiment was conducted on 83TrnsTe data after learning the prediction models from 52TrnsTrCv data to assess the transfer of knowledge from Known Trains to Unknown Trains.

Exp 3: We conducted Exp 2 again on 52TrnsTrCv data, where results similar to that obtained in Exp 1 for crossvalidation data endorses our notion of viceversa representation of stations, as done in Exp 1 and Exp 2.

Exp 4: We conducted Exp 2 on 52TrnsTe data with prediction models learned from 52TrnsTrCv data.
After conducting the experiments we analyzed the results to evaluate the performance of trained models and to determine the optimum value of (in OMLMPF). For brevity, we do not present the detailed results for all 135 trains, but we do justice by presenting OMLMPF output on test data of few trains in Tables V, VI, VII (negative numbers in tables suggest that the train arrived early by those many minutes).
Stations:  BBS  CTC  JJKR  BHC  BLS  KGP  BQA  ADRA  GMO  KQR  GAYA  MGS  CNB  NDLS  

0  2  8  1  13  25  19  18  2  9  21  5  6  15  

0  2.75  6.83  0.01  17.44  16.52  11.22  17.65  1.94  16.01  8.77  0.25  12.26  23.10 
Stations:  NLDM  ANSB  RPAR  SIR  UMB  SRE  MB  BE  LKO  BSB  MGS  PNBE  KIUL  JAJ  JSME  ASN  KOAA  

0  3  4  11  0  6  15  55  30  10  18  10  11  0  7  3  5  

0  9.38  7.87  2.43  3.61  0.50  26.13  36.14  29.42  32.14  20.38  3.296  6.87  3.80  17.55  14.30  13.91 
Stations:  JAT  PTKC  JRC  LDH  UMB  SRE  MB  BE  LKO  RBL  JAIS  AME  PBH  BOY  BSB  MGS  DNR  PNBE  RJPB  

0  8  3  0  5  15  10  1  30  41  51  57  74  111  75  123  130  120  120  

0  10.19  10.74  10.17  11.60  11.97  27.24  34.63  28.45  40.15  41.29  42.94  60.71  72.51  75.25  70.50  74.45  67.95  71.80 
Random Forest Regressor (RFR) Models  Ridge Regressor (RR) Models  
Exp 1 (Avg %age)  Exp 2 (Avg %age)  Exp 3 (Avg %age)  Exp 4 (Avg %age)  Exp 2 (Avg %age)  Exp 4 (Avg %age)  
CI68  CI95  CI99  CI68  CI95  CI99  CI68  CI95  CI99  CI68  CI95  CI99  CI68  CI95  CI99  CI68  CI95  CI99  
1OMLMPF  34.65  61.37  70.47  5.90  14.73  18.51  33.67  61.05  70.21  27.60  55.41  65.57  4.97  12.87  17.29  22.34  44.30  55.71 
2OMLMPF  35.28  61.36  70.85  5.72  14.17  18.41  33.72  61.03  70.65  27.51  56.32  66.87  5.34  12.65  16.80  22.81  43.67  56.59 
3OMLMPF  33.86  62.31  71.42  6.00  14.79  18.81  33.80  62.13  71.58  27.81  55.89  66.98  4.89  12.46  16.76  22.21  44.05  55.67 
4OMLMPF  34.39  62.53  71.74  5.66  14.96  18.97  33.67  61.57  71.49  27.82  55.80  66.82  4.66  12.35  16.35  21.85  43.89  55.83 
5OMLMPF  34.77  62.70  72.10  5.51  14.52  18.75  33.45  62.03  71.96  27.93  56.20  67.07  4.61  12.43  16.16  21.85  43.87  55.18 
CI68, CI95, and CI99 respectively stand for 68% CI, 95% CI, and 99% CI. Avg %age stands for Average Percentage.
Known Trains  Unknown Trains  
Trains  12305  12361  12815  12307  13131  13151  22811  22409  18612  13119  15635  03210  04401  04821  12141  12295  22308  12439  18311 
Number of Journeys  16  14  39  84  19  83  28  14  47  25  13  2  1  6  3  4  28  2  3 
Mean RMSE  87.12  89.38  96.61  88.26  62.84  82.34  53.71  44.72  29.42  80.66  80.22  57.37  23.86  31.97  53.38  68.49  44.83  11.75  36.20 
Trains row consists of unique Train Numbers. Number of Journeys row denotes the number of journeys undertaken by the corresponding train in its Test Data. Mean RMSE row presents the average of the RMSEs of all journeys. For example, Train 12305 covered 16 journeys with a mean RMSE of 87.12.
Iva Performance Evaluation of Models
We begin by noting again that a train’s Train Data consists of multiple instances of journeys, where each journey has the same set of stations that the train plies through. For each inline station during a train’s journey, we calculated monthly 68%, 95%, and 99% Confidence Intervals (CI) around the mean of late minutes in a month, considering the train’s complete Train Data
with outlier late minutes removed by Tukey’s Rule
[6]. For each train’s crossvalidation/test Train Data, the percentage of the number of times the predicted late minutes for an inline station fell under each matching CI was calculated. Then we averaged out all the percentages (calculated for each train) in different experiments enumerated above. Table VIII shows the corresponding figures. In Table IX we present the mean Root Mean Square Error (RMSE) values for few Known Trains and Unknown Trains obtained from their Test Data, where RMSE for a journey was calculated between the predicted late minutes and the actual late minutes. It is to be noted that reported results in Table VIII and IX are inclusive of journeys where the train actually got late at the source station, but these details could not be captured by our models due to their scarce occurrences.Preliminary analysis of CI and mean RMSE observations showed that RFR models outperformed RR models. However, for sake of completion, we present CI observations of RR models for some selected experiments in Table VIII. The scattering of individual late minutes at a station during a month; as observed in Figures 3, 4, and 5 suggests to consider CI95 (or higher) since the late minutes are not closely centered around mean but cover a wider distribution around it. Under RFR Models column in Table VIII, the figures in CI95 columns for Exp 1 and Exp 3 suggest that at an average we were able to predict late minutes at inline stations during crossvalidation journey data of Known Trains for approximately 62% times within 95% CI (say accuracy is 62%). Figures in Exp 2 under both RFR and RR Models columns in Table VIII for Unknown Trains’ test data do not seem promising, but since these results are for ZeroShot trains for which significant amount of data is not available, the observations are appreciable. One should also note here the low mean RMSE values for Unknown Trains in Table IX. The higher accuracies (around 56% and 66% for CI95 and CI99) for Known Trains
’ test data in Exp 4 column under RFR Models column compared to that under RR Models column signify a very important conclusion. Random Forest Regressors (which are an ensemble of multiple decision trees) very well model the deciding factors (in Table
III) compared to Ridge Regressors, thus the results state that the prediction of late minutes is effectively a decisionbased regression task.IvB Determination of Optimum value of in Omlmpf
We executed Algorithm 2 with values of (..), but which one truly captures the Markov Process property of delays along a train’s journey? To answer this we employ two common model selection criterion [1]: Akaike Information Criterion (AIC) and Schwartz Bayesian Information Criterion (BIC) to choose the statistically best regression model.
(1) 
(2) 
where stands for the number of observations used to train a model, is the Squared Sum of Errors (between predicted late minutes and the actual late minutes) and is the number of parameters in the model (number of columns in formatted dataframe Table III). Lower the score, better the model. The count of the number of times a run of OMLMPF (for a particular value of ) yielded the least AIC and BIC scores among all five runs for each train in all four experiments is noted in Table X. In Table X we see that delays along journey undertaken by 40.38% to 67.30% of Known Trains under related experiments follow a Order Markov Process since 1OMLMPF scores minimum AIC and BIC score among other frameworks. Similarly 71.08% to 81.93% of Unknown Trains follow a Order Markov Process. Rest of the trains follow a higher order Markov Process with diminishing indications. However lower cumulative RMSE scores (summed over all trains) obtained for  and OMLMPF under different experimental settings suggest to use them for realtime deployment.
Random Forest Regressor Models  

BIC Analysis  AIC Analysis  
Exp 1  Exp 2  Exp 3  Exp 4  Exp 1  Exp 2  Exp 3  Exp 4  
1OMLMPF  32  68  35  29  21  59  31  23 
2OMLMPF  7  7  9  14  9  12  9  10 
3OMLMPF  9  5  6  5  12  7  7  11 
4OMLMPF  4  3  1  4  8  2  3  6 
5OMLMPF  0  0  1  0  2  3  2  2 
The figures in each cell denote the number of times an OMLMPF scored minimum score among other runs, e.g. in BIC Analysis column for Exp 1, 1OMLMPF scored minimum BIC score for 32 trains among other runs.
V Conclusion and Future Work
Our objective was to predict the late minutes at an inline station given the route information of a train and a valid date. The significant accuracy results in Table VIII for Known Trains’ and Unknown Trains’ data demonstrates the efficacy of our proposed algorithm for a highly dynamic problem. We also determine experimentally and statistically that the delays along journey for most of the trains follow a Order Markovian Process, while other few trains follow a higher order Markovian Process. Reasonably low RMSE results obtained for Unknown Trains in Table IX also show that we were able to transfer knowledge from Known Trains to Unknown Trains. The OMLMPF algorithm is so designed that it can leverage different types of prediction models and predict delay at stations for any train, thus it is trainagnostic. With just % of total trains in India, our approach was able to cover more than
% of stations, thereby illustrating scalability . There are many avenues for future work: (a) one can expand the data collection and extend the analysis to trains Indiawide, (b) one can also explore other approaches like time series prediction and neural networks. In particular, Recurrent Neural Networks (RNN) have the property of memorizing past details and predicting the next state. The prediction of delays along stations is inherently dynamic which implicitly calls for an online learning algorithm to continuously learn the changing behavior of railway network and delays. Thus one can attempt to develop an Online RNN algorithm for it. One can also consider predicting delay of trains in other countries.
Vi Acknowledgment
We would like to thank Debarun Bhattacharjya for his help in statistically discovering the order of Markovian delays through mathematical equations. We also thank Nutanix Technologies India Pvt Ltd for the computational resources.
References

[1]
D. Beal, “Information criteria methods in sas for multiple linear regression models,”
SESUG Proceedings. Paper SA05, 2007.  [2] I. R. F. Club, “Faqs about indian railway numbers,” https://www.irfca.org/faq/faqnumber.html, 2016.
 [3] I. O. Data, “Indian railway time table,” https://data.gov.in/resources/indianrailwaystimetabletrainsavailablereservation03082015, 2016.
 [4] S. Ghosh, A. Banerjee, N. Sharma, S. Agarwal, N. Ganguly, S. Bhattacharya, and A. Mukherjee, “Statistical analysis of the indian railway network: a complex network approach,” Acta Physica Polonica B Proceedings Supplement, vol. 4, no. 2, pp. 123–138, 2011.
 [5] S. Ghosh, A. Banerjee, N. Sharma, S. Agarwal, A. Mukherjee, and N. Ganguly, “Structure and evolution of the indian railway network,” in Summer Solstice International Conference on Discrete Models of Complex Systems, 2010.
 [6] D. C. Hoaglin, B. Iglewicz, and J. W. Tukey, “Performance of some resistant rules for outlier labeling,” Journal of the American Statistical Association, 1986.
 [7] R. M. India, “Indian railways yearbook 20152016,” in Ministry of Railways (Railway Board), 2015.
 [8] C. H. Lampert, H. Nickisch, and S. Harmeling, “Attributebased classification for zeroshot visual object categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 3, pp. 453–465, Mar. 2014. [Online]. Available: http://dx.doi.org/10.1109/TPAMI.2013.140
 [9] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikitlearn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
 [10] RailApi, “Indian railway apis,” https://railwayapi.com, 2016.