Outages are fairly common in power distribution networks [1, 2], and this number is increasing in some countries because of aging infrastructure and changing weather patterns [3, 4]. While good design and maintenance reduce the number of outages, they cannot be eliminated completely. When an outage is required to perform maintenance or upgrade the equipment, the utility can minimize the disruption of service to customers by carefully planning the deployment of the crews and the sequence of operations. On the other hand, a fault in the system usually causes an unplanned outages, which can lead to long service interruptions and significant inconvenience to the customers. Therefore, reducing the number of unplanned outages and better managing their duration is a priority for most utilities .
The first step towards mitigating the negative consequences of unplanned outages is to gain a better understanding of their number and duration, as well as the number of customers affected. However, by definition, unplanned outages are irregular and difficult to predict . Most existing studies focus on predicting the number of outages and use the weather as the only explanatory variable. Of these, [7, 8, 9, 10] attempt to predict the number of outages during extreme weather events (e.g., hurricanes and ice storms). Other authors [11, 12] try to predict the average number of outages over a given period of time under normal weather conditions. Another line of work aims to rank various components of the power system in terms of their “susceptibility to failure” using different machine learning techniques and data sources [13, 14, 15].
Compared with predicting the number of outages, relatively little work has been done to predict the duration or the total customer-hours lost for a given outage. However, this is arguably the most relevant information from the customers’ perspective. When an outage occurs and customers ask when the power will be back on, utilities typically provide an estimate of the restoration time, over the phone, on a website, or using social media. For example, Seattle City Light maintains a real-time outage map with estimated time until restoration.111http://www.seattle.gov/light/sysstat/map.asp Since these estimates usually stem from the “best educated guess” that operators can produce based on their experience and other factors, the difference between the estimated and the actual outage duration can be quite large .
To improve the prediction of outage duration, a number of studies used statistical methods to quantify the relation between various features of outages and their duration. In , the authors tested the statistical significance of a number of features on the outage duration, but did not provide a specific forecasting algorithm. Adibi and Milanicz  developed an estimation method based on the restoration procedure that required a detailed knowledge of the nature of the outage and of the steps that would need to be taken to restore power. This is not practical approach when the goal is to provide customers with an early estimate of how long the outage is like to last because a detailed repair plan is rarely available. Rodriguez and Vargas  designed a fuzzy logic technique that requires a less detailed knowledge of the repair process but relies on human experts to determine the relative importance of the possible features. This approach combines statistical prediction with some human engineering knowledge, but is somewhat difficult to calibrate and is used as a subroutine in larger restoration optimization problems rather than directly as a reporting tool to the customers.
In this paper, to predict the duration of outages, a principled framework that takes into account exogenous environmental factors (e.g., wind speed and other weather conditions, time of the day), physical features (e.g., overhead or underground distribution) and engineering knowledge as intrinsically captured in historical outage reports and associated repair logs is proposed. Information about the ongoing repair process is incorporated incrementally, as it becomes available in the form of entries in the repair log based on reports from the field.
To this end, a prediction algorithm is trained using a collection of outage reports and repair logs that most utilities keep, which contain a wealth of information about historical outages. These records typically include the time and location of the outage, the number of customers affected, its cause, the steps taken to restore service and the outage duration. While some data is available in a table, the repair logs are often written in a “free writing” style using a combination of colloquial language and very specialized terms. Table I provides an example of such a repair log.222To protect privacy, addresses, names, and other sensitive information have been replaced by generic labels. This example shows that these records can be difficult to read even for engineers with domain knowledge.
To systematically process these logs, recent advances in the fields of machine learning and natural language processing
are leveraged to develop an algorithm for real-time outage prediction. An initial outage duration (or, repair time) prediction is made based on the environmental factors and physical features available at the start of the outage. As each repair log entry is received, it is summarized using a recurrent neural network (RNN) to provide a vector-space representation that can be easily integrated with physical features for predicting outage duration. Another RNN is used to incrementally update the predicted outage time, incorporating the repair log summary and updating a state vector characterizing the outage. Experiments on a large collection of outage reports demonstrate good performance with the initial predictions and improved results with incremental updates.
This paper is organized as follows. Section II defines the problem. Section III describes in detail the machine learning methods. Section IV reports on a case study based on Seattle City Light outage data. Section V provides examples to illustrate what the model learns from the language of the repair logs. Finally, Section VI draws conclusions.
Ii Problem Definition
Utilities measure the impact of an outage in terms of the total customer-hours lost. Using a distribution management system, the number of customers affected by a given outage can be determined fairly easily and accurately [5, 2, 17]. While this metric is useful for regulatory reporting, from an individual customer’s perspective, what matters is the expected duration of the outage. To maintain good customer relations, many utilities try to provide such estimates. Simply providing an average duration calculated over all outages is inaccurate and unhelpful . Accurately predicting the duration of an outage is difficult because the repair and restoration process is complex, dynamic and affected by many variables. To illustrate this point, Fig. 1 shows the distribution of outage durations associated with three different causes, based on a collection of 15 years of records of unplanned outages provided by Seattle City Light (SCL) and used in this study. As Fig. 1 shows, even faults with the same cause can lead to very different outage duration.
The actual cause of an outage is typically not known when it is first reported, but time of day, season, weather and other factors can provide information that is predictive of the cause. For example, the SCL outage records show that most damage from crows occurs during the summer in the early morning or late afternoon (See Fig. 2), so knowing that information can lead to good early predictions for bird-related outages.
In addition to giving their customers an expected repair time, utilities may also wish to provide them more detailed probabilistic information, such as the 80% confidence interval of the outage duration. Therefore in this paper a probabilistic forecast of the duration of an outage is provided. Specifically, at a particular time, the proposed method provides aGamma
distribution describing the duration using the information collected up to that time. The Gamma distribution is the probability density function of a non-negative valued random variable, and can be written as
where is the Gamma function, is the shape parameter and is the scale parameter. By setting and
to different values, the Gamma distribution includes the exponential and Chi-squared distributions, and is commonly used to describe waiting times in many applications[20, 21].
The initial prediction should ideally be refined to take into account new information about an outage as it becomes available from field reports. This paper develops and tests such a forecasting approach using the SCL records, which include both repair logs and outage reports, and hourly historical weather information for downtown Seattle. The logs contain 15 years of records with over 8,000 unplanned outage events and over 40,000 repair logs.333Interested readers can request the data from Seattle City Light at https://data.seattle.gov Since the repair logs are written in “free-form” technical English, a natural language processing tool is needed to combine these inputs with weather information to predict the duration of the outage.
Iii Outage Duration Predictors
At a high-level, the approach is based on the assumption that the conditional distribution of outage duration (or repair times) given a set of variables (e.g. weather, time of day) is reasonably well modeled by a Gamma distribution: . A neural network is used to provide a non-linear mapping from the feature vector to the parameters . The estimated parameters can be used with the Gamma distribution assumption to provide an estimate of outage duration , or they can be used to estimate the 90-th percentile outage duration, for example.
The real-time prediction model provides a sequence of estimates, leveraging multiple neural networks. The initial estimate uses a feedforward neural network to predict the Gamma distribution parameters (Sec. III-A) using only features available at the onset of the outage (Sec. III-B). The real-time prediction update model is a Recurrent Neural Network (RNN) that integrates the onset features with a continuous, vector space representation of the incoming repair logs (referred to as an ‘embedding’) and iteratively updates an outage state vector (Sec. III-C). The embedding of a repair log is generated using a bi-directional (forward and backward) RNN leveraging an attention mechanism (Sec. III-D).
Iii-a Initial Outage Duration Predictor
A feedforward neural network is used to predict a distribution of repair times parameterized by the gamma distribution. The input to the neural network is , the feature vector for the
-th outage. The neural network uses two layers with ReLU activations to compute a hidden state vector:
The ReLU function, , has been shown to be effective in multi-layer neural networks .
The two parameters of the gamma distribution are directly predicted using .
The activation function is used on the output layer to ensure that , as required by the Gamma distribution.
The log likelihood of outage duration , given the features , is computed using and .
The objective is to minimize the total negative log-likelihood:
and the model parameters , and
are all learned via backpropagation towards that objective.
Iii-B Onset Features
A total of 19 features is available at the onset of the outage. They are grouped into related categories to help explain their motivation.
Five features relate to the date and time of the outage: month, day of the week, day of the year, hour of the day and a binary feature indicating if it is a weekend or not. Certain outage types tend to be correlated with the season. Wind and trees are more of a problem in the winter, and bird-related outages are much more frequent during the summer in the early morning or late afternoon. The time of day features and day of week features can also help the model identify the cause of other outages, e.g. a car colliding with a pole happens mainly late at night on weekends.
There are nine weather features: temperature, apparent temperature, cloud cover, dew point, humidity, precipitation intensity, precipitation probability, atmospheric pressure, and wind speed. It is likely that not all of these are useful but there is no harm in including them, since regularization techniques are used to avoid overfitting (see Sec. IV-B).
Two features are used to indicate the difficulty of repairing outages at each location. The first is a binary feature indicating if the distribution is overhead or underground. The second is a smoothed average of historical repair times for outages from that feeder, where smoothing is a weighted combination of the average for that feeder and the average for all feeders depending on the number of outages observed for that feeder. The last three features provide information about the size of the outage and the busyness of the repair crews. They are the logarithm of the number of customers affected by the outage, the total number of outages in the last three hours, and the total number in the last eight hours.
These features are selected because they are all immediately available at the start of the outage, and have been used in prior work as discussed earlier. An oracle feature is also experimented with in this paper, i.e. a feature that is generally not known until the repair is underway, in this case the cause of the outage. Optionally including this feature allows us to test how well our other features implicitly capture the cause.
Iii-C Real-time predictions with repair logs
Our real-time prediction model makes use of the repair logs to update its predictions during the outage. As described above, an initial prediction is made at the start of the outage. Thereafter, each time a repair log is received, the system extracts relevant information about the progress of the repair and issues a new prediction. This procedure is depicted in the flowchart in Fig. 3.
The updated predictions are driven by a recurrent neural network. At each time step, the RNN takes two vector inputs and produces a vector output
The first input is the output from the RNN at the previous time step, , which functions as a summary of the state of the outage up to that point. The second is a vector that concatenates the onset features (Section III-B) with a vector summary of the latest repair log , plus an additional log-transformed feature indicating the amount of time elapsed since the beginning of the outage, . The method of creating the repair log embedding vector is described in Section III-D.
To compute for the first repair log, the input , a projection of the onset features, is used. The matrix is learned jointly with the other parameters.
Iii-D Repair Log Embedding Vector
To create the vector that summarizes the -th repair log, we leverage techniques from natural language processing for creating embeddings of short text-like sentences or paragraphs, namely bi-directional RNN’s with attention. The bi-directional RNN builds representations that capture the meaning of each word in its local context. Attention is a method of collapsing the per-word representations from the bi-RNN into a single summary vector by taking a weighted combination.
A vocabulary is defined by selecting all words444‘Words’ are unique white-space-separated tokens, after some preprocessing, described in Sec. IV-B. with counts greater than some threshold and adding an out-of-vocabulary token for other words. Having defined the vocabulary, each word is mapped to a
-dimensional indicator vector with a single 1 and all remaining elements equal to zero (referred to as a one-hot vector). Thus, the input sequence of words in the repair log is represented as a sequence of one-hot encoded vectors:
These vectors are projected to a low-dimensional embedding space using a matrix , resulting in the sequence
The sequence is input to two RNNs: one that processes the input from left to right and another that processes the input from right to left. The combination of these two RNNs is referred to as a bi-directional RNN. The outputs at each position from each direction of the RNN are concatenated together to create a representation for each word in the log:
The sequence of all the vectors across the repair log forms a matrix .
Neural attention is a widely used technique that allows the model to summarize a sequence using a weighted average, where the weights are predicted by the model to focus on (or attend to) pieces of information that it judges to be relevant [23, 24]. We apply attention to the recurrent state vectors in to summarize the repair log in a vector . The weights in the average are computed using parameter matrices and and the
function to create a normalized distribution.
The above process can be duplicated with parallel computation of different sets of attention weights using the same but different sets of parameters . The resulting vectors are then concatenated to form a high dimension . This is referred to as multi-head attention. Our best performing models use two-headed attention . The motivation for multi-head attention is that it allows each attention head to have a specialized purpose. For instance, one might focus on the cause of the outage and the other might focus on which team will be responding.
Three sources of data are available: outage reports, repair logs, and weather information. The outage reports were provided by Seattle City Light and span 15 years of data. The repair logs cover the same period and contain more than 30,000 textual records. (Refer to Table I for examples.)
The outages are divided into training, validation, and test sets based on the date of the outage. Outages occurring before March 15, 2014 are assigned to training, those between March 15, 2014 and March 15, 2015 are used for validation, and those after March 15, 2015 are for testing. Outages lasting more than 24 hours (which tend to be associated with major storms), planned outages and outages lasting less than 5 minutes (for which predictions are not needed) are not included in our data. There are 6,172 outages in the training set, 740 in the validation set, and 851 in the test set.
The outage data is supplemented with hourly historical weather information.555Weather data from darksky.net The weather information is provided for a single location: downtown Seattle. We align the weather data with the outages by selecting the information from the weather report which is closest in time to the start of the outage.
Since the outage reports and repair logs were collected over a period of several years, without anticipating the use of language processing, some work is required to format and clean the data. The repair logs are aligned with the outages by selecting the log entries that were made between the start and end times of the outage on the same feeder. The alignment is not exact as there may be more than one outage at the same feeder at the same time, but such cases are rare. Approximately 20% of the outage events do not align with any repair logs for a variety of reasons, e.g. transmission-level outages which are handled through a separate process. These outages are not included in the experiments that make use of repair logs. In some cases, a repair log will be made to note the conclusion of the outage. Logs that occur in the last 2.5% of an outage duration are removed because it is not useful to make further predictions at that point and these logs interfere with the fit of the model. For the real-time predictions there are 19,182 repair logs in the training data, 2,403 in the validation set and 3,155 in the test data.
Iv-B Implementation Details
The model is implemented using the Tensorflow library. Fitting is done using the Adam optimizer with a learning rate of and a batch size of one 
. All of our recurrent neural networks are of the Gated Recurrent Unit variety with layer normalization . There are two regularization strategies: early stopping  and variational dropout on the GRUs . Model code is available on GitHub.666http://gitub.com/ajaech/outageduration
Punctuation is removed using a simple regular expression. The repair log text is preprocessed by lower-casing and by replacing ID numbers with their types such as for transformers, feeders, or poles. (An example is found in Fig. 8 where the telephone pole identifier is replaced with <tp>.) The vocabulary is set by taking all words that appear in the training data more than a certain number of times where the cutoff is selected during tuning. The vocabulary size of the best models ranges from two thousand to four thousand words.
Hyperparameter tuning is done using a random search strategy, selecting the model that assigns the highest likelihood to the validation data. The hyperparameters are the vocabulary cutoff, the word embedding size, the
GRU cell size, the bi-directional GRU cell size, the dropout rate, the number of epochs to train, number of attention heads (one or two), and whether or not to use layer normalization. We find that early stopping is a better regularizer than variational dropout, layer normalization is helpful, and two attention heads is better than one.
Iv-C Initial Outage Duration Prediction Results
The metrics are negative log likelihood (the training objective), root mean squared error (RMSE), and Pearson’s correlation. The negative log likelihood is a measure of both how well the model is able to predict the true duration and also how well it is able to reduce the uncertainty of its predictions. Since the model is trained based on a negative log likelihood objective, improvements to the model are best observed with this measure, but it is less interpretable from an applications perspective. For comparison, a linear regression model was trained to optimize for mean squared error. The linear regression gives no uncertainty information and is slightly worse in terms of RMSE (4.3 hours) and correlation (28.7) for the all onset features condition. Results for other feature sets with this model are similar.
presents the experimental results. The case with no features corresponds to using a single gamma distribution for all outages. As more features are added, the model achieves a better negative log likelihood, i.e. provides a better fit of the observed test data, lower RMSE, and higher correlation. The last two lines serve as oracle experiments, since they include the true cause of the outage as a feature, which is not usually known at the onset. As expected, knowing the true cause improves performance for all metrics. We hypothesized that the onset features would give us some information about the true cause, which seems to be the case. A classifier trained to predict the outage cause from the onset features has an accuracy of 70%. (Always predicting the majority class ‘Equipment Failure’ gives an accuracy of 44%.)
|Time + Weather||2.70||4.40||18.0|
|All Onset Features||2.66||4.25||32.0|
|Cause + Onset Features||2.57||4.02||45.1|
Using a gradient boosted regression tree to assess feature importance, the top five features are the average outage duration for that feeder, the customers affected, the hour of the day, the day of the year, and the air pressure. The day of year feature is helpful because of the seasonality of different outage types. When binary indicator oracle features are added for each outage cause category then the feature importances are similar except that the Bird/Animal cause indicator is ranked as the sixth most important feature.
Iv-D Real-time Prediction Results
Figure 5 demonstrates the performance of the real-time prediction system by showing the root mean squared error and the negative log likelihood for the initial prediction and after receiving one to three repair logs. These metrics are computed on the subset of the test data where there were at least three repair logs for each outage. Because of this, the results for the real-time prediction are not comparable with our previous experiments. Both metrics show a trend of increased prediction accuracy as more information from the repair logs becomes available.
V Analysis and Case Studies
Figures 6 and 7 illustrate an example where successive repair log progressively improves the prediction. In Fig. 6, a heat map visualization shows where in the the repair logs the model is placing its attention. Since the vector associated with a word in the text is a concatenation of the forward and backward stages of the bi-RNN, it encodes information from the surrounding phrase. To make this more clear in the heat maps, the word attention weights are smoothed over the sequence. Observe that the model identifies the cause “1-26kv wire down” and the phrase “requests clearance” which tends to be associated with a speedy repair from that time point. Figure 7 shows the predicted distributions of outage duration evolve as field reports are received, as well as the actual outage duration. The distributions correspond to time remaining, so their start time is the arrival time of the corresponding report. The “no report” condition uses only the onset features.
illustrate a case where the first and second reports do not improve the prediction. They indicate uncertainty on the cause of the outage, which increases the expected duration. Because the cause (i.e. dead crow) is identified in the 3rd report, a speedy repair can then be predicted. The probability distribution for the final prediction extends beyond the top of the figure and is truncated to improve readability.
The distributions predicted could be used in a variety of ways to update customers on the status of repairs. For example, the time could be adjusted to be more or less conservative; e.g., Table III shows mode, mean (min MSE), and 80% confidence estimates of time remaining until power is restored for the 3-report example above. In addition, the attention weights could be used to report a cause when it is reliably identified, such as the bird-related fault identified in report 3. The specific strategy used should be assessed with customer studies.
|Head #1||Head #2|
|duty supervisor||139||to investigate||120|
|26kv line||109||lights out||73|
|need nurd||92||need nurd||47|
|lights out||84||to respond||44|
|part out||51||slsvc to||39|
It is informative to analyze the most common phrases attended to by the two attention heads. Table IV summarizes the most frequent bigrams (adjacent word pairs) for each head. To create this table, we find the word in each report that is given the highest weight and count the two bigrams associated with that word: and . The two heads specialize in different concepts but there is some overlap. The first head frequently identifies the mention of 26kV cables. The second head frequently identifies the inclusion of the term <CL>, a marker for an ID number of a report that is typically created in the log towards the end of an outage.
This paper introduces an approach for predicting outage duration by learning from historical outage records. It also shows how natural language processing can be used to provide additional features allowing real-time updates of duration estimates. Experiments with a large collection of outages show that good results can be obtained from environmental features alone, since there is good correlation between these features and some causes. In addition, improvements are possible by using text analysis of incoming repair logs that provide information related to the outage cause and repair steps.
The model proposed here was developed to predict a distribution of duration times, from which one could predict either the expected time until service is restored or a time within which there is a certain level of confidence that service will be restored. The framework could just as easily be used to predict the estimated time to repair directly by replacing the final neural network layer (, estimators) with an expected duration prediction layer, and changing the training objective to mean squared error. Experiments found that the RMSE results are only slightly better when optimizing directly for that objective than when using the gamma distribution.
The model proposed here advances on prior work but is also complementary. For example, in , it is shown that an ensemble of neural networks is an effective strategy for predicting the number of outages specifically looking at weather related (wind and lightning) factors. In contrast, the work here addressed prediction of outage duration and considered all types of outages (excluding major storms), but the benefits of ensembling may extend to outage prediction. Work on outage duration prediction that relies on environmental factors (vs. post-hoc knowledge of the cause) has investigated the importance of different factors , which motivated many factors explored here. However, prior work did not integrate these in a unified model. Because different factors interact (e.g. time of day and season for bird-related outages), it is useful to explore integrated models.
A constraint of the approach described here is that it requires historical distribution data associated with the region covered. As learned in this study, weather patterns impact the prediction, as does the type of infrastructure. If such historical data was available from a several cities, it would be possible for the initial prediction model to learn to generalize to a new urban area. The text-based updates will be more sensitive to the idiosyncrasies of reporting in a particular region.
There are a number of opportunities that the use of historical records and natural language processing could enable in future studies. For example, the data could be used to predict the likelihood of failure for particular types of equipment in the next few years. Further analysis of the attended words could provide guidance as to what sort of information should be included in field reports and provide automated suggestions about outage causes and repairs to engineers in the field.
This work was supported by NSF Award #1509880. The authors thank Seattle City Light for providing the outage and repair logs that made this research possible, and Ruchira Kulkarni for the initial work on processing the repair logs. The views, opinions and positions expressed by the authors are theirs alone, and do not necessarily reflect the views, opinions or positions of NSF or Seattle City Light.
=0mu plus 1mu
-  W. H. Kersting, Distribution system modeling and analysis. CRC press, 2012.
-  W. Kersting and R. Dugan, “Recommended practices for distribution system analysis,” in Power Systems Conference and Exposition, 2006. PSCE’06. 2006 IEEE PES. IEEE, 2006, pp. 499–504.
-  H. C. Caswell, V. J. Forte, J. C. Fraser, A. Pahwa, T. Short, M. Thatcher, and V. G. Werner, “Weather Normalization of Reliability Indices,” IEEE Transactions on Power Delivery, vol. 26, no. 2, pp. 1273–1279, 2011.
-  A. Pahwa, M. Hopkins, and T. Gaunt, “Evaluation of Outages in Overhead Distribution Systems of South Africa and of Manhattan, Kansas, USA,” in Proceedings of International Conference on Power Systems Operation and Planning, Cape Town, South Africa, 2007.
-  H. M. Rustebakke, Electric Utility Systems and Practices. General Electric Company, 1983.
-  P. Kankanala, S. Das, and A. Pahwa, “AdaBoost+: An Ensemble Learning Approach for Estimating Weather-Related Outages in Distribution Systems,” IEEE Transactions on Power Systems, vol. 29, no. 1, pp. 359–367, 2014.
-  D. Zhu, D. Cheng, R. P. Broadwater, and C. Scirbona, “Storm Modeling for Prediction of Power Distribution System Outages,” Electric Power Systems Research, vol. 77, no. 8, pp. 973–979, 2007.
-  Y. Zhou, A. Pahwa, and S. S. Yang, “Modeling Weather-Related Failures of Overhead Distribution Lines,” IEEE Transactions on Power Systems, vol. 21, no. 4, pp. 1683–1690, 2006-11.
H. Liu, R. A. Davidson, and T. V. Apanasovich, “Spatial Generalized Linear Mixed Models of Electric Power Outages Due to Hurricanes and Ice Storms,”Reliability Engineering & System Safety, vol. 93, no. 6, pp. 897–912, 2008.
-  K. Alvehag and L. Soder, “A Reliability Model for Distribution Systems Incorporating Seasonal Variations in Severe Weather,” IEEE Transactions on Power Delivery, vol. 26, no. 2, pp. 910–919, 2011.
-  A. Domijan Jr, R. Matavalam, A. Montenegro, W. Wilcox, Y. Joo, L. Delforn, J. Diaz, L. Davis, and J. D’Agostini, “Effects of Normal Weather Conditions on Interruptions in Distribution Systems,” International Journal of Power and Energy Systems, vol. 25, pp. 54–61, 2005.
-  P. Kankanala, “Estimation of Overhead Distribution System Outages Caused by Wind and Lightning Using an Artificial Neural Network,” in International Conference on Power System Operation & Planning, 2012.
P. Gross, A. Boulanger, M. Arias, D. Waltz, P. M. Long, C. Lawson, R. Anderson,
M. Koenig, M. Mastrocinque, W. Fairechio, J. A. Johnson, S. Lee, F. Doherty,
and A. Kressner, “Predicting Electricity Distribution Feeder Failures
Using Machine Learning Susceptibility Analysis,” in
Conference on Innovative Applications of Artificial Intelligence - Volume 2, ser. IAAI’06. AAAI Press, 2006, pp. 1705–1711.
-  Texas A&M Engineering, “Researchers develop model to predict and prevent power outages using big data,” 2017. [Online]. Available: http://engineering.tamu.edu/news/2017/07/26/researchers-develop-model-to-predict-and-prevent-power-outages-using-big-data
-  C. Rudin, D. Waltz, R. N. Anderson, A. Boulanger, A. Salleb-Aouissi, M. Chow, H. Dutta, P. N. Gross, B. Huang, S. Ierome, D. F. Isaac, A. Kressner, R. J. Passonneau, A. Radeva, and L. Wu, “Machine Learning for the New York City Power Grid,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 2, pp. 328–345, 2012.
-  M.-Y. Chow, L. S. Taylor, and M.-S. Chow, “Time of outage restoration analysis in distribution systems,” IEEE Transactions on Power Delivery, vol. 11, no. 3, pp. 1652–1658, 1996.
-  M. M. Adibi and D. P. Milanicz, “Estimating restoration duration,” IEEE Transaction on Power Systems, vol. 14, no. 4, pp. 1493–1498, 1999.
J. R. A. Rodriguez and A. Vargas, “Fuzzy-heuristic methodology to estimate the load restoration time in mv networks,”IEEE Transactions on Power Systems, vol. 20, no. 2, pp. 1095–1102, 2005.
-  R. B. Duffey and T. Ha, “The probability and timing of power system restoration,” IEEE Transactions on Power Systems, vol. 28, no. 1, pp. 3–9, 2013.
-  R. Ramakumar, Engineering reliability: fundamentals and applications. Prentice Hall, 1993.
-  D. C. Montgomery, G. C. Runger, and N. F. Hubele, Engineering statistics. John Wiley & Sons, 2009.
-  X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323.
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,”ICLR, 2015.
-  C. N. dos Santos, M. Tan, B. Xiang, and B. Zhou, “Attentive pooling networks,” CoRR, vol. abs/1602.03609, 2016.
-  A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 6000–6010.
-  M. Abadi and et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. [Online]. Available: https://www.tensorflow.org/
-  D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980, 2014.
-  K. Cho, B. van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder–decoder for statistical machine translation,” pp. 1724–1734, 2014.
-  J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
-  R. Caruana, S. Lawrence, and C. L. Giles, “Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping,” in Advances in neural information processing systems, 2001, pp. 402–408.
-  D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” in Advances in Neural Information Processing Systems, 2015, pp. 2575–2583.