DeepAI
Log In Sign Up

How long is a resilience event in a transmission system?: Metrics and models driven by utility data

08/15/2022
by   Ian Dobson, et al.
0

We discuss ways to measure duration in a power transmission system resilience event by modeling outage and restore processes from utility data. We introduce novel Poisson process models that describe how resilience events progress and verify that they are typical using extensive outage data collected across North America. Some usual duration metrics show impractically high statistical variability, and we recommend new duration metrics that perform better.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/17/2019

Using historical utility outage data to compute overall transmission grid resilience

Given increasing risk from climate-induced natural hazards, there is gro...
03/29/2021

U.S. Power Resilience for 2002–2017

Prolonged power outages debilitate the economy and threaten public healt...
08/12/2020

Probabilistic Resilience of DER systems – A Simulation Assisted Optimization Approach

Energy systems resilience is becoming increasingly important as the freq...
02/16/2018

Improving Power Grid Resilience Through Predictive Outage Estimation

In this paper, in an attempt to improve power grid resilience, a machine...
04/03/2018

Real-Time Prediction of the Duration of Distribution System Outages

This paper addresses the problem of predicting duration of unplanned pow...
01/23/2020

Bayesian estimates of transmission line outage rates that consider line dependencies

Transmission line outage rates are fundamental to power system reliabili...
06/07/2021

A consistent nonparametric test of the effect of dementia duration on mortality

A continuous-time multi-state history is semi-Markovian, if an intensity...

I Introduction

Much of the analysis of electric power system resilience relies on describing the duration and magnitude of resilience events with quantitative metrics [1, 2, 3, 4, 5, 6, 7, 8]. The resilience events correspond to conditions of unusually high stress such as extreme weather or cascading and are either simulated [2, 3, 4] or extracted from historical data [5, 6, 7, 8]. The metrics of duration and extent describe the performance of the power system as it responds to the high stress and, sometimes indirectly, the impact of the event on our society. The metrics are broadly useful in improving the engineering of power system resilience. This paper addresses electric power transmission system metrics for the duration of resilience events and the durations of the outage and restore processes occurring within resilience events. Here “outage” refers to a component being removed from service, and “restore” refers to re-energizing a component to return it to service.

The duration of a resilience event would appear to be straightforward: The event starts with the first transmission outage at time and the event ends with the last restore at time , so that the event duration metric is simply . However, we will show that the timing

of the last restore is so highly statistically variable that it is not meaningfully representative of the power system restoration. (A metric is highly statistically variable if it is likely that its value can be much different than its estimated value, and we quantify this by the size of a confidence interval containing the estimate.) Moreover, given the redundancy that is designed into power transmission systems, the last restore may have little or no impact on the power flowing to the distribution system and then to the customers. Therefore we analyze a variety of duration metrics to find new metrics which are less variable and more representative.

Our main approach is to develop new Poisson process models for the outage and restore processes. The new models are driven by seven years of automatic outage data collected across North America by the North American Reliability Corporation (NERC) in its Transmission Availability Data System (TADS). These statistical models enable the variability of the metrics to be quantified. Moreover, parameters of the new models are closely related to some of the duration metrics.

I-a Literature review

Much of the previous work on statistical models of power system resilience events addresses distribution systems. Zapata [9] models distribution system reliability with outages as a power-law Poisson process arriving at a queue that is serviced by a power-law repair process to produce a restore process. Wei and Ji [6] analyze distribution system resilience to particular severe hurricanes with a Poisson outage process arriving at a queue that repairs the outages to produce a restore process. Both the outage process rate and the repair time distribution vary in time as the hurricane progresses. Carrington [8] shows how to extract outage and restore processes from standard distribution utility data.

Both [6] and [9] statistically model the outage process and the component repair process, and then calculate the restore process with a first-in-first-out queue model, whereas we follow the insight of [8] in extracting and directly modeling the outage and restore processes. Modeling the restore process directly from the data avoids the complexities in queuing models of explicitly modeling the component repair and assuming an order of component repair. While [8]

fits the mean and standard deviation of the distribution system outage and restore processes to give a gamma distribution of restore times, it does not give statistical process models as we do in this paper. Moreover, the forms of the outage and restore processes are quite different: for transmission systems the restore process dramatically slows over time and typically extends well beyond the end of the outage process, whereas in distribution systems the outage and restore processes overlap during most of the event

[8, 6].

Previous work also estimates individual component repair times from distribution utility data. For example, Jaech [11]

predicts a gamma distribution of individual component outage restoration times and customer hours lost with a neural network, and Liu

[12] fits generalized additive accelerated failure time models to hurricane and ice storm data.

I-B Summary of paper contributions

This paper:

  1. proposes new statistical models of outage and restore processes in transmission systems, and shows that the new models describe typical North American data.

  2. analyzes statistical variability and interpretation of a variety of duration metrics.

  3. recommends novel and more useful duration metrics.

  4. reports typical values for model parameters and duration metrics for North America transmission resilience events.

Our previous conference papers [13, 14] extract events and outage and restore processes from transmission system outage data and calculate resilience metrics. These methods are also applied to quantify resilience for the largest events in NERC reports [15, 16]. Outage and restore processes are extracted from utility data for a distribution system in [8]. We deploy these previous methods to extract events and outage and restore processes from data as reviewed in section II, but otherwise the paper only overlaps with the previous work with part of contribution (4), but only for some conventional metrics applied to fewer and larger events; this paper proposes, analyzes, and recommends new and improved duration metrics.

Ii Resilience events and processes

To obtain resilience metrics from utility outage data, we first need to automatically extract resilience events and the outage and restore processes for each event. This section explains how to do this based on previous work [8, 13, 14] and establishes the notation needed for the paper.

Ii-a Utility data and extracting resilience events

The detailed North American outage data from NERC’s TADS are the automatic outage data for the following bulk electric system transmission system elements: AC circuits, transformers, AC/DC back-to-back converters, and DC circuits [14]. The data include the outage and restore time to the nearest minute, the initiating cause code for each outage, and the sustaining cause code for sustained outages. In this paper we analyze the approximately 62 000 automatic outages for all elements reported in TADS from 2015 to 2021 for the Eastern, Western, and ERCOT interconnections.

A key step in resilience analysis of real data is automatically extracting resilience events. For each interconnection, the automatic outages are grouped together into resilience events based on the bunching and overlaps of their starting times and durations. We quote from [14] the algorithm used: “Every outage in an event has to either start within five minutes of a previous outage in the event or overlap in duration with at least one previous outage in the event that has a difference in starting time not exceeding one hour. In applying this algorithm, repeated momentary outages of the same element are neglected if they occur within 5 minutes of each other.” We use this algorithm to automatically group outages into resilience events (their sizes vary from 1 to 352 outages) and then analyze all the resilience events with 10 or more outages. An event that contains at least one outage with a weather-related initiating or sustained cause code is defined as a weather-related event. The weather-related TADS cause codes are lightning, weather excluding lightning, fire, and environmental. This procedure identified 352 transmission events with 10 or more outages, 329 of which are weather-related.

Ii-B Outage, restore, and performance processes

Suppose that the resilience event has outages at times and restores at times . Note that the outages are sorted into the order in which the outages occur, and the restore times are sorted into the order in which the restores occur. This sorting implies that the th restore time is not usually the restore of the th outage .

For each event, the outage process is the cumulative number of outages at time and the restore process is the cumulative number of outages at time :

(1)
(2)

Both processes start at zero at the beginning of the event and increase to the total number of outages , as can be seen in the example in Fig. 1.

Fig. 1: Processes for a transmission system resilience event with 12 outages.

Resilience studies [1, 2, 3, 4] often define for each event a performance (or resilience) curve , which is the negative of the cumulative number of unrestored outages at time . The performance curve decrements for each outage and increments for each restore as shown in Fig. 1. Indeed, the performance curve is related to the outage and restore processes by . The performance curve can be uniquely decomposed into its outage and restore processes, and it contains the same information as the outage and restore processes [8].

The outage and restore processes, while straightforward, are fundamental to analyzing real outage data, and they have several distinctive features [8]: (a) The outage and restore processes routinely overlap in time in real data; this differs from the customary idealized outage and restore phases of resilience that are separated in time [1, 2, 3, 4, 7]. (b) The analysis is at a systems level and is not focused on tracking individual elements: it only counts the numbers of outages and restores and it does not track which outaged element restored when or the order in which elements restore. (c) The forms of the outage and restore processes and performance curve readily lead to resilience metrics that describe each process; in particular, it is useful to have separate metrics describing the outage process and the restore process.

Iii Poisson process models of outage and restore

This section introduces new Poisson process models that describe typical outage and restore processes in our transmission system data. The mean values of these Poisson processes are a useful approximation of the outage and restore processes. Moreover, parameters of the Poisson process models yield resilience metrics, and section VIII uses the Poisson process models to quantify the variability of the metrics. The fit of the Poisson models with the data is discussed in section VII, where it is shown that the model with a lognormal rate typically fits the restore process better than the model with an exponential rate.

Iii-a Poisson process of outage times with constant rate

The data for each event specifies that there are outages in the event and that the outages start at time and end at time . Given this information, and assuming a constant rate Poisson process, we model the outage times as occurring randomly and at a constant rate in the time interval . In particular, given that there are outages in , the outage times

are independent samples from a uniform distribution on

sorted into ascending order111One well known property of a constant rate Poisson process is that, if there are a given number of outages in an interval, then these outage times are uniformly distributed in that interval [18, Thm. 4A, Ex. 4A], [19, Thm. 5.2]..

A metric characterizing the outages is their rate , which is estimated for each event as222 Since there are time differences between the outages, the estimated average time difference between successive outages is , and then the estimated rate is the reciprocal of the average time difference.

(3)

The average or expected cumulative number of outages at time is

(4)

approximates the outage process as shown in Fig. 2. We see in Fig. 4 some typical examples in which the cumulative number of outage increases in the linear way given by (4). The total number of outages is . For each event, can be estimated from (3), and then the averaged outage process (4) approximates and describes the outage process .

Fig. 2: Horizontal axes ticks show eight outage times ,,…, produced by a Poisson process with constant rate . The resulting outage process is the dark blue stepped line. is approximated by the average outage process , which is the dashed line of slope .
Fig. 3: Horizontal axes ticks show eight restore times ,,…, produced by a Poisson process with lognormal rate. The resulting restore process is the red stepped line. is approximated by the average restore process , which is the dashed curve. is proportional to the CDF of the lognormal distribution and its slope is the Poisson process rate.
Fig. 4: Examples of outage processes (dark blue) and restore processes (red) for events. Red dashed line is lognormal restore approximation, gray dashed line is exponential restore approximation. p-value is from Anderson Darling test on lognormal fit to restore process.

Iii-B Poisson process of restore times with lognormal rate

The data for each event specifies that there are restores in the event and that the restores start at time . We work with the restore times relative to ; that is, , . The first restore time relative to , and any other simultaneous restores at , become . Suppose that first restore that occurs at a time is . Usually and .

The restore times typically happen with a rate that varies, as can be seen in the examples in Fig. 4. In particular, the rate of restores typically slows dramatically for the final restores. We model the positive restore times , as occurring randomly in a nonhomogeneous Poisson process at a rate proportional to a lognormal distribution. In particular, given that there are outages in the time interval , the restore times are independent samples from a lognormal distribution on sorted into ascending order. There are some extremely long restore times in the data (up to a year is recorded), and this is reflected in the modeling of the process as unbounded in .

Let the lognormal distribution have parameters and

and probability density function

. Then the Poisson process rate is proportional to the probability density function:

(5)

By definition of the lognormal distribution, since the restore times are independent samples from a lognormal distribution, the natural logarithms of the restore times

are independent samples from a normal distribution. The standard parameters characterizing the lognormal distribution are the mean

and standard deviation of the normal distribution. Therefore we estimate and for each event by

(6)
(7)

The Poisson process restore rate is proportional to the lognormal distribution as shown in (5). Then the average or expected cumulative number of restores is

(8)
(9)

where is the CDF of the standard normal distribution. Equation (8) shows that is proportional to the CDF of the lognormal distribution, and (9) expresses in terms of the parameters and . approximates the restore process as shown in Fig. 3.

The lognormal model has parameters , , , and . For each event, and can be estimated from (6) and (7) and then the averaged outage process (9) approximates and describes the restore process . Examples of the approximating restore curves are shown by red dashed lines in Fig. 4.

Iii-C Poisson process of restore times with exponential rate

We can substitute the exponential distribution for the lognormal distribution of subsection 

III-B to obtain a Poisson restore process with exponential rate. That is, given that there are outages in , the restore times are independent samples from an exponential distribution on sorted into ascending order. Let the exponential distribution have time constant and probability density function for . Then the Poisson process rate is

(10)

and the expected cumulative number of restores is

(11)
(12)

We estimate the exponential time constant by

(13)

is the arithmetic mean of the positive restore times relative to . The exponential model has parameters , , and . For each event, can be estimated from (13), and then the averaged outage process in (12) approximates and describes the restore process . Examples of the approximating restore curves are shown by gray dashed lines in Fig. 4.

Iv Duration metrics

There are many possible metrics describing durations in resilience events. This section defines and describes a variety of these metrics.

Iv-a Straightforward duration metrics

outage duration

time to first restore

restore duration

restore time to th restore

event duration

The outage process starts at the first outage and ends at so that the outage duration . The first restore is at time and the time to the first restore is . That is, quantifies how much the start of the restore process is delayed. The restore process starts at and ends at the last restore so that the restore duration . The event starts at time and ends at time . The event duration can be split into the time to the first restore and the restore duration:

(14)

This section discusses restore duration, but the corresponding metrics describing event duration are easily obtained from the metrics for restore duration by adding the time to first restore as in (14). The outage duration and time to first restore are useful metrics, but section V explains that the restore duration and the event duration suffer from high variability.

Iv-B Restore metrics based on quantiles

It is of interest to quantify the time to reach a given percentage of restoration, or, equivalently, the quantile of the restore times . There are many different definitions of quantiles ([17]

analyzes 10 definitions used in statistics), and correspondingly many ways to define restore metrics based on quantiles. This subsection discusses two metrics of restore duration based on quantiles; the first metric quantizes to a restore time while the second metric interpolates between restore times.

time to first restore with at least restoration

(15)

The ceiling function is the smallest integer . For example, is the time between the first restore and the first restore at which at least 95% of the restores are completed. It follows that for , for , and for . These quantum jumps in as varies, and which also occur as varies, are unsatisfactory when analyzing a range of events. This can be fixed with the following more elaborate quantile definition.

restore time to of restoration

(16)
where (17)

The ceiling function is the smallest integer , the floor function is the largest integer , and is the fractional part of .

Eqn. (16) shows that linearly interpolates between restore times and . uses the median-based quantile definition333implemented in R as quantile type 8, and in Mathematica by Quantile with parameters recommended by [17], but also limits to a maximum of in (17). When limiting applies, .

In contrast to , changes continuously as varies and with much smaller jumps as varies. For this reason, we strongly prefer to .

evaluated with (16) reduces to the usual median. That is, letting ,

(18)

Iv-C Metrics related to restore process models

These metrics work with the positive restore times relative to ; that is, , .444The following metric definitions require a positive outage duration () so that . If , we define the metric to be zero. Usually as explained in section III-B.

geometric mean of positive restore times

arithmetic mean of log restore times

standard deviation of log restore times

restore time to restoration assuming lognormal

satisfies and
so that

(19)

Note that .

restore time to restoration assuming exponential

satisfies and 
so that

The average restoring half life is the average time for the number of unrestored outages to halve averaged over the restore process assuming exponential decay.

There are variants of and with slightly simpler formulas that describe the time to restoration of of the nonzero restore times. For these variants, becomes and becomes . We prefer the definitions of and above because the time to restoration of of all restore times seems more straightforward.

All the duration metrics in the paper (labelled with ) are given in hours so that the time unit hour. We now discuss the units of and . A more precise version of is (or ). Dividing in hours by in hours gives the required nondimensional argument of the logarithm [20]. Changing will cause a change in the value of . does not depend on the units used and gives the same value for any choice of .

V Discussing restore metrics , , ,

metric recommend? comment median
number of outages/restores Yes useful measure of event size 13.5
outage duration Yes useful description of outage process 2.69
outage rate Yes useful description of outage process 5.45
time to first restore Yes useful description of delay in start of restores 0.52
event duration No ; extreme variability 69.8
restore duration (time to last restore) No extreme variability 69.1
restore time to th restore No preferred 31.4
first restore time with restore No preferred since continuous 55.4
restore time to quantile Yes 39.2
restore time to quantile Yes is an alternative 65.2
mean of log restore times    and is recommended. 1.64
standard deviation of log restore times 1.56
restore time to with lognormal    slightly preferred; lognormal fit only typical 67.7
exponential time constant No exponential fit poorer; arithmetic mean of restores 16.4
restore time to with exponential No exponential fit poorer 47.8
median restore time    preferred 4.27
geometric mean of restore times Yes best, least variable restore performance metric; 5.15
also estimates median of restores
all durations in hours, in per hour
TABLE I: Summary of metrics, recommendations, and typical values

All duration metrics of the restore process are subject to substantial statistical variability that can undermine their usefulness, especially for smaller values of event size . The variabilities of the restore metrics are analyzed in section VIII by calculating the size of their confidence interval, and only the conclusions about their variability are stated here.

The restore duration metric is straightforward, but it is typically too highly variable to be a reliable estimate. Moreover, depends strongly on the last or last few restores, preventing from describing the performance throughout the entire restore process. This dependence also makes relate poorly to transmission performance because these last restores may be unimportant for customers, or may be excessively delayed by factors out of the control of the utility, such as the difficulty of repairing transmission lines in the mountains in the winter or structural damage caused by hurricane or tornado.

The geometric mean of the positive restore times is the best estimate of restore performance in terms of having the least variability. It is also clear that depends on all the restores throughout the restore process. We now discuss how also estimates a median of the restore process. Since the normal distribution is symmetrical about its mean value, the mean also estimates the median of the normal distribution, and therefore estimates the median of the lognormal distribution555Only the symmetry of the distribution of the logarithm of the nonzero restore times relative to is needed here.. In fact,

is a better estimate (less variance) of the median than applying the standard formula (

18) for the median. The detailed correspondence is that estimates the median of , , which is modestly greater than666 For , difference in the medians is , where . the median of all the restore times , calculated in (18). That is, under the lognormal model, is a good estimate of the median of the positive restore times relative to , and approximates from above the median of all restore times relative to .

While is an informative metric with the lowest variability, and can be used as more representative of the almost complete duration of the restore process, with the compromise of higher variability than . is a more smoothly varying quantile metric indicating the 95% completion of the restore process. is also smoothly varying. is a bit more variable than , particularly for small . Overall, we slightly prefer to because the quantile approach is less model dependent, whereas will work best in the typical lognormal restore case.

Table I summarizes the metrics and our recommendations.

Vi Typical values of metrics & model parameters

Typical values of metrics and parameters are given for all the data in Table I and for each interconnection in Table II; these values are expected to be useful for modeling and assessing interconnection-specific transmission events. Due to the heavy tails in their distribution, some quantities in Table II such as have mean values that greatly exceed the median and large standard deviations. In these cases, the estimated mean has substantial statistical variation and poorly indicates a typical value; the median is a better typical value. The large standard deviations arise from both the metric statistical variability and the metric variation between events.

Eastern ERCOT Western
Metric mean SD median   mean SD median   mean SD median
23.2 38.2 13 16.9 10.0 13 20.1 17.7 14
3.5 3.6 2.8 2.6 2.1 2.3 2.8 2.3 2.5
7.3 8.6 5.1 6.5 3.7 5.2 24.0 99.0 6.4
0.78 1.07 0.53 1.28 1.34 0.95 0.65 0.80 0.43
379 1088 73 154 227 53 219 494 62
379 1088 72 153 228 50 218 494 61
126 332 32 75 81 36 81 210 26
305 1000 62 128 204 49 170 438 46
151 471 44 78 83 39 103 262 32
294 945 67 122 182 49 180 442 48
1.68 1.17 1.76 1.48 1.62 2.19 1.23 1.11 1.10
1.64 0.57 1.59 1.56 0.59 1.67 1.57 0.65 1.46
397 2740 77 199 327 55 132 362 46
49.8 154 17.6 28.6 30.8 15.0 28.5 57.6 12.5
145 449 52 84 90 44 83 167 37
15.3 65.3 4.8 18.1 28.8 5.3 5.6 6.1 2.6
12.8 51.6 5.8 10.0 9.5 8.9 5.8 5.6 3.0
all durations in hours, in per hour
TABLE II: Typical values of metrics by interconnection

On average, events in the Eastern interconnection are larger than in the West and ERCOT. It can be explained by the fact that the largest transmission events were caused by hurricanes, and all of these events occurred in the East. For all interconnections, the mean and median outage process durations are similar, and very short compared to event durations . The mean outage rate in the West is much higher due to several events (wildfires and a lightning storm) for which all outages started almost simultaneously. This extremely short outage duration results in huge outage rates (see (3)).

The restoration usually starts very quickly after the event starts as the time to first restore indicates. In ERCOT the average time to a first restore, 1 hour 17 minutes, is statistically significantly larger than in the East and in the West, where restoration typically starts within one hour. Overall, the time to first restore is negligible compared to event duration; this makes the event duration and the restore process duration effectively equal. In contrast, the time between the th and th restores, , is sizeable and often comprises a substantial share (41% on average) of . This observation again underscores the impact of the last few restores to the event and restore durations.

The geometric mean of the positive restore times, , is a simple and stable metric. is also an approximate estimate for the time to one half of restores for the events with log-normal restore times. The largest difference between these metrics observed for the ERCOT events can be attributed to the poorer log-normal fit for the ERCOT events. On average, is 12% of the entire restore process duration .

It is interesting to compare in Table II the sample quantile restore time with the lognormal and exponential quantiles and . often overestimates due to the heavy tail of the lognormal distribution, whereas often underestimates due to the light tail of the exponential distribution.

The parameters and

for fitted log-normal distributions and

for fitted exponential are consistent in each interconnection and across interconnections. Table V shows that increases and decreases with event size .

Only 23 of the 352 resilience events in the dataset are not weather-related. These 23 events vary in size from 10 to 26 outages. Except for , the medians of the duration metrics in Table III are statistically significantly higher777confirmed with a nonparametric one-way ANOVA test for medians [21] for weather-related events than for non weather-related events. Table III also shows for each weather type the median metrics for the 95 weather-related events with at least 18 outages. There are some statistically significant differences among the extreme weather types: the medians of and for hurricanes are greater than for other weather types, and and for hurricanes and tornadoes are greater than for other weather types. The mean of the times to first restore are similar for all weather types except tornadoes; the mean for tornadoes is 1.7 hours, which is at least double the mean for the other weather types.

Type (# cases)
fire 3(4) 21 1.51 0.33 33.4 2.63 30.8 0.96 1.89
hurricane (17) 55 6.53 0.58 257 20.4 109 3.02 1.50
 wind,thunder (36) 25 4.04 0.44 122 6.75 82.3 1.90 1.44
tornado (15) 24.5 5.04 0.96 174 12.7 93.4 2.54 1.47
winter (23) 32 4.37 0.60 49.5 4.73 41.5 1.55 1.32
all weather (329) 14 2.80 0.52 73.4 5.76 67.7 1.75 1.56
 non-weather (23) 11 1.00 0.65 19.1 1.10 19.1 0.09 1.58
all durations in hours
TABLE III: Median value of metrics by type of weather

Vii Fit of Poisson process models to utility data

This section discusses the fit of the Poisson models to the observed utility data by a goodness of fit test, which allows for analysis of each of the 352 events, and by probability plots for the combined normalized data, which also show where the fit deviates. For the goodness of fit tests, there is some arbitrariness in the threshold amount of deviation corresponding to the significance level, as well as some dependence on the event size , but they do give an indication of fit.

Vii-a Outage process fit with uniform distribution

The Poisson process model with constant outage rate implies that for each event the outage times , should be independent samples from a uniform distribution on the interval . We evaluated the fit of these outage times for each event to the uniform distribution as shown in Table IV. Satisfying the test means that the ideal model is not rejected at the significance level . Table IV shows that a majority of events satisfy the model.

test interconnection
(satisfies if )    all eastern western ercot
percent of events satisfying uniform outages
Kolmogorov-Smirnoff 69 70 72 50
Cramer-vonMises 72 73 71 56
Anderson-Darling 63 63 66 50
percent of events satisfying lognormal restores
Kolmogorov-Smirnoff 63 63 66 44
Cramer-vonMises 60 61 64 38
Anderson-Darling 59 59 64 38
percent of events satisfying exponential restores
Kolmogorov-Smirnoff 35 33 42 25
Cramer-vonMises 35 33 45 25
Anderson-Darling 32 31 40 13
TABLE IV: Percent of events satisfying outage and restore models

The normalized outage times , should be independent samples from the standard uniform distribution on the interval . The fit of the normalized outage times for all of the events to the standard uniform distribution is shown by the QQ plot in Fig. 5. The fit in Fig. 5 is quite close over the middle range, and the main deviations occur at the ends of the distribution and correspond to simultaneous multiple outages recorded at the beginning or end of the outage process888While it is plausible that some outage processes start or end with outages occurring in the same minute, it is not clear that the records accurately reflect the outage timing in all these cases..

The fits of this subsection indicate that the Poisson model with uniform rate is a typical case (a majority of all events) usefully approximating the outage process.

Fig. 5: Fit of normalized outage data to standard uniform distribution on QQ plot.
Fig. 6: Fit of normalized log restore data to standard normal distribution. Above compares CDFs; below is QQ plot.
Fig. 7: Fit of normalized restore data to standard exponential distribution. Above compares log survival functions; below is QQ plot.

Vii-B Restore process fit with lognormal distribution

As explained in section III-B, the Poisson process model with lognormal rate for the restores implies that for each event the restore times should be independent samples from a lognormal distribution. We evaluated the fit of these restore times for each event to the lognormal distribution with parameters estimated using (6), (7) at the significance level as shown in Table IV. Table IV shows that a majority of all events satisfy the model, and this also holds for the East and West interconnections.

For each event, the normalized restore times , should be independent samples from the standard normal distribution . The fit of the normalized restore times for all events to the standard normal distribution is shown by the CDF and QQ plots in Fig. 6, which show a reasonably good fit with some modest deviations.

The fits described in this subsection indicate that the Poisson process model with lognormal rate is a typical case usefully approximating the restore process. The typical lognormal case has a heavy tail that can describe some extremely delayed final restores.

Vii-C Restore process fit with exponential distribution

As explained in section III-C, the Poisson process model with exponential rate for the restores implies that for each event the restore times should be independent samples from an exponential distribution with time constant . We evaluate the fit of the restore times for each event to the exponential distribution with time constants estimated using (13) as shown in Table IV. Table IV shows that a minority of events satisfy the model.

For each event, the normalized restore times , should be independent samples from the standard exponential distribution with time constant 1. The fit of the normalized restore times for all events to the standard exponential distribution is shown by the survival function and QQ plots in Fig. 7. There is clear discrepancy between the exponential model and the data for the initial portion and tail of the distribution. The tail in the data is much heavier than exponential, and this discrepancy in the tail is particularly significant for our purpose here of estimating restore durations.

The fits described in this subsection indicate that the Poisson process model with exponential rate only fits a minority of the events and is a noticeably poorer approximation of the typical restore process than the model with lognormal rate.

Viii Stochastic variability of restore metrics

The restore duration metrics vary due to variation of the restore processes between events (and this of course is what we want to quantify) but also due to the inherent statistical variability of the metric used (which we want to minimize by selecting a better metric). The statistical variability makes the metric vary between events, even if the events have the same characteristics, because of random variations in the progress of the restores.

We assess the inherent statistical variability of the metrics by assuming the lognormal Poisson model for average values of and , which vary as functions of , and are estimated using (6) and (7). In this section we assume that .

Viii-a Variability of and

Since is assumed, and are estimated with samples. The sample mean of samples from a normal distribution with mean and standard deviation has normal distribution . Therefore has two-sided confidence interval with end points , where and is the CDF of the standard normal distribution. It follows that the geometric mean of samples from a lognormal distribution with parameters and has two-sided confidence interval with endpoints , or

(20)

We measure the size of the confidence interval (20) by the multiplicative factor , which we call the “multiplicative half-width” of the confidence interval. More generally, we define the size of a confidence interval with endpoints as

(21)

Now we obtain the size of the confidence interval for . From (19), taking ,

(22)

The sample standard deviation has distribution where is the chi distribution with degrees of freedom999the definition of uses , so that the number of degrees of freedom is one fewer than the number of samples .

Using (22) and the independence of and , the probability density function of is the convolution

(23)

and the CDF of is

(24)

We use (24), numerically integrating to evaluate the convolution, to find the confidence interval for as , then use (21) to find the multiplicative half-width of the confidence interval for .

Viii-B Variability of and

Since the restore times are sorted in increasing order, corresponds to the th largest restore time and, assuming that and , is the th order statistic of the lognormally distributed restore times . We evaluate in Mathematica the inverse CDF of the th order statistic of samples of the lognormal distribution with parameters and . Then we find the confidence interval for and its multiplicative half-width from (21).

To evaluate the variability of , we approximate its inverse CDF with the linear interpolation

where is given by (17). We then obtain the confidence interval and use (21) to obtain its multiplicative half-width.

Viii-C Results for variability of metrics

The size of the 90% confidence interval, measured by the multiplicative half-width (21), indicates the inherent statistical variability of the metrics. For example, a multiplicative half-width of 2 indicates that the interval spans from half to double of a point inside the interval. Table V shows results for metric variability, and there are some overall trends: All the metrics become much more variable as the event size decreases. Metrics estimating a larger fraction of the entire restore duration are much more variable (consider the sequence , , or , , , ). The quantile metrics (, , ) are always more variable than corresponding metrics related to lognormal restore (, , ), but the increase in variability is modest or small for 50.

Metric variability is worst and unacceptably large for , which always has a confidence interval size of more than a factor of 2. The high variability of the last restore and is expected due to the heavy tail of the lognormal distribution. Fig. 8 shows that the variability of is sharply reduced for , at least for larger , and further reduced for . This motivates avoiding and considering the use of , , , , which have confidence intervals with size less than a factor of 2 for and which perform more continuously by interpolating the metrics.

The pervasive problem of duration metric variability is best mitigated by , which has a confidence interval size of less than a factor of 2 for .

This section assesses metric variability assuming the lognormal model of restores. This is a good assumption for a majority of cases, and can be regarded as a stringent assumption for the remaining minority of cases due to the heavy tail of the lognormal distribution.

10 1.18 1.72 2.57 3.19 3.56 5.09 4.29 5.40 3.83 5.40
20 1.60 1.58 1.82 2.10 2.24 2.76 2.51 3.69 2.85 3.93
50 2.20 1.35 1.37 1.49 1.54 1.72 1.63 1.96 2.14 2.79
100 2.52 1.35 1.25 1.32 1.35 1.46 1.41 1.60 1.98 2.56
200 3.15 1.33 1.17 1.21 1.23 1.30 1.27 1.39 1.85 2.37
TABLE V: multiplicative half width of 90% confidence interval
Fig. 8: Size of 90% confidence interval for th order statistic as the fraction varies. is number of restores. Confidence interval size is multiplicative half-width. Lognormal restore is assumed with parameters and from Table V.

Ix Conclusions

We use extensive North American transmission system data to analyze the statistical variability and interpretations of a variety of metrics for the duration of processes in resilience events. Some metrics, such as the outage duration , outage rate , and the time delay before the first restore , are useful. Other duration metrics can suffer from excessive statistical variability, in which their estimated values are contained in confidence intervals that are so large that the estimated values of the metric are not representative. This variability is quantified using new Poisson models for outage and restore processes. The variability is worse for small events.

The apparently straightforward metrics of restore process duration and the event duration are extremely statistically variable and do not adequately describe the restore process, so we recommend new duration metrics and (or ) with better performance. In particular, the geometric mean of restore times has the least statistical variability, summarizes all of the restore process, and approximates a time at which half the restores are completed. The quantile-based metric indicates the time at which restoration is 95% complete, but has greater variability than . uses interpolation to vary more continuously as the data changes. Table I summarizes the metrics and their recommendations, and Tables II and III give typical values for the metrics for three interconnections and different weather conditions.

We introduce novel Poisson process models for the outage and restore processes in resilience events. These new stochastic models describe how resilience events progress in North American transmission systems, and are verified with extensive utility data to be good approximations for the majority of cases. The outages occur uniformly over a short time interval, whereas the restores occur at a lognormal rate that slows to produce the long delays often observed for the last few restores. The lognormal model for the restores is a noticeably better fit than an exponential model for the restores. We give typical values of the model parameters for three interconnections and for different weather conditions to make the new models more specific and useful to other researchers.

The Poisson process models describe probabilistic outages and restores occurring according to specified rates. Averaging the Poisson process models produces formulas for smooth, deterministic curves that approximate typical outage and restore processes. These deterministic averaged models are also of considerable interest in describing how resilience events progress in transmission systems.

References

  • [1] A. Stankovic et al., Methods for analysis and quantification of power system resilience, IEEE PES working group publication submitted to IEEE Trans. Power Systems (currently responding to first review).
  • [2] M. Panteli, D.N. Trakas, P. Mancarella, N.D. Hatziargyriou, Power systems resilience assessment: hardening and smart operational enhancement, Proc. IEEE, vol. 105, no. 7, July 2017, pp. 1202–1213.
  • [3] C. Nan, G. Sansavini, A quantitative method for assessing resilience of interdependent infrastructures, Reliab. Eng. Syst. Safety, vol. 157, Jan. 2017, pp. 35–53.
  • [4] S. Poudel, A. Dubey, A. Bose, Risk-based probabilistic quantification of power distribution system operational resilience, IEEE Systems Journal, vol. 14, no. 3, Sept. 2020, pp. 3506–3517.
  • [5] D.A. Reed, K.C. Kapur, R.D. Christie, Methodology for assessing the resilience of networked infrastructure, IEEE Systems Journal, vol. 3, no. 2, June 2009, pp. 174–180.
  • [6] Y. Wei, C. Ji, F. Galvan, S. Couvillon, G. Orellana, J. Momoh, Non-stationary random process for large-scale failure and recovery of power distribution, Applied Mathematics, vol. 7, no. 3, 2016, pp. 233–249.
  • [7] M.R. Kelly-Gorham, P.D.H. Hines, K. Zhou, I. Dobson, Using utility outage statistics to quantify improvements in bulk power system resilience, Electric Power Systems Research, vol. 189, 106676, Dec. 2020.
  • [8] N.K. Carrington I. Dobson, Z. Wang, Extracting resilience metrics from distribution utility data using outage and restore process statistics, IEEE Trans. Power Systems, vol. 36, no. 2, Nov. 2021, pp. 5814-5823.
  • [9] C.J. Zapata, S.C. Silva, H.I. Gonzalez, O.L. Burbano, J.A. Hernandez, Modeling the repair process of a power distribution system, IEEE/PES T&D Conf. & Exp.: Latin America, Bogota, Columbia, 2008.
  • [10] H. Li, L.A. Treinish, J.R.M. Hosking, A statistical model for risk management of electric outage forecasts, IBM Journal Research and Development, vol. 54, no. 3, paper 8, May/Jun. 2010.
  • [11] A. Jaech, B. Zhang, M. Ostendorf, D.S. Kirschen, Real-time prediction of the duration of distribution system outages, IEEE Trans. Power Systems, vol. 34, Jan. 2019, pp. 773–781.
  • [12]

    H. Liu, R.A. Davidson, T.V. Apanasovich, Spatial generalized linear mixed models of electric power outages due to hurricanes and ice storms, Reliab. Eng. Syst. Safety, vol. 93, no. 6, 2008, pp. 897–912.

  • [13] S. Ekisheva, R. Rieder, J. Norris, M. Lauby, I. Dobson, Impact of extreme weather on North American transmission system outages, IEEE PES General Meeting, Washington DC USA, July 2021.
  • [14] S. Ekisheva, I. Dobson, J. Norris, R. Rieder, Assessing transmission resilience during extreme weather with outage and restore processes, Probability Methods Applied to Power Sys., Manchester UK, June 2022.
  • [15] NERC, 2021 State of reliability, An assessment of 2020 bulk power system performance, July 2021. Available: www.nerc.com.
  • [16] NERC, 2022 State of reliability, An assessment of 2021 bulk power system performance, July 2022. Available: www.nerc.com.
  • [17] R.J. Hyndman, Y. Fan, Sample quantiles in statistical packages, The American Statistician, vol. 50, no. 4. November 1996, pp. 361-365.
  • [18] E. Parzen, Stochastic Processes, Dover NY, 2015.
  • [19] S.M. Ross, Introduction to Probability Models, 9th ed., Academic Press, MA 2007.
  • [20] C.F. Matta, L. Massa, A.V. Gubskaya, E. Knoll, Can one take the logarithm or the sine of a dimensioned quantity or a unit? Dimensional analysis involving transcendental functions, J. Chemical Education, vol. 88, no. 1, January 2011.
  • [21] M. Hollander, D.A. Wolfe, Nonparametric Statistical Methods, 2nd ed., John Wiley, 1999.