1 Introduction
With the rollout of advanced metering infrastructure (AMI) [AR5], the power system is undergoing rapid evolution. The Office for Gas and Electricity Markets (Ofgem) has announced that the UK plans to install more than 50 million smart meters by 2020 [AR6]. On the one hand, the installation of smart meters enables realtime information exchange between power suppliers and endusers and therefore increases the efficiency of the electric power supply and encourages different smart energy applications such as demand response and demand side management [Areview1]. On the other hand, the high temporal resolution energy consumption data coupled with intermittent energy resources such as wind energy make electricity demand present unprecedented diversity and complexity.
Different electricity generation units have been adopted in the power plants to meet the specific electrical demand/ load types. Among all the units, peak load units have the lowest efficiency and the highest cost. It is estimated that a 5%15% reduction in peak load would bring substantial benefits in saving resources and decreasing realtime electricity tariffs
[reason6], which calls for effective peak load management strategies.To realize that, being able to accurately predict the magnitude and occurring time of peak load/ demand, which can not only give the power plants sufficient startup time to avoid grid congestion but also is fundamental in ensuring the economic benefits and the security and stability of the power grid. With the increasing penetration of largescale intermittent energy such as wind and solar as well as energy storage power station, it has given rise to new characteristics of peak loads and a more challenging task for peak load/ demand forecast.
In such a context, it is evident that accurate peak load demand forecast becomes essential element to the power grid operations [reason10]. Although optimizing smart grid operation based on standard load forecast has been a longheld principle [reason4], the new digital and smart grid era calls for more attention to build flexible peak load forecast frameworks to adapt to the rapid development of the power system.
1.1 Motivation and contributions of this review
Instead of the continuous and stable power generation, peak load power plants only run for a short time over a year, which is neither economical nor environmentally friendly. Peak load management strategies were therefore proposed to reduce peak load generation costs based on the incentive and punishment mechanism and programs, such as interruptible load control, demandside bidding, and emergency demand response [saebi2010demand] [cappers2010demand]. Moreover, it is important to know reasonably well the future peak load demand in order to plan and trigger relevant peak demand management strategies and mechanisms. Therefore, accurate and reliable peak load forecast is crucial for materializing any peak demand management strategy.
In general, a mature peak load forecast can help the system operator to manage the peak load demand effectively in advance, and thus to achieve demand response to help reduce greenhouse gas emissions and decrease nonrenewable fuel reliance. Moreover, the ultimate goal of peak demand management and forecast is to balance electricity supply and demand to maximize the benefits of system. Therefore, to further highlight the motivation of the peak demand forecast, TABLE 1 lists key stakeholders (grid operators, electricity retailers, electricity endusers, government) and the impact of peak load demand forecast on them [koh2015evaluating] [sardi2017multiple] [mao2009short].
Electricity market stakeholders  Importance of peak load demand forecast  
Grid operators 


Electricity retailers 


Endusers 




Save electricity bills and improve their living standard  
Government 

Based on the above analysis, the objective of this review is to provide a clear and comprehensive overview of the peak load demand forecast methods in the literature. To the best of our knowledge, this should be the significant review on the topic of peak load forecast. In particular, the key contributions of this review are described as follows.

We give precise definitions of key factors relevant to the peak load forecast framework, which could serve as a useful standardization and guidance for future research in this area.

We conduct a thorough review of peak load demand forecast methods and explores hybrid forecast models from a historical and systematic point of view.

We provide a comparative analysis based on existing studies and discuss potential improving methods for peak load demand forecast. A comprehensive summary regarding the application scopes of the reviewed methods is also presented, which could provide useful insights for future research directions.
1.2 Literature retrieval strategy
Before a detailed overview of peak load demand forecast methods, a necessary initial step is to follow the standard criteria and protocols to select highly related and highquality sources and publications.
The literature retrieval databases selected are ScienceDirect (SD) and the Institution of Electrical and Electronics Engineers (IEEE). SD is a famous academic database provided by Elsevier, in which more than a billion articles are downloaded every year, making it the most downloaded academic search platform among academic databases [SD]. IEEE publishes a wide range of peerreviewed journals, and the criteria defined are of recognized authoritative influence in the field of electrical power analysis [7IEEE].
The following key phrases are used during the literature retrieval process (searching range of the year: 18722020):

Peak load forecasting/estimation/prediction

Peak load demand forecast/estimation/prediction

Maximum load forecasting/estimation/prediction
The keywords in each key phrase utilize the Boolean operator ’AND’; each key phrase is connected with the Boolean operator ’OR’.
After excluding lowrelevance articles without key phrases in the title and abstract, a total of 139 highly related and highquality papers form the basis of this review through a preliminary analysis. The obtained studies consist of 67 journal papers and 72 conference papers. The subsequent discussions of peak load demand forecast are all based on the literature obtained in this section.
1.3 Systematic overview of literature based on time line
To understand the historical development trend of peak load demand forecast, important to follow the timeline to conduct a systematic review. Figure. 1 shows the the number of total publications and journal publications published every year from 1956 to 2020.
Built on the exponential trend of obtained publications, the exploration of peak load forecast can be roughly categorized into three stages following the timeline: the initial stage, the developing stage, and the developed stage. The initial stage was from 1956 to 1990, during which the research on peak load demand forecast was in its infancy with a small number of publications. The strengthening phase was from 1991 to 2003, during which the number of publications began to increase gradually, with three or four articles published every year. The developed period is from 2004 to 2020 with a large number of publications on peak load forecast.
The large number of publications over the past decade reveal that there are increasing interests on peak demand forecast. This could be explained by the fact that with the economic development, there are increasing electricity consumption. As a result, peak load forecast becomes increasingly important for safe and reliable energy systems operation. Moreover, considering increasing integration of modern and clean energy technologies such as electric vehicles (EVs) and wind energy, it would be become more challenging for the peak demand forecast and the research interests on the topic will continue to grow in the future.
It should be noted that the number of journal publications on peak load forecast each year over the last decade is usually within the range of 1 to 4, which may indicate there still lacks sufficient efforts from the researchers but on the other hand indicate more research opportunities. This paper will provide a timely review on the important topic of peak load forecast with a clear definition of the research problem, comprehensive review of existing methods and a comparative and forwardlooking analysis of future research directions.
1.4 Structure of the review
The remainder of this paper is organized as follows: Section 2
provides comprehensive summaries and precise definitions for the peak load forecast problem including the forecast period, influential variables, general outputs, and evaluation metrics. Section
3 describes peak load demand forecast methods following the timeline by dividing them into manual/human expert stage, classic peak load demand forecast stage, and advanced peak load demand forecast stage. Section 4 firstly gives a comparative analysis and explores possible improving methods for the peak load demand forecast framework. Then, a comprehensive summary of existing studies on the peak load forecast will be presented. In Section 5, a conclusion is given with possible future research directions discussed.2 Peak load demand forecast problem definition
A general peak load demand forecast framework is shown in Figure 2. Intuitively, the general framework for peak load demand forecast is similar to standard load forecast. However, peak load demand forecast has its particularity when it comes to specific subprocesses, such as input variables and output results. To our best knowledge, many terms that have been defined in the standard load forecast have not been well defined in the peak load demand forecast, which leads to different understanding of the same terms in different studies. To this end, for the first time, this paper will provide an unification of relevant terms to accurately define peak load demand forecast methods and to provide generalized guidance for future research on this topic.
The following subsections will first summarize the commonly used time horizon for shortterm, mediumterm, and longterm peak load demand forecast according to the reviewed literature. Secondly, influential variables used in peak load demand forecast models will be discussed. Thirdly, the outputs of the forecasting model will be summarised. Finally, special evaluation indicators for peak load demand forecast results are presented.
2.1 Peak load demand forecast time period
Although the forecast horizon of standard load forecast has been well known, there is no such summary for peak load demand forecast.
Therefore, through analysing the reviewed literature, we classify the time horizon of peak load demand forecast into following categories:

Shortterm peak load demand forecast (STPLF), to forecast peak load from several hours to days (days7) [462005Probabilistic, 56newd2006Developed, 702008Density, 816121762].

Mediumterm peak load demand forecast (MTPLF), to forecast peak load from per week to months (months 12) [46news2005Short, 46new1, 816121762].

Longterm peak load demand forecast (LTPLF), to forecast peak load from more than a year ahead [46news2005Short, 46new1, 702008Density, 816121762].
It is worth noting that based on the reviewed papers: 1) daily peak load demand forecast is mainly studied among STPLF; 2) weekly and monthly peak load demand forecast is mainly studied among MTPLF; and 3) annual peak load demand forecast is mainly studied for LTPLF.
2.2 Influential variables of peak load demand forecast
2.2.1 Endogenous variables
The endogenous variables used in peak load demand forecast differ from those used in standard load forecast. For example, assume that the training data are hourly load consumption for one year. A standard load demand forecast model will use the hourly load data, i.e. data points, as the endogenous variables. However, a peak load demand forecast model will use the daily peak load value (sometimes also with the daily peak time), i.e. data points ( data pairs if with the daily peak time), as the endogenous variables.
The endogenous variables used by peak load demand forecast models are generally peak load data in similar days, which can often reflect the internal structure similar to the peak load in the forecast period, making it easier for the algorithm to capture the characteristics of the predicted target. [1262019Deep] proposed novel algorithms to identify the recent days that are similar to the days before the forecast, and the peak load close to the predicted date is then deduced as historical training data by analogy with the rule of thumb to improve the prediction accuracy.
Furthermore, since the input data only need the peak load in a specific period, the input variable dimension of peak load demand forecast is much lower than that of standard load forecast. The advantage of this is reflected in the high computational efficiency of peak load demand forecast model. [30Amjady2001Short] compared both the number of input features and the computation time of hourly load forecast and daily peak load forecast based on the same historical data. The statistical analysis showed that only six input features were needed for the daily peak load demand forecast, while 171 input features were necessary for the hourly load forecast.
2.2.2 Exogenous variables
TABLE 2 summarizes the exogenous variables that are frequently used in peak load demand forecast models.
Detail  











The selection of exogenous variables of peak load demand forecast models is similar to yet different from standard load forecast. According to the table, it can be seen that the input variables of peak load are similar to load prediction on the macro level, namely temperature, humidity, etc. Moreover, the selection of input variables of peak load is closely related to the forecasting period, which is also similar to that of the standard load forecast. The commonly used variables of STPLF are weather and calendar factors. For MTPLF and LTPLF, in addition to the weather and calendar variables, it is necessary to capture socioeconomic development and population growth trends. Besides, the acquisition method and accuracy of longterm weather data are also thorny problems that must be considered wisely for MTPLF and LTPLE.
On the other hand, the difference between peak load demand forecast and standard load forecast is that, since the prediction target is a series of extreme values under most conditions, the variables that most closely related to a peak load demand forecast model are the extreme variables with the ability to indicate the change degrees of the weather, such as the maximum and minimum temperature. In addition, some weather variables are internally related and can affect each other. For example, [564075966] pointed out that high relative humidity in months with apparent seasonality (summer/winter) would lead to an increase in demand for refrigeration or heating, thus affecting the forecast accuracy of peak load demand. Therefore, in their study, relative humidity was quantified as temperature change to correct the inaccurate input variables, which significantly improved the forecasting accuracy of peak load demand.
Calendar variables have a significant influence on areas with rare special events and regular holidays. In [362002Artificial], the influence of lunar calendar festivals in Egypt on peak load was considered, and the influence of Ramadan is quantified as a weight factor and input into the expert system. The prediction results showed that models considering special festivals had better performance than others. Moreover, electricity consumption in the weekend and holidays of commercial and industrial sectors is considerably changed from that of working days, and the peak load may not even occur in these sectors during nonworking days for the most time. Therefore, some of the reviewed works modeled historical data separately based on these calendar factors to improve the forecast performance. [26news1997Short] trained models separately for each hour of a day, and the weekend and weekdays were also considered as criteria for model training, which resulted in 48 independent models to predict morning peak and afternoon peak in a day. This timedivision modeling method distinguishes between working days and nonworking days, which significantly overcome the defect of the traditional model in predicting peak load on weekends.
2.3 Outputs of peak load demand forecast
The main difference between peak load demand forecast and standard load forecast is that the output is usually one value or a pair of values (e.g. a peak load value with its occurring time/date) whereas the output of standard load forecast is generally a set of load values (time series). Existing studies did not make an unified definition for the output of the peak load demand forecast. By reviewing relevant literature, the output of peak load forecast are summarized as follows:

Forecast the total peak consumption on a given peak day [1087893595]

Forecast load usage pattern during a given peak period [622008Electricity]

Forecast peak time [1288791587].

Forecast peak value [73aaGOIA2010700].

Forecast peak value and its occurring time simultaneously [85aab2012Building].

Forecast peak value and forecast its occurring time separately [20].

Forecast peak (or together with valley) value as an additional input to produce load profile [26newc1997Cascaded].
2.4 Evaluation indicators for peak load demand forecast
The evaluation indicators of peak load demand forecast models are partly the same as those of standard load forecast models, such as mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), etc. Since these indicators are well known in standard load forecast, this section will not describe them in detail. Instead, to highlight the particularity of peak load demand forecast accuracy metrics compared with standard load forecast, this section will list some special evaluation indicators used in the existing studies. In addition, the following evaluation indicators that are specific to peak load forecast are given below.

Evaluation indicators for peak value
Assuming that is the predicted peak value, is the actual peak value, is the number of the training samples:
(1) (2) Peak absolute percentage error () [85aab2012Building] is defined as:
(3) The range of is . equals 0% represents a perfect trained model, while greater than 100% indicates an unacceptable model.

Evaluation indicators for peak occurring time
Assuming that is the predicted peak occurring time, is the actual peak occurring time, is the number of the training samples:
(4) (5) By using to represent the tolerance residual for the peak occurring time, and as a flag to represent whether the predicted occurring time hits the tolerance interval , then we have, the Hit rate (HR) [85aab2012Building] is defined as::
(6) where
(7) The peak time forecasting is usually measured by [85aab2012Building][20], which specifies a period before and after the peak load occurring as the forecast error tolerance range. The prediction is considered to be correct as long as the predicted time falls within the tolerance interval.
3 Peak load demand forecast methods
Traditional forecasting methods can be roughly divided into qualitative and quantitative analysis [730]. Qualitative analysis refers to the use of expert opinions to develop theoretical insights for prediction, such as curve fitting and extrapolation techniques, on the premise that historical data are not available or technical experiments are not feasible [730]. Quantitative analysis, on the other hand, presumes historical data are available and the future development of data still follows the changing trend of historical data within a reasonable range. In quantitative analysis, mathematical or statistical methods are usually used.
Following the literature obtained in Section 1.2, the number of publications for each method is summarized in Figure 3 while Figure 4 shows the timeline of each method being first used for peak load forecast. Finally, the peak load demand forecast methods are categorized according to the development timeline and types in Figure 5.
According to Figures 3, 4 and 5
, the development of peak load demand forecast methods can be broadly classified into three stages: the manual/human expert stage starting in the late 1950s, the classic stage starting in the early 1970s, and the advanced stage starting in the early 1990s. There are few studies in the manual/human expert stage. In the classic stage, regression is the most popular method for peak load demand forecast, followed by time series decomposition, stochastic time series models, and exponential smoothing. In the advanced stage, artificial neural network (ANN) based methods are the most favorable choice.
In the following we will provide a detailed review of methods in each stage.
3.1 Manual/Human expert peak load demand forecast stage
A manual peak load demand forecast method to transform the forecasted weather into daylight illumination parameters was proposed in [11956A]. The obtained results were then combined with the peak load demand table to calculate peak load increment, which was then applied to load curves to forecast daily peak load demand one day ahead. However, most of the studies were based on the simple calculation of variable relations or human experts’ opinions to estimate peak load demand before 1971. Most of the results predicted by overrelying on calculation and human experience were unsatisfying, due to the special characteristics of peak load demand.
3.2 Classic peak load demand forecast stage
3.2.1 Regression
Regression analysis decides model parameters by function expression based on the historical data, thus obtains the causal relationship between explanatory variables
(peak loads) to produce peak load demand forecast [8Turner2012Regression]. Mathematically, a general regression model can be represented by Eq (8).(8) 
where represent the model coefficients to be learned from data and denotes the residual left unexplained by this model.
The above model is said to be univariate regression when
describes a univariate random variable, otherwise, it can be described as multivariate regression. Besides, if there is only one explanatory variable (i.e.,
), the model is simple regression, otherwise, it is multiple regression. A parametric regression model is based on the knowing form of , in which parameters are needed to be estimated. When there is a linear relationship between parameters and the explanatory variables , the model is said linear, otherwise, the model is known as nonlinear [8Turner2012Regression].Most of the obtained papers used univariate regression and selected multiple explanatory variables to get precise forecast results. [9a1990A] proposed multiple regressionbased approaches that took calendar effects into account to forecast shortterm system load. The approach produced forecasts using four models: an initial peak forecast regression model, an initial hourly forecast regression model, an adjusted peak model, and an adjusted hourly model. All four models were based on regression, and the forecasted initial peak load and the maximum hourly load were combined with past errors in the adjusted peak model to produce the new peak forecast. Then the new peak forecast was used as a constraint in the adjusted hourly model to produce the final hourly forecasts. The proposed model was more flexible in handling the effects of special days and avoided causing iterative residuals for multipleday forecasts since past forecast errors were considered as one of the influential variables. A regressionbased model was proposed in [946934706], which considered econometric effects such as GDP, consumer price index, and population as extra explanatory variables to perform LTPLF for Zimbabwe. [28newr1999reg] included a new variable, average wind chill, for the winter season, and considered the holidays’ effects by using transformation and reflection techniques to produce better daily peak forecast. [192002Regression] adopted a multiple regression model to linearise the load trend for Tokyo Electricity Power Company. The model is simple and has a promising peak load demand forecast performance with a MAPE of 1.43%. However, the performance of the model on a dataset with more fluctuated load patterns was not satisfactory.
There are a few papers utilizing probabilistic forecasts in the regression framework for peak load demand forecast. Instead of providing an estimate of the peak load demand, the probabilistic forecast is capable of predicting the distribution intervals, in which the uncertainty inherent of the demand could be quantified. [702008Density] presented a new methodology to forecast annual and weekly peak load demand. This method adopted semiparametric additive models to estimate the correlations between predictors and the peak load demand. Same as [946934706], weather, calendar, economic, and population variables were also considered in the paper, and the results showed a remarkable improvement in the forecast accuracy. [902014Long] modeled monthly peak load demand as a function by considering the monthly peak time and the monthly peak load as two key variables. In this paper, the Gaussian process was used to forecast peak load demand. It also proposed a method to optimize the hyperparameters in the kernel function, which was vital to improve forecast accuracy. [952014Non] utilized Alternating Conditional Expectation (ACE) to model hourly peak load during a month. Unlike
most multiple linear regression models, the seasonal and trend components in this model did not require a priori decomposition, and the nonparametric transformed functions could be obtained through ACE. In the model, weather variables such as temperature and humidity were analyzed and used as input to perform a probability density peak load demand forecast.
Regression models are also often found being used combined with other techniques to improve forecast accuracy. [27newdaaHaida1998Peak] extended the model in [192002Regression] by using trend cancellation and estimation techniques to minimize the effects of transitional seasons. [9sBarakat1990Short] performed monthly peak load demand forecast for the central region of Saudi Arabia where three timeseries methods (CensusII multiplicative decomposition, seasonal autoregressive integrated moving average (SARIMA), Winters’ seasonal smoothing) were combined with the regression model. [73aaGOIA2010700] considered intradaily seasonality effect, and proposed a new method to forecast daily peak load based on functional data analysis. They firstly introduced functional clustering to obtain groups that contains similar load usage patterns. Then each group was assigned a specialized functional regression model. Finally, functional linear discriminant analysis was applied to assign new curves to the classified groups to perform peak load demand forecast. The proposed method demonstrated promising performance.
Some representative regression methods for peak load forecast are summarized in TABLE 3. As aforementioned, the advantage of regression analysis lies in the model is usually simple and easy to understand, with fewer parameters and higher forecast efficiency. However, regression analysis usually makes assumptions on the historical data and did not consider the correlation between different time trends, which could limit its applications in some cases. In the following sections, we will review time series models for peak demand forecast to consider the potential correlation between historical data at different time points.
Reference  Model detail  Input Variable(s)  Forecast horizon  Geographic scope  Forecast output(s)  Performance  
[8Barakat1989Forecasting] 







[9a1990A] 







[192002Regression] 







[26news1997Short]  Multiple linear regression 






[281999The] 



Region (substations) 



[73aaGOIA2010700] 






3.2.2 Time series decomposition
There are different time series decomposition methods, such as Fourier series analysis, wavelet methods and empirical mode decomposition (EMD).
A general timeseries decomposition model usually adopts the addition or multiplication model to split the original time series into four subparts: Secular trend (T), Seasonal Variation (S), Cyclical Variation (C), Irregular Variation, (I). In the context of peak load demand forecast:

The secular trend refers to the continuous change of peak load demand in a long period.

The seasonal variation refers to the regular seasonal change of peak load demand.

The cyclical variation is the periodic change in peak load demand over years.

The irregular variation refers to the unexpected change of the peak load demand caused by many random factors.
When predicting the future peak load, each component is calculated separately first, and then the forecasted value of for each subpart is passed to the addition model or the multiplication model to obtain the final prediction.
The addition model of time series decomposition is defined as:
(9) 
where the four components in the addition model are independent of each other, all of which are expressed in absolute quantities and of the same order of magnitude.
On the other hand, the multiplication model of time series decomposition is:
(10) 
Different from the addition model, the four components of the multiplication model are dependent on each other. In general, the secular trend in the multiplication model is expressed in absolute quantity, while other components are expressed in relative quantity.
When , , do not change over time, addition model is usually selected. Otherwise, the multiplication model could be a better choice. It should, however, be noted that, there is a convertible relationship between the addition and multiplication models where log function is one of the effective converting methods [730].
In [2Gupta1971A]
, the addition model is utilized to produce monthly peak load demand probabilistic forecast. It also utilized Fourier transformation to reduce the nonstationary time series to a stationary series. Moreover, Monte Carlo simulation was adopted to simplify the computation process.
[5Fong2011The] designed a comparative experiment to compare the timeseries model using Fourier series with the autoregressive model. Weekly peak load demand forecasts for one year ahead were produced, which showed that although overforecasts exist, the model using Fourier expansion could track the dynamic behavior of peak load demand and produce better results than the autoregressive model.[682009The] utilized a multiplicative model to forecast the monthly peak load on a regional power grid. [8Turner2012Regression] utilized the decomposition method to develop a multi regressiondecomposition model. The proposed method aims to forecast monthly peak loads for one year ahead, and the result was promising with a MAPE of 7.88%. The advantage of this method is that it related the historical load trend with diverse influencing factors, and could simulate additional cyclic effects. [22Choi1996A] used the realworld load data from Korea Electric Power Corporation to perform one day ahead daily peak load demand forecast. In this paper, Fourier transformation was adopted to identify the chaotic characteristics of the time series, and the optimal and embedding dimension and delay time were determined to be used as inputs to train an artificial neural network (ANN) model. MAPE for daily peak load demand forecast of the proposed model was close to 1.4%. As aforementioned, wavelet transformation is also a traditional time series decomposition method, and it can transform the information from the
time domain to frequency domain, and thus capture the lowfrequency and highfrequency components of the peak load signal. In
[88aaa2013A], wavelet decomposition was introduced to combine with other advanced methods to build a hybrid model. Daily peak load demand forecast for Iran National Grid was conducted based on the proposed model.Empirical mode decomposition as a more recent time series decomposition method is also used in peak demand forecast. In [662009Long], empirical mode decomposition was proposed to capture longrun seasonality, shortrun effects, and trend effects for the daily peak load. The load decomposition results given by EMD contain physical meaning related to time series characterizes, and thus can improve the forecast accuracy.
Note that time series decomposition uses the deterministic function to extract information. Therefore, it often ignores the stochastic factors of the original time series, resulting in insufficient information extraction, which can be compensated by the stochastic time series models, as we will discuss in the next subsection.
3.2.3 Stochastic time series models
Stochastic time series models can be generally divided into: the autoregression (AR) model where denotes the order of autoregression; the moving average (MA) model where is the order of moving average; the autoregression moving average (ARMA) model; the autoregression integrated moving average (ARIMA) model where denotes the order of integration; and the seasonal autoregression integrated moving average (SARIMA
) model where are the seasonal parts of the model corresponding to .
A SARIMA model may be written as [802011Discrete]:
(11) 
where: , is the peak load demand observed at time . and denote the nonseasonal and seasonal difference ( is the seasonal length) operators respectively, which transform into stationary time series. is the backshift operator, which is used to represent the backshift of time. When is used for , it means to reverse by one unit of time (). For monthly data, represents data of the same month of the last year. and are the nonseasonal and seasonal autoregression operators, respectively. and are the nonseasonal and seasonal moving average operators, respectively. denote the maximum backshift order for the nonseasonal, seasonal, autoregression, and moving average operators, respectively.
is the white noise at time
. The above model can be represented by SARIMA where represents the seasonal time series, and represents the monthly time series. When , the SARIMA model degenerates into an ARIMA model, and when , the SARIMA model degenerates into the white noise process.It is worth pointing out that AR, MA, and ARMA models are suitable for weak/wide stationary time series. ARIMA is used for nonstationary time series, and SARIMA can deal with nonstationary and cyclical time series.
[5El1986Weekly] developed two models to forecast weekly peak load one year ahead, where MA model for the seasonalcyclic component is utilized. In [121992New], ARMA was used for monthly peak load demand forecast by considering the seasonal patterns and load fluctuations. Some hybrid methods combining ARMA with other methods such as regression models [63newm5398613] and ANN [1332020The] have been seen in the peak demand forecast.
Reference  Model detail  Input variable  Forecast horizon  Geographic scope  Forecast output  Performance  
[30Amjady2001Short] 







[121992New] 







[82aap2011Prediction] 







[84aaf2012Finding] 

Daily peak load demand 




When practical conditions are considered such as economic and cultural factors, it often lead to nonstationary time series problems. For such cases, ARIMA and SARIMA, are widely used. [30Amjady2001Short] utilized ARIMA models to forecast hourly loads and daily peak load demand, incorporating human experience within the model. The proposed approach adopted ARIMA to produce a raw output as the initial input of the modified model, and also took temperature into account to distinguish hot and cold days to perform a regressionbased analysis. The results of the proposed models were compared with ANN, standard ARIMA, and human operators, and the accuracy of the modified ARIMA models outperformed other models. [57aatarticle] used load data from Dubai to build models based on ARIMA and dynamic regression, to forecast monthly peak load where the of method is 0.997. [57aam2007Monthly] developed a model based on SARIMA to forecast monthly peak load for Sulaimany Governorate in northern Iraq. The adequate SARIMA model they found was , and the forecast results gave better opportunities for the power planners to determine the maximum generating capacity for peak load demand. The paper also pointed out that ARIMA was suitable for shortterm forecasting since it prioritized the closer time series. [83aaf2012Forecasting] utilized SARIMA to forecast monthly peak load demand for India. The SARIMA model outperformed the official load forecasting provided by the Central Electricity Authority (CEA) for both static and dynamic horizons in all five regional grids in India. [878] compared SARIMA with HoltWinters multiplicative exponential smoothing, and it showed that the SARIMA model produced better forecast results.
Although ARIMA and SARIMA models have promising forecast results such as in [30Amjady2001Short][57aam2007Monthly][735697708][997117006][1027399017], they usually do not take into account trend fluctuations in the data. Following this, variations based on ARIMA and SARIMA have been proposed in some studies to overcome the limitation and to further improve the forecast accuracy. [56aad4202249] combine generalized autoregressive conditional heteroskedastic errors (GARCH) with ARIMA to define the maximum peak load demand level by considering the unexpected randomness of the load series. [82aap2011Prediction] presented a regressionSARIMA model with generalized autoregressive conditional heteroskedastic errors (RegSARIMAGARCH), which could accommodate the volatility of the daily peak load demand and the multiple seasonality of the mid and long term peak demand. The proposed model was used to conduct daily peak forecast for South Africa, and the comparative experiment showed that the proposed model produced better prediction accuracy than the piecewise linear regression model, the SARIMA model, and the SARIMAGARCH model.
Some representative stochastic time series methods for peak load forecast are summarized in TABLE 4.
As aforementioned, the historical load data for peak load demand forecast are characterized by its high randomness. Therefore, the stochastic time series model is widely used in peak load demand forecast as an effective method to deal with random sequences. However, there are some limitations in the stochastic time series methods. For instance, for MA model, it gives the same weight to all the time series data, which does not necessarily reflect the actual situation. As such, a more flexible weights assignment of the data needs to be considered. In the next subsection, we will discuss the exponential smoothing, which can overcome the above limitation.
3.2.4 Exponential smoothing
Exponential smoothing is a time series analysis method developed based on MA. Exponential smoothing predicts the future peak load according to the weighted average of the historical time series. The recent data are given a larger weight whereas the previous data are given a smaller weight. This is based on the principle that the influence of a certain variable on subsequent behavior is gradually attenuating [764incollection].
A general exponential smoothing model for peak load demand forecast can be written as [730]:
(12) 
where is the forecasted peak load demand at time , and is the smoothing parameter that controls the weights decrease (exponentially).
Exponential smoothing can be divided into several different forms. [26aa27] provides a comprehensive review of exponential smoothing methods, in which 17 basic methods and some extensions based on these methods are described in detail. In general, single exponential smoothing is applied to sequences without trends or seasonality, and second exponential smoothing is applied to time series that only have trends. The triple exponential smoothing (also known as the HoltWinters) targets sequences with both trends and seasonality. When modeling the seasonal data, a HoltWinters model consists of three smoothing equations each having its smoothing parameters: trend, level, and seasonality components [17du12Yaffee2000Introduction]. When the seasonal variations are constant and uncorrelated with time series, the additive HoltWinters model can be hired. However, if the seasonal variables change proportionally with time series, the multiplication model can be chosen to predict the seasonal data.
The triple exponential smoothing is the most commonly used method in the reviewed papers. [9a1990A] used exponential smoothing to correct the forecast values that were consistently too high or too small, and the adjusted model showed good capability to track the fastchanging load demand and produce hourly forecasts with higher accuracy. [26aaMasood1997EDSSF]
proposed a decision support system based on a variety of time series techniques. The nearoptimal monthly peak forecast models were built by exponential smoothing, BoxJenkins vector, and dynamic regression to perform shortterm peak load demand forecast. Moreover, a comprehensive assessment of the models was provided by using several evaluation indicators such as MAPE, Akaike information criterion (AIC), Bayes information criterion (BIC), and MSE. The results of the proposed system showed that different models performed differently towards different regions of the country.
When trends and seasonal variations dominate the time series, the HoltWinters exponential smoothing usually outperforms the ARIMA. This was confirmed by [38aaJ2017Short], which used the HoltWinters Smoothing to forecast peak load demand for England and Wales. The model they built described the intradaily and intraweekly seasonal cycles, and the comparative results showed that this approach achieved a better accuracy than the ARIMA model.
Exponential smoothing is often used in combination with other methods to build composite models. [9a1990A] applied exponential smoothing to a regression model and compared it with the regression model with ARIMA, and the experiment revealed that most of the initial forecasts were corrected after applying the exponential smoothing. However, the smoothing coefficient of exponential smoothing needs to be artificially selected, and if the time series fluctuates wildly (e.g., the peak load), it will produce unsatisfactory prediction results [billah2006exponential].
3.2.5 Kalman filter and grey prediction
Kalman filter (KF) is a linear system state equation that can estimate the system state optimally through the input and output observations of the system. Since the observed data include the noise, the optimal estimation can also be regarded as the filtering process. KF comprises two processes: prediction and correction. During the prediction, the filter makes a forecast of the current state using the estimation of the state from the previous timestep. The correction was performed using observations of the current state to correct the predicted value acquired in the prediction phase to obtain an improved estimation. Besides, apart from being known as the recursive state estimator for linear systems, some KF variants are also capable of nonlinear systems [Kalman1].
[211995Short]
presented a hybrid learning scheme that consists of unsupervised and supervised learning phases to forecast daily and weekly peak/average load profiles.
The KFbased learning algorithm was engaged to find the optimum parameters and functions in the supervised learning phase. [85aab2012Building] selected KF as one of the benchmarks to carry out an integrated hybrid model for STPLF.The grey system is between a white box model and a black box model, where it focuses on learning the internal structure, parameters, and general characteristics, and tries to decipher known information as much as possible. Grey time series prediction model was constructed based on the observed historical time series reflecting the predicted peak load characteristics.
[612008The] used grey correlation theory in sensitivity analysis to select relevant meteorological variables for daily peak load demand forecast. [622008Electricity] developed a variable weight combination forecasting model by combining the grey model and ARIMA model, which was used to forecast load consumption in the peak load month for MTPLF. The hybrid model was proved reliable to handle the nonsmooth characteristics of monthly load data and achieved satisfactory forecast accuracy. [20] proposed a hybrid grey model to forecast yearly peak load and its occurring date simultaneously for LTPLF. The model only needed a small amount of historical annual peak load data to produce the forecast results, and it was claimed the model was highly adaptive to dynamic changes of yearly peak load.
3.3 Advanced peak load demand forecast stage
With the emergence of artificial intelligence (AI) and big data, traditional AIbased techniques such as fuzzy logic (FL), expert system (ES) and genetic algorithm (GA) and modern AI and machine learning based methods such as artificial neural network (ANN) and deep learning, support vector machines (SVMs), and ensemble models have been adopted for peak load demand forecast.
3.3.1 Traditional AIbased methods
As a bridging stage between the classic and advanced methods, there are a few studies in the reviewed literature using traditional AIbased methods for peak load forecast.
Fuzzy logic imitates the uncertainty concept of judging and reasoning of the human brain. It applies fuzzy rules to the reasoning of the system with the uncertain model to deal with the fuzzy information that is difficult to handle by conventional methods. [27newdaaHaida1998Peak] adopted separate fuzzy models to predict the peak and valley load, and the simulation results showed a good prediction accuracy. [281999The] proposed a fuzzy regression approach to peak load estimation. The effectiveness of the proposed method was demonstrated by forecasting daily load consumption and daily peak load demand at the distribution level.
Expert system has also been used in peak load forecast. An ES is a computer system, which is a knowledgebased programming method that absorbs the domain knowledge and experience of experts and makes intelligent decisions based on the reasoning of such knowledge and experience. A complete ES consists of the knowledge base, the reasoning machine, the knowledge acquisition part, and the interpretation interface. [362002Artificial] implemented a knowledgebased ES to forecast yearly peak load for both typical fastdeveloping system and regular developing system, and the knowledge base of this system was composed of both static and dynamic variables. The results proved that the knowledgebased ES yielded the best performance among all considered models (time series model, traditional ES model, econometric model).
Fuzzy logic has been combined with ES to produce better prediction results. [13aaHsu1992Fuzzy] built an ES based on fuzzy set theory to forecast hourly load in Taiwan by improving the estimation accuracy of the
peak and trough loads. The proposed ES could handle uncertain weather variables and heuristic rules, and it
could update peak and trough loads iteratively to produce a more accurate forecast. [30newa30Kiartzis2000A] proposed a fuzzy ES to forecast morning and afternoon valley, noon and evening peak based on weather information and historical load data from the Greek power system. The results showed that fuzzy ES could forecast daily peak and valley loads reasonably well compared with neural networks.Genetic Algorithm has also been adopted in building models for peak load forecast. GA is based on natural selection and population genetics, which makes the population evolve to the optimal region in the searching space through selection, crossover, variation, evaluation, and other operations. GA can also be used to optimize the parameters of the forecast models such as initial connection weights of the networks and the threshold values of nodes for neural networks. In [85aab2012Building], 13 years of regional data from France were utilized for training a realvalued genetic algorithm (RGA)based neural network with support vector machine (NNSVM) model. Daily load profile forecast and monthly peak load demand forecast were generated, and the comparative experiments showed that the proposed model was suitable for forecasting longterm peak load. [87aaaEl2013Electric] implemented a comparative analysis for yearly peak load demand forecast based on the unified Egyptian network data. In the experiment, models based on GA, leastsquare, and least absolute value filtering were trained separately. The results showed that the model developed based on GA gave the best performance with the lowest forecast error of 0.70%.
3.3.2 Artificial neural network and deep learningbased methods
ANN was proposed in 1991, and it has attracted much attention in the peak load demand forecast. Many advanced methods based on ANN such as deep learning methods have since been proposed with good performance.
Artificial neural network (ANN) is inspired by
the anatomy of the human brain, and it consists of artificial neurons in multilayers for information communication. An example structure of ANN is shown in Figure
6.A typical ANN consists of the input layer, the hidden layer, and the output layer. Except for the input layer, each neuron in ANN is connected to neurons of the former layer (i.e. the input neurons), with each connection corresponding to a weight. The sum of the product of all input and the corresponding connection weights are passed to an active function to calculate each neuron’s final value, as is shown in Figure
7.The activation function needs to be selected according to data characteristics, and the Sigmoid function is the most commonly used active function of ANN models
[17du20Agatonovic2000Basic]. One of the wellknown ANN is the backpropagation (BP) neural network, a multilayer neural network with error backward propagation. BP is widely used for its satisfying performance on prediction tasks. It, however, suffers from high computational cost and low computational efficiency, and therefore the radial basis function network (RBFN) was brought up to deal with this. The input variables of RBFN pass directly to the hidden layers without additional weights, and RBFN is proved to be less timeconsuming than the traditional multilayer neural network
[692010Peak].[11eSaeed1991Electric] collected hourly temperature and load data from Seattle to build a model based on ANN, and the trained ANN was then used to forecast daily peak load and hourly load and total daily load. The mean error of the peak load forecast model was ranged between 1.55% and 2.60%. This ANN model allowed a more flexible relationship between weather variables and load patterns. However, the model produced higher errors when people have specific startup activities, which indicates that the use of additional calendar variables should yield better results. [37aa2003Regional] utilized ANN to perform annual regional peak load demand forecast of Taiwan. The proposed model had three input neurons corresponding to economic, demographic, and weather variables, two hidden neurons, and one output neuron representing the regional peak load that needs to be estimated. The effectiveness of the proposed model was demonstrated by comparing the forecast accuracy with a regressionbased model. [63newpSaini2008Peak]
was presented to forecast daily peak load up to seven days ahead based on a feedforward neural network with
the steepest descent, Bayesian regularization, resilient and adaptive backpropagation learning methods. [108aaa2016Artificial] presented a new model based on ANN by employing Bayesian regularized neural network model with LevenbergMarquart (LM) backpropagation algorithm to forecast daily peak load demand for a commercial building complex. [74aaarticle] forecasted annual peak load demand five years ahead for Iran based on RBF. The paper selected variables related to yearly incremental growth rate and pointed out that longterm forecasting should pay more attention to economic factors than weather conditions.There are a few studies combining multiple ANNs to improve the forecast accuracy, in which the peak load is usually generated as a byproduct to enhance the forecast performance further. [27] developed a model based on cascaded ANNs (CANNs) to forecast the load profile one day ahead where the daily peak, valley, and total load consumption were first estimated by an ANN, and then such forecasted values were used as additional input data for the next day’s load profile forecast. The results revealed that the cascaded structure of ANN could produce more satisfactory forecasting results. [89aaiHern2013Improved] proposed a multistage ANN to forecast load demand in two stages. Firstly, the daily peak and valley values were generated by ANN. Second, based on the peak and valley load, the whole electricity demand curve was produced. This method outperformed normal ANN for significantly reduced the MAPE. Although multiple ANNs can provide promising results for load forecasting, [89aaiHern2013Improved] revealed that the multistage ANN suffers from higher computational complexity than a single ANN.
In general, ANN has apparent advantages for its adaptive learning and function approximation capabilities. Since ANN can deal with the high randomness and uncertainty of the time series well, it is recommended for STPLF. However, ANN models suffer from long training time and are easy to fall into local optimum. Therefore, researchers mostly focused on optimizing the neural network structure, such as combining with fuzzy logic [63newm5398613][25Mandal1997Fuzzy][99aaa], to further improve the model training efficiency and forecasting accuracy.
Many modern and advanced methods have emerged based on ANN, such as selforganizing map (SOM), recurrent neural network (RNN), and convolution neural network (CNN). In particular, longshort memory (LSTM) is one variation of RNN
[1328985197].SOM is often used as improving method due to its unsupervised learning characteristics together with other forecast methods to produce the final results. In
[10d1991Design, 101991Design], SOM was adopted to cluster days with similar load consumption patterns. Then, based on a feedforward multilayer neural network, daily peak load and valley load were estimated to compute the desired hourly load. [63newcAMINNASERI20081302]adopted SOM to cluster load profiles, and principal components analysis (PCA) for reducing the dimensions of the data. Then, separate feedforward neural network was trained for each cluster. The comparative analysis demonstrated the superiority of the proposed method.
RNN and CNN are commonly used deep learning methods, which have more complex network structures, such as more hidden layers and recurrent structures. Deep learning models can better capture the dynamic characteristics of peak load to provide a more accurate and stable prediction and have more robust learning and generalization ability than the standard ANN, especially in the big data era. [1262019Deep] proposed a method to combine RNN with dynamic time warping (DTW) for shortterm peak load demand forecast. The DTW was introduced to identify load curves with similar trends, and a bespoke gated RNN was trained to forecast daily peak load demand one month ahead based on the halfhourly load data. The proposed method achieved a satisfactory MAPE of 1.01%. In addition, comparative analysis suggested that the DTW distance had the ability to adapt to the dynamic change of nonstationary daily peak load series. In [1348994442], the LSTM layer was adopted to forecast weekly peak load in Korea. In this study, input variables including weekly peak load, weekly temperature, and weekly GDP of the previous year were used. The LSTM layer in this paper was proved to be able to capture more useful characteristics of the load data, and results showed good forecast accuracy with the lowest forecast error of 2.16%.
About 60 studies have employed ANN and deep learning based methods to perform peak load demand forecasting, revealing its dominant popularity in the peak load forecast. Some representative references related to these methods are listed in TABLE 5.
Reference  Model detail  Input variable  Forecast horizon  Geographic scope  Forecast output  Performance  
[11eSaeed1991Electric] 







[13Ho1992Short] 







[26newc1997Cascaded] 







[362002Artificial] 







[37Saini2002Artificial] 







[63newcAMINNASERI20081302] 







[1262019Deep] 






3.3.3 Support Vector Machines
As one popular machine learning method, support vector machines can minimize actual risk by seeking risk minimization so that to get satisfactory forecasting performance. The variation of SVMs for regression problems is represented as support vector regression (SVR) [8521], which is efficient for largescale regression problems [855].
Given a training dataset where denotes the th observation (dimensional input vector), is the output corresponding to , and denotes the size of training set. For nonlinear SVMs, the basic idea is to introduce kernel as below:
(13) 
where is the hypothetical higher dimensional feature space. Coefficients and need to be estimated based on the structure risk minimization principle.
[642009Forecasting] introduced local prediction based on SVM for electric daily peak load forecast. The local prediction can find the approximation function in the reconstructed embedded space. The partitioned inputs were assigned with an SVM model in each subdomains, and thus local prediction could make better forecasts than the single/global model. [1042016Peak] developed a novel onlineSVM model based on the standard SVM. The proposed model was used to forecast daily peak load for the residential building in Surrey, and results showed that the model could be a more intelligent tool for smart grid systems. [1158319611]
adopted SVM to build a model for monthly peak load prediction. Firstly, feature selection was implemented based on correlation analysis. Then the training set was reconstructed by the topology network and random walk with restart (RWR) algorithm. Moreover, a feedforward correlation was utilized to minimize the effect of unknown errors. Finally, the preprocessed training data was fed into SVM to train a model with higher accuracy.
Similar to ANN, SVMs are more suitable for STPLF and can cope with nonlinear and high dimensional data
[1042016Peak][1158319611]. The disadvantage of SVMs is also similar to that of ANN for suffering from long training time with large data sets. Besides, the hyperparameters of SVMs need to be manually selected, which is also a complex step that needed to be optimized.
3.3.4 Ensemble Learning
Ensemble learning trains multiple learners and aggregates each learner’s predicted results to obtain the final output through combining strategies, which generally involve averaging, voting, and stacking [EnsemblePolikar:2009]. According to the dependencies between learners, one possible classification of the popular ensemble learning methods is as follows:

Learners have to be generated in sequence to satisfy the strong dependency between them (boosting).

Learners are allowed to be simultaneously generated
since there is no strong dependence between them (bagging and random forest).
Ensemble learning has been widely used in peak load demand forecast in recent years. Ensemble models used in the reviewed studies mainly are: boosting, bagging, and random forest.
Boosting adjusts the sample distribution according to the performance of the initial learner so that samples with the wrong prediction get more attention than others, and then it trains the next learner based on the adjusted sample. The process is iterated until a specified number of learner clusters are generated, or the aggregated learning criteria reaches the stop threshold [EnsemblePolikar:2009]
. Commonly used boosting algorithms in the reviewed papers are adaptive boosting (AdaBoost), boosting tree, gradient boosting (GB) and Extreme gradient boosting (XGBoost).
[boosting2018AHMAD20181008] adopted three machine learning models (ANN with nonlinear autoregressive exogenous multivariable inputs, multivariate linear regression, and AdaBoost) to predict load profile one month, one season, and one year ahead at the district level. During training, datasets with different sizes were utilized for training models for different prediction intervals.This paper also adopted feature extraction to select essential variables, and the results showed that the AdaBoost outperformed other models significantly for all prediction intervals. Moreover, for seasonal forecasting, the error range of AdaBoost was
relatively narrow, which indicated that the model trained based on AdaBoost was more capable of capturing the dynamic change of load curves. [boosting2019ZHANG2019116358] conducted shortterm load forecasting for southern California. In this study, different models were adopted (multivariate linear regression, random forest, and GB) and the installed solar capacity was identified to be an important feature during the forecasting. The comparative experiment results revealed two insights: (1) The fact that the installed solar capacity became an important feature suggested that new and clean energy resources are important components in the system that researchers need to pay more attention to; (2) Different forecasting accuracy in different periods indicated that being able to capture the fluctuation of load curves is important for forecast. [boosting2020LU2020117756] combined complete ensemble empirical mode decomposition with XGBoost to predict daily load consumption, daily peak load, and daily water delivery. Compared to traditional XGBoost, the hybrid model showed a lower MAPE of 5.99% for the daily peak load demand forecast.Bagging is based on bootstrap sampling. It carries out multiple times of putback sampling for a given dataset and trains learners simultaneously based on the obtained sampling set. When bagging is applied to a regression task, a simple mean or median can be adopted to obtain the final output [bagging201853]. [bagging2018DEOLIVEIRA2018776], for the first time, utilized bagging to forecast monthly load demand for countries with different development stages. The paper combined bagging with exponential smoothing and SARIMA and then used simple mean and median to aggregate the results from single learners. A new variation of bagging, Remainder Sieve Bootstrap (RSB) was also proposed to enhance the forecasting results, and the result showed that the proposed method yielded the best MAPE for both developed and developing countries.
Random forest (RF) can be seen as an extension of bagging, which further introduces random selection in constructing individual decision tree learners based on bagging.
The RF firstly uses bootstrap to generate its training sets, and then a decision tree is constructed for each of the training set. Features are randomly selected and an optimization criteria is used to guide the split of nodes in constructing each decision tree learner. The prediction strategies of RF are: voting for the classification task, and averaging for the regression task [EnsemblePolikar:2009].
As the number of learners increases, RF generally converges to a smaller generalization error than bagging. Moreover, the training efficiency of RF is often superior to bagging, benefiting from the randomness in constructing single learners. In [97newdFAN20141]
, an ensemble method combining eight popular forecasting algorithms (multiple linear regression (MLR), ARIMA, SVM, RF, multilayer perceptron, boosting tree, and multivariate adaptive regression splines) is proposed for peak load forecast. Each model in the studies was
assigned to a weight by GA. The results showed that SVM and RF had the largest weights, which indicates that these two algorithms contributed more potential gains for enhancing peak load demand forecast accuracy. [RF2018WANG201811] adopted RF to predict hourly load usage patterns for two educational buildings in North Central Florida, and the feature importance distribution was also produced as a byproduct. The proposed model was compared with the regression tree and SVM, and the results showed that RF had the best superiority among all the trained models. Moreover, the feature importance distribution also proved that the influential features changed depending on different education periods, which indicated that the load usage behavior of educational buildings is highly related to different semesters.3.3.5 Hybrid techniques
Many novel hybrid models with satisfactory forecast performance have been proposed in the reviewed papers, and some of the models have already been discussed in the previous section. TABLE 6 summarised papers utilizing hybrid models according to combinations of methods in different forecast methods development stages.
Combination of the stages  Hybrid models with references  

Human knowledge + ARIMA [30Amjady2001Short]  
Classic stage + Classic stage 


Classic stage + Advanced stage 


Advanced stage + Advanced stage 

Manual + Classic stage. There are a few papers that proposed hybrid models based on the combination of manual stage and classic stage, in which [30Amjady2001Short] was the earliest work among the obtained papers that utilized the combination of classic forecasting methods with human experience. In this paper, human experts’ opinions were selected as one of the initial input variables for the daily peak load demand forecast. The proposed modified ARIMA was compared with standard ARIMA, and the results revealed that the former had the best performance with the lowest MAPE of 1.01% for predicting the daily peak load of cold Sunday to cold Wednesday.
Classic + Advanced stage. Some papers combined methods from the classic stage with methods from the advanced stage. Among which, [281999The] combined fuzzy logic with a regression model. The fuzzy set theory is good at representing the uncertainty of the data, which allows the use of additional customer information as inputs to the forecast model, and could achieve more accurate forecasts. [926867500] used the combination of PCA and MLR to forecast weekly peak load at the distribution level. Firstly, the correlation analysis was utilized to select the important features, and the PCA was adopted to reduce the redundancy of the input dimensions. Finally, the output from PCA was applied to MLR to perform midterm peak load prediction. This hybrid model was simpler than many advanced AIbased methods, yet could also achieve satisfactory forecast accuracy.
Advanced + Advanced stage. From TABLE 6 we can see that most of the proposed hybrid models are the combinations of the advanced stage methods. Among which, [88aaa2013A] proposed a hybrid method to forecast daily peak load for Iran. The model was built using the combination of wavelet decomposition, NN, and GA. Historical load data and weather variables from three different cities were used to train the model. The proposed model was also compared with other advanced models, and the results showed that this model outperformed most of the models. [30newa30Kiartzis2000A], proposed a hybrid model combining fuzzy logic with the expert system. In this study, fuzzy logic has the advantage of obtaining the uncertain and incomplete information from the realworld data, which will be then considered as the input of the expert system, such that the hybrid model can make more accurate predictions based on the acquired knowledge. [63newm5398613] and [99aaa] both combined fuzzy logic with neural network. The advantage of the hybrid model is that the neural network has strong selflearning ability and can make good use of the expression provided by fuzzy logic to produce forecasts with higher accuracy. Moreover, the fuzzy neural network is effective when handling peak loads with strong fluctuations, and it is good at capturing the calendar effect than other advanced models. In [85aab2012Building], the realvalued genetic algorithm (RGA) based neural networkSVM model was proposed. In the model, the neural network was responsible for producing the growth index for the forecast target, SVM was adopted to output the deviation value, and the RGA was adopted to select optimal parameters for the neural network and SVM. The experiment demonstrated that the proposed hybrid model had good performance on both short and midterm load demand forecast.
4 Discussion and summary
This section will first give a comparative analysis of the peak demand forecast methods. Then, improving methods for peak load forecast models are discussed. Finally, a comprehensive summary and discussion of the papers reviewed will be presented.
4.1 Comparative studies of different models
Each forecasting method has its advantages and disadvantages, therefore, it is necessary to compare the performance of various forecasting methods to understand their advantages and limitations. To this end, we will summarize the existing comparative studies in the literature. Some representative comparative studies are listed in TABLE 7 including composite/combined models or comparison analysis. A composite model could refer to intermethods composite models (i.e. hybrid models) and resultweighted composite models. Based on different development stages of forecast methods (classic stage and advanced stage), existing comparative studies could be categorized into intracomparison (e.g. methods within the classic stage) and intercomparison (e.g. methods from both classic and advanced stage).
When compared with human expert opinions, the methods in the classic stage showed good performance, as presented in [26aaMasood1997EDSSF]. In the classic stage, regression, as the most popular method in the reviewed papers, are often selected as the benchmark for building hybrid models [9a1990A] [37aa2003Regional] [502005Comparison] [82aap2011Prediction]. For instance, [9a1990A] combined the exponential smoothing with regression to forecast the daily peak load demand, and compared the results with the combination of ARIMA and regression. Results showed that the former model could alleviate the bias caused by the latter. [82aap2011Prediction] also used regression as a benchmark to compare its performance with the hybrid model (RegSARIMAGARCH).
In comparing the methods in classic stage with methods in the advanced stage, [37aa2003Regional] utilized ANN and regression to forecast annual peak load demand for Taiwan and the results showed that ANN could achieve a better performance. For instance, [502005Comparison] combined GA with symbolic regression to build an STPLF framework, and the results showed the hybrid model could achieve comparable performance to an ANN model.
Reference  Model/Experiment type  Detail  Forecast contents  Performance  
[26aaMasood1997EDSSF]  Composite model 




[37aa2003Regional]  Comparative analysis  ANN vs Regression 



[9a1990A] 





[502005Comparison] 





[82aap2011Prediction] 





[30newa30Kiartzis2000A] 





[63newm5398613] 

ANFIS (Fuzzy logic + NN) vs ARMA 



[85aab2012Building] 





[1288791587]  Comparative analysis 




[boosting2020LU2020117756] 




Considering different distribution and diversity of the data and problem, hybrid methods are not always achieving satisfactory performance and sometimes could be counterproductive. [63newm5398613] compared ARMA with a hybrid model (FL + ANN) for daily peak load demand forecast. The obtained results showed that ARMA performed better when samples were trained with weekends whereas the proposed hybrid model gained better forecast accuracy when excluding the weekends. [85aab2012Building] combined realvalued GA with SVM and ANN, and the hybrid model was then used for producing daily peak load and its occurring time. The experiment compared the proposed model with other models (realvalued GASVM, KF, RBFN). Under the same experimental conditions, surprisingly the realvalued GASVM model produced the worst results.
There are also a few studies that conducted the comparative analysis based on methods in the advanced stage only. For example, [1288791587] formulated peak load forecast as a classification problem and compared several methods including LSTM, SVM, RF, CNN, and Adaboost. The results showed that among all the methods, LSTM had the best performance following by SVM, RF, and CNN, whereas Adaboost produced unsatisfactory forecast results. [boosting2020LU2020117756] proposed a hybrid model (complete ensemble empirical mode decomposition (CEEMDAN) combined with XGBoost), which was then compared with other models such as CEEMDANRF, RBFN. The results revealed that the proposed model generated the best performance, whereas RBFN had the largest MAPE among these three models.
4.2 Improving methods for peak load demand forecast models
As aforementioned, hybrid methods by combing different methods could be an option to improve the forecast accuracy. There are some other measures that could be taken to further improve the forecast performance such as through optimizing the model inputs (data normalization, feature selection and transformation), and improving the models/algorithms (e.g. by integrating clustering methods).
4.2.1 Data
The magnitude difference between the data set and various variables is likely to lead to the deviation prediction of the training algorithm. Many training algorithms, such as SVR, require input variables of a similar order of magnitude. Beside, in the real scenarios, load data often need to be normalized due to privacy requests [73aaGOIA2010700]
. Therefore, data normalization is a necessary preprocessing step for training the model. Among the reviewed papers, the commonly used data normalization methods are: zero mean normalization (Zscore normalization)
[63newcAMINNASERI20081302][1042016Peak] and MinMax normalization [892014Linguistic][108newlJulio2016Linear].The training data size is another important factor that could affect the output accuracy of the model in the training process. If the training size is too small, information learned by the model will be insufficient, and the performance will be poor as a consequence. On the other hand, too much training data will lead to low computational efficiency. Therefore, a good tradeoff and balance between the training size and the computation time is worth investigating. [108aaa2016Artificial] considers training data of four different lengths (one week to four weeks) in forecasting subhourly load usage and daily peak load. The training results showed that the larger the training size, the higher the training accuracy of the neural network model. In [84aaf2012Finding], different training sizes were used to predict the peak load two days to one week ahead. Specifically, the training data are the hourly load from New South Wales in the past three months, six months, nine months, and one year, respectively. The results showed that the model trained with six months of historical data is the best at predicting the peak load in the coming days among all the models.
4.2.2 Feature transformation/Feature selection
As aforementioned, the input variables are often numerous especially in the big data era when training a forecasting model. However, many variables may have unrelated characteristics with the target/ response variable , and variables may also be interdependent, which may easily lead to long training time and decreased forecast performance.
Feature transformation and feature selection are usually adopted to address the problem [FSTescolano2009feature].
Feature transformation aims to get transformed features
by creating a new feature space and the commonly used methods include PCA, independent component analysis (ICA), and linear discriminant analysis (LDA). Feature selection
[fsDASH1997131] is to select a subset from the original feature space and commonly used methods include filtering, wrapper and embedding.Most of the reviewed studies utilizing feature selection on peak load forecast adopted filtering and wrapper [108aaa2016Artificial] while those using feature transformation utilized PCA. For instance, PCA was compared with correlation analysis in [97newdFAN20141], in which the original pattern matrix of the training data is 281095. Through correlation analysis, variables with correlation factors greater than 0.95 were selected. By applying PCA, the dimension of the input matrix was reduced to 11100. After combining the userdefined neural network to train the model for daily peak load demand forecast, the forecasting accuracy showed that the trained model using PCA was superior to correlation analysis both in computational time and training accuracy.
4.2.3 Clustering methods
With the installation of smart meters, high resolution distributed energy consumption data (e.g. at building levels) becomes available, which provides opportunities in studying different behaviours of forecast models under different buildings. For instance, [dai2020energy] compared performance of different forecast models on different buildings and concluded that clustering buildings based on their historical load usage patterns should be considered to produce more meaning insights (e.g. to improve forecast accuracy) instead of their predefined building use types. Clustering methods divide the data into different clusters according to certain standards, such as distance criterion. After clustering, the data within the same cluster have great similarity, while the data belonging to different clusters have great difference [32008A]. The accuracy of peak load forecast can be improved by training different models for different clusters and then obtaining the aggregated final forecasts.
Clustering methods can be divided into partition based clustering (e.g.
means), hierarchical clustering, densitybased clustering (e.g. densitybased spatial clustering of applications with noise (DBSCAN)), and modelbased clustering (e.g. Gaussian mixture models)
[42013The]. Some studies employing clustering in the peak load forecast are summarized in TABLE 8 where commonly used clustering algorithms in peak load forecast are means, hierarchical clustering, SOM, and fuzzy clustering (FC).Clustering methods  References 
Kmeans and its extensions  [97newdFAN20141],[532007A],[108newlJulio2016Linear] 
Hierarchical clustering  [552006Peak],[582007Load] 
Fuzzy clustering  [108newdLaouafi2016Daily],[892014Linguistic],[412004Peak],[492005An], 
SOM  [56newd2006Developed],[10d1991Design],[63newcAMINNASERI20081302] 
means is a classical algorithm of partition based clustering, which has high efficiency when handling largescale data. Some variants based on means, such as the entropy weighted means [97newdFAN20141] have also been used in peak load forecast. Hierarchical clustering can be classified into aggregation hierarchical clustering and splitting hierarchical clustering [Cluster]. For instance, [552006Peak] adopted hierarchical clustering to optimize the input daily data of a feedforward neural network (FNN) to predict load usage during a peak period, and the results demonstrated that the FNN could converge more quickly and produce more accurate results. SOM is a commonly used clustering method owing to its unsupervised feature. [63newcAMINNASERI20081302] firstly utilized SOM to cluster peak loads, then each cluster was trained separately by FNN to get a specified model. Results showed that the proposed hybrid method is effective for daily peak load forecast.
The above methods belong to hard clustering since each data point can only be assigned to a single cluster. Instead, fuzzy clustering such as Fuzzy Cmeans is a soft clustering method where each observation can belong to multiple clusters with corresponding membership coefficients [bezdek2013pattern]. For instance, [412004Peak] adopted fuzzy clustering to cluster peak load patterns according to the working/ nonworking days. [492005An] combined fuzzy clustering with FNN to forecast load curves during peak load period.
4.3 Summary of the reviewed studies
TABLE 9 gives a comprehensive summary of the reviewed papers including their forecasting periods, forecast outputs, input variables, improving methods and geographical scope.
It is worth mentioning that the classification of peak load forecast methods into three stages (i.e. manual, classic and advanced) generally aligns with the evolving of power systems. In the manual and classic stage, the traditional energyintensive industry dominated the electricity market with a relative stable peak demand patterns. Peak load forecasting based on statistical methods were commonly used. With the development of smart grids and the changing energy landscape at both demand side (e.g. demand side management and electric vehicles) and supply side (e.g. intermittent renewable energy supply at both transmission and distribution level), peak demand patterns become more random and less predictable. As such, more advanced methods that can better take advantage of big data and capture complex patterns such as deep learning and hybrid machine learning methods are preferable choices.
In addition, different from traditional load forecasting, the occurrence time and magnitudes of the peak demand are equally important in peak load forecasting. The peak load occurrence time is a field that may be more related to extreme value theories or quantile regression because of some rare events. Moreover, considering the uncertainty of peak load, it is also an effective forecasting method to take the peak load as anomalous data to quantify its occurrence probability and magnitude probability
[ProbabilisticLFFB].Methods 

Improving methods  Input variables  Geographic scope  Output  References  
Clustering  FS/FT  H  W  C  E/O  Region  Country  City 

V  V+T  LP  
Regression  STPLF 


MTPLF  [8Turner2012Regression],[926867500],[932014Peak],[1248671682],[902014Long]  
LTPLF  [27],[1062016Robust],[1170Long],[702008Density]  

STPLF 


MTPLF  [622008Electricity],[57aam2007Monthly],[121992New],[7IEEE],[5Fong2011The],[802011Discrete],[83aaf2012Forecasting]  
LTPLF  

STPLF  [662009Long],[88aaa2013A],[22Choi1996A],[1087893595]  
MTPLF  [2Gupta1971A],[5Fong2011The],[8Barakat1989Forecasting],[682009The],[802011Discrete]  
LTPLF  

STPLF  [9a1990A],[38aaJ2017Short],[108newdLaouafi2016Daily]  
MTPLF  [9sBarakat1990Short],[121992New],[26aaMasood1997EDSSF]  
LTPLF  

STPLF  [211995Short],[85aab2012Building]  
MTPLF  
LTPLF  

STPLF  [612008The]  
MTPLF  [622008Electricity]  
LTPLF  [20]  
ANN  STPLF 


MTPLF  [57aalOtavio2007Long],[32008A],[401372805],[441412874],[451414771],[481556396]  
LTPLF  [37Saini2002Artificial]  

STPLF  [13aaHsu1992Fuzzy],[30newa30Kiartzis2000A]  
MTPLF  
LTPLF  [36newL2002Long]  

STPLF 


MTPLF  [720The]  
LTPLF  

STPLF  [34],[502005Comparison],[720The],[816121762],[85aab2012Building],[88aaa2013A],[97newdFAN20141]  
MTPLF  
LTPLF  [87aaaEl2013Electric]  
SVMs  STPLF  [602008Special],[612008The],[642009Forecasting],[1042016Peak],[1112017Combining],[1158319611],[1168330143],[1288791587]  
MTPLF  
LTPLF  
Boosting  STPLF  [boosting2019ZHANG2019116358],[boosting2020LU2020117756]  
MTPLF  [boosting2018AHMAD20181008]  
LTPLF  [boosting2018AHMAD20181008]  
Bagging  STPLF  
MTPLF  [bagging2018DEOLIVEIRA2018776]  
LTPLF  
RF  STPLF  [97newdFAN20141],[RF2018WANG201811],[RF2020SATREMELOY2020114246]  
MTPLF  
LTPLF  
CNN  STPLF  [1288791587],[1298881305]  
MTPLF  
LTPLF  
RNN  STPLF  [1262019Deep],[1272019Evolutionary],[1288791587]  
MTPLF  [1328985197],[1348994442]  
LTPLF 
For the input variables, historical load data, weather variables, and calendar variables were usually used in STPLF. On the other hand, economic and other variables such as population growth rate were frequently used in MTPLF and LTPLF. Moreover, due to the high randomness of peak load, the forecast model is greatly affected by small probability events such as extreme weather and accidental events. Accidental events vary among individuals/ entities, whose impact is often limited to a small range and therefore difficult to forecast. Although extreme weather also belongs to the small probability event, its impact is usually well studied. For instance, it is necessary to focus on climatic factors such as the maximum (lowest) temperature, the duration of the high (low) temperature, and the humidity. The maximum (lowest) temperature determines the peak load value. The duration of high (low) temperature affects the peak occurring time range, and the humidity further aggravates the difference between the physical and the actual temperature, thus affecting consumers’ electricity usage decisions.
Most of the forecast methods utilizing improving techniques (e.g. clustering) belong to the advanced stage of peak load demand forecast, which indicates that research attention to this field is increasing. However, it is worth pointing out that although there are many clustering methods available, the current load curvybased clustering heavily relies on additional physical and socialeconomic information from entities/users (in other words, the domain knowledge) in order to properly interpret clustering results. More efforts on how to effectively incorporate domain knowledge into the forecast methods and improving techniques are needed.
As for the forecast geographic scope, researchers usually consider peak load forecast over a wide range of regions or countries during the classic stage. With the development of smart grids and the installation of smart meters at the local level, there are more high resolution temporal and spatial data becoming available [42013The]. In addition, the increasing penetration of distributed energy resources (e.g. electric vehicles and microgrids) coupled with distributed intelligence and local energy applications [5Fong2011The] brings the operation and maintenance of power systems into a new era of disaggregated environment. From the perspective of peak load forecast , the highly random human activities will have higher impact on the forecast performance in small geographic areas (e.g. community level) than aggregated level (e.g. region/country) [1072016Distribution]. Therefore, more research efforts on the interaction between electricity usage decisions of end users and disaggregated load forecasting are needed in the future.
5 Conclusion and future work
Motived by the importance of peak load demand forecast from the perspectives of electricity market stakeholders, this paper carries out a systematic review of the peak load demand forecast, which aims to summarize existing studies on the topic and provide guidance for future research. First, we aim to provide an unified problem definition for peak demand load forecast. Then, the peak load demand forecast methods were categorized into three stages based on their development timeline, and a thorough review of relevant methods in each stage was conducted. Moreover, a comparative analysis of different forecast methods was presented, and useful improving techniques for enhancing the forecast performance were discussed. Finally, a comprehensive summary of reviewed papers on the peak load forecast framework was presented with possible future research directions.
With highresolution load data (e.g. residential smart meter data) becoming increasingly available, data privacy is an important issue that needs to be addressed. In the new digital era, using private encryption algorithms to protect the consumers’ data has become an essential task that researchers must deal with [6Lim2016Security]. There are challenges in terms of electricity data transmission and storage compliance, security and privacy protection [privacy1]. In addition, it is well known that the data size and quality usually determines the training quality of machine learning models. However, in practice relevant data that are deemed necessary for a forecast task might be owned by different organizations. To make accurate predictions, it is necessary to combine diverse data sources from different organizations in building the model. This could be achieved by aggregating all the data sources into a third party central database, however, it may face inevitable security risks because of the central distribution of the data [privacy2]. Therefore, designing the forecasting framework under the premise of meeting data privacy, security, and regulatory requirements (e.g. through federated learning [yang2019federated]) is an important future research trend on peak load demand forecast.
Comments
There are no comments yet.