I Introduction
Demand response (DR) has proven to be a successful paradigm that emerges in the context of smart grid [1, 2, 3]. The DR enables the participation of demandside resources, including commercial buildings [4, 5], industrial loads [6], electric vehicles [7], in maintaining the supplydemandbalance in power systems through some economic means such as incentives or price signals [1]. A variety of DR programs have been developed so far on both the wholesale market level and the retail market level. On the wholesale level, qualified demandside resources can offer in almost all markets including capacity market, energy market, as well as ancillary markets, and will be cleared by the independent system operator (ISO) in the same way as the conventional generating resources. DR programs on the retail level, however, have more variety than those on the wholesale level. Yet, the retail DR programs can, byandlarge, be categorized into two classes—incentive based and price based. The former includes, e.g, direct load control and some interruptible programs, while the latter includes the realtime pricing (RTP), timeofusepricing, critical peaking pricing.
In pricebased DR programs, the load serving entity (LSE)—the operator of the retail market—needs to design effective pricing schemes such that certain objective such as minimizing load fluctuation [8], maximizing profit [9] or utility [10], can be achieved. Particularly in RTP, the LSE sends out a price signal for every time interval, and the end use customers (EUCs) then respond to the price signal by adjusting their energy consumptions. It is therefore critical for the LSE to know the price response characteristics of the EUCs in order to make informed decisions on the price signal sent to the EUCs. Conventionally, price response characteristics are modeled using a static function such as linear, exponential, logarithmic, and potential functions [11]. These static functions, however, neglects the inherent temporal correlation of the EUC behaviors, and as we will see in the later part of the paper, may result in large errors when predicting the actual responses of EUCs in RTP programs.
To explicitly capture the temporal behavior of the EUCs, we propose a dynamical DR model which has states that evolve over time. The states in the proposed dynamical DR model keep necessary information from previous time intervals, and allows more accurate prediction of the EUCs’ response. These states can be explicitly chosen, in which case the model can be represented by a linear function or a multilayer feedforward neural network (FNN), or implicitly chosen, in which case the model can be represented by a recurrent neural network (RNN) or a long shortterm memory (LSTM) unit network. The proposed dynamical DR model can be learned from historical price and energy consumption data. In particular, the states can be determined empirically from the historical data. We will also show through numerical simulation that the proposed dynamical DR model significantly outperforms the conventional static models.
The remaining part of this paper is organized as follows. Section II reviews decision process of EUCs and proposes a dynamical DR model that is based on available data. The neural network representations of the dynamical DR model are developed in Section III, and results from numerical simulation are presented in Section IV. Some concluding remarks are made in Section V.
Ii Demand Response Model
In this section, we first briefly review how a myopic EUC may determine its energy consumption at a given electricity price. In light of the deficiencies in static DR models, we propose a dynamical DR model in the end. Throughout this paper, we use a subscript to denote the time interval, and assume one day is decomposed into segments.
Iia Decision Process of End Use Customers
In a RTP program, the LSE sends out a energy price for time interval , denoted by , before the beginning of that time interval. Each EUC then responds to the price signal by optimizing its energy consumption in a way that maximizes its overall benefit or equivalently, minimizes its overall cost. While there exist many detailed models for EUCs’, a generic model that is agnostic to the underlying components is to model the energy demand and energy consumption separately, where the energy demand is the needed energy while the energy consumption is the actual consumed energy [12, 10].
Let denote the set of EUCs served by the LSE. Let and denote the energy demand and energy consumption of EUC at time interval , respectively. A myopic EUC finds its optimal energy consumption by solving the following utility maximization problem [10]:
(1a)  
subject to  
(1b)  
(1c) 
where is the benefit function, which gives the benefit of the EUC at certain energy demand and energy consumption, is the backlog rate that represents the percentage of unmet energy demand that is carried over to the next time interval,
is a random variable that models the new demand,
is the feasible set of energy consumption.From the perspective of the LSE, the collective behavior of all EUCs are of more interest since its total profit demands on the aggregate energy consumption. The DR model that is of interest to the LSE is essentially a mapping from the price to the aggregate energy consumption as follow:
(2) 
IiB Dynamical Demand Response Model
The mapping from price to energy consumption is conventionally modeled using a static function such as linear, exponential, logarithmic, and potential functions [11], which can be conceptually expressed as:
(3) 
These static functions generally work well on wholesale level since the price response characteristics of DR resources in the wholesale market are mainly determined by their offers. Once the energy demand are cleared for each time interval, the DR resources will be controlled such that their energy consumptions follow the schedule, since otherwise they will incur costs. Therefore, given a certain price, the energy consumption can be determined from the relative static offer stacks directly, i.e., the mapping is relatively static.
However, in a RTP problem, the EUCs do not have the responsibility to follow any energy consumption profile. Indeed, it is obvious from (1) that given a price, the energy consumption of an EUC will depend on its current energy demand, which itself depends on the previous energy consumption. Therefore, the mapping in (2) is dynamical.
In light of this observation, we propose a dynamical model for the DR model for the EUCs in a RTP program as follows:
(4) 
where
is a state vector that captures all the factors besides the price that impact the aggregate energy consumption, and evolves over time given
.While it is difficult to identify the “perfect” state vector, we may still be able to construct one that is good enough in the sense it gives good prediction accuracy of the aggregate demand. From a practical view, the state vector can be compromised of elements from the set of information that is available to the LSE at time interval , denoted by . There are two potential approaches to construct such a state vector, a direct approach, and an indirect approach. In the direct approach, we select the state vector to be
(5) 
where —referred to as the order of the DR model—is a parameter that can be determined from the historical data. We adopt the convention that when . In the indirect approach, the state vector in (5
) is further transformed in a series of nonlinear operations. With the state vector and the price information, we can predict the aggregate energy consumption using supervised learning techniques.
Iii Neural Network Representations of Dynamical Demand Response Model
In this section, we develop neural network representations for the dynamical DR model. A multilayer FNN is applied in the direct approach, and a multilayer RNN is applied in the indirect approach.
Iiia Direct Approach Using Feedforward Neural Network
When using the direct approach, the state vector is manually selected from the set of available information. Therefore, the input vector and the output of the DR model are both readily available. We can thus fit the dynamical DR model in (4) using, for example, a linear function as follows:
(6) 
where is a weight vector, and is the bias, the superscript denotes the transpose of a vector or matrix.
Also, we may represent with a nonlinear function such as a multilayer FNN, which consists of one input layer, hidden layer, and one output layer as illustrated in the left part in Fig. 1. The hidden layer takes an input vector , and computes a (hidden) output vector according to
(7) 
where
denotes a rectified linear unit function that is applied elementwise,
is a weight matrix, andis a bias vector. Note that the output vector of one hidden layer is the input vector for the next hidden layer, i.e,
, except the last hidden layer, the output of which is mapped to the output through a fully connected unit as follows:(8) 
where is a weight matrix, and is a bias vector. Note that . The multilayer FNN can be trained using backpropagation such that the mean squared error between the predicted output and the true value
is minimized, i.e., by minimizing the following loss function:
(9) 
IiiB Indirect Approach Using Recurrent Neural Network
Alternatively, the states can be implicitly constructed within the neural network, which leads to RNNs [13]. The right part in Fig. 1 illustrate a multilayer RNN with one input layer, hidden layers, and one output layer. The hidden state of the RNN unit in layer , denoted by , is the input for RNN unit in the next layer as well as the input for itself at the next time step, as indicated by the arrows in Fig.1. This RNN takes a sequence as the input, and outputs a sequence . Meanwhile, sequences of hidden states , , are generated along the trajectory, based on the following equations:
(10) 
where is applied elementwise, and are weight matrices, is a bias vector. Note that are initialized to zeros, for , and . The hidden states in RNN are dynamical since their values also depend on their previous values, while those in the FNN are static since their values purely depend on the inputs. The output of the last hidden state vector is mapped to the output through a fully connected unit as in the case of multilayer FNN. The RNN can be trained by minimizing the same loss function as in (9
) using backpropagation through time technique (see, e.g.,
[14]). The input vector only has to include the mostrecent information, i.e., when RNN is used, and .The key difference between the RNN and FNN in representing the dynamical DR model is that the FNN captures the temporal impacts by explicitly specifying a set of historical data as inputs, while the RNN keeps the temporal impacts by implicitly computing a dynamical hidden state.
One of the deficiency of the basic RNN unit is the lack of ability to model longterm dependencies. As a significant improvement over the basic RNN unit, the LSTM is proposed in [16]. The structure of an LSTM unit is illustrated in Fig. 2, in which
denotes a sigmoid function. For the purpose of simplicity, we drop the superscript that indicates the layer and focus on structure inside one LSTM unit. The LSTM unit introduces a new hidden state vector
, which is used to keep longterm memories. The LSTM unit works as follows. First, a forget gate vector , an information gate vector , and an output gate vector is computed from previous hidden state and new input vector as follows:(11)  
(12)  
(13) 
Then, the two hidden state vectors are updated as follows:
(14)  
(15)  
(16) 
where represents elementwise multiplication. This structure has proven to be very effective in capturing longterm temporal dependencies, and therefore, is expected to outperform the basic RNN unit when representing the dynamical DR model. The multilayer LSTM network is similar to the RNN in Fig. 1. We simply replace the RNN unit with the LSTM unit.
Iv Simulation Results
In this section, we numerically investigate the performance of the proposed dynamical DR model under the direct and indirect approaches.
Iva Simulation Setup
Assume , i.e., each time interval covers one hour. We assume the benefit function has a quadratic form, the parameter in which is a constant, i.e.,
(17) 
The feasible set is . The backlog rate is uniformly sampled from . The new demand is generated according the following procedures. First, the peak demand of each EUC is sampled uniformly from MW, then the new demand of the EUC is computed as the product of the peak demand, the normalized annual load profile from one of the zones in PJM in 2017 [17], and a Gaussian random variable with a mean of and a standard devision of . The value of is taken as the ratio of $/MWh and the EUC’s peak demand. It is chosen in this way such that the responses from EUCs are reasonable. In reality, the value of is a choice of the EUCs. The number of EUCs served by the LSE is . The historical data are simulated using the parameters described above, and a time series of prices that are sampled uniformly from $/MWh, which is shown in Fig. 3. The first sets of data (corresponding to January to October) are used for training, and the last
sets of data for testing (corresponding to November and December). The neural networks are implemented using Tensorflow
[18], trained with the Adam optimizer with a learning rate of and training step of .IvB Performance Comparison
The mean absolute percentage error (MAPE), and the standard deviation of absolute error (SDAPE) are used to evaluate the performance of the learned model.
IvB1 Linear Function
We first test the performance of dynamical DR model under direct approach. Table I shows the results when the linear function is used in the direct approach. Note that when order , i.e., no information on previous time interval is utilized, both the training error and testing error are high, with MAPEs being and , respectively. The errors drop significantly when , which verifies the claim that the price responsive characteristics of the EUCs depend on their states in previous time intervals, rather than being static over time. Increasing the order significantly improves the model performance when , however, When , the improvement becomes negligible. Therefore, when the linear function is used, an appropriate order of the dynamical DR model would be . Intuitively, this means the response of the aggregate EUC demand to the price signal depends on the previous hours, rather than being static.
order  0  1  2  3  4  5  
Train 
MAPE (%)  19.67  6.19  4.46  3.99  3.95  3.82 
SDAPE (%)  15.22  4.97  3.89  3.61  3.56  3.46  
Test 
MAPE (%)  16.46  6.02  4.35  4.16  4.29  4.10 
SDAPE (%)  13.91  4.80  3.55  3.38  3.43  3.31 
IvB2 Fnn
A FNN with hidden layers, each with neurons, is also used to represent the dynamical DR model. Table II shows the results when the FNN is used in the direct approach. It is clear that the FNN outperforms the linear function when the order of the model is the same, which indicates that the linear function is underfitting the price and energy consumption data. In addtion, when the FNN is used in the direct approach, the number of order of the dynamical DR model can be much lower than the case with the linear function. For example, the testing MAPE of the FNN is when , much better than that of the linear function when , which is .
order  0  1  2  3  4  5  
Train 
MAPE (%)  17.54  4.73  2.98  2.95  2.92  2.87 
SDAPE (%)  12.45  3.64  2.55  2.46  2.45  2.38  
Test 
MAPE (%)  15.48  5.26  3.46  3.47  3.48  3.34 
SDAPE (%)  11.74  3.77  2.74  2.64  2.64  2.63 
IvB3 Rnn/lstm
In the indirect approach, a RNN/LSTM with hidden layer and neurons is adopted. The results are presented in Table II. In the indirect approach, only the most recent price and energy consumption is sent into the RNN/LSTM for prediction; The testing MAPEs of the indirect approach, which are with RNN and with LSTM, are better than that of the direct approach with FNN of order , which is , while the LSTM performs better than the RNN by a small margin.
RNN  LSTM  
Train 
MAPE (%)  3.02  2.83 
SDAPE (%)  2.68  2.56  
Test 
MAPE (%)  3.25  3.16 
SDAPE (%)  2.55  2.47 
The testing MAPE obtained using linear function with order , FNN with order , RNN, and LSTM are presented in the violin plot in Fig. 4. To sum up, both the direct and indirect approach can be applied to learn a good dynamical DR model. FNN, RNN, and LSTM outperforms the linear function, at the cost of higher model complexity. In particular, when RNN or LSTM is used, no manual selection of states are required.
IvC Discussion
When the neural networks are used to represent the dynamical DR model, we also investigated the impacts of hyperparameter such as the order the dynamical DR model, the number of layers, and the number of neurons in each layer. Among all factors, the order of the dynamical DR model has the most significant impact on the performance, which has already been discussed in details in the above section. The impacts of other parameters are relatively small.
Also, we would like to emphasize that the learned dynamical DR model can find its application in a variety of scenarios. For example, it can be used to predict the energy consumption profile over several time intervals, given the prices over the same intervals. It can also be used in an agent that simulates the collectively behavior of a set of EUCs without necessarily modeling all the underlying components; such an agent can be very helpful in developing certain algorithms.
V Concluding Remarks
In this paper, we proposed a dynamical DR model that captures the temporal behavior of EUCs. A key element in the proposed model is the choice of state, which can be determined using either a direct approach that selects the state manually from the available information, or an indirect approach that computes the states from input information. The dynamical DR model can be represented by a linear function or FNN when the direct approach is used, and a RNN or LSTM network when the indirect approach is used. Both these two approaches can achieve small MAPE when predicting the response of the aggregate energy consumption to the price. Numerical simulation results validated that the dynamical DR models are indeed necessary to model the price response characteristics of the EUCs, which are inherently temporally correlated.
Future research will utilize the proposed dynamical DR model to learn a pricing policy of the LSE purely from historical data using agentbased algorithms.
References
 [1] F. Wang, H. Xu, T. Xu, K. Li, M. ShafieKhah, and J. P. Catalão, “The values of marketbased demand response on improving power system reliability under extreme circumstances,” Applied energy, vol. 193, pp. 220–231, 2017.
 [2] F. Rahimi and A. Ipakchi, “Demand response as a market resource under the smart grid paradigm,” IEEE Transactions on smart grid, vol. 1, no. 1, pp. 82–88, 2010.
 [3] J. Medina, N. Muller, and I. Roytelman, “Demand response and distribution grid operations: Opportunities and challenges,” IEEE Transactions on Smart Grid, vol. 1, no. 2, pp. 193–198, 2010.
 [4] X. Zhang, M. Pipattanasomporn, M. Kuzlu, and S. Rahman, “Conceptual framework for a multibuilding peak load management system,” in PES Innovative Smart Grid Technologies Conference Europe (ISGTEurope), 2016 IEEE. IEEE, 2016, pp. 1–5.
 [5] X. Zhang, M. Pipattanasomporn, and S. Rahman, “A selflearning algorithm for coordinated control of rooftop units in smalland mediumsized commercial buildings,” Applied Energy, vol. 205, pp. 1034–1049, 2017.
 [6] X. Zhang, G. Hug, J. Z. Kolter, and I. Harjunkoski, “Demand response of ancillary service from industrial loads coordinated with energy storage,” IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 951–961, 2018.
 [7] K. Zhang, L. Lu, C. Lei, H. Zhu, and Y. Ouyang, “Dynamic operations and pricing of electric unmanned aerial vehicle systems and power networks,” Transportation Research Part C: Emerging Technologies, vol. 92, pp. 472–485, 2018.
 [8] H. Sun, A. Minot, D. Nikovski, H. Hashimoto, T. Takano, and Y. Takaguchi, “Mitigating substation demand fluctuations using decoupled price schemes for demand response,” in Innovative Smart Grid Technologies Conference (ISGT), 2016 IEEE Power & Energy Society. IEEE, 2016, pp. 1–5.
 [9] H. Xu, K. Zhang, and J. Zhang, “Optimal joint bidding and pricing of profitseeking load serving entity,” IEEE Transactions on Power Systems, 2018.

[10]
B.G. Kim, Y. Zhang, M. Van Der Schaar, and J.W. Lee, “Dynamic pricing and energy consumption scheduling with reinforcement learning,”
IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2187–2198, 2016.  [11] S. Yousefi, M. P. Moghaddam, and V. J. Majd, “Optimal real time pricing in an agentbased retail market using a comprehensive demand response model,” Energy, vol. 36, no. 9, pp. 5716–5727, 2011.
 [12] A. J. Conejo, J. M. Morales, and L. Baringo, “Realtime demand response model,” IEEE Transactions on Smart Grid, vol. 1, no. 3, pp. 236–242, 2010.
 [13] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning. MIT press Cambridge, 2016, vol. 1.
 [14] P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
 [15] “Understanding LSTM networks,” http://colah.github.io/posts/201508UnderstandingLSTMs/, accessed: 20180703.
 [16] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 [17] “PJM metered load data,” http://www.pjm.com/marketsandoperations/opsanalysis/historicalloaddata.aspx, accessed: 20180723.

[18]
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
S. Ghemawat, G. Irving, M. Isard et al.
, “Tensorflow: a system for largescale machine learning.” in
OSDI, vol. 16, 2016, pp. 265–283.