Log In Sign Up

Advancing Renewable Electricity Consumption With Reinforcement Learning

by   Filip Tolovski, et al.

As the share of renewable energy sources in the present electric energy mix rises, their intermittence proves to be the biggest challenge to carbon free electricity generation. To address this challenge, we propose an electricity pricing agent, which sends price signals to the customers and contributes to shifting the customer demand to periods of high renewable energy generation. We propose an implementation of a pricing agent with a reinforcement learning approach where the environment is represented by the customers, the electricity generation utilities and the weather conditions.


page 1

page 2

page 3

page 4


Charge Scheduling of an Energy Storage System under Time-of-use Pricing and a Demand Charge

A real-coded genetic algorithm is used to schedule the charging of an en...

Predicting Levels of Household Electricity Consumption in Low-Access Settings

In low-income settings, the most critical piece of information for elect...

Learning and Coordination of Large and Uncertain Loads via Flexible Contracts with Commitment

Large electricity customers (e.g., large data centers) can exhibit huge ...

Modelling Electricity Consumption in Office Buildings: An Agent Based Approach

In this paper, we develop an agent-based model which integrates four imp...

Home Energy Management Systems: Operation and Resilience of Heuristics against Cyberattacks

Internet of Things (IoT) and advanced communication technologies have de...

1 Introduction

The intermittence of renewable energy sources in the present electric energy generation systems poses an issue to electric utilities and electricity markets. These are built on the notion that supply follows demand (Denholm et al., 2015). Their integration in the present energy markets, due to the lack of scalable storage solutions (de Sisternes et al., 2016), can be most efficiently addressed by demand response (Bird et al., 2013). Demand response defines changes in the electric usage by consumers from their normal preferences caused by changes in electricity prices or by other incentives (Agency et al., 2003). With demand response through real time pricing (Agency et al., 2003), utilities can shift the customer load demand to periods of excess renewable energy generation. These peaks of generated energy are caused by high renewable energy generation, while the load demand is low. The shifting of customer load demand can result with reducing the greenhouse gasses emissions by natural gas and coal electricity generation plants, used at times of peak demand or for back up generation (Bird et al., 2013).

2 Related work

Previous research on using reinforcement learning for dynamic pricing in hierarchical markets has shown promising results, both in balancing of the supply and demand, as well as in cutting the costs for customers and electric utilities alike (Lu et al., 2018). If a reinforcement learning model for customer load scheduling is also included, there is a cost reduction on the both sides when compared to myopic optimization (Kim et al., 2016)

. However, there is no application of these methods in a physical environment. This is due to the lack of a simulation environment which can provide a reliable estimate of the safety and cost of the proposed method. To this end, we propose a simulation environment and adding weather data as an input to our proposed agent.

3 Reinforcement Learning Approach

To model this problem as an reinforcement learning problem (Sutton and Barto, 2018), we choose the electric utility to represent the pricing agent. For the state st, we choose to be represented by the momentary and future renewable electricity supply to the electric utility, the momentary customer load demand together with the momentary and future features of the real world which are used for electricity demand forecasting (Gajowniczek and Zabkowski, 2017; Paterakis et al., 2017; Mocanu et al., 2016). At each timestep t, the electric utility as the agent selects an action at, which is represented by the momentary and future electricity prices. This action is then transmitted back to the customers, which as a part of the environment responds with a load demand. This load demand is then used to calculate the reward rt. Now, the pricing agent is in another state st+1 and this whole process is repeated for the duration of the simulation period, which is previously chosen. The environment and the agent are shown in Fig. 1.

Since it is very often that the renewable electricity sources are not in the vicinity of customers, we set the future supply to be given as an input. The momentary demand, momentary and future renewable energy, its price, weather data and the temporal data are of dimensions , while the energy selling price, i.e. the action, is of dimension 1. This way, the pricing agent formulates the price having as input the expectation for the future states. The size of the timestep and

are treated as hyperparameters of the learning problem.

Figure 1: Diagram of the reinforcement learning setting. The pricing agent provides the new electricity selling price to the customers based on the received input data.

3.1 Reward function

The two objectives of the proposed pricing agent is to decrease the difference between the supply of renewable energy and demand and to keep the energy utility profitable. We propose the global reward function to be a linear combination of two sub-rewards as in the multi-objective analysis of return, shown in Eq. (1), proposed by Dulac-Arnold et al. (2019).






The coefficients in (1) represent hyperparameters of the learning problem and are initially set to 1. The sub-reward function shown in (2) calculates the profit for the pricing agent as the difference between the purchase price and the selling price. The sub-reward function shown in (3) calculates the square of the difference between the renewable energy available and the demand. The negative sign and the square form of the sub-reward function (3) show the objective of reducing both, the positive and the negative difference between the renewable supply and demand.

4 Implementation

4.1 Simulation environment

Since real time testing of the proposed model is costly and there isn’t available dataset for batch off-line training, we propose building a new simulation environment. In the proposed environment, customers are represented by a number of previously trained demand response agents. As a training ground for these agents, we propose the CityLearn (Vázquez-Canteli et al., 2019), an OpenAI Gym environment (Brockman et al., 2016). CityLearn enables training and evaluation of autonomous and collaborative reinforcement learning models for demand response in resident buildings. In order to be able to model buildings which are not fitted with energy storage capabilities, we remove the storage capabilities of some of the buildings in CityLearn. The cost function for the customer agents is chosen to minimize both the peak energy demand and the cumulative energy cost. To improve the generalization of the pricing agent with emerging smart cities and neighbourhoods, we propose training both independent and cooperative customer agents. The independent agent is not aware for the actions of other agents, while the cooperative agent value its actions in conjunction with the actions of other agents (Claus and Boutilier, 1998). After evaluating the performance of the customer agents, a number of different simulation environments are built by combining trained customer agents. The distribution of customer agents in an environment is set as such to ensure that the pool of simulation environments properly models the momentary and future price responsiveness of the physical environment.

4.2 Training

The training of the pricing agent is done in two phases. In the first phase, the pricing agent is trained using model-agnostic meta-learning (Finn et al., 2017) across all simulation environments. The second phase, where the pricing agent is trained and ran in the physical environment, is started after reaching a certain performance threshold in the simulation environment. The training in the first phase should increase the sample efficiency of the pricing agent in the second phase and should reduce the costs and the risks of training in the physical environment. Training on multiple environments with different distributions of customer agents should also increase the robustness of the pricing agent in the second phase (Dulac-Arnold et al., 2019).

4.3 Safety and Explainability

To ensure the safety of the pricing agent operating in a physical environment, we propose using a constraints on the price it signals to the customers. The value of the constraints should be set by the electricity utility, according to their pricing policy and to the market regulations. In order to evaluate the safety of an algorithm in respect to the constraints, we propose using summary of all violations to the constraints, such as in Dalal et al. (2018). Regarding evaluation of the impact these constraints have on the performance of the pricing agent, we propose learning a policy as a function of the constraint level as in Boutilier and Lu (2016); Carrara et al. (2018). This should provide information about the trade-offs between the constraint level and the expected return to the human operators of the pricing agent (Dulac-Arnold et al., 2019). In order to further improve the explainability of the pricing agent, we propose tracking the performance on the two objectives of the reward function. This way, the human operators can have an insight in the performance of the used policy.

5 Conclusion and Further Work

In this paper, we propose a pricing agent and an appropriate simulation environment, which can be used for training and evaluation of the agent. We address the challenges of safety, robustness and sample efficiency of the pricing agent which can increase the cost of deployment in a physical environment. After implementing the proposed pricing agent and evaluation of the results, we propose further training of the customer agents from the simulation environment. They would keep their original reward function, but now they will be trained in an environment where the price signal is responsive to their actions. This could further improve their performance in terms of reducing the peak energy demand. These additionally trained customer agents could amount to an environment for evaluation of the pricing agent when the customers are fully responsive to the price signals.


  • I. E. Agency, O. for Economic Co-operation, and Development (2003) The power to choose: demand response in liberalised electricity markets. Energy market reform, OECD/IEA. External Links: ISBN 9789264105034, LCCN 04392452 Cited by: §1.
  • L. Bird, M. Milligan, and D. Lew (2013) Integrating variable renewable energy: challenges and solutions. NREL Technical Report. External Links: Document Cited by: §1.
  • C. Boutilier and T. Lu (2016)

    Budget allocation using weakly coupled, constrained markov decision processes


    Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI-16)

    New York, NY, pp. 52–61. Cited by: §4.3.
  • G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba (2016) OpenAI gym. External Links: arXiv:1606.01540 Cited by: §4.1.
  • N. Carrara, O. Pietquin, R. Laroche, T. Urvoy, and J. Bouraoui (2018) A fitted-q algorithm for budgeted mdps. In European Workshop on Reinforcement Learning (EWRL), External Links: Link Cited by: §4.3.
  • C. Claus and C. Boutilier (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence, AAAI ’98/IAAI ’98, USA, pp. 746–752. External Links: ISBN 0262510987 Cited by: §4.1.
  • G. Dalal, K. Dvijotham, M. Vecerík, T. Hester, C. Paduraru, and Y. Tassa (2018) Safe exploration in continuous action spaces. CoRR abs/1801.08757. External Links: Link, 1801.08757 Cited by: §4.3.
  • F. J. de Sisternes, J. D. Jenkins, and A. Botterud (2016) The value of energy storage in decarbonizing the electricity sector. Applied Energy 175, pp. 368 – 379. External Links: ISSN 0306-2619, Document Cited by: §1.
  • P. Denholm, M. O’Connell, G. Brinkman, and J. Jorgenson (2015) Overgeneration from solar energy in california. a field guide to the duck chart. NREL Technical Report. External Links: Document Cited by: §1.
  • G. Dulac-Arnold, D. J. Mankowitz, and T. Hester (2019) Challenges of real-world reinforcement learning. CoRR abs/1904.12901. External Links: Link, 1904.12901 Cited by: §3.1, §4.2, §4.3.
  • C. Finn, P. Abbeel, and S. Levine (2017) Model-agnostic meta-learning for fast adaptation of deep networks. CoRR abs/1703.03400. External Links: Link, 1703.03400 Cited by: §4.2.
  • K. Gajowniczek and T. Zabkowski (2017)

    Two-stage electricity demand modeling using machine learning algorithms

    Energies 2017, pp. 1547–1571. External Links: Document Cited by: §3.
  • B. Kim, Y. Zhang, M. van der Schaar, and J. Lee (2016) Dynamic pricing and energy consumption scheduling with reinforcement learning. IEEE Transactions on Smart Grid 7 (5), pp. 2187–2198. External Links: Document, ISSN Cited by: §2.
  • R. Lu, S. H. Hong, and X. Zhang (2018) A Dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach. Applied Energy 220 (C), pp. 220–230. External Links: Document Cited by: §2.
  • E. Mocanu, P. H. Nguyen, M. Gibescu, E. M. Larsen, and P. Pinson (2016)

    Demand forecasting at low aggregation levels using factored conditional restricted boltzmann machine

    In 2016 Power Systems Computation Conference (PSCC), Vol. , pp. 1–7. External Links: Document, ISSN Cited by: §3.
  • N. G. Paterakis, E. Mocanu, M. Gibescu, B. Stappers, and W. van Alst (2017) Deep learning versus traditional machine learning methods for aggregated energy demand prediction. In 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Vol. , pp. 1–6. External Links: Document, ISSN Cited by: §3.
  • R.S. Sutton and A.G. Barto (2018) Reinforcement learning: an introduction. Adaptive Computation and Machine Learning series, MIT Press. External Links: ISBN 9780262039246, LCCN 2018023826 Cited by: §3.
  • J. R. Vázquez-Canteli, J. Kämpf, G. Henze, and Z. Nagy (2019) CityLearn v1.0: an openai gym environment for demand response with deep reinforcement learning. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys ’19, New York, NY, USA, pp. 356–357. External Links: ISBN 9781450370059, Link, Document Cited by: §4.1.