Machine Learning on EPEX Order Books: Insights and Forecasts

06/14/2019 ∙ by Simon Schnürch, et al. ∙ Fraunhofer 0

This paper employs machine learning algorithms to forecast German electricity spot market prices. The forecasts utilize in particular bid and ask order book data from the spot market but also fundamental market data like renewable infeed and expected demand. Appropriate feature extraction for the order book data is developed. Using cross-validation to optimise hyperparameters, neural networks and random forests are proposed and compared to statistical reference models. The machine learning models outperform traditional approaches.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Forecasting electricity prices is an important task in an energy utility and needed not only for proprietary trading but also for the optimisation of power plant production schedules and other technical issues. A promising approach in power price forecasting is based on a recalculation of the order book using forecasts on market fundamentals like demand or renewable infeed. However, this approach requires extensive statistical analysis of market data. In this paper, we examine if and how this statistical work can be reduced using machine learning. Our paper focuses on two research questions:

  • How can order books from electricity markets be included in machine learning algorithms?

  • How can order-book-based spot price forecasts be improved using machine learning?

Figure 1: Example electricity price time series on different time scales.

We consider the German/Austrian EPEX spot market for electricity. There is a daily auction for electricity with delivery the next day. All 24 hours of the day are traded as separate products. Figure 1 shows auction results on different time scales. The pronounced seasonality of prices is visible as well as their high volatility.111Another interesting property is that in contrast to price series of other commodities or stocks, electricity prices may become negative.

Figure 2: Order book data for a particular hour on different scales.

In the following, we shortly explain the idea of order-book-based price forecasts. Each price is the result of an auction, which can be represented as a bid and an ask curve. For a particular hour, those curves are shown in Figure 2. The intersection of the bid (purchase, demand) and ask (sell, supply) curve is the market clearing price (MCP). In the magnified figure, it is clearly visible that the bid and ask curves are step functions. Each step width is the cumulated volume which market participants have put in the auction at a certain price. Price levels correspond in fact to the marginal production costs of different power plants. Due to the regulatory environment, in particular renewables bid at negative prices in the auction. Moreover, in contrast to a classical power plant, the produced amount of renewable energy is stochastic and total expected production is sold on the exchange. Relying on those economical circumstances, the order-book-based forecasting modifies the volumes at different price levels in the bid and ask curves. The modifications correspond to the forecasted wind and solar power infeed. An important issue is which price levels are influenced by the renewable infeed. Usually, energy utilities use exhaustive statistical analysis on historical data to identify the price levels and the impact of the renewable forecasts. In fact, there are also other fundamental factors which influence the market price, first of all the expected electricity demand. This paper focuses on machine learning methods to reduce the effort for building a forecast model.

In the following section we give an overview on existing literature on the economics of electricity markets, order-book-based models and the use of machine learning in price forecasting. In Section 3 we detail our methodology. Section 4 is devoted to numerical results and a comparison to other models from the literature. Section 5 concludes.

2 Existing literature on price forecasting and machine learning in electricity markets

Solar and wind energy is playing a more and more prominent role in today’s electricity markets. Empirical studies show that renewable electricity generation is both highly volatile and has a substantial impact on the day-ahead electricity price (Wagner (2014)). Using multivariate regression methods, various authors have quantified the influence renewable infeed has on the price (Cludius . (2014); Würzburg . (2013)). This influence can easily be seen graphically, cf. Figure 3. Therefore, we also use expected solar and wind infeed as features for the price forecasts.

Figure 3: Influence of wind and solar infeed on price.

There is a vast body of literature on electricity price forecasting, over which Aggarwal . (2009) give an early overview. Their survey covers 47 papers published between 1997 and 2006 with topics ranging from game theoretic to time series and machine learning models. A more recent extensive literature overview is given by Weron (2014)

, in which the author distinguishes and describes five model classes for electricity price forecasting, namely game-theoretic, fundamental, reduced-form, statistical and machine learning models. In an empirical study he finds the latter two to yield the best results. The article closes with a discussion of future challenges in the field, including the issues of feature selection, probabilistic forecasts, combined estimators, model comparability and multivariate factor models. Regarding this last aspect,

Ziel  Weron (2018) conduct an empirical comparison of different univariate and multivariate model structures for price forecasting. Comparing a total of 58 models on several datasets, they find that there is no single modelling framework that consistently achieves the best results.

Statistical methods which have been applied to price forecasting include, for example, dynamic regression and transfer functions (Nogales . (2002)), wavelet transformation followed by an ARIMA model (Conejo . (2005)) and weighted nearest neighbor techniques (Troncoso . (2007)). There are many applications of machine learning methods in electricity price forecasting. Amjady (2006)

compare the performance of a fuzzy neural network with one hidden layer to ARIMA, wavelet-ARIMA, multilayer perceptron and radial basis function network models for the Spanish market. 

Chen . (2012) also use a neural network with one hidden layer and a special training technique called extreme learning machine on Australian data. On the same market, Mosbah  El-Hawary (2016) train a multilayer neural network on temperature, total demand, gas price and electricity price data of the year 2005 to predict hourly electricity prices for January 2006. In order to show the superior performance of neural networks compared to time series approaches, Keles . (2016) conduct an extensive study focussing on the important topics of variable selection and hyperparameter optimisation. They select the most predictive features via a k-nearest neighbor backward elimination approach and employ 6-fold cross-validation to optimise forecasting performance over several hyperparameters of the neural network. The resulting network is found to outperform the benchmark models substantially. Recently, more sophisticated types of neural networks have been used: In a benchmark study, Lago . (2018)

compare feed-forward neural networks with up to 2 hidden layers, radial basis function networks, deep belief networks, convolutional neural networks, simple recurrent neural networks, LSTM and GRU networks to several statistical and also to other machine learning methods like random forests and gradient boosting. Using the Diebold-Mariano test, they show the deep feed-forward, GRU and LSTM network approaches to perform significantly better than most of the other methods on Belgium market data. 

Marcjasz . (2018) consider a non-linear autoregressive (NARX) neural network-type model which especially accounts for the long-term price seasonality. Also using the Diebold-Mariano test, they show that this approach can improve the accuracy of day-ahead forecasts relative to the corresponding ARX benchmark.

Among the features considered in the aforementioned studies historical electricity prices, total demand series, total demand prognoses, renewable infeed forecasts, weather data and calendar information appear on a regular basis. On the other hand, to the best of our knowledge, the first to use supply and demand curves for price prediction are Ziel  Steinert (2016). Their goal is to fill the gap between time series analysis and structural analysis by setting up a time series model for these curves and then forecasting the future market clearing price as the intersection of the corresponding forecasted curves. They compare multiple time series prediction methods based on this approach. However, they do not investigate whether the performance of their model can be enhanced by machine learning techniques.

3 Methodology

Data preparation and feature extraction from order book

Our dataset ranges from 1/2/2015 to 18/9/2018 (31823 single auctions) and includes order book data from the EPEX German/Austrian electricity spot market, transparency data from EEX on expected wind and solar power infeed, and expected total demand data from ENTSO-E. To avoid data dredging, (about 9 months) of the available data at the end of the time period are held back for an out-of-sample model evaluation (see Section 4).

For feature extraction, i.e., translating the order book into a vector of numbers, we use ideas from

Coulon . (2014) and Ziel  Steinert (2016). Let be the set of possible prices and the set of time points for which there are data available. Each is a tuple consisting of a date and an hour . We represent the supply and demand data at time as vectors and , where and denote the supply and demand volume, respectively, bid at price level . The market clearing price at time is determined by EPEX via the EUPHEMIA algorithm, which also considers complex orders. There is no information about such orders in our dataset, so it would be unreasonable to expect any learning algorithm to incorporate them into its price prediction. Therefore, we calculate the market clearing price that would result from considering only the available supply and demand data and use this as the target value for price prediction. To this end, we define the so-called supply and demand curves


The MCP lies at the intersection of the supply and demand curves. As and are step functions, explicit formulae for are quite technical and therefore omitted. We refer to Figure 2 for a graphical illustration. To reduce the dimensionality, we partition

into price classes and use the volumes per price class as features. To determine the price classes we use a heuristic which aims to achieve that all price intervals contain the same amount of volume on average. This algorithm ensures that there are more price classes at the interesting parts of the curve, i.e., in the price regions with many bids. We begin by averaging the supply and demand curves over all time points. Then, we fix a volume 

that each price class is supposed to contain on average and choose price class boundaries and accordingly.

Figure 4: Averaged supply curve with , figure taken from Ziel  Steinert (2016).

Again, the mathematical details are rather technical (see also Ziel  Steinert (2016)). However, the graphical illustration in Figure 4 should make the idea intuitively clear. Analogously as with the original supply and demand curves, one can calculate the price that results from the price classes and of course, in general, does not exactly coincide with the actual market clearing price .

Finally, in order to simplify both implementation and interpretation without losing any essential information, we transform the supply and demand features into a so-called price curve. For this, let be the ascendingly ordered union of the supply and demand price class boundaries. Now, we define new price classes


and volume features


We use these price curve features and additionally the total demand as inputs for the price prediction. Figure 5 shows an example of such a price curve calculated from given supply and demand curves.

Figure 5: Transformed supply and demand curves to price curve (dotted black) and inelastic demand (dotted green).

There is also an economic interpretation for this transformation: In fact, electricity demand is highly price-inelastic, so the constant inelastic demand is the expected total demand for electricity at that hour. The price curve is the so-called merit order, which represents the electricity production units sorted by their variable production costs. For more details, we refer to standard literature on electricity markets like Burger . (2014). Note that the price curve still contains the information that is necessary to calculate the resulting price: The MCP lies at the intersection of the cumulative price curve and the constant inelastic demand. In addition to the price curve and inelastic demand, we use renewable infeed and total demand forecasts as features as well as some calendar information, namely

To account for the periodicity of months and hours, we project these values on a circle and use the two-dimensional projections as features. For example, if date lies in month , this is encoded as


For prediction, we use the price curve features of a preceding day, the so-called reference date . For notational convenience, we write . As a reference date for we use the nearest day before which is of the same type of day as . This is a simple but efficient technique in energy economics. More sophisticated methods to define a reference date may incorporate similarities in renewable infeed and demand profile.

Training of learning algorithms

We employ ordinary linear regression, random forests and feed-forward neural networks to predict hourly electricity prices. Note that we use the prices which are implied by the volume features

as target values, which means that the prices we aim to forecast attain the values . On the whole dataset, the absolute difference between these price approximations and the real prices is EUR/MWh on average (corresponding to a median absolute percentage deviation of ). While we assume ordinary linear regression to be well-known, we give a brief description of the machine learning algorithms we consider. In each case, our goal is to approximate the function  which maps the features described above to the corresponding electricity price. To this end, we assume to be given a set of training data where



is a vector of realizations of independent random variables with zero expectation and equal variance.

Random forests

Random forests are based on a simpler machine learning method called decision trees (

(Hastie ., 2001, chapter 9.2)).

While decision trees are easy to understand, they often perform rather poorly because of their high dependence on the training data. Random forests aim to overcome this drawback by averaging the predictions of several decision trees that are trained in a randomized way proceeding from the same data (Breiman (2001)). As part of their training process, random forests offer a convenient way to assess the influence of each feature on the output. Therefore, they can deliver a ranking of the features according to their relevance for electricity price prediction. While it is quite interesting in its own right, we also use this ranking for feature selection, i.e., for training a feed-forward neural network only on the most important features (e.g. ).

Feed-forward neural networks

Feed-forward neural networks can be viewed as a far-reaching non-linear extension to ordinary linear regression. They consist of several layers

, through which the input is fed via the composition of non-linear activation functions and weighted sums in order to generate the output. The smallest unit (one vector component) of such a layer is called a neuron. A central result in the theory of neural networks states that, using a non-constant, bounded and continuous activation function, a neural network with just one hidden layer can in principle approximate any continuous function arbitrarily well when there are sufficiently many neurons and appropriate weights are chosen (

Hornik (1991)). In practice, a higher number of layers has been found to improve performance for many applications (deep learning). Besides the number of hidden layers and the number of neurons per layer, there are other so-called hyperparameters

on which forecasting performance can critically depend. For instance, the optimisation algorithm that is used to train the network has to be chosen. Typically, some variant of stochastic gradient descent (SGD) like rmsprop (

Thieleman  Hinton (2012)) or Adam (Kingma  Ba (2015)) is used. Furthermore, SGD-type algorithms work with batches of training data. The batch size

can be varied in order to improve performance. Other hyperparameters which we consider include the number of epochs, i.e., the number of times the training data are fed into the optimisation algorithm, the activation function (tangens hyperbolicus, rectified linear unit, identity) and whether or not to employ dropout to avoid overfitting (

Srivastava . (2014)

) and batch normalization to avoid internal covariate shift (

Ioffe  Szegedy (2015)).

Hyperparameter optimisation via cross-validation

We choose the hyperparameter values for the neural networks and random forests using five-fold cross-validation. First, we define a grid of hyperparameter combinations to be evaluated. Then, for every combination of hyperparameters in the grid, we split our training dataset into five parts or folds of equal size, train a model with these values on four of the folds and evaluate its performance on the remaining one. After repeating this five times, each time with a different validation fold, we average performances. Finally, once the whole grid has been evaluated, we choose the hyperparameter combination that performs best on average.

Summarising, the features we use to forecast the spot price of a time point with reference time point are

  • the total demand and the price curve features of the same hour on the reference day, i.e., , ,

  • the solar and wind infeed forecasts as well as the total demand forecast for the time points and ,

  • the calendar features year, daylight saving time, type of day, month and hour for the time points and .

We considered about 100 different parameter combinations for the random forests with the number of trees equal to , , , , or . For the neural networks, we tested over 1000 parameter combinations with about 20 different network sizes ranging from one hidden layer with 5 neurons to hidden layers with 25 neurons each.

4 Results

To evaluate model performance, we primarily use the root-mean-square error

where are the predictions, are the true target values and is the number of observations for which a prediction is made. Furthermore, we consider the mean absolute error

as a more interpretable measure of how far off the prediction is on average. The RMSE is the error measure which the machine learning algorithms aim to minimize during training. Accordingly, we select the model architecture that performs best in the 5-fold cross-validation with respect to the RMSE. In the electricity forecasting literature, sometimes the mean absolute percentage error (MAPE) is used. This is unsuitable for the German market, as often the MCP is at or close to zero. Therefore, we report the median absolute percentage error

for comparison.

Aside from the methods which were described in Section 3, we consider two benchmarks. The first one is called the naive benchmark (Nogales . (2002)). Its forecast for hour of date is the price at hour of the previous day if is a workday other than Monday and the price at hour of the same type of day in the previous week otherwise.

The second benchmark is based on a different market, the Energy Exchange Austria (EXAA), where the electricity price is fixed two hours before the EPEX auction takes place. Therefore, the EXAA price at a time point can directly be used as a predictor for the EPEX price at the same time point. In fact, Ziel . (2015) show this benchmark to be highly competitive. However, note that it is not really appropriate to compare the remaining forecasting methods to the EXAA benchmark because they are based on different information (see also Ziel  Steinert (2016)). Nonetheless, the EXAA benchmark can provide some orientation on how well other models perform and how much improvement could be expected.

The best-performing random forest consists of 1000 decision trees where at each step in the training of the underlying decision trees a randomly chosen subset of size 23 (corresponding to ) of all available features is used and where a tree node is only split further if it contains at least of all training data. We also use the random forest to support feature selection for the following neural network approach.

For the neural networks under consideration we use different feature vector realizations:

  • all available features,

  • all but the price curve features of the reference date,

  • the 10 most influential features according to the best-performing random forest,

  • the 20 most influential features according to the best-performing random forest.

For each case we use different network architectures, which we determine by hyperparameter optimisation as described above. These are reported in Table 1 where each column corresponds to a different set of features and each row corresponds to a hyperparameter. The notation for the network architecture means that a 3-layer network consisting of 5 nodes per layer is used. For the networks that are trained on the selected features, we find a deeper architecture to perform best: * denotes a -layer network with nodes per layer. Analogously, in the dropout row,

means that dropout is employed with a probability of

after the second layer and * means that dropout is employed after each of the layers with a probability of . It is noteworthy that the best-performing network when using all features is rather small. Thus, as an additional plausibility check, we also consider the network architectures proposed by Keles . (2016) (network size , sigmoid activation function, no dropout) and Lago . (2018) (network size

, relu activation function, no dropout) as a reference. Note that their models do not consider price curve features, i.e., order book data.

Hyperparameter All features
Without curve
Selected features
Selected features
[5, 5, 5] [5, 5] [25] * 25 [25] * 25
optimiser rmsprop Adam Adam Adam
Number of
100 100 100 100
Batch size 128 64 128 128
tanh relu relu relu
Dropout [0, 0.25, 0] [0, 0.25, 0] [0.1] * 25 [0.1] * 25
no yes yes yes
Table 1: The hyperparameters which were used when training feed-forward neural networks with different features (all, without curve features, only with the 10 or 20 most influential features as chosen by the best-performing random forest).

The results of the chosen model configurations are shown in Table 2. The errors we report are measured both on the training set (in-sample error) to evaluate how well the model describes the given data and on the test set (out-of-sample error) to assess model performance on previously unseen data (20% of our whole dataset).

Forecasting technique in-sample error out-of-sample error
Naive model 13.55 7.87 15.31% 12.68 7.71 11.61%
Ordinary linear regression 6.85 4.25 10.93% 9.60 7.52 16.95%
Random forest 6.77 4.17 9.73% 11.92 9.32 19.9%
Feed-forward neural network
with architecture from
Keles . (2016)
6.72 4.51 11.49% 14.87 12.81 30.63%
Feed-forward neural network
with architecture from
Lago . (2018)
2.27 1.65 4.45% 21.05 8.94 15.22%
Feed-forward neural network 5.45 3.57 8.89% 9.59 7.08 14.18%
Feed-forward neural network
without curve features
6.63 4.41 11.22% 10.11 7.85 16.12%
Feed-forward neural network
with feature selection ()
7.69 5.06 11.68% 9.41 7.34 15.57%
Feed-forward neural network
with feature selection ()
7.71 4.95 11.27% 13.65 10.18 21.48%
EXAA 6.47 3.53 7.56% 5.58 3.92 7.22%
Table 2: Comparison of in-sample and out-of-sample errors in EUR per MWh or for various price forecasting techniques.

Alternative: More sophisticated neural network architectures

Apart from feed-forward neural networks we also analysed recurrent neural networks. As electricity spot prices can be expected to exhibit a strong dependence on previous days’ features and prices, it seems reasonable to model them as a multivariate time series. While classical approaches like ARIMA or GARCH models are possible, this also is a typical application for recurrent neural networks because they explicitly incorporate the sequential structure of the inputs. In this case, the goal was to predict the -dimensional vector of spot prices at some date based on information available up to date . For each date this information consists of the curve features for date as well as the calendar features and expected renewable infeed and total demand for date . We implemented this approach using the long short-term memory (LSTM) architecture that allows for efficient training of recurrent neural networks (Hochreiter  Schmidhuber (1997)), but the results were not as convincing as with the other methods. This might be due to the high dimensionality of the multivariate time series under consideration. Therefore, we focused on the random forest and feed-forward neural network approaches where the temporal dependence structure is more explicitly incorporated as a feature by means of the reference day.

5 Conclusion

Our results show that neural networks can indeed provide order-book-based price forecasts with competitive results. However, they do not perform significantly better than simpler methods like ordinary linear regression. Whereas the classical order-book-based forecasting technique requires a lot of statistical analysis, the network architecture optimisation also demands significant resources. We also found that reducing the number of features generally improves results. In regard to the RMSE, we find that the feed-forward neural network with only 10 features as selected by the random forest performs best. Considering the MAE (a measure directly linked to revenues from financial trading), the feed-forward neural network without feature selection is in the lead. However, the naive model shows good results as well, supporting this traditional and often applied heuristic in energy economics. The neural network architectures from literature show competitive in-sample results, but their performance drops significantly in an out-of-sample analysis. This indicates overfitting.

The posed research questions have been answered. We have shown how to incorporate order book features using volume-based partitioning, a transformation to price curves and feature selection based on random forests. We have also shown that machine learning cannot significantly reduce the work effort needed in the model set-up, but gives competitive results.

The models do have a lot of potential for improvement. For instance, there are much more accurate wind and solar infeed forecasts available in the market compared to the data from EEX transparency (unfortunately they are not free of charge). We see the largest potential in a daily recalibration of the models including an updated feature selection which allows the model to react to fundamental changes in the market (coal and gas prices, power plant outages, …).

In addition, we also analysed different applications of machine learning on EPEX order books, which are not outlined in detail: We employed neural networks to reconstruct renewable infeed from the order book and used the networks to generate price forward curves.