1 Introduction
Climate change has become a primary concern that needs to be promptly addressed. The continuously rising temperatures across the globe are causing an increase in the frequency of extreme weather conditions, such as heavy rainfall, which have lead to changing habitat ranges for plants and animals [temp]. It is understood that greenhouse gas emissions are the primary reason for the rise in temperatures, for example, we can see that the electricity sector contributed to 27% of the total U.S. emissions in 2018 [emission]. Of major concern is also the fact that due to socioeconomic developments and a warmer globe the energy required by all populations across the world is likely to increase [socioeco].
Renewable energies use natural flows that currently exist in our environment and produce minimal waste which makes them a suitable alternative to fossil fuel based energy. With the desire to implement its commitments under the Paris Agreement [paris], the European Union has proposed a new set of targets for 2030. Among these targets, there is the desire to reach at least a 32% share of renewable energy. This is a clear signal that clean energy will receive more attention in the upcoming years.
One of the renewable energies that has increased in popularity is wind energy. This is due to the fact that wind energy is amongst the cleanest source of electricity given the very low greenhouse gas emission and low water consumption [evans]. One of the drawbacks of windgenerated energy is the large space necessary to install a wind farm; this arises from the fact that placing wind turbines in a limited area would hinder their individual productivity and therefore not be economically convenient. Because of this, the optimisation of the layout plays an important role in energy yield. The complexity of this optimisation problem is given by the computational power required to reproduce accurate simulations of the wind across a wind farm, of major complexity is the computation of the wind behind the turbine usually referred to as the wake. Furthermore, it can be difficult to account for the uncertainty in the wind speed and direction.
In addition to the computational cost and uncertainty in wind speed and directions, the objective function evaluations rely on the design sets instead of design (or decision) vectors. For instance, maximising power as an objective function is an output of the whole wind farm and not of one turbine. Therefore, to design an optimal wind farm, we need to consider the correlations between wind farms, which can have different number of turbines. In other words, the sets can have different number of design vectors. In this paper, we use a setbased kernel in Gaussian process model [gpoversets] and embed it in the Bayesian multiobjective optimisation framework. To the best of our knowledge, setbased multiobjective Bayesian optimisation has never been used to solve a wind farm layout optimisation problem. To be summarised, the contributions of the paper are:

Handling uncertainty in wind speed and direction by building a probabilistic model.

Handling computationally expensive multiple conflicting objectives by using Bayesian multiobjective optimisation.

Handling correlations between different wind farms (as different design sets) by using a set based kernel.
The rest of the article is structured as follows. In Section 2, we provide an overview of the wind farm layout problem and define the optimisation problem. In Section 3, we explain the Bayesian multiobjective optimisation framework with Gaussian over sets as surrogate models. In Section 4, we provide the results and finally we conclude and provide future research directions in Section 5.
2 Wind farm layout Optimisation
2.1 Background
The literature behind the wind farm layout optimisation problem is extremely vast. Some of the early attempts to solve the problem date back to 1994, where Mosetti [Mosetti1994]
used a genetic algorithm to optimise the layout. In their work, the wake model used was limited to three types of wind cases: single direction, constant intensity with variable direction, and variable intensity with variable direction. To date, there have been numerous papers that have improved on Mosetti’s work, since then we have seen the development of more accurate wake models which were not limited to specific wind scenarios. Some of these improvements include a Gaussian wake model, Parada et al.
[wakegauss], which resulted in notable improvements only when tested under constant wind scenarios.Furthermore, multiobjective optimisation including NSGAII with a simplified wake model has also been used in wind farm layout optimisation [Mittal]. One more rigorous attempt at the wind farm layout optimisation problems is based on computational fluid dynamics (CFD) simulations. These are models that are applied to accurately simulate the wake effect of the turbines often using Reynoldsaveraged Navier–Stokes [RANS] and Large Eddy Simulation [LES] turbulence models. The use of such simulations allows for a better understanding of what the true power output of a wind farm would be. The drawback of the CFD based approach lies in the fact that the evaluations of these models are computationally expensive. Attempts to limit the computational cost of such models include deterministic optimisation (e.g. mixedinteger programming [cdfmix]), where an effort was made to reduce the number of CFD simulations required to find optimal solutions. One of the advantages of these mathematical approaches is the fact that they are very robust and can deal with different types of terrains.
In this paper, we apply a Bayesian model over sets that aim to reduce the computational time to obtain a set of approximated Pareto optimal solutions. For the demonstration of the potential of the approach, we used a simplified Jensen wake model to consider the fluid dynamic aspects. Moreover, we use a joint probability distribution of wind speed and direction with the Jensen model to evaluate the power output.
2.2 Wake model
The wake effect considers the wind speeds facing different turbines in a wind farm. The wind speeds for different turbines can be different and depend on the coordinates of turbines and the incoming wind speed (or downstream wind speed) and direction. A simplified wellknown wake models is Jensen model [jensen]
(also known as park model), which can be used to estimate the wind speed for each turbine. This model allows for fast computations of the wake effect of turbines since it assumes that the wake behind the rotor expands linearly and that it is only affected by the distance from the turbine. An example to estimate the wind speed
at a distance from a turbine is shown in Figure 1. In the figure, , and are downstream wind speed, turbine rotor radius and radius of cone, respectively. The radius of the cone can be estimated using the equation below:(1) 
where is the decay constant and determines the expansion of wake with distance. It is usually calculated with an analytical expression:
(2) 
where is height of the turbine and is the surface roughness of the wind farm. In many cases, the is kept constant. After applying the conservation of momentum, the reduced wind speed is given by:
(3) 
where is the thrust coefficient. For more details about wake models, see [jensen].
2.3 Objective Functions
One of the main objectives in designing a wind farm is to maximise the power output. In addition, the cost of installing turbines needs to be minimised. The power output depends on the distribution of wind speed () and direction () and the wind speed at different turbines. The distribution
can be modelled with the historic data of wind speed and direction. In this way, the uncertainty in wind speed and direction can be handled and quantified in estimating the power output. In this work, we used an open source data set provide by Engie
[mydata]from 20132016. One can use another data set to build a joint distribution of wind speed and direction.
The input to the wake model is the incoming wind speed, direction and location of wind turbines in the wind farm and the output is the wind speed at different turbines. This wind speed is then used to estimate the power curve using the following equation [kusiak2009line]:
(4) 
where are the parameters (values are taken from [kusiak2009line]) and is the wind speed at which wind turbine becomes productive. The represents the coordinate or location of turbines and is an input to the wake model. Once the joint distribution of wind speed and direction and wind speed at different turbines is known, the power output of wind farm is defined as:
(5) 
As can be seen in the equation above, the expected power of the wind farm depends on the number of turbines. Two wind farms with different number of turbines may give two different output energy. Therefore, to model the objective function, we need to find the correlation between wind farms, which can have different number of turbines. In other words, the correlation must be quantified between design sets (or wind farms) with different number of design vectors (or wind turbines). This work uses the Gaussian process over sets to quantify the correlation between wind farms and embed it into the Bayesian multiobjective optimisation framework. The details are provided in the next section. The second objective is the cost, which can be computed as nonlinear function of number of wind turbines [Mosetti1994]:
(6) 
The cost as the objective function depends only on the number of turbines and does not use computationally expensive wake simulation models.
3 Multiobjective Bayesian Optimisation over Sets
In MultiObjective Bayesian Optimisation, we start with a data set: , where is the number of variables, is the number of objectives and is the number of instances in the data. With this data, we want to fit a Gaussian Process () model that helps predict objective functions without direct evaluation. When dealing with more than one objective, there are two approaches to build surrogate models. The first consists of building a on a scalarized version of objective functions [scalarizing, Rahat2017, Knowles2006], with this procedure we are changing the nature of the problem into a singleobjective problem; this approach however might not be suited for every type of problem since the relationship between objectives might not always be trivial or easily expressed. The second alternative consists in building a Gaussian Process for each objective[mobopt, Emmerich2006, Emmerich2011, Yang2019]. This latter option is what will be used in this paper. Once we have fitted the surrogate models we aim to use them to search for a new decision vector that maximises an acquisition function. By acquisition function we mean a function that can help us understand if a decision vector is providing a good balance of exploitation and exploration. Various are the options for an acquisition function for example expected improvement [Mockus1975, Jones1998]probability of improvement [prob_of_improvement, Kushner1964] or lowerconfidence bound [upper_conf_bound]. In this work, given the surrogates for multiple objectives we will be using the expected hypervolume improvement (EHVI) [EHVI, Emmerich2006, Yang2019, Daulton2020].
Once the new decision vector () has been selected after maximising the acquisition function, we compute an expensive evaluation (, where and and add to the existing data and repeat the process. The optimisation finishes when a predefined termination criterion is reached (e.g. number of expensive evaluations). An outline of the algorithm is shown in Algorithm 1, where is the expected hypervolume improvement.
To deal with the type of objective function in the wind layout optimisation problem, the traditional formulation of Gaussian Processes cannot be used. Two are the main reasons for this: Firstly, we have data such that in each instance (layout) we have a collection of solutions (turbines); this means that each data point is represented as a matrix . Secondly, we need to take into account the fact that layouts can have a different number of turbines. Therefore, we apply the Gaussian over sets as the Bayesian model and find a new set by maximising the acquisition function.
3.1 Gaussian Processes Over Sets
With we aim to build a surrogate model that can predict the output of an objective function without an explicit evaluation. One of the major advantages of the Gaussian process as the Bayesian model is that it provides uncertainty in predictions. This uncertainty can help in finding the promising decision vectors in the subsequent iterations of the Bayesian optimisation.
A
can be described as a multivariate normal distribution with mean
and covariance matrix [Rasmussen2006] such that given a function we have:The mean and covariance matrix
are both dependent on a kernel function which quantifies the correlation between instances in the data. For simplicity in calculations, we assume the zero mean. A variety of kernel functions can be used e.g. Radial basis function (RBF), Matern or linear kernel. In this work, we use a RBF (also known as squared exponential or Gaussian kernel):
(7) 
where
is the vector of hyperparameters to be estimated when building the model,
is the squared Euclidean distance between two decision vectors and and is the Kronecker delta function. The hyperparameters can be estimated by maximising the following likelihood function:(8) 
where with representing rows of the data matrix and . Once the optimal parameters are found, we can predict the value of the objective function of a new instance
using the following posterior predictive distribution:
(9) 
where is the covariance vector between and the training data . To deal with the problem of finding the correlation between sets, we utilise Gaussian Process Over Sets [gpoversets] which is based on the idea of using a set of decision vectors. A design set can be written as:
(10) 
where is the number of decision variables and is the number of decision vectors. In wind farm layout optimisation, the xcoordinate and ycoordinate are the decision variables and therefore, and the number of turbines vary for different wind farms. The Over Sets requires adjustments to the traditional Gaussian Process algorithm, namely the way that the correlation between sets is calculated. For Over Sets, we compute correlation between sets as follows:
(11) 
Two important advantages of using the kernel over sets mentioned above is the resulting covariance matrix is positivedefinite and the order of the decision vectors in the set do not effect the correlations between sets. The data set for building the model with sets is:
(12) 
The posterior predictive distribution (equation (9)) becomes:
(13) 
where and is the correlation vector between the new set and the training data . This formulation is compatible with the objective function formulation that is used in this paper since it allows to compute the correlation of instances with different lengths, where the length corresponds to the number of decision vectors in a set.
3.2 Problem Encoding
In order to find a new design set, we use the expected hypervolume improvement as the acquisition function. The hypervolume is the dimensional Lebesgue measure of the space dominated by a finite approximation of the nondominated solutions in the objective space , which is bounded by a reference point :
(14) 
The hypervolume improvement of a vector is:
(15) 
and expected hypervolume improvement contribution of a new solution is:
(16) 
where and
are the approximated mean and standard deviations from the Bayesian model. An optimiser e.g. an evolutionary algorithm can be used to maximise the EHVI. To have the same number of decision variables in maximising EHVI, we impose an encoding. We represent a wind farm with a finite number of grid points (where turbines can be installed). In this layout, the
represent a turbine being present in a specific grid point, and represent an empty space in the grid. Note that, the Gaussian process as the Bayesian model is trained on two dimensional real coordinates, which are then encoded to 0 and 1 to maximise the acquisition function. An illustration of encoding representation of a wind farm (with grid points) is shown in Figure 2. This work uses the binary genetic algorithm to maximise the EHVI in finding a new vector of 0s and 1s. This vector is then decoded back to real coordinates to get a design set. This design set is then used to calculate expensive objective functions and the model is retrained. This encoding allows to easily switch between the real (for training the models) and binary coordinates (for maximising the acquisition function).4 Results and discussion
We applied the Bayesian optimisation on a wind farm with maximum of 400 turbines spaced in grid^{1}^{1}1We used the EHVI implementation available athttps://liacs.leidenuniv.nl/~csmoda/index.php?page=code. The rest of experimental settings are as follows:

Initial number of sets (or wind farms) = 20 (as ), generated randomly with minimum of two and maximum of 400 turbines

Kernel function in Gaussian process: Radial basis (or squared exponential) with noise

Algorithm to maximise the likelihood function: Differential evolution from SciPy [scipy]

Algorithm to maximise the EHVI as the acquisition function: Binary Genetic algorithm from PyMOO [pymoo]

Maximum number of function evaluations: 100 including 20 initial sets

Maximum number of independent runs = 3

Diameter of rotor of the turbine: 82 meters (standard diameter of a turbine)

Data set to find the joint distribution of wind speed and wind direction: Engie Open Data (an open source data set) [mydata]

Method to find the joint distribution between wind speed and direction: kernel density estimation from Scipy with bandwidth selection as per the Scott’s rule
[Scott1992] 
Minimum distance between turbines =

Wake model: Jensen from the PyWake library [pywake]
The joint distribution of wind speed and direction estimated with kernel density estimation is shown in Figure 3. As can be seen, there are two major peaks in the distribution, which are at the Southwest and Northeast directions as also can be seen in the polar histogram plots of wind direction in Figure 4.
The joint distribution of wind speed and direction and wind speed estimations facing the turbines via wake model were used in getting the power of the wind farm. After running the algorithm for 100 expensive evaluations, we obtained an approximated Pareto front and is shown in Figure 5 (for the run with median hypervolume value). As can be seen, the algorithm was able to explore in different regions of the Pareto front and obtained a set of tradeoff solutions.
Each point on the approximated Pareto front represents a wind farm with a different number of turbines. The set based kernel in Gaussian process consider the correlation between wind farms. In other words, two wind farms with similar x and y coordinates have a large correlation compared to two wind farms with different x and y coordinates. For instance, consider six wind farms shown in Figure 6. These wind farms have different number of turbines. Some of the wind farms e.g. (‘a’ and ‘b’), (‘c’ and ‘d’) and (‘e’ and ‘f’) have turbines in the similar locations. Therefore, we expect to see a higher correlation between these pairs compared to e.g. (‘a’ and ‘e’) or (‘b’ and ‘d’). For a length scale, , amplitude
, and noise variance
, we plot the covariance matrix representing the correlation between these wind farms in Figure 7. As can been seen, obviously the correlation values are the largest among diagonal elements. After the diagonal elements, the correlation values follow the kernel calculations. For example, the wind farms in the decreasing order of correlation with wind farm ‘a’ are (‘a’, ‘b’, ‘d’, ‘e’, ‘c’, ‘f’) and with wind farm ‘e’ are (‘e’, ‘f’, ‘a’, ‘c’, ‘d’, ‘b’).Figure 8
shows the hypervolume with the number of function evaluations. The solid line is the median and shaded region represents the 95% confidence interval. As can be seen, the hypervolume increased with the number of function evaluations.
Figure 9 shows three wind farms (or sets) from the approximated Pareto front. The first two wind farms are the extreme points on the approximated Pareto front with 2 and 228 turbines. The third wind farm has 97 turbines. We did not select one wind farm as the final solution as it depends on the expert or decision maker to select a wind farm based on the objective function values and their preferences. Presenting wind farms with their objective function values can provide useful insights to decision makers to make an informed decision for wind farm layout design.
5 Conclusions
mIn this work, we applied Multiobjective Bayesian optimisation with expected hypervolume improvement as the acquisition function to handle the computationally expensive optimisation problem. To handle the uncertainty in wind direction and wind speed, we modelled the data with kernel density estimation and used their joint distribution. As the power output of the wind farm relied on design sets, we utilised the kernel over sets in Gaussian process and embedded it into the multiobjective Bayesian optimisation framework. As shown in the results, the method was able to obtain a set of approximated Pareto optimal solutions. The method was able to handle and quantify the correlation between different wind farms with different number of turbines. The hypervolume as the performance metric showed the validity of the method.
This paper presented the first attempt at trying to solve the wind farm layout optimisation using Multiobjective Bayesian optimisation over sets. We believe that there are particular aspects of the model that can be further investigated and improved. For example, finding correlations between wind farms could be improved with a different set based kernel. Moreover, the computational complexity of set based kernel is , which is higher than the computational complexity of traditional Gaussian process, ( is the number of sets (or vectors in traditional Gaussian process), is the cardinality of the set and is the number of decision variables. Further research could provide ways of reducing the computational cost of the model. In this work, we built a surrogate model for the cost function which was not computationally expensive and was a function of the number of turbines. This modelling of the cost function may not be required to maximise the expected hypervolume improvement. Working on such heterogeneous objective functions in Bayesian optimisation is also a future research direction. Utilising different wake models including computational fluid dynamic (CFD) solvers will also be considered as one of the main future research directions. We believe that this work can lead to an increase interest in set based optimisation especially in Multiobjective Bayesian optimisation which will widen its application to other research areas, especially where the objective functions rely on a set of solutions.
Acknowledgments
The authors would like to thank Prof. Kishalay Mitra, Prof. Jonathan Fieldsend, Prof. Richard Eversion, Dr Srinivas Soumitri Miriyala and Dr Priyanka D Pantula for initial discussions on optimising wind farms. This research was partially supported by South Asia Development Fund at the University of Exeter, UK.