 # The perils of automated fitting of datasets: the case of a wind turbine cost model

Rinne et al. conduct an interesting analysis of the impact of wind turbine technology and land-use on wind power potentials, which allows profound insights into each factors contribution to overall potentials. The paper presents a detailed model of site-specific wind turbine investment cost (i.e. road- and grid access costs) complemented by a model used to estimate site-independent costs. We believe that propose a cutting edge model of site-specific investment costs. However, the site-independent cost model is flawed in our opinion. This flaw most likely does not impact the results presented in the paper, although we expect a considerable generalization error. Thus the application of the wind turbine cost model in other contexts may lead to unreasonable results. More generally, the derivation of the wind turbine cost model serves as an example of how applications of automated regression analysis can go wrong.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Intro

Rinne et al.  conduct an interesting analysis of the impact of wind turbine technology and land-use on wind power potentials, which allows profound insights into each factor’s contribution to overall potentials. The paper presents a detailed model of site-specific wind turbine investment cost (i.e. road- and grid access costs) complemented by a model used to estimate site-independent costs. We believe that  propose a cutting edge model of site-specific investment costs. However, the site-independent cost model is flawed in our opinion. This flaw most likely does not impact the results presented in the paper, although we expect a considerable generalization error. Thus the application of the wind turbine cost model in other contexts may lead to unreasonable results.
More generally, the derivation of the wind turbine cost model serves as an example of how applications of automated regression analysis can go wrong.

## 2 The Rinne et al. model of wind turbine cost

The functional form of the model is described by equation (8) in , which we restate as

 y=β1f1(x1)+β2f2(x2)+β3f3(x3)+C. (1)

The model is a linear combination of several basis functions , specifically and . All possible combinations of

are considered as candidate models. Subsequently, linear regressions are run for all models and the one with the lowest RMSE is selected. In accordance with the final wind turbine cost model presented in

, we write (8) in its explicit form

 SpecificCost(hh,p,r,age)=620ln(hh)−1.68pr2π+182√age−1005. (2)

We inserted all constants as in the paper and renamed the parameters to easier identify their meaning. Parameter should be read as hub height, is rated capacity, is the rotor radius and indicates when the turbine came to the market. Note that we employ four parameters instead of three, since we replaced the composite input parameter specific power from  with its elementary constituents and . Analytically, these models are completely identical.

## 3 Where the model fails

Equation (2) implies that a wind turbine’s total costs are

 TotalCost(hh,p,r,age)=p(620ln(hh)−1.68pr2π+182√age−1005) (3)

Our main concerns are (i) the scaling behavior of (3) as well as the choice of (ii) input parameters and (iii) basis functions.

### 3.1 Scaling

In essence, a consistent model should mirror a realistic scaling behavior of all parameters. For example, in total cost a large turbine needs to be more expensive than a small turbine, although specific costs may decrease with size. Figure 1 shows that basic scaling relations of the rated power are violated in equation (3) for certain parameter regimes. It depicts the scaling of investment cost with respect to rated power and fixed , , and for one of the four turbines discussed in , assuming a hub height of 75 meters.
Note, that equation (3) has a quadratic term in , therefore the function has one maximum. For above that maximum, total turbine costs decrease in absolute terms while increasing rated capacity . This is counter-intuitive and cannot be explained by economies of scale, which should at best lead to declining marginal increases in total cost.

We refer to values below the maximum cost as plausible region (shown in turquoise) and above the extremum as implausible (shown in red). Here, plausible means that a point on the curve may replicate real cost behaviour, even though we make no judgement on the correctness of the result in this regime. In contrast, points in the implausible region seem incorrect a priori due to an implausible scaling. In the implausible region, increasing turbine capacity, everything else being equal, will decrease cost predicted by the cost model. For some values, turbine costs even become negative. However, the turbine models in  are all in the plausible range. Figure 1: We plot the region of plausibility (turquoise) and implausibility (red) for the Vestas V90-3.0 MW with hub height of 75 meters taken from Table 1 in , while varying their respective rated capacity. X denotes the example turbine from . The turbine is in the plausible region, however at higher rated powers costs are decreasing and even negative. Calculations are performed through the R-Script from .

One could presume that any reasonable choice of turbine parameters remains in the plausible region. This, however, is not the case. Building on the US wind turbine database  (which includes hub height, rotor length, rated capacity and date of installation), we have determined whether all distinct 650 turbine types are within the plausible region111Market age is approximated by time of installation of the turbines. Figure 2 indicates that several real turbines are actually outside the plausible region, i.e. their costs most likely are underestimated by the Rinne et. al. model (3). The real turbines that fall outside the model’s regime of validity tend to be new and have relatively high specific power. Figure 2: Assessment of all turbines in US Wind Turbine Database . Red turbines are in the plausible regime, while blue turbines are in the implausible region. Calculations are performed through the R-Script from 

### 3.2 Choice of variables in the model

The selection of variables in  is neither motivated nor validated. We understand that hub height and age as a proxy of technological progress may have an impact on costs. However, the parameter power density is not necessarily directly related to investment costs. While the authors convincingly argue that the specific power of recent turbine models has been decreasing over time and cite related work that shows that decreased specific power leads to decreased LCOE, that does not directly indicate that a lower power density reduces investment costs.
Observe that specific power is a ratio of two parameters. Thus, high specific power is attained in two different ways: either by installing larger generators or smaller blades. While smaller rotors certainly lead to lower cost, a higher rated capacity requires a larger generator, increasing costs. This illustrates that varying two different parameters may both cause a high specific power, even though both processes have opposite effects on cost. Figure 2 shows that this is empirically relevant.
We agree that there may be a correlation between investment cost and specific power. Still, it cannot necessarily be asserted that a causal relationship between the two parameters exists. This may be related to a confounding variable, i.e. a variable the two parameters are both related to.

### 3.3 Choice of fitting functions

The authors dont motivate why the particular basis functions were chosen. It is unclear why polynomials of other degrees, the exponential function or any other functions are disregarded. For example,  models the same problem through an exponential function. The well-known NREL cost model  uses linear combinations of polynomials with fractional exponents between 2 and 3. It is not self-evident what functional form is the most natural one and thus the authors should motivate their choice. Furthermore, picking a model through minimizing RMSE is dependent on the allowed basis functions. It is well known that

data points can be interpolated exactly by a degree

polynomial. Therefore a polynomial model of sufficiently high degree is guaranteed to attain an RMSE of 0. In addition, the choice to model age (in their definition years before 2016) by a squareroot leads to the undesired side effect that predictions for turbines built after 2016 are impossible, due to negative terms under the squareroot.

## 4 Conclusions

”With four parameters I can fit an elephant, and with five I can make him wiggle his trunk” quipped John von Neumann, illustrating the perils of curve fitting. In summary, we have shown that  use an investment cost model that exhibits unrealistic scaling in certain, empirically relevant, parameter regimes: the costs of several turbine models employed in the US are very likely misrepresented by the Rinne et al. model. Rinne et al.’s cost model works for their particular parameter space, but it is not valid for a wider range. We therefore caution against a naive utilization in other studies.
Rinne et. al. need to explicitly limit the parameter space of their model to ensure validity of their results or replace the model by a different specification. Figure 1’s scaling in the regime of implausibility shows that after a certain point generators are predicted to be cheaper in absolute numbers if installed with a higher output. This behaviour is concerning since the field of power system analysis relies heavily on numerically solving optimization problems. If (2) inputted into a numerical optimization solver without further precautions that ensure that the solver is restricted to the plausible region, optimal solutions may be implausible.

## 5 Discussion of Typo

There is an error in Table 3 of  that most likely stems from a typo or simple copy paste error. The values in Table 3 are not reproducible from equation (8) in , when the corresponding values are inserted. In the following we restate example parts of table 3 and contrast it with our results.

Specifically, if the age of the turbines in Table 3 is in inserted into (8) different values than given in Table 3 result. However, we were able to reproduce the values in Table 3 by inserting 0 for the parameter age. This points to the fact, that 0 has been inserted by mistake. We point out that this error should be corrected since it may be otherwise cited wrongly by subsequent publications.

## References

•  Erkka Rinne, Hannele Holttinen, Juha Kiviluoma and Simo Rissanen, Effects of turbine technology and land use on wind power resource potential, Nature Energy 3, 494–500 (2018). (Link)
•  Hoen, B.D., Diffendorfer, J.E., Rand, J.T., Kramer, L.A., Garrity, C.P., Hunt, H.E. (2018) United States Wind Turbine Database. U.S. Geological Survey, American Wind Energy Association, and Lawrence Berkeley National Laboratory data release: USWTDB VUSWTDB V2.0.20190424 (April, 2019). (Link)
•  David Severin Ryberg, Dilara Gulcin Caglayan, Sabrina Schmitt, Jochen Linßen, Detlef Stolten and Martin Robinius; The Future of European Onshore Wind Energy Potential: Detailed Distribution and Simulation of Advanced Turbine Designs. Preprints 2018, 2018120196 (doi: 10.20944/preprints201812.0196.v1). (Link)
•  Fingersh, L., M. Hand, and A. Laxson. Wind turbine design cost and scaling model. Technical Report, No. NREL/TP-500-40566. National Renewable Energy Lab.(NREL), Golden, CO (United States), 2006. (Link)
•  Claude Klöckl, Katharina Gruber, Peter Regner, Sebastian Wehrle and Johannes Schmidt, Github Repository, DOI: 10.5281/zenodo.3066230 2019.