The coupling of models for the different components of the Soil-Vegetation-Atmosphere-System is required to investigate component interactions and feedback processes. However, the component models for atmosphere, land-surface and subsurface are usually operated at different resolutions in space and time owing to the dominant processes. The computationally often more expensive atmospheric models, for instance, are typically employed at a coarser resolution than land-surface and subsurface models. Thus up- and downscaling procedures are required at the interface between the atmospheric model and the land-surface/subsurface models. We apply multi-objective Genetic Programming (GP) to a training data set of high-resolution atmospheric model runs to learn equations or short programs that reconstruct the fine-scale fields (e.g., 400 m resolution) of the near-surface atmospheric state variables from the coarse atmospheric model output (e.g., 2.8 km resolution). Like artificial neural networks, GP can flexibly incorporate multivariate and nonlinear relations, but offers the advantage that the solutions are human readable and thus can be checked for physical consistency. Using the Strength Pareto Approach for multi-objective fitness assignment allows us to consider multiple characteristics of the fine-scale fields during the learning procedure.
The mass and energy fluxes at the interception between soil and atmosphere significantly impact processes in the atmosphere, at the land-surface and in the subsurface.
The Transregional Collaborative Research Centre 32 on ’Patterns in Soil-Vegetation-Atmosphere-Systems’ aims at a better understanding of the interplay of processes, patterns, and structures between the different components at different scales in space and time [Vereecken et al., 2010].
To account for interactions and feedbacks between the different components of the soil-vegetation-atmosphere system, coupled modeling systems are increasingly used (e.g., TerrSysMP by Shrestha et al. ).
When coupling different component models a scale gap occurs:
modeling land-surface and subsurface typically requires higher resolutions, while atmospheric models are often computationally too expensive to be employed at the same high resolution.
The lower atmospheric boundary layer (ABL) is usually very heterogeneous due to the production of turbulent kinetic energy by shear and buoyancy effects, even more so over heterogeneous terrain. The representation of the spatio-temporal variability in the lower ABL is very important, because the fluxes of matter and energy are driven primarily by the local differences of state variables at the surface and the lowermost atmosphere. Many processes at the earth’s surface are nonlinear, e.g., processes such as runoff production or snow melt, which are threshold dependent, or the turbulent exchange coefficients, which are nonlinear functions of the near-surface atmospheric stability [Schomburg et al., 2010]. Modeling exchange processes using averaged parameters or state variables can introduce biases [Schlünzen and Katzfey, 2003]. To avoid such biases a downscaling algorithm is needed that reproduces the atmospheric variability near the surface and in order to provide reasonable driving data for land-surface and hydrological models.
Downscaling methods have initially been developed for general circulation models (GCMs). Various review articles summarize the large number of downscaling methods existing nowadays [Wilby and Wigley, 1997; Xu, 1999; von Storch et al., 2000; Fowler et al., 2007; Maraun et al., 2010].
The general idea behind downscaling is to model the variables on a smaller based on the large scale model output. In dynamical downscaling a limited area model at high-resolution is applied. Dynamical downscaling is computationally expensive, but it provides physically consistent spatial fields on the smaller scale. Statistical downscaling is computationally less demanding: regression models, computationally probably least expensive, employ a so-called transfer function between the large-scale predictors and the local-scale predictant. Examples of regression approaches for downscaling incorporate for instance multiple linear regression, principal component analysis (e.g.,Wagner et al.  or Simon et al. ), canonical correlation analysis (e.g., Friederichs and Paeth ), artificial neural networks (e.g., Schoof and Pryor  or Carreau and Vrac ) or kriging (e.g., Biau et al. ). For precipitation downscaling also fractals are used (e.g., Deidda 
). Weather generators (WGs) can be seen as an expansion of the regression models, as WGs not only try to estimate the mean but also the variance. Finally, weather typing or classification schemes relate synoptically or statistically defined weather classes to the local climate. The local variables are modeled conditioned on the weather classes.
Fiddes and Gruber  recently introduced TopoSCALE to downscale GCM data to scales of 100 m. TopoSCALE is a physically based downscaling scheme designed for regions with complex terrain, where vertical gradients of the atmospheric variables often dominate over horizontal gradients. A recent study by Malone et al.  presents a downscaling methodology where the uncertainty of the target variable is taken into account when fitting a nonlinear regression model.
In general, downscaling of meso-scale model output is much less common than the downscaling of GCM output, but most concepts on the regional climate scale can be adapted to the smaller scales. As we aim at a downscaling algorithm applicable within a coupled modeling system, methods with low computational costs are preferable. Schomburg et al. [2010, 2012]
have presented a three-step downscaling scheme combining a bi-quadratic spline-interpolation, deterministic rules based on linear regression and autocorrelated noise. The scheme works well for certain variables and weather conditions, but a linear regression appears to be insufficient under more complex conditions. In this study we apply an advanced regression approach taking into account multivariate and nonlinear relations. Artificial neural networks (ANNs) are widely used for downscaling, but we take an alternative approach, Genetic Programming (GP). Like ANNs, GP can handle multivariate and nonlinear relations, but with the advantage that the solutions take the form of an equation or program code, which is human readable and thus can be checked for physical consistency.
1997; Koza, 1992]. As the name implies such methods are inspired by the concept of natural evolution. Given a training data set and a fitness measure that quantifies the quality of a candidate solution (i.e., a downscaling rule) GP automatically generates solutions. GP typically starts by randomly generating an initial population of candidate solutions. Each following generation is created by mutating and recombining the candidate solutions from the previous generation with the best solutions most likely contributing to the new generation.
In atmospheric and related sciences GP is applied rather rarely. Liong et al. , for instance, applied GP for rainfall-runoff modeling. Parasuraman et al.  and Izadifar and Elshorbagy  employed GP for modeling evapotranspiration or latent heat from meteorological variables. Ghorbani et al.  used GP for sea water level forecasting. In these four studies GP performed reasonably well. Three of these studies compared the performance of GP and ANNs. Both methods preformed about equally well, which underlines that GP is an eligible alternative to ANNs, which are more popular in geosciences.
Only very few studies apply methods from evolutionary computation to atmospheric downscaling. To our current knowledge all of them deal with downscaling of GCM output to a station or catchment mean. Coulibaly  utilized GP for simulating daily extreme temperature at a weather station located in the Chute-du-Diable basin in Northeastern Canada. GP yielded good results, outperforming the widely used Statistical Down-Scaling Model (SDSM) by Wilby et al. . Hashmi et al. 
employed Gene Expression Programming (GEP), which is a variant of GP, for downscaling watershed precipitation in the Clutha River watershed on New Zealand’s South Island. In both studies, GP/GEP not only performed better than the SDSM, but also required fewer predictor variables.Liu et al.  compared the performance of a GP based method (evolutionary polynomial regression) with a feed forward neural net and the SDSM for downscaling daily total precipitation and daily maximum and minimum temperature in the Chute-du-Diable basin. Both nonlinear machine learning methods performed about equally well and better than the SDSM, especially for precipitation.
We aim at simulating the temporal evolution of spatial fields not a time series at a single grid point, i.e., we have to consider three dimensions, two in space and one in time. We derive deterministic rules relating the fine-scale patterns to coarse model output and high-resolution information on surface properties. Initially, we attempted to reproduce the training data fields as exactly as possible, but we quickly realized that this approach might not be appropriate for our problem. An exact fit of the high-resolution fields is not possible, because of the inevitable unpredictable component that a downscaling procedure can not reproduce. Furthermore, discretization, i.e., the representation of a continuous system on a discrete grid, always induces uncertainty. However, for an improved representation of the mass and energy fluxes at the surface, modeling the sub-grid scale properties statistically might be sufficient. A fuzzy approach for defining the fitness measure seems more appropriate than taking the exact resemblance to the high-resolution model output fields as criterion. We thus adjusted our objective and seek to retrieve realistic spatio-temporal patterns on the fine-scale rather than reproducing the exact model output fields. This task is very similar to the verification of spatial fields.
For verification of forecasts from high-resolution numerical models fuzzy, neighborhood-based methods have already been developed during the last decades (e.g., Ebert ). In the fuzzy approach it is rewarded when a model forecast is close to the observation in patterns and structures, which can be achieved by loosening the requirements for exact matches between forecast and observation comparing neighborhoods of forecast and observation points. The Strength Pareto Approach allows the use of more than a single measure to quantify the quality of a potential downscaling rule and offers the liberty to chose the best compromise between the different objectives in hindsight.
This article is structured as follows. After introducing the data (Section 2), the downscaling methodology based on multi-objective GP is explained in detail (Section 3) and tested for the disaggregation of near-surface temperature (Section 4). Finally, we conclude and give an outlook on future work (Section 5).
2 Background and Data
Our GP based downscaling is developed in order to be applied in a modeling platform, such as TerrSysMP by Shrestha et al. , which couples land-surface and atmospheric models operating on different scales.
TerrSysMP is composed of the atmospheric model COSMO [Doms and Schättler, 2002; Baldauf et al., 2011], land-surface model CLM [Oleson et al., 2004] and hydrological model ParFlow [Kollet and Maxwell, 2006].
The three component models are coupled via the external coupler OASIS, which contains the downscaling scheme by Schomburg et al. .
The scheme consists of three steps:
(1) a bi-quadratic spline interpolation interpolates the coarse-resolution atmospheric data to a higher resolution while conserving the coarse field’s mean and gradients;
(2) so-called ’deterministic’ downscaling rules are applied;
(3) temporal autoregressive noise is added to restore the fine-scale variance missing after step 1 and 2.
The deterministic rules applied in step 2 exploit empirical relations between atmospheric variables and surface properties. Two rules were derived from physical considerations: near surface pressure is downscaled by exploiting the hydrostatic equation; under cloud-free conditions, shortwave net radiation can be downscaled using surface albedo as predictor.
In case there is no known physical relation the training data set was examined for possible statistical correlations between high-resolution near-surface atmospheric fields and land-surface properties. A rule detection algorithm that calculates Pearson correlation coefficients for subsets of the training data was implemented, with the subsets selected based on indicators, e.g., cloud cover, to distinguish clear sky conditions from others. The algorithm detected two applicable rules: for unstable atmospheres, near-surface temperature can be downscaled using orographic height as a linear predictor; for situations with cloud cover , longwave net radiation can be disaggregated using ground temperature as linear predictor. Under more complex conditions and for other near-surface variables (specific humidity, wind speed and precipitation) the algorithm could not find any deterministic downscaling rules applicable to a sufficiently large subset of the training data.
The algorithm by Schomburg et al.  can only detect linear relations and is limited to dependencies between two variables. The aim of this study is to improve the downscaling using a more flexible approach, which is able to model more complex nonlinear and multivariate relations. This flexibility is achieved employing Genetic Programming (GP).
We use the same training data set as Schomburg et al. , which consists oft high-resolution (400 m) COSMO simulations on a domain of 168 km 168 km in western Germany. The domain is centered over the Rur catchment, which is the main investigation area of TR 32. The data set contains hourly output for 8 simulation periods of 1 to 2 days with different weather conditions (Table 2). From the simulations, we extract the inner 112 km 112 km, i.e., 280 280 grid points to exclude boundary effects. The downscaling scheme by Schomburg et al.  has been trained for disaggregation from 2.8 km resolution down to 400 m and consists of three steps. In this study we consider the same scales, i.e., we train on a downscaling by a factor of 7.
As motivated in the introduction, we seek to reproduce realistic spatio-temporal patterns of the near-surface atmospheric state variables, not necessarily the exact high-resolution model output.
To this goal, we formulate the downscaling problem as a multi-objective optimization problem, in order to consider multiple characteristics of the fine-scale fields, for instance the spatially distributed variance.
Summing up the different objectives to be minimized is, however, problematic since the objectives may have different units and ranges.
A scaling procedure would be required, but the risk of treating the objectives unequally or getting trapped in a local minimum would remain.
The Strength Pareto Approach (SPEA) by Zitzler and Thiele  provides a solution to this problem.
After giving a general introduction to GP, we introduce the concept of Pareto optimality SPEA is based on. Finally, we introduce multi-objective GP step by step. Large parts of our are based on the GPLAB package for Matlab by Silva and Almeida .
3.1 Genetic Programming
’Genetic Programming addresses the problem of automatic programming, namely, the problem of how to enable a computer to do useful things without instructing it, step by step on how to do it’ (J. Koza in Banzhaf et al. ).
GP is one of several methods within the area of evolutionary computation.
As the name implies these methods are inspired by the concept of natural evolution.
In nature an individual is exposed to environmental pressure.
Its chance to survive, reproduce and consequently contribute to the next generation is dependent on its fitness with respect to the environmental conditions.
In evolutionary computation the problem to be solved corresponds to environmental pressure. GP evolves a solution to the problem in a similar manner as evolution happens in nature, i.e., a solution is developed over several generations, each consisting of a large number of candidate solutions. The candidate solutions, also called individuals in analogy to the evolution terminology, are composed of program code.
The candidate solutions forming the initial generation are automatically and often randomly generated and then tested on the given problem. Each succeeding generation is evolved by applying so-called genetic operators, which recombine and modify individuals from the preceding (parent) generation. The better a candidate solutions solves the given problem, the greater its chance is to contribute to the new generation [Eiben and Smith, 2003].
Genetic Programming differs from other methods of evolutionary computing by the representation of the individuals. In GP the individuals are traditionally represented by parse trees. Figure 1 shows a simple example of a parse tree consisting of 7 nodes arranged on 4 levels. The parse tree in Figure 1 embodies an equation consisting of arithmetic functions, variables and one constant. A parse tree is read starting from the bottom, i.e., in figure 1 and serve as inputs arguments to the subtraction, the output () and serve as input arguments to the multiplication and so on.
In general a parse tree consists of functions and so-called terminals. The set of functions and terminals used in a GP run is typically defined by the user and adapted to the problem to be solved. A function set can contain all kinds of functions, for instance arithmetic functions (), transcendental functions (,…), or conditional statements (if then else,…) [Banzhaf et al., 1997]. The terminals have their name, because they do not have input arguments and thus terminate the branches of the tree. Terminals serve as fundamental input to the functions. Thus, a terminal set can include variables, constants and zero-argument functions, such as a random number generator.
3.2 Pareto Optimality
The term Pareto optimality originates in economics.
The state of an economic system is called Pareto optimal when economic resources are distributed in such a way that it is impossible to improve the situation of one person without detereorating the situation of at least one other person.
For optimization problems that involve multiple, sometimes conflicting objectives there is possibly not one single optimal solution. Usually there exists a set of alternative solutions in which no solution is optimal in the sense that it is superior to all other solutions when considering all objectives.
The multiple objectives correspond to different quality criteria of the desired solution. We denote the objective space containing all objective functions as and the solution space containing all potential solutions as . An objective is in general calculated by comparing prediction (downscaled fields) and reference (high-resolution model output) . The prediction results from the solution (downscaling rule)
being applied to the vector of predictorsx. Thus, incorporating all dependencies, we can write . For simplicity in the following we only include the dependency on the solution explicitly.
Be the objective vector, i.e., the vector containing all objectives. Then the multi-objective minimization problem can then be written as,
Let us consider two solutions . The solution is said to dominate () if and only if
Or in words, dominates if is as at least as good as with respect to all objectives, and there exists at least one objective where is better than .
The solution is said to cover () if or , i.e., either dominates or they both perform equally well concerning all objectives. The solutions that are not dominated by any of the elements in the solution space are called Pareto optimal. Figure 3 shows an example of a minimization problem with two objectives. The squares constitute the set of Pareto optimal solutions, the circles correspond to the non-optimal solutions.
3.3 Multi-objective Genetic Programming
For multi-objective fitness assignment we have implemented the Strength Pareto Approach (SPEA) by Zitzler and Thiele . SPEA requires two main changes to traditional GP. Firstly, each generation is split into two sets, called populations and . The population is evolving over time as in traditional GP, whereas the second population , the so-called Pareto set, contains all Pareto optimal solutions. Secondly, the fitness calculation for individuals in both populations and is based on a comparison between the individuals rather than on absolute values.
GP requires a training data set to learn from, function and terminal sets to build the candidate solutions, and fitness measures (objectives) to quantify the quality of the candidate solutions. Additionally, the user can set controlling parameters, such as population size, maximum size of the Pareto set, maximum tree size or the probability to select the various genetic operators. Finally, a termination criterion that stops the evolution must be provided, for instance a maximum number of generations.
Figure 2 shows a basic flowchart of the GP algorithm incorporating SPEA, which we explain in the following.
(1) An initial population of candidate solutions (individuals) is generated. The initial population can be created randomly or include known approximate solutions of the given problem.
(2) Each candidate solution (each potential downscaling rule) is applied to the training data set.
(3) The result of each candidate solution is evaluated according to the objectives.
(4) The Pareto set is updated: All individuals in population that are nondmiated within are moved to the Pareto set . The individuals in that are covered by another member of are removed from the Pareto set. In case the number of individuals stored in exceeds the given maximum,
is pruned by hierarchical clustering. A clustering procedure helps to preserve the solution diversity while shrinking the Pareto set. To make sure that all objectives are considered equally, we scale the values before applying the clustering procedure. As in our case, each objectiveis positively defined, we calculate the scaled objective as , i.e., from the objective we subtract the minimum and divide the result by the maximum occurring in the current Pareto set. The fitness results from a greater than/smaller than comparison, i.e., scaling does not effect the fitness. Note that the scaling is only applied within the clustering step.
(5) The fitness of each individual in and is calculated according to SPEA by comparing the individual’s performances. (For details see Section 3.3.3)
(6) If the termination criterion is met, the final Pareto set is returned. If the termination criterion is not met, the algorithm continues with (7).
(7) The next generation is created by combining and mutating individuals from the current . The creation of the new generation can be split into two steps. First, a sampling procedure is applied to determine the parents. Second, genetic operators (crossover, mutation) are applied to create new individuals.
(7.1) For sampling we use the lexicographic parsimony pressure [Luke et al., 2002] as it is implemented in GPLAB. A number of individuals is randomly drawn from the current . The individual drawn with the best fitness is to become parent. In case several individuals are equally fit the smallest one is chosen.
(7.2) The genetic operators are applied to the individuals. Crossover recombines two parents. The parent parse trees are cut at a randomly chosen node and the separated subtrees are exchanged. (Subtree)-mutation cuts a randomly chosen subtree from the parent and replaces it by a new randomly created subtree. Parent selection and application of genetic operators, are repeated until enough individuals for the new generation have been created.
Starting from (2) the succeeding steps are iteratively repeated until the termination criterion is met (see (6)).
3.3.3 Fitness Assignment
In SPEA the fitness assignment consists of two steps.
(1) Each solution in the Pareto set, , is assigned a real value called fitness . The fitness is proportional to the number of individuals that are covered by , i.e., . Let be the total number of individuals in . Then is defined as . To clearly separate between the fitness of individuals in and , the fitness of the individuals in is also called strength, hence Strength Pareto Approach.
(2) The fitness of an individual in the population , , is calculated as the sum over the fitness of all individuals in the Pareto set, , that cover , , where . One is added to the sum to ensure the individuals in the Pareto set have better fitness than those in .
Figure 3 shows one possible scenario of a minimization problem with two objectives and . The values indicate the fitness. The circles correspond to the individuals in , the squares to the individuals in . The lowest point in Figure 3 shows an individual contained in the Pareto set that dominates 3 out of the 7 individuals in . Therefore its fitness equals . The next lowest point represents an individual from the population , which is dominated only by one individual with a fitness of . Hence its fitness calculates as .
Our downscaling aims at reproducing the fine-scale patterns, i.e., reproducing as much of the deterministic part as possible, while accounting for spatial noise or uncertainty.
Combining both aspects makes the original and the downscaled fields ’look similar’.
Note that we aim at predicting anomalies, i.e., the difference between the high-resolution and the spline-interpolated fields.
In the following this is not always explicitly mentioned.
One possibility to make two fields look similar is to make their values similar in distribution. Thus our first objective is to minimize the differences in distribution between the downscaled and the reference values (anomalies) thereby completely ignoring the location of the values. This difference is measured by the integrated quadratic distance . The
quantifies the distance between two cumulative distribution functions () and [Thorarinsdottir et al., 2013],
We apply the to the discrete histogram distributions. The histogram distributions are calculated using a bin width of 0.25 K. Let be the normalized histogram distribution for the high-resolution reference field at time step , and accordingly the normalized histogram distribution for corresponding the downscaled field. A single histogram bin is denoted by . From the histogram distributions, we calculate for each field (time step) separately (Figure 4) and take the mean over all time steps as objective, i.e.,
where denotes the number of fields contained in the reference data set.
The deterministic part is accounted for by using the root mean square error () as objective. The provides an estimate of the expected value, and smooths out those patterns of the fine scale fields that are not deterministically reproducible. To account for some uncertainty we introduce a fuzzy method. In our application we account for uncertainty in the location of the fine-scale structures by defining a neighborhood for each pixel . This neighborhood contains the four direct neighbors of (Figure 4). We incorporate the neighborhood in the calculation of the as follows:
Here, and denote the total number of pixels in x- and y-direction.
As noted, the spatial variability of the non-deterministic part is smoothed out by minimizing the
. In order to recover variability, we use the mean error of the coarse pixel standard deviationas third objective. Let denote a pixel on the coarse scale containing pixels on the finer scale (Figure 4). Note that unlike , is not defined as a sliding neighborhood, but via a fixed grid, i.e., the grid of the coarse model output to be downscaled, due to computational reasons.
The coarse pixel standard deviation of the reference field calculates as
with denoting the coarse pixel mean. Since we are dealing with anomalies . We can now define the third objective as
The size of the solution’s parse trees, i.e., the number of nodes, serves as the fourth objective.
Smaller parse trees can be checked for physical consistency more easily and are computationally less expensive.
Incorporating the size as objective gives furthermore information on a potential dependency between the quality and the complexity of the solutions.
In summery, we have three objectives quantifying the quality of the potential downscaling rules by considering different characteristics of the fine-scale fields at different scales, and one objective incorporating the solution’s complexity.
4 Downscaling near-surface temperature
We apply multi-objective GP to downscaling of near-surface temperature at 10 m height, which is the center of the lowest atmospheric layer in the COSMO model. High-resolution temperature fields can exhibit very different fine-scale patterns. In a well mixed ABL, temperature anomalies are in good approximation proportional to orographic height anomalies [Schomburg et al., 2010]. Under clear sky conditions during night time, temperature inversions cause cold air to drain into the valleys [Barr and Orgill, 1989], which leads to pronounced channel structures in the temperature field. The anomalies caused by cold air drainage can grow very large compared to the anomalies in a well-mixed boundary layer (Figure 5). Therefore, it is important to also capture this more complex process in the downscaling algorithm.
To test multi-objective GP for atmospheric downscaling, we use a cross-validation approach (i.e., leave one out) for three reasons.
First, it allows us to use a training data set that is as big as possible in each GP run.
Second, it allows us to check for overfitting, a well known problem especially when training on a small data set.
With a cross-validation approach, we can identify weather conditions for which our solutions can not be generalized.
Third, we gain information on the impact of the training data choice on the solutions by considering the similarity of GP solutions resulting from different training data sets.
As mentioned in Section 2 the complete data set contains 8 simulation periods (Table 2). For computational reasons we have extracted single time steps for training and validating GP. We have carried out 8 GP runs in total. In each run another day is left out for validation.
The potential predictors included in the terminal set are preselected from the COSMO model output based on our understanding of atmospheric processes, which influence near-surface temperature. The coarse-scale fields we pick are near-surface temperature, vertical temperature gradients of the lowest 25 m, 60 m, and 110 m as indicators for atmospheric stability, horizontal and vertical wind speed, and net radiation at the land surface. Fine-scale information is contained in the parameters describing the surface properties. We pick topographic height, plant cover, surface roughness length and surface albedo, as well as a few parameters derived from the topography field. The latter give information on local topography relative to its direct surroundings. All predictors are listed in Table 1.
The essential GP settings are summarized in Table 3. We run 200 generations with 100 individuals each, i.e., each run evaluates 20000 potential downscaling rules. The maximum Pareto set size is set to 50. For computational reasons and to keep the solutions readable, we furthermore limit the tree size to 5 levels. Besides predictors described above the terminal set contains also a random number generator. The function set contains the arithmetic functions with 2 input argument each and an if-statement with four input arguments (i.e., if do else do ).
Figure 6 shows the relative reduction of objectives for the solutions from the Pareto sets returned by the 8 GP runs.
Only the first three objectives, , and , are included in Figure 6.
These objectives are formulated as penalties, i.e., the smaller the objective the better.
We calculate the relative improvement for a downscaling rule concerning an objective as, ,
where is the objective when predicting no anomalies, i.e., zero everywhere.
A prediction of no anomalies corresponds to the spline-interpolated field.
The definition of the relative improvement is analogue to that of a skill score, with the objective for a perfect solution being equal to zero.
A positive indicates that the downscaled field is better than the spline-interpolated field concerning objective ; for a perfect downscaling ; when the objective is halved ; for a downscaling that is as good as the interpolated field ; for a downscaling worse than the interpolated field .
Note that , i.e., in the positive direction can not exceed 1, whereas it can grow very large in the negative direction.
Looking at Figure 6, we see that for the three different objectives, we achieve very different improvements. The relative improvement of the spans a wide range from 0 up to about 0.7 with an average of about 0.4-0.5. For the the relative improvement is in general smaller with an average of about 0.1-0.2. Unlike for the , for the there are also solutions which perform worse than the interpolated field. The is on average even increased by approximately . Only very few solutions achieve an improvement concerning the . This shows the difficulty of predicting the actual reference fields exactly. Including the as objective, however, serves as a constraint not to deviate too strongly from the reference fields. The results confirm that predicting realistic spatial variability is better attainable than an exact fit.
To study potential overfitting, Figure 7 shows the difference of the relative improvement between training and validation data set (
). For the majority of cases the median is close to zero. With the exception of May 9th 2008, the medians are spread about equally into positive and negative direction, which indicates that no serious overfitting takes place and that the downscaling rule discovered by GP are adaptable. For most of the 8 cases there are very few outliers (2-6 out of 50) for which the performance on the validation data set is clearly worse compared to the training data set. The outliers are most apparent for the . Most outliers show already a less satisfying performance on the training data sets and tend to stick out in the scatter plots (e.g., Figure 6 (a) or (c)). These solution are part of the Pareto set due to their good perform in the 4th objective, i.e., the solution size. They correspond to very small solutions and often consist of only one node (Figure 8). We seek for a reasonable tradeoff between the objectives and hence also between quality and complexity of the solutions. Thus, such extreme outliers would not be chosen from the Pareto set as the final solution.
Overfitting could be a problem for the case where May 9th 2008 is excluded for training. The clear sky conditions on May 9th 2008 led to very pronounced fine-scale structures. Probably, the exclusion of such an extreme situation led to the bad performance for this case. For this case the solutions need to extrapolate and thus perform in general worse on the validation day than on the training data set.
Besides the discussed outliers, the solution quality is only slightly dependent on the solution size. Figure 8 shows scatter plots for all four objectives for the run validated on October 14th 2007. The solution size corresponds to the abscissa in Figure 8 (d)-(f). There is only a week correlation between solution size and the other objectives. The subjective rating of the solutions indicated by the color of the points in Figure 8 is explained in the discussion.
We examine the results of a single solution from the run validated on October 14th 2007, a clear sky day. Figure 9 and 10 show the performance one downscaling rule on the validation day, at 11:00 UTC and 23:00 UTC. The daytime temperature field (Figure 9) is well predicted by both the GP based rule as well as by the (thresholded) linear regression based rule from Schomburg et al. . However looking more closely and zooming in, the fine-scale structures are seemingly better captured by the GP based rule. For the nighttime field (Figure 10) the improvement by GP more impressive: the pattern formed by nightly cold air drainage is much better captured by the GP based rule. We notice that the minima and maxima are localized more strongly in the reference field than in the downscaled field, which might be at least partly due to the unpredictable noisy component of the fine-scale fields.
The GP derived rule shown in Figures 9 and 10 is:
In the first part, surface height anomaly is multiplied by the temperature gradients , which is physically intuitive and reasonable. The second part either adds vertical wind speed or one of the topography based parameters depending on the atmospheric layering.
Note that the example solution shows some evidence for overfitting. The relative improvements of the objectives for training data set and validation day are listed in Table 4. Still the performance on the validation day appears reasonable, since no irrealistically high or low values occur.
Multi-objective GP has advanced downscaling in two directions compared to linear regression approach by Schomburg et al. [2010, 2012].
Firstly, it allows to incorporate multivariate and nonlinear relations in the search for downscaling rules and secondly, uncertainties contained in the training data set are considered by defining multiple objectives.
Nonlinear regression via rather complex equations or code can induce overfitting, especially when applied to small training data sets.
In the GP based methodology the tendency to overfit is rather small.
It is, however, important to span a wide range of weather conditions including extreme days to prevent overfitting.
Incorporating SPEA, GP does not return one single downscaling rule but 50 of quite different quality when visually comparing downscaled and reference fields. The colors of the points in Figure 8 indicate the results of a subjective evaluation of the different solutions by examination of the downscaled fields. Good, average and bad solutions can obviously not be clustered easily, but some systematics are visible. Figure 8 (d) indicates that a certain level of must be accepted in order to reasonably reproduce the fine-scale variability, because subjectively good solutions tend to perform worse concerning the than visually bad solutions. This prompts the question, if the is an appropriate objective for our aim, or if other measures are better suited. appears to be a more important objective. The better a solution concerning the , the better it is also in the subjective evaluation. Concerning the the situation is less clear, but a similar tendency exists as for the . When subjectively rating the solutions, more complex (larger) solutions tend to perform better. Thus, a minimum solution size is required to account for the complexity of the processes involved in building the fine-scale structures. However, the best solution seem not to be detectable from the objectives alone. Figures 11 show a solution we visually rate as average, a conclusion we could not draw from the objectives alone. Only looking at the objectives the two solutions seem very similar. Thus it might be necessary to reconsider the objectives we currently use, to find a better way of quantifying the similarity between the fine-scale patterns of downscaled and reference fields
We tested the errors of temporal and spatial correlation as additional objectives. However, this caused the solution size to increase considerably. Temporal correlation between consecutive time steps, as well as nearest neighbor correlation might be too strongly dependent on the noisy component of the fields and therefore might be unpredictable deterministically. The coarse scale predictors tend to be rather smooth and thus the correlation between consecutive time steps tends to be overestimated. Reproduction of spatio-temporal correlations might be more appropriately achieved by adding spatial and temporally correlated noise to the deterministic fields.
5 Conclusion and Outlook
We have introduced multi-objective GP for the discovery of downscaling rules to disaggregate meso-scale atmospheric model output to the high resolution required for driving land-surface and hydrological models.
Applying multi-objective GP to downscaling near-surface temperature, we have shown that the more complex method is an advancement compared to linear regression conditioned on indicators.
We are now capable to reproduce fine-scale patterns resulting from different atmospheric processes, e.g., produced in a well mixed boundary layer or during nighttime temperature inversions.
In general we can recommend GP as a good machine learning method for the geosciences, especially for complex problems incorporating multiple objectives where linear methods do not give satisfactory results.
Next steps will include the optimization of the set of objectives, as discussed, and the expansion of the training data set, especially to include more clear-sky days and extreme weather conditions. We will apply multi-objective GP to the remaining variables required as atmospheric input to land-surface and subsurface models, i.e., precipitation, near-surface specific humidity, near-surface wind speed, long-wave and short-wave radiation. The discovered downscaling rules will be implemented in the coupled soil-vegetation-atmosphere model TerrSysMP. Finally we will extend the system for different scale gaps, e.g., for disaggregating from 1 km to 100 m.
Multi-objective GP generates a set of Pareto optimal solution and not just one, which suggests the implementation of a downscaling ensemble by either using different rules in each ensemble run or by randomly switching between different rules taking into account temporal consistency. An ensemble approach could also help to estimate the source of uncertainty induced by the downscaling procedure.
This work has been carried out in the framework of the Transregional Collaborative Research Centre TR 32 ’Patterns in Soil-Vegetation-Atmosphere-Systems’ funded by the Deutsche Forschungsgemeinschaft (DFG).
- Baldauf et al.  Baldauf, M., Seifert, A., Förstner, J., Majewski, D., Raschendorfer, M. and Reinhardt, T. , ‘Operational convective-scale numerical weather prediction with the COSMO model: description and sensitivities’, Mon. Weather Rev. 139(12), 3887–3905.
Banzhaf et al. 
Banzhaf, W., Nordin, P., Keller, R. and Francone, F. ,
Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications (The Morgan Kaufmann Series in Artificial Intelligence), Morgan Kaufmann Publishers, San Francisco, CA, USA.
- Barr and Orgill  Barr, S. and Orgill, M. M. , ‘Influence of external meteorology on nocturnal valley drainage winds’, J. App. Meteorol. 28(6), 497–517.
- Biau et al.  Biau, G., Zorita, E., von Storch, H. and Wackernagel, H. , ‘Estimation of precipitation by kriging in the EOF space of thesea level pressure field’, J. Climate 12(4), 1070–1085.
- Carreau and Vrac  Carreau, J. and Vrac, M. , ‘Stochastic downscaling of precipitation with neural network conditional mixture models’, Water. Resour. Res. 47(10).
- Coulibaly  Coulibaly, P. , ‘Downscaling daily extreme temperatures with genetic programming’, Geophys. Res. Lett. 31(16).
- Deidda  Deidda, R. , ‘Rainfall downscaling in a space-time multifractal framework’, Water. Resour. Res 36(7), 1779–1794.
- Doms and Schättler  Doms, G. and Schättler, U. , A description of the nonhydrostatic regional model LM, Technical report, Deutscher Wetterdienst.
- Ebert  Ebert, E. E. , ‘Fuzzy verification of high-resolution gridded forecasts: a review and proposed framework’, Meteorol. Appl. 15(1), 51–64.
- Eiben and Smith  Eiben, A. and Smith, J. , Introduction to Evolutionary Computing, Springer, Berlin, Germany.
- Fiddes and Gruber  Fiddes, J. and Gruber, S. , ‘TopoSCALE v. 1.0: downscaling gridded climate data in complex terrain’, Geosci. Model Dev. 7(1), 387–405.
- Fowler et al.  Fowler, H. J., Blenkinsop, S. and Tebaldi, C. , ‘Linking climate change modelling to impacts studies: recent advances in downscaling techniques for hydrological modelling’, Int. J. Climatol. 27(12), 1547–1578.
- Friederichs and Paeth  Friederichs, P. and Paeth, H. , ‘Seasonal prediction of african precipitation with ECHAM4-T42 ensemble simulations using a multivariate MOS re-calibration scheme’, Clim. Dynam. 27(7-8), 761–786.
- Ghorbani et al.  Ghorbani, M. A., Khatibi, R., Aytek, A., Makarynskyy, O. and Shiri, J. , ‘Sea water level forecasting using genetic programming and comparing the performance with artificial neural networks’, Comput. Geosci. 36(5), 620–627.
- Hashmi et al.  Hashmi, M. Z., Shamseldin, A. Y. and Melville, B. W. , ‘Statistical downscaling of watershed precipitation using gene expression programming (gep)’, Environ. Modell. Softw. 26(12), 1639–1646.
- Izadifar and Elshorbagy  Izadifar, Z. and Elshorbagy, A. , ‘Prediction of hourly actual evapotranspiration using neural networks, genetic programming, and statistical models’, Hydrol. Process. 24(23), 3413–3425.
- Kollet and Maxwell  Kollet, S. J. and Maxwell, R. M. , ‘Integrated surface–groundwater flow modeling: A free-surface overland flow boundary condition in a parallel groundwater flow model’, Adv. Water Resour. 29(7), 945–958.
- Koza  Koza, J. , Genetic Programming, On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA, USA.
- Liong et al.  Liong, S.-Y., Gautam, T. R., Khu, S. T., Babovic, V., Keijzer, M. and Muttil, N. , ‘Genetic programming: A new paradigm in rainfall runoff modeling1’, J. Am. Water Resour. As. 38(3), 705–718.
- Liu et al.  Liu, X., Coulibaly, P. and Evora, N. , ‘Comparison of data-driven methods for downscaling ensemble weather forecasts.’, Hydro. Earth Syst. Sc. 12(2).
- Luke et al.  Luke, S., Panait, L. et al. , Lexicographic parsimony pressure., in ‘GECCO’, Vol. 2, pp. 829–836.
- Malone et al.  Malone, B. P., McBratney, A. B., Minasny, B. and Wheeler, I. , ‘A general method for downscaling earth resource information’, Comput. Geosci. 41, 119–125.
- Maraun et al.  Maraun, D., Wetterhall, F., Ireson, A. M., Chandler, R. E., Kendon, E. J., Widmann, M., Brienen, S., Rust, H. W., Sauter, T., Themeßl, M. et al. , ‘Precipitation downscaling under climate change: recent developments to bridge the gap between dynamical models and the end user’, Rev. Geophy. 48(3).
- Oleson et al.  Oleson, K. W., Dai, Y., Bonan, G., Bosilovich, M., Dickinson, R., Dirmeyer, P., Hoffman, F., Houser, P., Levis, S., Niu, G.-Y. et al. , Technical description of the community land model (CLM), Technical report, NCAR Technical Note, National Center for Atmospheric Research, Boulder, CO, USA.
- Parasuraman et al.  Parasuraman, K., Elshorbagy, A. and Carey, S. K. , ‘Modelling the dynamics of the evapotranspiration process using genetic programming’, Hydrolog. Sci. J. 52(3), 563–578.
- Schlünzen and Katzfey  Schlünzen, K. H. and Katzfey, J. J. , ‘Relevance of sub-grid-scale land-use effects for mesoscale models’, Tellus A 55(3), 232–246.
- Schomburg et al.  Schomburg, A., Venema, V., Ament, F. and Simmer, C. , ‘Disaggregation of screen-level variables in a numerical weather prediction model with an explicit simulation of subgrid-scale land-surface heterogeneity’, Meteorol. Atmos. Phys. 116(3-4), 81–94.
- Schomburg et al.  Schomburg, A., Venema, V., Lindau, R., Ament, F. and Simmer, C. , ‘A downscaling scheme for atmospheric variables to drive soil–vegetation–atmosphere transfer models’, Tellus B 62(4), 242–258.
- Schoof and Pryor  Schoof, J. T. and Pryor, S. , ‘Downscaling temperature and precipitation: A comparison of regression-based methods and artificial neural networks’, Int. J. of Climatol. 21(7), 773–790.
- Shrestha et al.  Shrestha, P., Sulis, M., Masbou, M., Kollet, S. and Simmer, C. , ‘A scale-consistent terrestrial systems modeling platform based on COSMO, CLM and ParFlow’, Mon. Weather Rev. (2014).
- Silva and Almeida  Silva, S. and Almeida, J. , GPLAB-a genetic programming toolbox for MATLAB, in ‘Proceedings of the Nordic MATLAB conference’, Citeseer, pp. 273–278.
- Simon et al.  Simon, T., Hense, A., Su, B., Jiang, T., Simmer, C. and Ohlwein, C. , ‘Pattern-based statistical downscaling of east asian summer monsoon precipitation’, Tellus A 65.
- Thorarinsdottir et al.  Thorarinsdottir, T. L., Gneiting, T. and Gissibl, N. , ‘Using proper divergence functions to evaluate climate models’, Journal on Uncertainty Quantification 1(1), 522–534.
- Vereecken et al.  Vereecken, H., Kollet, S. and Simmer, C. , ‘Patterns in soil-vegetation-atmosphere systems: Monitoring, modeling, and data assimilation’, Vadose Zone J. 9(4), 821–827.
- von Storch et al.  von Storch, H., Hewitson, B. and Mearns, L. , Review of empirical downscaling techniques, Technical report, Regional climate development under global warming.
- Wagner et al.  Wagner, D., Ruprecht, E. and Simmer, C. , ‘A combination of microwave observations from satellites and an eof analysis to retrieve vertical humidity profiles over the ocean’, J. App. Meteorol. 29(11), 1142–1157.
- Wilby et al.  Wilby, R. L., Dawson, C. W. and Barrow, E. M. , ‘SDSM a decision support tool for the assessment of regional climate change impacts’, Environ. Model. Softw. 17(2), 145–157.
- Wilby and Wigley  Wilby, R. L. and Wigley, T. M. L. , ‘Downscaling general circulation model output: a review of methods and limitations’, Prog. Phys. Geog. 21(4), 530–548.
- Xu  Xu, C.-Y. , ‘From GCMs to river flow: a review of downscaling methods and hydrologic modelling approaches’, Prog. Phys. Geog. 23(2), 229–249.
Zitzler and Thiele 
Zitzler, E. and Thiele, L. , ‘Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach’,Evol. Comput. 3(4), 257–271.