Abstract
The coupling of models for the different components of the SoilVegetationAtmosphereSystem is required to investigate component interactions and feedback processes. However, the component models for atmosphere, landsurface and subsurface are usually operated at different resolutions in space and time owing to the dominant processes. The computationally often more expensive atmospheric models, for instance, are typically employed at a coarser resolution than landsurface and subsurface models. Thus up and downscaling procedures are required at the interface between the atmospheric model and the landsurface/subsurface models. We apply multiobjective Genetic Programming (GP) to a training data set of highresolution atmospheric model runs to learn equations or short programs that reconstruct the finescale fields (e.g., 400 m resolution) of the nearsurface atmospheric state variables from the coarse atmospheric model output (e.g., 2.8 km resolution). Like artificial neural networks, GP can flexibly incorporate multivariate and nonlinear relations, but offers the advantage that the solutions are human readable and thus can be checked for physical consistency. Using the Strength Pareto Approach for multiobjective fitness assignment allows us to consider multiple characteristics of the finescale fields during the learning procedure.
1 Introduction
The mass and energy fluxes at the interception between soil and atmosphere significantly impact processes in the atmosphere, at the landsurface and in the subsurface.
The Transregional Collaborative Research Centre 32 on ’Patterns in SoilVegetationAtmosphereSystems’ aims at a better understanding of the interplay of processes, patterns, and structures between the different components at different scales in space and time [Vereecken et al., 2010].
To account for interactions and feedbacks between the different components of the soilvegetationatmosphere system, coupled modeling systems are increasingly used (e.g., TerrSysMP by Shrestha et al. [2014]).
When coupling different component models a scale gap occurs:
modeling landsurface and subsurface typically requires higher resolutions, while atmospheric models are often computationally too expensive to be employed at the same high resolution.
The lower atmospheric boundary layer (ABL) is usually very heterogeneous due to the production of turbulent kinetic energy by shear and buoyancy effects, even more so over heterogeneous terrain.
The representation of the spatiotemporal variability in the lower ABL is very important, because the fluxes of matter and energy are driven primarily by the local differences of state variables at the surface and the lowermost atmosphere.
Many processes at the earth’s surface are nonlinear, e.g., processes such as runoff production or snow melt, which are threshold dependent, or the turbulent exchange coefficients, which are nonlinear functions of the nearsurface atmospheric stability [Schomburg et al., 2010].
Modeling exchange processes using averaged parameters or state variables can introduce biases [Schlünzen and Katzfey, 2003].
To avoid such biases a downscaling algorithm is needed that reproduces the atmospheric variability near the surface and in order to provide reasonable driving data for landsurface and hydrological models.
Downscaling methods have initially been developed for general circulation models (GCMs).
Various review articles summarize the large number of downscaling methods existing nowadays [Wilby and Wigley, 1997; Xu, 1999; von Storch et al., 2000; Fowler et al., 2007; Maraun et al., 2010].
The general idea behind downscaling is to model the variables on a smaller based on the large scale model output. In dynamical downscaling a limited area model at highresolution is applied. Dynamical downscaling is computationally expensive, but it provides physically consistent spatial fields on the smaller scale. Statistical downscaling is computationally less demanding: regression models, computationally probably least expensive, employ a socalled transfer function between the largescale predictors and the localscale predictant. Examples of regression approaches for downscaling incorporate for instance multiple linear regression, principal component analysis (e.g.,
Wagner et al. [1990] or Simon et al. [2013]), canonical correlation analysis (e.g., Friederichs and Paeth [2006]), artificial neural networks (e.g., Schoof and Pryor [2001] or Carreau and Vrac [2011]) or kriging (e.g., Biau et al. [1999]). For precipitation downscaling also fractals are used (e.g., Deidda [2000]). Weather generators (WGs) can be seen as an expansion of the regression models, as WGs not only try to estimate the mean but also the variance. Finally, weather typing or classification schemes relate synoptically or statistically defined weather classes to the local climate. The local variables are modeled conditioned on the weather classes.
Fiddes and Gruber [2014] recently introduced TopoSCALE to downscale GCM data to scales of 100 m. TopoSCALE is a physically based downscaling scheme designed for regions with complex terrain, where vertical gradients of the atmospheric variables often dominate over horizontal gradients. A recent study by Malone et al. [2012] presents a downscaling methodology where the uncertainty of the target variable is taken into account when fitting a nonlinear regression model.
In general, downscaling of mesoscale model output is much less common than the downscaling of GCM output, but most concepts on the regional climate scale can be adapted to the smaller scales. As we aim at a downscaling algorithm applicable within a coupled modeling system, methods with low computational costs are preferable. Schomburg et al. [2010, 2012]
have presented a threestep downscaling scheme combining a biquadratic splineinterpolation, deterministic rules based on linear regression and autocorrelated noise. The scheme works well for certain variables and weather conditions, but a linear regression appears to be insufficient under more complex conditions. In this study we apply an advanced regression approach taking into account multivariate and nonlinear relations. Artificial neural networks (ANNs) are widely used for downscaling, but we take an alternative approach, Genetic Programming (GP). Like ANNs, GP can handle multivariate and nonlinear relations, but with the advantage that the solutions take the form of an equation or program code, which is human readable and thus can be checked for physical consistency.
GP is a machine learning method from the area of evolutionary computation
[Banzhaf et al., 1997; Koza, 1992]. As the name implies such methods are inspired by the concept of natural evolution. Given a training data set and a fitness measure that quantifies the quality of a candidate solution (i.e., a downscaling rule) GP automatically generates solutions. GP typically starts by randomly generating an initial population of candidate solutions. Each following generation is created by mutating and recombining the candidate solutions from the previous generation with the best solutions most likely contributing to the new generation.In atmospheric and related sciences GP is applied rather rarely. Liong et al. [2002], for instance, applied GP for rainfallrunoff modeling. Parasuraman et al. [2007] and Izadifar and Elshorbagy [2010] employed GP for modeling evapotranspiration or latent heat from meteorological variables. Ghorbani et al. [2010] used GP for sea water level forecasting. In these four studies GP performed reasonably well. Three of these studies compared the performance of GP and ANNs. Both methods preformed about equally well, which underlines that GP is an eligible alternative to ANNs, which are more popular in geosciences.
Only very few studies apply methods from evolutionary computation to atmospheric downscaling. To our current knowledge all of them deal with downscaling of GCM output to a station or catchment mean. Coulibaly [2004] utilized GP for simulating daily extreme temperature at a weather station located in the ChuteduDiable basin in Northeastern Canada. GP yielded good results, outperforming the widely used Statistical DownScaling Model (SDSM) by Wilby et al. [2002]. Hashmi et al. [2011]
employed Gene Expression Programming (GEP), which is a variant of GP, for downscaling watershed precipitation in the Clutha River watershed on New Zealand’s South Island. In both studies, GP/GEP not only performed better than the SDSM, but also required fewer predictor variables.
Liu et al. [2008] compared the performance of a GP based method (evolutionary polynomial regression) with a feed forward neural net and the SDSM for downscaling daily total precipitation and daily maximum and minimum temperature in the ChuteduDiable basin. Both nonlinear machine learning methods performed about equally well and better than the SDSM, especially for precipitation.We aim at simulating the temporal evolution of spatial fields not a time series at a single grid point, i.e., we have to consider three dimensions, two in space and one in time. We derive deterministic rules relating the finescale patterns to coarse model output and highresolution information on surface properties. Initially, we attempted to reproduce the training data fields as exactly as possible, but we quickly realized that this approach might not be appropriate for our problem. An exact fit of the highresolution fields is not possible, because of the inevitable unpredictable component that a downscaling procedure can not reproduce. Furthermore, discretization, i.e., the representation of a continuous system on a discrete grid, always induces uncertainty. However, for an improved representation of the mass and energy fluxes at the surface, modeling the subgrid scale properties statistically might be sufficient. A fuzzy approach for defining the fitness measure seems more appropriate than taking the exact resemblance to the highresolution model output fields as criterion. We thus adjusted our objective and seek to retrieve realistic spatiotemporal patterns on the finescale rather than reproducing the exact model output fields. This task is very similar to the verification of spatial fields.
For verification of forecasts from highresolution numerical models fuzzy, neighborhoodbased methods have already been developed during the last decades (e.g., Ebert [2008]). In the fuzzy approach it is rewarded when a model forecast is close to the observation in patterns and structures, which can be achieved by loosening the requirements for exact matches between forecast and observation comparing neighborhoods of forecast and observation points. The Strength Pareto Approach allows the use of more than a single measure to quantify the quality of a potential downscaling rule and offers the liberty to chose the best compromise between the different objectives in hindsight.
This article is structured as follows. After introducing the data (Section 2), the downscaling methodology based on multiobjective GP is explained in detail (Section 3) and tested for the disaggregation of nearsurface temperature (Section 4). Finally, we conclude and give an outlook on future work (Section 5).
2 Background and Data
Our GP based downscaling is developed in order to be applied in a modeling platform, such as TerrSysMP by Shrestha et al. [2014], which couples landsurface and atmospheric models operating on different scales.
TerrSysMP is composed of the atmospheric model COSMO [Doms and Schättler, 2002; Baldauf et al., 2011], landsurface model CLM [Oleson et al., 2004] and hydrological model ParFlow [Kollet and Maxwell, 2006].
The three component models are coupled via the external coupler OASIS, which contains the downscaling scheme by Schomburg et al. [2010].
The scheme consists of three steps:
(1) a biquadratic spline interpolation interpolates the coarseresolution atmospheric data to a higher resolution while conserving the coarse field’s mean and gradients;
(2) socalled ’deterministic’ downscaling rules are applied;
(3) temporal autoregressive noise is added to restore the finescale variance missing after step 1 and 2.
The deterministic rules applied in step 2 exploit empirical relations between atmospheric variables and surface properties.
Two rules were derived from physical considerations:
near surface pressure is downscaled by exploiting the hydrostatic equation;
under cloudfree conditions, shortwave net radiation can be downscaled using surface albedo as predictor.
In case there is no known physical relation the training data set was examined for possible statistical correlations between highresolution nearsurface atmospheric fields and landsurface properties.
A rule detection algorithm that calculates Pearson correlation coefficients for subsets of the training data was implemented, with the subsets selected based on indicators, e.g., cloud cover, to distinguish clear sky conditions from others.
The algorithm detected two applicable rules:
for unstable atmospheres, nearsurface temperature can be downscaled using orographic height as a linear predictor;
for situations with cloud cover , longwave net radiation can be disaggregated using ground temperature as linear predictor.
Under more complex conditions and for other nearsurface variables (specific humidity, wind speed and precipitation) the algorithm could not find any deterministic downscaling rules applicable to a sufficiently large subset of the training data.
The algorithm by Schomburg et al. [2010] can only detect linear relations and is limited to dependencies between two variables.
The aim of this study is to improve the downscaling using a more flexible approach, which is able to model more complex nonlinear and multivariate relations.
This flexibility is achieved employing Genetic Programming (GP).
We use the same training data set as Schomburg et al. [2010], which consists oft highresolution (400 m) COSMO simulations on a domain of 168 km 168 km in western Germany.
The domain is centered over the Rur catchment, which is the main investigation area of TR 32.
The data set contains hourly output for 8 simulation periods of 1 to 2 days with different weather conditions (Table 2).
From the simulations, we extract the inner 112 km 112 km, i.e., 280 280 grid points to exclude boundary effects.
The downscaling scheme by Schomburg et al. [2010] has been trained for disaggregation from 2.8 km resolution down to 400 m and consists of three steps.
In this study we consider the same scales, i.e., we train on a downscaling by a factor of 7.
3 Methods
As motivated in the introduction, we seek to reproduce realistic spatiotemporal patterns of the nearsurface atmospheric state variables, not necessarily the exact highresolution model output.
To this goal, we formulate the downscaling problem as a multiobjective optimization problem, in order to consider multiple characteristics of the finescale fields, for instance the spatially distributed variance.
Summing up the different objectives to be minimized is, however, problematic since the objectives may have different units and ranges.
A scaling procedure would be required, but the risk of treating the objectives unequally or getting trapped in a local minimum would remain.
The Strength Pareto Approach (SPEA) by Zitzler and Thiele [1999] provides a solution to this problem.
After giving a general introduction to GP, we introduce the concept of Pareto optimality SPEA is based on.
Finally, we introduce multiobjective GP step by step.
Large parts of our are based on the GPLAB package for Matlab by Silva and Almeida [2003].
3.1 Genetic Programming
’Genetic Programming addresses the problem of automatic programming, namely, the problem of how to enable a computer to do useful things without instructing it, step by step on how to do it’ (J. Koza in Banzhaf et al. [1997]).
GP is one of several methods within the area of evolutionary computation.
As the name implies these methods are inspired by the concept of natural evolution.
In nature an individual is exposed to environmental pressure.
Its chance to survive, reproduce and consequently contribute to the next generation is dependent on its fitness with respect to the environmental conditions.
In evolutionary computation the problem to be solved corresponds to environmental pressure.
GP evolves a solution to the problem in a similar manner as evolution happens in nature, i.e., a solution is developed over several generations, each consisting of a large number of candidate solutions.
The candidate solutions, also called individuals in analogy to the evolution terminology, are composed of program code.
The candidate solutions forming the initial generation are automatically and often randomly generated and then tested on the given problem.
Each succeeding generation is evolved by applying socalled genetic operators, which recombine and modify individuals from the preceding (parent) generation.
The better a candidate solutions solves the given problem, the greater its chance is to contribute to the new generation [Eiben and Smith, 2003].
Genetic Programming differs from other methods of evolutionary computing by the representation of the individuals.
In GP the individuals are traditionally represented by parse trees.
Figure 1 shows a simple example of a parse tree consisting of 7 nodes arranged on 4 levels.
The parse tree in Figure 1 embodies an equation consisting of arithmetic functions, variables and one constant.
A parse tree is read starting from the bottom, i.e., in figure 1 and serve as inputs arguments to the subtraction, the output () and serve as input arguments to the multiplication and so on.
In general a parse tree consists of functions and socalled terminals.
The set of functions and terminals used in a GP run is typically defined by the user and adapted to the problem to be solved.
A function set can contain all kinds of functions, for instance
arithmetic functions (),
transcendental functions (,…),
or conditional statements (if then else,…) [Banzhaf et al., 1997].
The terminals have their name, because they do not have input arguments and thus terminate the branches of the tree.
Terminals serve as fundamental input to the functions.
Thus, a terminal set can include variables, constants and zeroargument functions, such as a random number generator.
3.2 Pareto Optimality
The term Pareto optimality originates in economics.
The state of an economic system is called Pareto optimal when economic resources are distributed in such a way that it is impossible to improve the situation of one person without detereorating the situation of at least one other person.
For optimization problems that involve multiple, sometimes conflicting objectives there is possibly not one single optimal solution.
Usually there exists a set of alternative solutions in which no solution is optimal in the sense that it is superior to all other solutions when considering all objectives.
The multiple objectives correspond to different quality criteria of the desired solution.
We denote the objective space containing all objective functions as and the solution space containing all potential solutions as .
An objective is in general calculated by comparing prediction (downscaled fields) and reference (highresolution model output) .
The prediction results from the solution (downscaling rule)
being applied to the vector of predictors
x. Thus, incorporating all dependencies, we can write . For simplicity in the following we only include the dependency on the solution explicitly.Be the objective vector, i.e., the vector containing all objectives. Then the multiobjective minimization problem can then be written as,
(1) 
Let us consider two solutions . The solution is said to dominate () if and only if
(2) 
Or in words, dominates if is as at least as good as with respect to all objectives, and there exists at least one objective where is better than .
The solution is said to cover () if or , i.e., either dominates or they both perform equally well concerning all objectives.
The solutions that are not dominated by any of the elements in the solution space are called Pareto optimal.
Figure 3 shows an example of a minimization problem with two objectives.
The squares constitute the set of Pareto optimal solutions, the circles correspond to the nonoptimal solutions.
3.3 Multiobjective Genetic Programming
For multiobjective fitness assignment we have implemented the Strength Pareto Approach (SPEA) by Zitzler and Thiele [1999]. SPEA requires two main changes to traditional GP. Firstly, each generation is split into two sets, called populations and . The population is evolving over time as in traditional GP, whereas the second population , the socalled Pareto set, contains all Pareto optimal solutions. Secondly, the fitness calculation for individuals in both populations and is based on a comparison between the individuals rather than on absolute values.
3.3.1 Preparation
GP requires a training data set to learn from, function and terminal sets to build the candidate solutions, and fitness measures (objectives) to quantify the quality of the candidate solutions. Additionally, the user can set controlling parameters, such as population size, maximum size of the Pareto set, maximum tree size or the probability to select the various genetic operators. Finally, a termination criterion that stops the evolution must be provided, for instance a maximum number of generations.
3.3.2 Execution
Figure 2 shows a basic flowchart of the GP algorithm incorporating SPEA, which we explain in the following.
(1) An initial population of candidate solutions (individuals) is generated.
The initial population can be created randomly or include known approximate solutions of the given problem.
(2) Each candidate solution (each potential downscaling rule) is applied to the training data set.
(3) The result of each candidate solution is evaluated according to the objectives.
(4) The Pareto set is updated:
All individuals in population that are nondmiated within are moved to the Pareto set .
The individuals in that are covered by another member of are removed from the Pareto set.
In case the number of individuals stored in exceeds the given maximum,
is pruned by hierarchical clustering. A clustering procedure helps to preserve the solution diversity while shrinking the Pareto set. To make sure that all objectives are considered equally, we scale the values before applying the clustering procedure. As in our case, each objective
is positively defined, we calculate the scaled objective as , i.e., from the objective we subtract the minimum and divide the result by the maximum occurring in the current Pareto set. The fitness results from a greater than/smaller than comparison, i.e., scaling does not effect the fitness. Note that the scaling is only applied within the clustering step.(5) The fitness of each individual in and is calculated according to SPEA by comparing the individual’s performances. (For details see Section 3.3.3)
(6) If the termination criterion is met, the final Pareto set is returned. If the termination criterion is not met, the algorithm continues with (7).
(7) The next generation is created by combining and mutating individuals from the current . The creation of the new generation can be split into two steps. First, a sampling procedure is applied to determine the parents. Second, genetic operators (crossover, mutation) are applied to create new individuals.
(7.1) For sampling we use the lexicographic parsimony pressure [Luke et al., 2002] as it is implemented in GPLAB. A number of individuals is randomly drawn from the current . The individual drawn with the best fitness is to become parent. In case several individuals are equally fit the smallest one is chosen.
(7.2) The genetic operators are applied to the individuals. Crossover recombines two parents. The parent parse trees are cut at a randomly chosen node and the separated subtrees are exchanged. (Subtree)mutation cuts a randomly chosen subtree from the parent and replaces it by a new randomly created subtree. Parent selection and application of genetic operators, are repeated until enough individuals for the new generation have been created.
Starting from (2) the succeeding steps are iteratively repeated until the termination criterion is met (see (6)).
3.3.3 Fitness Assignment
In SPEA the fitness assignment consists of two steps.
(1) Each solution in the Pareto set, , is assigned a real value called fitness .
The fitness is proportional to the number of individuals that are covered by , i.e., .
Let be the total number of individuals in .
Then is defined as .
To clearly separate between the fitness of individuals in and , the fitness of the individuals in is also called strength, hence Strength Pareto Approach.
(2) The fitness of an individual in the population , , is calculated as the sum over the fitness of all individuals in the Pareto set, , that cover ,
, where .
One is added to the sum to ensure the individuals in the Pareto set have better fitness than those in .
Figure 3 shows one possible scenario of a minimization problem with two objectives and .
The values indicate the fitness.
The circles correspond to the individuals in , the squares to the individuals in .
The lowest point in Figure 3 shows an individual contained in the Pareto set that dominates 3 out of the 7 individuals in .
Therefore its fitness equals .
The next lowest point represents an individual from the population , which is dominated only by one individual with a fitness of .
Hence its fitness calculates as .
3.4 Objectives
Our downscaling aims at reproducing the finescale patterns, i.e., reproducing as much of the deterministic part as possible, while accounting for spatial noise or uncertainty.
Combining both aspects makes the original and the downscaled fields ’look similar’.
Note that we aim at predicting anomalies, i.e., the difference between the highresolution and the splineinterpolated fields.
In the following this is not always explicitly mentioned.
One possibility to make two fields look similar is to make their values similar in distribution.
Thus our first objective is to minimize the differences in distribution between the downscaled and the reference values (anomalies) thereby completely ignoring the location of the values.
This difference is measured by the integrated quadratic distance .
The
quantifies the distance between two cumulative distribution functions (
) and [Thorarinsdottir et al., 2013],(3) 
We apply the to the discrete histogram distributions. The histogram distributions are calculated using a bin width of 0.25 K. Let be the normalized histogram distribution for the highresolution reference field at time step , and accordingly the normalized histogram distribution for corresponding the downscaled field. A single histogram bin is denoted by . From the histogram distributions, we calculate for each field (time step) separately (Figure 4) and take the mean over all time steps as objective, i.e.,
(4) 
where denotes the number of fields contained in the reference data set.
The deterministic part is accounted for by using the root mean square error () as objective.
The provides an estimate of the expected value, and smooths out those patterns of the fine scale fields that are not deterministically reproducible.
To account for some uncertainty we introduce a fuzzy method.
In our application we account for uncertainty in the location of the finescale structures by defining a neighborhood
for each pixel .
This neighborhood contains the four direct neighbors of (Figure 4).
We incorporate the neighborhood in the calculation of the as follows:
(5) 
Here, and denote the total number of pixels in x and ydirection.
As noted, the spatial variability of the nondeterministic part is smoothed out by minimizing the
. In order to recover variability, we use the mean error of the coarse pixel standard deviation
as third objective. Let denote a pixel on the coarse scale containing pixels on the finer scale (Figure 4). Note that unlike , is not defined as a sliding neighborhood, but via a fixed grid, i.e., the grid of the coarse model output to be downscaled, due to computational reasons.The coarse pixel standard deviation of the reference field calculates as
(6) 
with denoting the coarse pixel mean. Since we are dealing with anomalies . We can now define the third objective as
(7) 
The size of the solution’s parse trees, i.e., the number of nodes, serves as the fourth objective.
Smaller parse trees can be checked for physical consistency more easily and are computationally less expensive.
Incorporating the size as objective gives furthermore information on a potential dependency between the quality and the complexity of the solutions.
In summery, we have three objectives quantifying the quality of the potential downscaling rules by considering different characteristics of the finescale fields at different scales, and one objective incorporating the solution’s complexity.
4 Downscaling nearsurface temperature
We apply multiobjective GP to downscaling of nearsurface temperature at 10 m height, which is the center of the lowest atmospheric layer in the COSMO model. Highresolution temperature fields can exhibit very different finescale patterns. In a well mixed ABL, temperature anomalies are in good approximation proportional to orographic height anomalies [Schomburg et al., 2010]. Under clear sky conditions during night time, temperature inversions cause cold air to drain into the valleys [Barr and Orgill, 1989], which leads to pronounced channel structures in the temperature field. The anomalies caused by cold air drainage can grow very large compared to the anomalies in a wellmixed boundary layer (Figure 5). Therefore, it is important to also capture this more complex process in the downscaling algorithm.
4.1 Setup
To test multiobjective GP for atmospheric downscaling, we use a crossvalidation approach (i.e., leave one out) for three reasons.
First, it allows us to use a training data set that is as big as possible in each GP run.
Second, it allows us to check for overfitting, a well known problem especially when training on a small data set.
With a crossvalidation approach, we can identify weather conditions for which our solutions can not be generalized.
Third, we gain information on the impact of the training data choice on the solutions by considering the similarity of GP solutions resulting from different training data sets.
As mentioned in Section 2 the complete data set contains 8 simulation periods (Table 2).
For computational reasons we have extracted single time steps for training and validating GP.
We have carried out 8 GP runs in total.
In each run another day is left out for validation.
The potential predictors included in the terminal set are preselected from the COSMO model output based on our understanding of atmospheric processes, which influence nearsurface temperature.
The coarsescale fields we pick are nearsurface temperature, vertical temperature gradients of the lowest 25 m, 60 m, and 110 m as indicators for atmospheric stability, horizontal and vertical wind speed, and net radiation at the land surface.
Finescale information is contained in the parameters describing the surface properties.
We pick topographic height, plant cover, surface roughness length and surface albedo, as well as a few parameters derived from the topography field.
The latter give information on local topography relative to its direct surroundings.
All predictors are listed in Table 1.
The essential GP settings are summarized in Table 3.
We run 200 generations with 100 individuals each, i.e., each run evaluates 20000 potential downscaling rules.
The maximum Pareto set size is set to 50.
For computational reasons and to keep the solutions readable, we furthermore limit the tree size to 5 levels.
Besides predictors described above the terminal set contains also a random number generator.
The function set contains the arithmetic functions with 2 input argument each and an ifstatement with four input arguments (i.e., if do else do ).
4.2 Results
Figure 6 shows the relative reduction of objectives for the solutions from the Pareto sets returned by the 8 GP runs.
Only the first three objectives, , and , are included in Figure 6.
These objectives are formulated as penalties, i.e., the smaller the objective the better.
We calculate the relative improvement for a downscaling rule concerning an objective as, ,
where is the objective when predicting no anomalies, i.e., zero everywhere.
A prediction of no anomalies corresponds to the splineinterpolated field.
The definition of the relative improvement is analogue to that of a skill score, with the objective for a perfect solution being equal to zero.
A positive indicates that the downscaled field is better than the splineinterpolated field concerning objective ; for a perfect downscaling ; when the objective is halved ; for a downscaling that is as good as the interpolated field ; for a downscaling worse than the interpolated field .
Note that , i.e., in the positive direction can not exceed 1, whereas it can grow very large in the negative direction.
Looking at Figure 6, we see that for the three different objectives, we achieve very different improvements.
The relative improvement of the spans a wide range from 0 up to about 0.7 with an average of about 0.40.5.
For the the relative improvement is in general smaller with an average of about 0.10.2.
Unlike for the , for the there are also solutions which perform worse than the interpolated field.
The is on average even increased by approximately .
Only very few solutions achieve an improvement concerning the .
This shows the difficulty of predicting the actual reference fields exactly.
Including the as objective, however, serves as a constraint not to deviate too strongly from the reference fields.
The results confirm that predicting realistic spatial variability is better attainable than an exact fit.
To study potential overfitting, Figure 7 shows the difference of the relative improvement between training and validation data set (
). For the majority of cases the median is close to zero. With the exception of May 9th 2008, the medians are spread about equally into positive and negative direction, which indicates that no serious overfitting takes place and that the downscaling rule discovered by GP are adaptable. For most of the 8 cases there are very few outliers (
26 out of 50) for which the performance on the validation data set is clearly worse compared to the training data set. The outliers are most apparent for the . Most outliers show already a less satisfying performance on the training data sets and tend to stick out in the scatter plots (e.g., Figure 6 (a) or (c)). These solution are part of the Pareto set due to their good perform in the 4th objective, i.e., the solution size. They correspond to very small solutions and often consist of only one node (Figure 8). We seek for a reasonable tradeoff between the objectives and hence also between quality and complexity of the solutions. Thus, such extreme outliers would not be chosen from the Pareto set as the final solution.Overfitting could be a problem for the case where May 9th 2008 is excluded for training. The clear sky conditions on May 9th 2008 led to very pronounced finescale structures. Probably, the exclusion of such an extreme situation led to the bad performance for this case. For this case the solutions need to extrapolate and thus perform in general worse on the validation day than on the training data set.
Besides the discussed outliers, the solution quality is only slightly dependent on the solution size. Figure 8 shows scatter plots for all four objectives for the run validated on October 14th 2007. The solution size corresponds to the abscissa in Figure 8 (d)(f). There is only a week correlation between solution size and the other objectives. The subjective rating of the solutions indicated by the color of the points in Figure 8 is explained in the discussion.
We examine the results of a single solution from the run validated on October 14th 2007, a clear sky day. Figure 9 and 10 show the performance one downscaling rule on the validation day, at 11:00 UTC and 23:00 UTC. The daytime temperature field (Figure 9) is well predicted by both the GP based rule as well as by the (thresholded) linear regression based rule from Schomburg et al. [2010]. However looking more closely and zooming in, the finescale structures are seemingly better captured by the GP based rule. For the nighttime field (Figure 10) the improvement by GP more impressive: the pattern formed by nightly cold air drainage is much better captured by the GP based rule. We notice that the minima and maxima are localized more strongly in the reference field than in the downscaled field, which might be at least partly due to the unpredictable noisy component of the finescale fields.
The GP derived rule shown in Figures 9 and 10 is:
if
then
else
+ if
then if
then
else
else
In the first part, surface height anomaly is multiplied by the temperature gradients , which is physically intuitive and reasonable. The second part either adds vertical wind speed or one of the topography based parameters depending on the atmospheric layering.
Note that the example solution shows some evidence for overfitting. The relative improvements of the objectives for training data set and validation day are listed in Table 4. Still the performance on the validation day appears reasonable, since no irrealistically high or low values occur.
4.3 Discussion
Multiobjective GP has advanced downscaling in two directions compared to linear regression approach by Schomburg et al. [2010, 2012].
Firstly, it allows to incorporate multivariate and nonlinear relations in the search for downscaling rules and secondly, uncertainties contained in the training data set are considered by defining multiple objectives.
Nonlinear regression via rather complex equations or code can induce overfitting, especially when applied to small training data sets.
In the GP based methodology the tendency to overfit is rather small.
It is, however, important to span a wide range of weather conditions including extreme days to prevent overfitting.
Incorporating SPEA, GP does not return one single downscaling rule but 50 of quite different quality when visually comparing downscaled and reference fields.
The colors of the points in Figure 8 indicate the results of a subjective evaluation of the different solutions by examination of the downscaled fields.
Good, average and bad solutions can obviously not be clustered easily, but some systematics are visible.
Figure 8 (d) indicates that a certain level of must be accepted in order to reasonably reproduce the finescale variability, because subjectively good solutions tend to perform worse concerning the than visually bad solutions.
This prompts the question, if the is an appropriate objective for our aim, or if other measures are better suited.
appears to be a more important objective.
The better a solution concerning the , the better it is also in the subjective evaluation.
Concerning the the situation is less clear, but a similar tendency exists as for the .
When subjectively rating the solutions, more complex (larger) solutions tend to perform better.
Thus, a minimum solution size is required to account for the complexity of the processes involved in building the finescale structures.
However, the best solution seem not to be detectable from the objectives alone.
Figures 11 show a solution we visually rate as average, a conclusion we could not draw from the objectives alone.
Only looking at the objectives the two solutions seem very similar.
Thus it might be necessary to reconsider the objectives we currently use, to find a better way of quantifying the similarity between the finescale patterns of downscaled and reference fields
We tested the errors of temporal and spatial correlation as additional objectives.
However, this caused the solution size to increase considerably.
Temporal correlation between consecutive time steps, as well as nearest neighbor correlation might be too strongly dependent on the noisy component of the fields and therefore might be unpredictable deterministically.
The coarse scale predictors tend to be rather smooth and thus the correlation between consecutive time steps tends to be overestimated.
Reproduction of spatiotemporal correlations might be more appropriately achieved by adding spatial and temporally correlated noise to the deterministic fields.
5 Conclusion and Outlook
We have introduced multiobjective GP for the discovery of downscaling rules to disaggregate mesoscale atmospheric model output to the high resolution required for driving landsurface and hydrological models.
Applying multiobjective GP to downscaling nearsurface temperature, we have shown that the more complex method is an advancement compared to linear regression conditioned on indicators.
We are now capable to reproduce finescale patterns resulting from different atmospheric processes, e.g., produced in a well mixed boundary layer or during nighttime temperature inversions.
In general we can recommend GP as a good machine learning method for the geosciences, especially for complex problems incorporating multiple objectives where linear methods do not give satisfactory results.
Next steps will include the optimization of the set of objectives, as discussed, and the expansion of the training data set, especially to include more clearsky days and extreme weather conditions.
We will apply multiobjective GP to the remaining variables required as atmospheric input to landsurface and subsurface models, i.e., precipitation, nearsurface specific humidity, nearsurface wind speed, longwave and shortwave radiation.
The discovered downscaling rules will be implemented in the coupled soilvegetationatmosphere model TerrSysMP.
Finally we will extend the system for different scale gaps, e.g., for disaggregating from 1 km to 100 m.
Multiobjective GP generates a set of Pareto optimal solution and not just one, which suggests the implementation of a downscaling ensemble by either using different rules in each ensemble run or by randomly switching between different rules taking into account temporal consistency.
An ensemble approach could also help to estimate the source of uncertainty induced by the downscaling procedure.
Acknowledgments
This work has been carried out in the framework of the Transregional Collaborative Research Centre TR 32 ’Patterns in SoilVegetationAtmosphereSystems’ funded by the Deutsche Forschungsgemeinschaft (DFG).
References
 [1]
 Baldauf et al. [2011] Baldauf, M., Seifert, A., Förstner, J., Majewski, D., Raschendorfer, M. and Reinhardt, T. [2011], ‘Operational convectivescale numerical weather prediction with the COSMO model: description and sensitivities’, Mon. Weather Rev. 139(12), 3887–3905.

Banzhaf et al. [1997]
Banzhaf, W., Nordin, P., Keller, R. and Francone, F. [1997],
Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications (The Morgan Kaufmann Series in Artificial Intelligence)
, Morgan Kaufmann Publishers, San Francisco, CA, USA.  Barr and Orgill [1989] Barr, S. and Orgill, M. M. [1989], ‘Influence of external meteorology on nocturnal valley drainage winds’, J. App. Meteorol. 28(6), 497–517.
 Biau et al. [1999] Biau, G., Zorita, E., von Storch, H. and Wackernagel, H. [1999], ‘Estimation of precipitation by kriging in the EOF space of thesea level pressure field’, J. Climate 12(4), 1070–1085.
 Carreau and Vrac [2011] Carreau, J. and Vrac, M. [2011], ‘Stochastic downscaling of precipitation with neural network conditional mixture models’, Water. Resour. Res. 47(10).
 Coulibaly [2004] Coulibaly, P. [2004], ‘Downscaling daily extreme temperatures with genetic programming’, Geophys. Res. Lett. 31(16).
 Deidda [2000] Deidda, R. [2000], ‘Rainfall downscaling in a spacetime multifractal framework’, Water. Resour. Res 36(7), 1779–1794.
 Doms and Schättler [2002] Doms, G. and Schättler, U. [2002], A description of the nonhydrostatic regional model LM, Technical report, Deutscher Wetterdienst.
 Ebert [2008] Ebert, E. E. [2008], ‘Fuzzy verification of highresolution gridded forecasts: a review and proposed framework’, Meteorol. Appl. 15(1), 51–64.
 Eiben and Smith [2003] Eiben, A. and Smith, J. [2003], Introduction to Evolutionary Computing, Springer, Berlin, Germany.
 Fiddes and Gruber [2014] Fiddes, J. and Gruber, S. [2014], ‘TopoSCALE v. 1.0: downscaling gridded climate data in complex terrain’, Geosci. Model Dev. 7(1), 387–405.
 Fowler et al. [2007] Fowler, H. J., Blenkinsop, S. and Tebaldi, C. [2007], ‘Linking climate change modelling to impacts studies: recent advances in downscaling techniques for hydrological modelling’, Int. J. Climatol. 27(12), 1547–1578.
 Friederichs and Paeth [2006] Friederichs, P. and Paeth, H. [2006], ‘Seasonal prediction of african precipitation with ECHAM4T42 ensemble simulations using a multivariate MOS recalibration scheme’, Clim. Dynam. 27(78), 761–786.
 Ghorbani et al. [2010] Ghorbani, M. A., Khatibi, R., Aytek, A., Makarynskyy, O. and Shiri, J. [2010], ‘Sea water level forecasting using genetic programming and comparing the performance with artificial neural networks’, Comput. Geosci. 36(5), 620–627.
 Hashmi et al. [2011] Hashmi, M. Z., Shamseldin, A. Y. and Melville, B. W. [2011], ‘Statistical downscaling of watershed precipitation using gene expression programming (gep)’, Environ. Modell. Softw. 26(12), 1639–1646.
 Izadifar and Elshorbagy [2010] Izadifar, Z. and Elshorbagy, A. [2010], ‘Prediction of hourly actual evapotranspiration using neural networks, genetic programming, and statistical models’, Hydrol. Process. 24(23), 3413–3425.
 Kollet and Maxwell [2006] Kollet, S. J. and Maxwell, R. M. [2006], ‘Integrated surface–groundwater flow modeling: A freesurface overland flow boundary condition in a parallel groundwater flow model’, Adv. Water Resour. 29(7), 945–958.
 Koza [1992] Koza, J. [1992], Genetic Programming, On the Programming of Computers by Means of Natural Selection, MIT Press, Cambridge, MA, USA.
 Liong et al. [2002] Liong, S.Y., Gautam, T. R., Khu, S. T., Babovic, V., Keijzer, M. and Muttil, N. [2002], ‘Genetic programming: A new paradigm in rainfall runoff modeling1’, J. Am. Water Resour. As. 38(3), 705–718.
 Liu et al. [2008] Liu, X., Coulibaly, P. and Evora, N. [2008], ‘Comparison of datadriven methods for downscaling ensemble weather forecasts.’, Hydro. Earth Syst. Sc. 12(2).
 Luke et al. [2002] Luke, S., Panait, L. et al. [2002], Lexicographic parsimony pressure., in ‘GECCO’, Vol. 2, pp. 829–836.
 Malone et al. [2012] Malone, B. P., McBratney, A. B., Minasny, B. and Wheeler, I. [2012], ‘A general method for downscaling earth resource information’, Comput. Geosci. 41, 119–125.
 Maraun et al. [2010] Maraun, D., Wetterhall, F., Ireson, A. M., Chandler, R. E., Kendon, E. J., Widmann, M., Brienen, S., Rust, H. W., Sauter, T., Themeßl, M. et al. [2010], ‘Precipitation downscaling under climate change: recent developments to bridge the gap between dynamical models and the end user’, Rev. Geophy. 48(3).
 Oleson et al. [2004] Oleson, K. W., Dai, Y., Bonan, G., Bosilovich, M., Dickinson, R., Dirmeyer, P., Hoffman, F., Houser, P., Levis, S., Niu, G.Y. et al. [2004], Technical description of the community land model (CLM), Technical report, NCAR Technical Note, National Center for Atmospheric Research, Boulder, CO, USA.
 Parasuraman et al. [2007] Parasuraman, K., Elshorbagy, A. and Carey, S. K. [2007], ‘Modelling the dynamics of the evapotranspiration process using genetic programming’, Hydrolog. Sci. J. 52(3), 563–578.
 Schlünzen and Katzfey [2003] Schlünzen, K. H. and Katzfey, J. J. [2003], ‘Relevance of subgridscale landuse effects for mesoscale models’, Tellus A 55(3), 232–246.
 Schomburg et al. [2012] Schomburg, A., Venema, V., Ament, F. and Simmer, C. [2012], ‘Disaggregation of screenlevel variables in a numerical weather prediction model with an explicit simulation of subgridscale landsurface heterogeneity’, Meteorol. Atmos. Phys. 116(34), 81–94.
 Schomburg et al. [2010] Schomburg, A., Venema, V., Lindau, R., Ament, F. and Simmer, C. [2010], ‘A downscaling scheme for atmospheric variables to drive soil–vegetation–atmosphere transfer models’, Tellus B 62(4), 242–258.
 Schoof and Pryor [2001] Schoof, J. T. and Pryor, S. [2001], ‘Downscaling temperature and precipitation: A comparison of regressionbased methods and artificial neural networks’, Int. J. of Climatol. 21(7), 773–790.
 Shrestha et al. [2014] Shrestha, P., Sulis, M., Masbou, M., Kollet, S. and Simmer, C. [2014], ‘A scaleconsistent terrestrial systems modeling platform based on COSMO, CLM and ParFlow’, Mon. Weather Rev. (2014).
 Silva and Almeida [2003] Silva, S. and Almeida, J. [2003], GPLABa genetic programming toolbox for MATLAB, in ‘Proceedings of the Nordic MATLAB conference’, Citeseer, pp. 273–278.
 Simon et al. [2013] Simon, T., Hense, A., Su, B., Jiang, T., Simmer, C. and Ohlwein, C. [2013], ‘Patternbased statistical downscaling of east asian summer monsoon precipitation’, Tellus A 65.
 Thorarinsdottir et al. [2013] Thorarinsdottir, T. L., Gneiting, T. and Gissibl, N. [2013], ‘Using proper divergence functions to evaluate climate models’, Journal on Uncertainty Quantification 1(1), 522–534.
 Vereecken et al. [2010] Vereecken, H., Kollet, S. and Simmer, C. [2010], ‘Patterns in soilvegetationatmosphere systems: Monitoring, modeling, and data assimilation’, Vadose Zone J. 9(4), 821–827.
 von Storch et al. [2000] von Storch, H., Hewitson, B. and Mearns, L. [2000], Review of empirical downscaling techniques, Technical report, Regional climate development under global warming.
 Wagner et al. [1990] Wagner, D., Ruprecht, E. and Simmer, C. [1990], ‘A combination of microwave observations from satellites and an eof analysis to retrieve vertical humidity profiles over the ocean’, J. App. Meteorol. 29(11), 1142–1157.
 Wilby et al. [2002] Wilby, R. L., Dawson, C. W. and Barrow, E. M. [2002], ‘SDSM a decision support tool for the assessment of regional climate change impacts’, Environ. Model. Softw. 17(2), 145–157.
 Wilby and Wigley [1997] Wilby, R. L. and Wigley, T. M. L. [1997], ‘Downscaling general circulation model output: a review of methods and limitations’, Prog. Phys. Geog. 21(4), 530–548.
 Xu [1999] Xu, C.Y. [1999], ‘From GCMs to river flow: a review of downscaling methods and hydrologic modelling approaches’, Prog. Phys. Geog. 23(2), 229–249.

Zitzler and Thiele [1999]
Zitzler, E. and Thiele, L. [1999], ‘Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach’,
Evol. Comput. 3(4), 257–271.
Comments
There are no comments yet.