1 Introduction
One of the primary objectives of supervised learning is to provide an estimation of the conditional expectation
for some underlying 1dimensional objective and a multidimensional covariate given the dataset . However, in many realworld applications, it is also important to extract the additional information encoded in the conditional distribution . This is particularly the case in the field of Heterogeneous Treatment Effects (HTE) estimation problems, which represent the primary motivation of this work.1.1 Motivation
In HTE problems, the traditional object of interest is the Conditional Average Treatment Effect (CATE) function, defined by
(1) 
where (resp. ) denotes the potential outcome (e.g., Rubin (1974) and Imbens and Rubin (2015)) of the treatment (resp. no treatment). The data are usually of form , where denotes the treatment assignment indicator. Recently, many approaches based on modern statistical learning techniques have been investigated to estimate the CATE function (e.g., Künzel et al. (2019); Athey and Wager (2019); Nie and Wager (2017)). Typically, assuming unconfoundedness, that is
(2) 
one is able to estimate and respectively with
(3) 
The classical approach in the HTE context is to design the causal inference procedure around the estimation of the CATE function defined in (1) using , and to test whether there is a notable difference between and for each newcoming individual . It is important to note that this is already a difficult task for certain datasets due to the unbalance between treatment and control groups or other practical reasons. For instance, the Xlearner Künzel et al. (2019) is proposed to deal with the unbalanced design by making efficient use of the structural information about the CATE function, and the Rlearner Nie and Wager (2017) is introduced to improve accuracy and robustness of the CATE function estimation by formulating it into a standard lossminimization problem. However, a simple inference based on the CATE function, or other key features composed by the average treatment effects (for example, the sorted Group Average Treatment Effects (GATES) proposed in Chernozhukov et al. (2018)), may be hazardous in some situations because of the lack of information on the fluctuations, or multimodality, of both conditional laws and . This phenomenon, in practice, can arise when and/or depend on some additional unconfounding factors that are not collected in the study, which, however, greatly affect the behaviors of the potential outcomes. From another point of view, being aware of the existence of such problems may also help to fix the flaw of data collection procedure or the lack of subgroup analysis for future study.
Ideally, one is interested in estimating the joint conditional distribution Unfortunately, a major difficulty of HTE estimation lies in the fact that it is in general impossible to collect and at the same time for the point . Unlike the difference in the linear conditional expectation , the dependence between and given is much more complex and difficult to track. Hence, due to the lack of information of the collectable dataset, the estimation of the conditional covariance between and
is usually unavailable, let alone the conditional joint distribution. A possible route to address this shortcoming is to concentrate on a weaker problem: instead of estimating
, we are interested in the estimation of conditional marginal distributions and . By considering the two subgroups (3) of the dataset , the problem thus enters into a more classical supervised learning context, similar as the design of Tlearners Künzel et al. (2019), while the objective is replaced by the estimation of conditional distributions. In some scenarios, even a simple raw visualization of the marginal conditional distributions, as a complement of CATE function estimation, may greatly help the decision making process for practitioners.Another motivation comes from the need to setup statistically sound decision procedures for multivariate objectives in the context of HTE. For example, a treatment is often related to a cost, which is also collectable and sometimes essential to the final treatment decisions. In this context, a simple extension of the CATE function will clearly not be able to capture the dependencies between the treatment effects and the cost. Thus, a statistical tool that allows conditional distribution estimation with multivariate objective will therefore be useful for more complex inferences involving both treatment effects and costs at the same time. In general, the traditional nonparametric methods for conditional distribution inference (e.g. Hall et al. (1999); Hall and Yao (2005)) are less effective when it comes to flexibility of implementation, parallelization, and the ability to handle highdimensional noisy data. So, we try to take advantage of the available modern machine/statistical learning tools.
1.2 Random Forests for conditional distribution estimation
In order to address the issues described in the subsection above, our idea is to propose an adaptation of the Random Forests (RF) algorithm Breiman (2001), so that it can be applied to the conditional distribution estimation problems in the HTE context. RF have proven to be successful in many realworld applications—the reader is referred to Biau and Scornet (2015) and the references therein for a general introduction. If we look at the final prediction at each point provided by the RF algorithm, it can be regarded as a weighted average of
, where the random weights depend upon the training dataset and the stochastic mechanism of the forests. Therefore, a very natural idea is to use this weighted empirical measure to approximate the target conditional distribution. This is also the driving force in the construction of Quantile Regression Forests
Meinshausen (2006); Athey et al. (2019)and other approaches that combine kernel density estimations and Random Forests (e.g.
Pospisil and Lee (2019)).In the present article, instead of studying the quantile or density function of the target conditional distribution, we focus directly on the (weighted) empirical measures output by the forests and the associated Wasserstein distances. This also makes further inferences based on MonteCarlo methods or smoothing more convenient and straightforward. To make it clearer, let us denote by
the probability measure associated with the conditional distribution
. Heuristically speaking, if the Wasserstein distance between
and is dominated, in some sense, by the distance between and , then the data points that fall into a “neighborhood” of are expected to be capable to provide reliable approximation of the conditional measure . In the RF context, the role of each tree in the ensemble is to build a wisely created partition of the domain, so that the “neighborhood” mentioned above can be defined accordingly. As such, the random weights come from the averaging procedure of multiple trees.As Breiman’s original RF are primarily designed for conditional expectation estimations, we first provide in Section 2
a reformulation that gives new insights into Breiman’s original splitting criterion, by exploiting a simple relation between empirical variance and Wasserstein distance between empirical measures. This reformulation allows a new interpretation of the RF algorithm in the context of conditional distribution estimation, which, in turn, can be used to handle multivariate objectives with a computational cost that grows linearly with the dimension of the output. We also investigate in this section several dedicated modifications of Breiman’s splitting rule and present some preliminary theoretical connections between their constructions. With a slight abuse of language, all these RF variants aiming at conditional distribution estimation are referred to as Wasserstein Random Forests (WRF) in this article. Finally, we return in Section
3 to the HTE problem and illustrate through various numerical experiments how WRF may help to design more transparent causal inferences in this context.2 Wasserstein Random Forests
In order to simplify the introduction of WRF, we temporarily limit the discussion to the classical supervised learning setting. Let and
be, respectively, the canonical random variables of covariate and the objective. Our goal is to estimate the conditional measure
associated with using the dataset .2.1 Mechanism of Random Forests
A Random Forest is an ensemble method that aggregates a collection of randomized decision trees. Denote by
the number of trees and, for , let be the canonical random variable that captures randomness of the th tree. Each decision tree is trained on a randomly selected dataset with the same cardinal , sampled uniformly in with or without replacement. More concretely, for each tree, a sequence of axisaligned splits is made recursively by maximizing some fixed splitting rule. At each iteration, directions are explored and the splits are always performed in the middle of two consecutive data points, in order to remove the possible ties. The splitting stops when the current cell contains fewer points than a threshold , or when all the data points are identical. In this way, a binary hierarchical partition of is constructed. For any , we denote by the cell in the th tree that contains and by the number of data points in that fall into .The core of our approach relies on the fact that the prediction of the conditional distribution at point given by the th tree is simply the empirical measure associated with the observations that fall into the same cell as , that is
where is the Dirac measure at . Let . As such, the final estimation provided by the forest is but the average of the , , over the trees, i.e.,
Equivalently,
where
is the random weight associated with . It is readily checked that for any . Thus, the final prediction is a weighted empirical measure with random weights naturally given by the tree aggregation mechanism. Our notation is compatible with Biau and Scornet (2015), where a more detailed introduction to RF is provided. It should be stressed again that we are interested in learning the conditional distribution , not in inferring the conditional expectation as in traditional forests. This is of course a more complex task, insofar as the expectation is just a feature, albeit essential, of the distribution.
As an illustration, consider in Figure 1 the partition provided by a decision tree trained on a bidimensional subdataset of size 8. The estimation of the conditional distribution at the point is simply the empirical measure associated with the cell to which it belongs. Mutatis mutandis, suppose that there is another decision tree that gives the measure as the estimation at . Then the final estimation of the conditional distribution output by the forest that contains these two trees is the empirical distribution . On the other hand, the classical RF outputs the average as the scalar estimation of the conditional expectation.
2.2 Splitting criteria and Wasserstein distances
Now, let us take a closer look at the splitting criteria that are maximized at each cell in the construction of trees. For a cell that consists in a subset of data points , an axisaligned cut along the th coordinate at position defines a partition of . More precisely,
With a slight abuse of notation, we write when . Recall that Breiman’s original splitting criterion Breiman (2001) takes the following form:
(4) 
where (resp. , ) is the average of the that fall into (resp. , ), and (resp. , ) is the cardinal of (resp. , ). This criterion is maximized at each node of each tree over and the mtry randomly chosen coordinates (see, e.g., Section 2.2 of Biau and Scornet (2015)). The quantity can be interpreted as the difference between the total variance and the intraclass variance within the subgroups divided by the split, which, thanks to the total variance decomposition, turns out to be the associated interclass variance, i.e.,
(5) 
Therefore, when seeking possible extensions, Breiman’s splitting criterion should be understood in a twofold manner: minimization of the intraclass variance or maximization of the interclass variance, induced by the split. Regardless of the choice of interpretation, since there is only a finite number of cuts to be evaluated at each iteration, a decision tree can therefore be built in a greedy manner. Without loss of generality, when bootstrap is involved (i.e., is sampled with replacement), one may consider multisets/bags in order to deal with duplicate data for formal definitions discussed above.
Before proceeding further, we recall some basic properties of Wasserstein distances between empirical measures. If not mentioned otherwise, and denote respectively the dimension of the covariate and the dimension of the objective . For two empirical measures of the same size on , say and , by considering the original form of Monge problem (see, e.g., Section 2.2 of Peyré and Cuturi (2018)), the Wasserstein distance is defined by
where denotes a permutation in , the collection of all permutations on and is the Euclidean norm on . When and are probability measures on (i.e., ), it is easily checked that
(6) 
where (resp. ) denotes the th order statistic of the sample (resp. ).
Intraclass interpretation
We focus on the representation of given in (4). The key observation is encapsulated in the following proposition, which expresses Breiman’s splitting criterion in terms of Wasserstein distances between empirical measures.
Proposition 2.1.
Assume that , i.e., . Denote by (resp. , ) the empirical measure associated with (resp. , ). Then
Proof.
First, in the same spirit of Kontorovich’s relaxation (see, e.g., Ambrosio and Gigli (2013); Peyré and Cuturi (2018)), we provide a permutationbased representation of the Wasserstein distance between empirical measures of potentially different sizes and . Denote by the ceiling function. Observe that, by exploiting the following alternative forms
(7) 
one has
When , one obtains
(8) 
In particular, when and is univariate, we have
whence
(9) 
Proposition 2.1 heuristically indicates that RF are welladapted to estimate conditional distributions, by offering a new interpretation of Breiman’s splitting criterion in terms of intraclass quadratic Wasserstein distances between empirical measures. An important consequence of this result is that it allows a natural generalization of Breiman’s criterion to outputs with a dimension greater than 1. Indeed, the extension of RF to multivariate outputs is not straightforward, even in the context of conditional expectation estimation (e.g., Segal and Xiao (2011); Miller et al. (2014)). The dependence between the different coordinates of the objective is usually dealt with using additional tuning or supplementary prior knowledge. Such a modeling is not necessary in our approach since dependencies in the
vector features are captured by the Wasserstein distances. (Note however that some appropriate normalization should be considered when there are noticeable differences between the coordinates of the objective.) Besides, this extension is also computationally efficient, as the complexity of the evaluation at each cell increases linearly w.r.t. the dimension
of the objective . To see this, assume that and just note that, according to (8), standard calculations givewhere
In the sequel, to increase the clarity of our presentation, we use the notation instead of for the criterion defined in Proposition 2.1, keeping in mind that this quantity is valid for any dimension of . It is interesting to note that can be naturally extended to orders not necessarily equal to , by putting
However, finding an efficient implementation of is not trivial in terms of computational complexity. This requires a further analysis, which is beyond the scope of the present article.
Interclass interpretation
Let us now give another look at the representation of in the form of (5). Unlike the intraclass interpretation provided above, there is no free representation in this situation. However, by replacing the error with the distance between empirical measures according to the goal of conditional distribution estimation, it is natural to consider the following splitting criterion:
In the univariate case (i.e., ), thanks to (6) and the same technique involving the alternative form (7) used in the proof of Proposition 2.1, it is easily checked that can be computed with complexity at each cell that contains the data points
. This rate can be achieved by considering a Quicksort algorithm in order to deal with the order statistics in (
6). The implementation is tractable, although slightly worse than , the complexity of . However, the computation of the Wasserstein distance is not trivial when , where an exact computation is of order (cf. Section 2 of Peyré and Cuturi (2018)). A possible relaxation is to consider an entropic regularized approximation such as Sinkhorn distance (cf. Cuturi (2013); Genevay et al. (2019); Lin et al. (2019)), where the associated complexity is of order with tolerance . Nevertheless, since the amount of evaluations ofis enormous during the construction of RF, we only recommend using this variant of splitting criterion for univariate problems at the moment. The details of efficient implementations and possible relaxations for multivariate cases will be left for future research.
Since the principle of minimizing (resp. maximizing) the intraclass (resp. interclass) variance is a universal strategy beyond the implementation in the RF context, we hope that these Wassersteinbased interpretations may inspire a wider range of future applications. It is however now time to put our splitting analysis to good use and return to the HTE conditional distribution estimation problem.
3 Applications
Our primary interest is the improvement that WRF can bring into the causal inference under the potential outcomes framework. To be clearer, let us recall that is a random variable on and denotes the treatment assignment indicator, i.e., a Bernoulli random variable that may depends on . As for now, the potential outcomes and are assumed to be univariate random variables—extension to the multivariate case will be discussed a little later. During the observational study, the i.i.d. dataset is collected with an abbreviation of . Under the unconfoundedness assumption (2
), our goal is to estimate the probability distribution
associated with the conditional marginal distribution for , based on the dataset . To this aim, we take advantage of the WRF framework presented in Section 2 and discuss some possible ways to exploit the additional information that can be collected by the estimation of and .3.1 Wasserstein Random Forests for HTE
Before discussing the usage of these conditional marginal distribution estimations, we would like to stress again that we have no intention to “replace” the Average Treatment Effectbased causal inference strategy. On the contrary, our primary motivation is to provide a complementary tool so that a more transparent inference can be conducted, by maximizing the usage of available data. More precisely, we train WRF respectively on the treatment and control groups (3), to estimate respectively the conditional measures and . These estimations are denoted by and .
First, since potential outcomes are assumed to be univariate, a raw visualization of and is always accessible and informative. In this way, causality can therefore be visualized by the change of the shape of the marginal distributions. Next, following a philosophy similar to the CATE function, we propose to assess the changes in the conditional distribution in terms of Wasserstein distance using the criterion
Intuitively speaking, is more sensitive than the CATE function , in the sense that is capable of capturing certain causal effects that are less noticeable in terms of . An estimation can be obtained as a byproduct of the estimation of and . Finally, regarding the multivariate output case, we would like to mention that when the cost of the treatment, say , is also collected in the dataset, WRF can then be used—as we have seen in Subsection 2.2, without further effort—to provide an estimation of the joint multivariate distribution in order to conduct more complex inferences involving the costs and the treatment effects at the same time. The same idea also applies to the case where the treatment effects themselves are also multivariate.
3.2 Experimental results
Since the conditional distribution is in general inaccessible from the realworld datasets, we present here a brief simulation study based on synthetic data to illustrate the performance of WRF in the context of HTE.
A Python package is accessible at the following github repository:
We also provide in Appendices a more complete numerical analysis on the tuning suggestions and influence of propensity score function (i.e., the probability of treatment), along with a discussion on the possible extensions. All the detailed algorithms can be found in the end.
We consider the following model, where and the symbol
stands for the Gaussian distribution:

with ;

;

;

with , , and . To summarize, the conditional measure is unimodal, while is bimodal, composed of two independent Gaussians. The mixture parameters in can be interpreted as an unconfounding factor that is not collected in the study. The four functions , , , and have been designed to implement complex dependence between the covariate and the potential outcomes. We note however that the CATE function takes the simple form and is therefore equal to zero if and only if .






We trained the models based on a simulated dataset of size , which is reasonably small considering the complexity of the conditional distribution estimation problem. The treatment and control groups are balanced due to the symmetrical form of , and the size of training data for each group is therefore around 2500. An illustration for an individual with (so, ) can be found in Figure 2 ((a)(c) for WRF; (d)(e) for WRF). This visualization highlights the good quality of conditional inference performed by our WRF methods. More importantly, it stresses the pertinence of studying conditional distributions in the HTE context, since the CATE function, as is the case here, is not always capable to provide insights regarding causality. For example, according to the trained WRF model, we have (reference value ), which is much more noticeable as an indicator of causality than the CATE function in this situation. We also note that we are able to approximate the joint conditional distribution of and given , which illustrates the potential of our approach in the multivariate case (Figure 2 (c)). The results for the two considered settings and are pretty similar. A finer comparison based on average Wasserstein distance is shown in Table 1, where ( and ) denotes the average distance between and (approximated by an empirical measure of size 10 000) tested on points randomly sampled in .
Methods  

WRF  0.6325  0.7831  1.1511  1.7993 
WRF  0.6618  0.8317  1.1462  1.7752 
WRF  0.6493  0.7906  1.0757  1.6445 
Conclusion
We have proposed a new approach based on WRF that can help HTE inference through estimating some key conditional distributions. From a theoretical perspective, the challenge is to prove consistency of WRF when the sample size tends to infinity, in the spirit of works such as Scornet et al. (2015); Wager (2014). For example, a first goal would be to show that, under appropriate assumptions, as , where denotes the output of WRF and the expectation is taken w.r.t. both the distribution of and the sample. A deeper understanding of the relations between inter/intra interpretations is also an interesting topic for future research.
Broader impact
With the introduction of Wasserstein Random Forests and their applications in the Heterogeneous Treatment Effects context, we hope the present paper will be useful in a twofold manner. First, for the causal inference community, it provides complementary tools that allow to design more transparent inference strategies. Second, to the Random Forests community, it opens an interesting perspective that connects forests, Wasserstein distances, and conditional distribution estimation problems. We do not expect any specific negative outcomes from the broader impact of this article.
Acknowledgements
This work is funded by the Agence Nationale de la Recherche, under grant agreement no. ANR18CE36001001. François Petit acknowledges the support of the Idex “Université de Paris 2019”.
Appendix A On the parameter tuning of WRF
We discuss in this section the influence of the choice of parameters (i.e., mtry, , and nodesize) of the WRF and try to provide some suggestions on the algorithm tuning. We stick to the model provided in Section 3.2 of the main text and compare the respectively for and to illustrate the performance of our method in unimodal and multimodal situations. Unlike the conditional expectation estimation, the cross validationbased tuning strategy is not straightforward to implement for conditional distribution estimation. Indeed, we have only a single sample at each point , and it does not provide enough information for the conditional distribution. Therefore, we also track the performance of the associated conditional expectation estimations in terms of Mean Squared Error (MSE). The conditional expectation functions given of and are denoted respectively by and . Our goal is to illustrate whether the tuning for the conditional expectation can be exploited to guide the tuning for the conditional distribution estimation problem. We also note that since each tree is constructed using only part of the data, the outofbag errors for the forest can thus be obtained by averaging the empirical error of each tree on the unused subdataset (see, e.g., Section 2.4 of Biau and Scornet (2015)) in the case where an independent test dataset is not available.
First, it is wellknown that in the classical RF context the number of trees should be taken as large as possible, according to the available computing budget, in order to reduce the variance of the forest. Although the goal in the WRF framework is changed to the conditional distribution estimation, it is still suggested to use a large if possible.
Second, let us investigate the number of directions to be explored at each cell mtry. The result is illustrated in Figure 3 ((a)(d) for average Wasserstein loss and (e)(f) for MSE of conditional expectation estimation). Roughly speaking, the value of mtry reflects the strength of greedy optimization at each cell during the construction of decision trees. A conservative approach is to choose mtry as large as possible according to the available computing resources.







Then, let us see the influence brought by the change of nodesize. The illustration can be found in Figure 4 ((a)(d) for average Wasserstein loss and (e)(f) for MSE of conditional expectation estimation). In the classical RF context, the motivation of the choice can be interpreted as introducing some local averaging procedure at each cell in order to deal with the variance or noise of the sample. Here, as discussed in the main text, we are interested in the conditional distribution estimation in the HTE context, where the variance or other fluctuation of the conditional distribution is part of the information to be estimated. Hence, the interpretation of the choice should be adapted accordingly, as the minimum sample size that is used to describe the conditional distribution at each cell. This interpretation is better suited when it comes to the estimation of multimodal conditional distributions. As shown in Figure 4 (a)(d), there are some optimal choices of nodesize between and . In the simple cases, such as the estimation of (unimodal), the MSE of the associated conditional expectation (Figure 4 (e)) can be used, accordingly, to tune the algorithm for conditional distribution estimation. However, in the more complex case such as the estimation of (bimodal), the MSE of the conditional expectation estimation is no as stable (Figure 4 (f)). Nevertheless, it is also recommended to use small nodesize in this situation as a conservative choice.














Finally, we discuss the size of the subdataset used to construct each decision tree. Note that the choice of is still not wellunderstood even in the classical RF context (see, e.g., Biau and Scornet (2015); Scornet et al. (2015)). When the computing budget allows to implement (with replacement, which corresponds to the classical Bootstrap), we recommend to use this choice. Otherwise, we recommend to fix the from one fifth to one third of the whole data size in order to maintain a reasonably good performance without heavy computations.
Suggestions on the parameter tuning
The takehome message for the parameter tuning of WRF is simple: We recommend to use large and mtry according to the available computing resources. The parameter nodesize can be tuned via a cross validationbased strategy using the MSE of the associated conditional expectation estimation. In addition, we suggest to choose smaller nodesize when there is abnormal fluctuation of the MSE score. It is also proposed to use classical bootstrap (i.e., with replacement) when possible. Otherwise, we suggest to fix a smaller according to the computing budget. Finally, although there is no theoretical guarantee, we advocate to use WRF for univariate objective, since it has a better overall accuracy with a reasonable additional computational cost.
Appendix B On the propensity score function
The propensity score function measures the probability that the treatment is assigned to a certain individual, which basically determines the distribution of the available dataset for the estimation of and in the population. More precisely, imagine that is an individual such that in the neighbourhood of , the value of is close to 0. Then, it is expected that only very few training data for the estimation of can be collected during the observational study. As a consequence, it is expected that the estimation at such point is of reasonably bad quality. For example, returning to the model provided in Section 3.2 of the main text, the propensity score function is
Denote by an individual such that , , and . It is readily checked that . As shown in Figure 6, the estimation of is very accurate (see Figure 6 (a)(b)), while the estimation of is of poor quality (see Figure 6 (c)(d)).





From a theoretical perspective, one may suppose that the propensity score function is bounded away from 0 and 1 uniformly for all (see, e.g., Künzel et al. (2019); Nie and Wager (2017)). However, it is, unfortunately, not possible to control the propensity score during an observational study. As a consequence, it is usually very difficult to verify such an assumption in practice.
Therefore, a more meaningful question can be how to detect if our estimation is reliable or not for a certain individual. A straightforward strategy is to estimate the propensity score function independently, as done for example in Athey and Wager (2019), and to test whether the value of this score is close to 0 and 1. Another approach is to exploit the information encoded in the splits/weights of the forest to detect whether enough data is collected for the prediction at target individual. The details are left for future research.
Finally, let us mention that if the goal is to estimate the function defined in Section 3.1 of the main text, we expect that more dedicated variants of WRF can be constructed, in the same spirit of Causal Forests introduced in Athey and Wager (2019).
Appendix C Possible extensions
In this section, we discuss two natural extensions of WRF that we did not investigate in details.
First, inspired by the Random Rotation Ensembles introduced in Blaser and Fryzlewicz (2016)
, it is natural to consider the implementation of oblique splits, i.e., the splits are not necessarily axisaligned. More precisely, for each tree, by sampling a uniformly distributed rotation matrix (e.g. Section 3 of
Blaser and Fryzlewicz (2016)), we are able to construct the decision tree by using the rotated subdataset (or equivalently, one can also implement randomly rotated cuts in the tree’s construction). Intuitively speaking, the rotation variants of WRF will be more consistent when it comes to performance, while the additional computing resources are required for both training and prediction.Another direction is to replace the Dirac mass in the empirical measures by some kernel , as proposed in Pospisil and Lee (2019). For instance, the WRF can be modified by using the following splitting criteria:
where the kernel is chosen according to prior knowledge of the problem. At the same time, the final prediction will be replaced by
where remains the same as defined in Section 2.1 of the main text. When the associated distance is easy to compute, we expect that this extension will be more accurate for small datasets. Nevertheless, the performances of these natural extensions are still not clear. The details are therefore left for future research.
Appendix D Algorithms
The detailed algorithms for the WRF used in the main text are provided below.
Comments
There are no comments yet.