1 Introduction
Deep reinforcement learning (RL) algorithms have fueled many of the most publicized achievements in modern machine learning
(Silver et al., 2017; OpenAI, 2018; Abbeel & Schulman, 2016; Mnih et al., 2013). However, despite these accomplishments, deep RL methods still are not nearly as reliable as their (deep) supervised learning counterparts. Indeed, recent research found the existing deep RL methods to be brittle
(Henderson et al., 2017; Zhang et al., 2018), hard to reproduce (Henderson et al., 2017; Tucker et al., 2018), unreliable across runs (Henderson et al., 2017, 2018), and sometimes outperformed by simple baselines (Mania et al., 2018).The prevalence of these issues points to a broader problem: we do not understand how the parts comprising deep RL algorithms impact agent training, either separately or as a whole. This unsatisfactory understanding suggests that we should reevaluate the inner workings of our algorithms. Indeed, the overall question motivating our work is: how do the multitude of mechanisms used in deep RL training algorithms impact agent behavior?
Our contributions.
We analyze the underpinnings of agent behavior—both through the traditional metric of cumulative reward, and by measuring more finegrained algorithmic properties. As a first step, we conduct a case study of two of the most popular deep policygradient methods: Trust Region Policy Optimization (TRPO) (Schulman et al., 2015a) and Proximal Policy Optimization (PPO) (Schulman et al., 2017). These two methods are closely related: PPO was originally developed as a refinement of TRPO.
We find that much of the observed improvement in reward brought by PPO may come from seemingly small modifications to the core algorithm which we call codelevel optimizations. These optimizations are either found only in implementations of PPO, or are described as auxiliary details and are not present in the corresponding TRPO baselines^{1}^{1}1
Note that these codelevel optimizations are separate from “implementation choices” like the choice of PyTorch versus TensorFlow in that they intentionally change the training algorithm’s operation.
. We pinpoint these modifications, and perform an ablation study demonstrating that they are instrumental to the PPO’s performance.This observation prompts us to study how codelevel optimizations change agent training dynamics, and whether we can truly think of these optimizations as merely auxiliary improvements. Our results indicate that these optimizations fundamentally change algorithms’ operation, and go even beyond improvements in agent reward. We find that they majorly impact a key algorithmic principle behind TRPO and PPO’s operations: trust region enforcement.
Ultimately, we discover that the PPO codeoptimizations are more important in terms of final reward achieved than the choice of general training algorithm (TRPO vs. PPO). This result is in stark contrast to the previous view that the central PPO clipping method drives the gains seen in Schulman et al. (2017). In doing so, we demonstrate that the algorithmic changes imposed by such optimizations make rigorous comparisons of algorithms difficult. Without a rigorous understanding of the full impact of codelevel optimizations, we cannot hope to gain any reliable insight from comparing algorithms on benchmark tasks.
Our results emphasize the importance of building RL methods in a modular manner. To progress towards more performant and reliable algorithms, we need to understand each component’s impact on agents’ behavior and performance—both individually, and as part of a whole.
Code for all the results shown in this work is available at https://github.com/MadryLab/implementationmatters.
2 Related Work
The idea of using gradient estimates to update neural network–based RL agents dates back at least to the work of
Williams (1992), who proposed the REINFORCE algorithm. Later, Sutton et al. (1999) established a unifying framework that casts the previous algorithms as instances of the policy gradient method.Our work focuses on proximal policy optimization (PPO) (Schulman et al., 2017) and trust region policy optimization (TRPO) (Schulman et al., 2015a), which are two of the most prominent policy gradient algorithms used in deep RL. Much of the original inspiration for the usage of the trust regions stems from the conservative policy update of Kakade (2001). This policy update, similarly to TRPO, uses a natural gradient descentbased greedy policy update. TRPO also bears similarity to the relative policy entropy search method of Peters et al. (2010), which constrains the distance between marginal action distributions (whereas TRPO constrains the conditionals of such action distributions).
Notably, Henderson et al. (2017) points out a number of brittleness, reproducibility, and experimental practice issues in deep RL algorithms. Importantly, we build on the observation of Henderson et al. (2017) that final reward for a given algorithm is greatly influenced depending on the code base used. Rajeswaran et al. (2017) and Mania et al. (2018) also demonstrate that on many of the benchmark tasks, the performance of PPO and TRPO can be matched by fairly elementary randomized search approaches. Additionally, Tucker et al. (2018) showed that one of the recently proposed extensions of the policy gradient framework, i.e., the usage of baseline functions that are also actiondependent (in addition to being statedependent), might not lead to better policies after all.
3 Attributing Success in Proximal Policy Optimization
Our overarching goal is to better understand the underpinnings of the behavior of deep policy gradient methods. We thus perform a careful study of two tightly linked algorithms: TRPO and PPO (recall that PPO is motivated as TRPO with a different trust region enforcement mechanism). To better understand these methods, we start by thoroughly investigating their implementations in practice. We find that in comparison to TRPO, the PPO implementation contains many nontrivial optimizations that are not (or only barely) described in its corresponding paper. Indeed, the standard implementation of PPO^{2}^{2}2From the OpenAI baselines GitHub repository: https://github.com/openai/baselines contains the following additional optimizations:

Value function clipping: Schulman et al. (2017) originally suggest fitting the value network via regression to target values:
but the standard implementation instead fits the value network with a PPOlike objective:
where is clipped around the previous value estimates (and
is fixed to the same value as the value used to clip probability ratios in the PPO loss function (cf. Eq. (
2) in Section 4). 
Reward scaling:
Rather than feeding the rewards directly from the environment into the objective, the PPO implementation performs a certain discountbased scaling scheme. In this scheme, the rewards are divided through by the standard deviation of a rolling discounted sum of the rewards (without subtracting and readding the mean)—see Algorithm
1 in Appendix A.2. 
Orthogonal initialization and layer scaling: Instead of using the default weight initialization scheme for the policy and value networks, the implementation uses an orthogonal initialization scheme with scaling that varies from layer to layer.

Adam learning rate annealing: Depending on the task, the implementation sometimes anneals the learning rate of Adam (Kingma & Ba, 2014) (an already adaptive method) for optimization.

Reward Clipping: The implementation also clips the rewards within a preset range (usually or ).

Observation Clipping: Analagously to rewards, the observations are also clipped within a range, usually .

Hyperbolic tan activations: As observed by Henderson et al. (2017), implementations of policy gradient algorithms also use hyperbolic tangent function activations between layers in the policy and value networks.

Global Gradient Clipping
: After computing the gradient with respect to the policy and the value networks, the implementation clips the gradients such the “global norm” (i.e. the norm of the concatenated gradients of all parameters) does not exceed .
agents), and plot histograms in which agents are partitioned based on whether each optimization is on or off. Our results show that reward normalization, Adam annealing, and network initialization each significantly impact the rewards landscape with respect to hyperparameters, and were necessary for attaining the highest PPO reward within the tested hyperparameter grid. We detail our experimental setup in Appendix
A.1.These optimizations may appear as merely surfacelevel or insignificant algorithmic changes to the core policy gradient method at hand. However, we find that they dramatically impact the performance of PPO. Specifically, we perform a full ablation study on the four optimizations mentioned above^{3}^{3}3Due to restrictions on computational resources, we could only perform a full ablation on the first four of the identified optimizations.. Figure 1 shows a histogram of the final rewards of agents trained with every possible configuration of the above optimizations—for each configuration, a grid search for the optimal learning rate is performed, and we measure the reward of random agents trained using the identified learning rate. Our findings suggest that many codelevel optimizations are necessary for PPO to attain its claimed performance.
The above findings show that our ability to understand PPO from an algorithmic perspective hinges on the ability to distill out its fundamental principles from such algorithmindependent (in the sense that these optimizations can be implemented for any policy gradient method) optimizations. We thus consider a variant of PPO called PPOMinimal (PPOM) which implements only the core of the algorithm. PPOM uses the standard value network loss, no reward scaling, the default network initialization, and Adam with a fixed learning rate. Importantly, PPOM ignores all the codelevel optimizations listed at the beginning of Section 3. We explore PPOM alongside PPO and TRPO. We list all the algorithms we study and their defining properties in Table 1.
Algorithm  Section  Step method  Uses PPO clipping?  Uses PPO optimizations? 

PPO  —  PPO  ✓  As in (Dhariwal et al., 2017) 
PPOM  Sec. 3  PPO  ✓  ✗ 
PPONoClip  Sec. 4  PPO  ✗  Found via grid search 
TRPO  —  TRPO  —  ✗ 
TRPO+  Sec. 5  TRPO  —  Found via grid search 
Overall, our results on the importance of these optimizations both corroborate results demonstrating the brittleness of deep policy gradient methods, and demonstrate that even beyond environmental brittleness, the algorithms themselves exhibit high sensitivity to implementation choices ^{4}^{4}4This might also explain the difference between different codebases observed in Henderson et al. (2017).
4 CodeLevel Optimizations have Algorithmic Effects
The seemingly disproportionate effect of codelevel optimizations identified in our ablation study may lead us to ask: how do these seemingly superficial codelevel optimizations impact underlying agent behavior? In this section, we demonstrate that the codelevel optimizations fundamentally alter agent behavior. Rather than merely improving ultimate cumulative award, such optimizations directly impact the principles motivating the core algorithms.
Trust Region Optimization.
A key property of policy gradient algorithms is that update steps computed at any specific policy are only guaranteed predictiveness in a neighborhood around . Thus, to ensure that the update steps we derive remain predictive, many policy gradient algorithms ensure that these steps stay in the vicinity of the current policy. The resulting “trust region” methods (Kakade, 2001; Schulman et al., 2015a, 2017) try to constrain the local variation of the parameters in policyspace by restricting the distributional distance between successive policies.
A popular method in this class is trust region policy optimization (TRPO) Schulman et al. (2015a). TRPO constrains the KL divergence between successive policies on the optimization trajectory, leading to the following problem:
s.t.  (1) 
In practice, we maximize this objective with a secondorder approximation of the KL divergence and natural gradient descent, and replace the worstcase KL constraints over all possible states with an approximation of the mean KL based on the states observed in the current trajectory.
Proximal policy optimization.
One disadvantage of the TRPO algorithm is that it can be computationally costly—the step direction is estimated with nonlinear conjugate gradients, which requires the computation of multiple Hessianvector products. To address this issue, Schulman et al. (2017) propose proximal policy optimization (PPO), which tries to enforce a trust region with a different objective that does not require computing a projection. Concretely, PPO proposes replacing the KLconstrained objective (1) of TRPO by clipping the objective function directly as:
(2) 
where
(3) 
Note that this objective can be optimized without an explicit projection step, leading to a simpler parameter update during training. In addition to its simplicity, PPO is intended to be faster and more sampleefficient than TRPO (Schulman et al., 2017).
Trust regions in TRPO and PPO.
Enforcing a trust region is a core algorithmic property of different policy gradient methods. However, whether or not a trust region is enforced is not directly observable from the final rewards. So, how does this algorithmic property vary across stateoftheart policy gradient methods?
In Figure 2 we measure the mean KL divergence between successive policies in a training run of both TRPO and PPOM (PPO without codelevel optimizations). Recall that TRPO is designed specifically to constrain this KLbased trust region, while the clipping mechanism of PPO attempts to approximate it. Indeed, we find that TRPO precisely enforces this trust region (this is unsuprising, and nearly by construction).
We thus turn our attention to the trust regions induced by training with PPO and PPOM. First, we consider mathematically the contribution of a single stateaction pair to the gradient of the PPO objective, which is given by
are respectively the standard and clipped versions of the surrogate objective. As a result, since we initialize as (and thus the ratios start all equal to one) the first step we take is identical to a maximization step over the unclipped surrogate objective. It thus stands to reason that the nature of the trust region enforced is heavily dependent on the method with which the clipped PPO objective is optimized, rather than on the objective itself. Therefore, the size of the step we take is determined solely by the steepness of the surrogate landscape (i.e. Lipschitz constant of the optimization problem we solve), and we can end up moving arbitrarily far from the trust region. We hypothesize that this dependence of PPO on properties of the optimizer rather than on the optimization objective contributes to the brittleness of the algorithm to hyperparameters such as learning rate and momentum, as observed by Henderson et al. (2018) and others.
The results we observe (shown in Figure 2) corroborate this intuition. For agents trained with optimal parameters, all three algorithms are able to maintain a KLbased trust region. First, we note that all three algorithms fail to maintain a ratiobased trust region, despite PPO and PPOM being trained directly with a ratioclipping objective. Furthermore, the nature of the KL trust region enforced differs between PPO and PPOM, despite the fact that the core algorithm remains constant between the two methods; while PPOM KL trends up as the number of iterations increases, PPO KL peaks halfway through training before trending down again.
The findings from this experiment and the corresponding calculations demonstrate that perhaps a key factor in the behavior of PPOtrained agents even from an algorithmic viewpoint comes from auxiliary optimizations, rather than the core methodology.
5 Identifying Roots of Algorithmic Progress
Stateoftheart deep policy gradient methods are comprised of many interacting components. At what is generally described as their core, these methods incorporate mechanisms like trust regionenforcing steps, timedependent value predictors, and advantage estimation methods for controlling the exploitation/exploration tradeoff (Schulman et al., 2015b). However, these algorithms also incorporate many less oftdiscussed optimizations (cf. Section 3) that ultimately dictate much of agent behavior (cf. Section 4). Given the need to improve on these algorithms, the fact that such optimizations are so important begs the question: how do we identify the true roots of algorithmic progress in deep policy gradient methods?
Unfortunately, answering this question is not easy. Going back to our study of PPO and TRPO, it is widely believed (and claimed) that the key innovation of PPO responsible for its improved performance over the baseline of TRPO is the ratio clipping mechanism discussed in Section 4. However, we have already shown that this clipping mechanism is insufficient theoretically to maintain a trust region, and also that the method by which the objective is optimized appears to have significant effect on the resulting trust region. If codelevel optimizations are thus partially responsible for algorithmic properties of PPO, is it possible that they are also a key factor in PPO’s improved performance?
To address this question, we set out to further disentangle the impact of PPO’s core clipping mechanism from its codelevel optimizations by once again considering variations on the PPO and TRPO algorithms. Specifically, we examine how employing the core PPO and TRPO steps changes model performance while controlling for the effect of codelevel optimizations identified in standard implementations of PPO (in particular, we focus on those covered in Section 3). These codelevel optimizations are largely algorithmindependent, and so they can be straightforwardly applied or lightly adapted to any policy gradient method. The previously introduced PPOM algorithm corresponds to PPO without these optimizations. To further account for their effects, we study an additional algorithm which we denote as TRPO+, consisting of the core algorithmic contribution of TRPO in combination with PPO’s codelevel optimizations as identified in Section 3 ^{5}^{5}5We also add a new codelevel optimization, a KL decay, inapplicable to PPO but meant to serve as the analog of Adam learning rate annealing.. We note that TRPO+ together with the other three algorithms introduced (PPO, PPOM, and TRPO; all listed in Table 1) now capture all combinations of core algorithms and codelevel optimizations, allowing us to study the impact of each in a finegrained manner.
As our results show in Table 2, it turns out that codelevel optimizations contribute to algorithms’ increased performance often significantly more than the choice of algorithm (i.e., using PPO vs. TRPO). For example, on Hopperv2, PPO and TRPO see 17% and 21% improvements (respectively) when equipped with codelevel optimizations. At the same time, for all tasks after fixing the choice to use or not use optimizations, the core algorithm employed does not seem to have a significant impact on reward. In Table 2, we quantify this contrast through the following two metrics, which we denote average algorithmic improvement (AAI) and average codelevel improvement (ACLI):
In short, AAI measures the maximal effect of switching step algorithms, whereas ACLI measures the maximal effect of adding in codelevel optimizations for a fixed choice of step algorithm.
MuJoCo Task  
Step  Walker2dv2  Hopperv2  Humanoidv2 
PPO  3292 [3157, 3426]  2513 [2391, 2632]  806 [785, 827] 
PPOM  2735 [2602, 2866]  2142 [2008, 2279]  674 [656, 695] 
TRPO  2791 [2709, 2873]  2043 [1948, 2136]  586 [576, 596] 
TRPO+  3050 [2976, 3126]  2466 [2381, 2549]  1030 [979, 1083] 
AAI  242  99  224 
ACLI  557  421  444 
. We train at least 80 agents for each estimate (more for some highvariance cases). We present 95% confidence intervals computed via a 1000sample bootstrap. We also present the AAI and ACLI metrics discussed in Section
5, which attempt to quantify the relative contribution of algorithmic choice vs. use of codelevel optimizations respectively.PPO without clipping.
Given the relative insignificance of the step mechanism compared to the use of codelevel optimizations, we are prompted to ask: to what extent is the clipping mechanism of PPO actually responsible for the algorithm’s success? In Table 3, we assess this by considering a PPONoClip algorithm which makes use of common codelevel optimizations (by gridding over the best possible combination of such optimizations) but does not employ a clipping mechanism (this is the same algorithm we studied in Section 4 in the context of trust region enforcement)—recall that we list all the algorithms studied in Table 1.
It turns out that the clipping mechanism is not necessary to achieve high performance—we find that PPONoClip performs uniformly better than PPOM, despite the latter employing the core PPO clipping mechanism. Moreover, introducing codelevel optimizations seems to outweigh even the core PPO algorithm in terms of effect on rewards. In fact, we find that with sufficient hyperparameter tuning, PPONoClip often matches the performance of standard PPO, which includes a standard configuration of codelevel optimizations^{6}^{6}6Note that it is possible that further refinement on the codelevel optimizations could be added on top of PPO to perhaps improve its performance to an even greater extent (after all, PPONoClip can only express a subset the training algorithms covered by PPO, as the latter leaves the clipping severity to be free parameter). We also include benchmark PPO numbers from the OpenAI baselines repository (Dhariwal et al., 2017) (where available) to put results into context.
Walker2dv2  Hopperv2  Humanoidv2  
PPO  3292 [3157, 3426]  2513 [2391, 2632]  806 [785, 827] 
PPO (baselines)  3424  2316  — 
PPOM  2735 [2602, 2866]  2142 [2008, 2279]  674 [656, 695] 
PPONoClip  2867 [2701, 3024]  2371 [2316, 2424]  831 [798, 869] 
Our results suggest that it is difficult to attribute success to different aspects of policy gradient algorithms without careful analysis.
6 Conclusion
In this work, we take a first step in examining how the mechanisms powering deep policy gradient methods impact agents both in terms of achieved reward and underlying algorithmic behavior. Wanting to understand agent operation from the ground up, we take a deep dive into the operation of two of the most popular deep policy gradient methods: TRPO and PPO. In doing so, we identify a number of “codelevel optimizations”—algorithm augmentations found only in algorithms’ implementations or described as auxiliary details in their presentation—and find that these optimizations have a drastic effect on agent performance.
In fact, these seemingly unimportant optimizations fundamentally change algorithm operation in ways unpredicted by the conceptual policy gradient framework. Indeed, the optimizations often dictate the nature of the trust region enforced by policy gradient algorithms, even controlling for the surrogate objective being optimized. We go on to test the importance of codelevel optimizations in agent performance, and find that PPO’s marked improvement over TRPO (and even stochastic gradient descent) can be largely attributed to these optimizations.
Overall, our results highlight the necessity of designing deep RL methods in a modular manner. When building algorithms, we should understand precisely how each component impacts agent training—both in terms of overall performance and underlying algorithmic behavior. It is impossible to properly attribute successes and failures in the complicated systems that make up deep RL methods without such diligence. More broadly, our findings suggest that developing an RL toolkit will require moving beyond the current benchmarkdriven evaluation model to a more finegrained understanding of deep RL methods.
7 Acknowledgements
We would like to thank Chloe Hsu for identifying a bug in our initial implementation of PPO and TPRO. Work supported in part by the NSF grants CCF1553428, CNS1815221, the Google PhD Fellowship, the Open Phil AI Fellowship, and the Microsoft Corporation.
References
 Abbeel & Schulman (2016) Pieter Abbeel and John Schulman. Deep reinforcement learning through policy optimization. Tutorial at Neural Information Processing Systems, 2016.
 Dhariwal et al. (2017) Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. Openai baselines. https://github.com/openai/baselines, 2017.
 Henderson et al. (2017) Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560, 2017.
 Henderson et al. (2018) Peter Henderson, Joshua Romoff, and Joelle Pineau. Where did my optimum go?: An empirical analysis of gradient descent optimization in policy gradient methods, 2018.
 Kakade (2001) Sham M. Kakade. A natural policy gradient. In NIPS, 2001.
 Kingma & Ba (2014) Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
 Mania et al. (2018) Horia Mania, Aurelia Guy, and Benjamin Recht. Simple random search provides a competitive approach to reinforcement learning. CoRR, abs/1803.07055, 2018.

Mnih et al. (2013)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
Antonoglou, Daan Wierstra, and Martin Riedmiller.
Playing atari with deep reinforcement learning.
In
NeurIPS Deep Learning Workshop
, 2013.  OpenAI (2018) OpenAI. Openai five. https://blog.openai.com/openaifive/, 2018.
 Peters et al. (2010) Jan Peters, Katharina Mülling, and Yasemin Altun. Relative entropy policy search. In AAAI, 2010.
 Rajeswaran et al. (2017) Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, and Sham M. Kakade. Towards generalization and simplicity in continuous control. In NIPS, 2017.
 Schulman et al. (2015a) John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International Conference on Machine Learning, pp. 1889–1897, 2015a.
 Schulman et al. (2015b) John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. Highdimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
 Schulman et al. (2017) John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
 Silver et al. (2017) David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
 Sutton et al. (1999) Richard S. Sutton, David A. McAllester, Satinder P. Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In NIPS, 1999.
 Tucker et al. (2018) George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, and Sergey Levine. The mirage of actiondependent baselines in reinforcement learning. In ICML, 2018.
 Williams (1992) Ronald J. Williams. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
 Zhang et al. (2018) Amy Zhang, Yuxin Wu, and Joelle Pineau. Natural environment benchmarks for reinforcement learning, 2018.
Appendix A Appendix
a.1 Experimental Setup
All the hyperparameters used in this paper were obtained through grid searches. For PPO the exact codelevel optimizations and their associated hyperparameters (e.g. coefficients for entropy regularization, reward clipping, etc.) were taken from the OpenAI baselines repository ^{7}^{7}7https://github.com/openai/baselines, and gridding is performed over the value function learning rate, the clipping constant, and the learning rate schedule. In TRPO, we grid over the same parameters (replacing learning rate schedule with the KL constraint), but omit the codelevel optimizations. For PPONoClip, we grid over the same parameters as PPO, in addition to the configuration of codelevel optimizations (since we lack a good reference for what the optimal configuration of these optimizations is). For TRPO+ we also grid over the codelevel optimizations, and also implement a “KL schedule” whereby the KL constraint can change over training (analogous to the learning rate annealing optimization in PPO). Finally, for PPOM, we grid over the same parameters as PPO (just learning rate schedules), without any codelevel optimizations. The final parameters for each algorithm are given below, and a more detailed account is available in our code release: https://github.com/MadryLab/implementationmatters.
All error bars we plot are 95% confidence intervals, obtained via bootstrapped sampling.
Comments
There are no comments yet.