# Moment-Preserving and Mesh-Adaptive Reweighting Method for Rare-Event Sampling in Monte-Carlo Algorithms

We present novel roulette schemes for rare-event sampling that are both structure-preserving and unbiased. The boundaries where Monte Carlo markers are split and deleted are placed automatically and adapted during runtime. Extending existing codes with the new schemes is possible without severe changes because the equation of motion for the markers is not altered. A nonlinear and nonlocal coupling between markers, as in the case of ion cyclotron resonance heating, is permitted. We show results from the ASCOT-RFOF code as an application of the schemes. The amplitude of Monte Carlo noise for 2 MeV ions can be reduced by a factor of 18, what would normally take over 350 times as many markers.

## Authors

• 1 publication
• 1 publication
• 2 publications
• 1 publication
• 1 publication
• 1 publication
• 1 publication
09/07/2021

### Aerodynamic Risk Assessment using Parametric, Three-Dimensional Unstructured, High-Fidelity CFD and Adaptive Sampling

We demonstrate an adaptive sampling approach for computing the probabili...
01/13/2019

### Multiple Importance Sampling for Efficient Symbol Error Rate Estimation

Digital constellations formed by hexagonal or other non-square two-dimen...
11/02/2019

### BIMC: The Bayesian Inverse Monte Carlo method for goal-oriented uncertainty quantification. Part I

We consider the problem of estimating rare event probabilities, focusing...
12/10/2021

### Frozen Gaussian Sampling: A Mesh-free Monte Carlo Method For Approximating Semiclassical Schrödinger Equations

In this paper, we develop a Monte Carlo algorithm named the Frozen Gauss...
10/28/2021

### Living on the Edge: An Unified Approach to Antithetic Sampling

We identify recurrent ingredients in the antithetic sampling literature ...
03/17/2022

### Covid19 Reproduction Number: Credibility Intervals by Blockwise Proximal Monte Carlo Samplers

Monitoring the Covid19 pandemic constitutes a critical societal stake th...
07/15/2020

### Hardware Acceleration of Monte-Carlo Sampling for Energy Efficient Robust Robot Manipulation

Algorithms based on Monte-Carlo sampling have been widely adapted in rob...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Monte Carlo techniques can be efficient for solving a variety of problems, for example diffusion-advection equations. One application is plasma physics and the heating of plasmas with ion cyclotron resonance heating (ICRH) stixBook. The particle distribution function in a plasma is governed by a Boltzmann equation in Fokker-Planck approximation

 (1)

is the phase space coordinate,

is the diffusion tensor, and

the advection velocity. The single arrow denotes a vector, the double arrow a tensor.

and together are refered to as Fokker-Planck Coefficients (FPC). can be computed by simulating the corresponding Langevin equation (ichimaru2018basic, p. 294ff)

 \dif→z=→a\dift+\smallleftrightarrowb⋅→ξ(t)√\dift (2)

for test particles, which we call markers. are Gaussian random numbers with zero mean and variance 1. The markers are distributed according to , which can be reconstructed for example by using a histogram:

 ¯f% histogram bin(t)=∫binf(→z,t)\dif→z∫bin\dif→z≈∑i1bin(→zi(t))Wi(t)∫bin\dif→z, (3)

where we approximated the integral over the distribution function with the sum over the markers inside a bin of the histogram. is 1 if lies in the bin of the histogram, and 0 otherwise. is the weight of the -th marker, which is constant unless reweighting is employed111Some schemes introduce brunner1999collisional, and evolve the markers according to the Langevin equation for . Each marker then has a weight which evolves in time and effectively gives . We instead directly solve for . The weights are only adapted in discrete steps when we employ reweighting, which will be introduced later. The weights do not follow an equation of motion.. The markers are randomly initialized such that equation (3) is valid for the initial condition for .

For ICRH the FPC have two contributions: collisions, for which a Maxwellian background plasma with which the particles collide can be assumed hirvijoki2014ascot, and radio frequency (RF) interactions. The RF FPC depend on the amplitude of the wave field . The wave amplitude is set by the injected power:

 Prf=Awave∫Dg(f(z),\smallleftrightarrowD(z),v(z))dz, (4)

where is the density of the absorbed power, and the domain of the Fokker-Planck equation. When modeling an experiment one has to adjust the FPC such that the absorbed power is close to the power injected in the experiment. Equations (1) and (2) are then nonlocal, complicating the numerical solution of equation (1). The Langevin equation (2) can still be solved for fusion plasmas by orbit following codes such as ASCOT hirvijoki2014ascot. The code library RFOF johnson2011library is used here to include the RF interaction.

A direct numerical solution of equation (2) quickly becomes computationally expensive when the solution of the diffusion-advection equation varies by several orders of magnitude in the domain of the solution. As

is proportional to the probability distribution of marker positions, only few markers are in areas where

is small. Those areas are therefore only poorly resolved. Plasmas heated by ICRH exhibit a population of highly energetic ions, which are few compared to thermal ions but of prime interest killeen2012computational; heidbrink2008basic.

From importance sampling we know how to reduce the error of the simulation results in such regions of interest: We have to place many markers there. Equation (3) tells us that this corresponds to lower marker weights in those regions compared to regions of less interest. As the simulation progresses, the markers will mix due to the stochastic term in equation (2), i.e. small markers will leave the region of interest and large markers will enter it. We need to adjust the weights of markers entering and leaving the region to keep the weight of markers in the region of interest smaller than the average weight, which is called reweighting. There exist different reweighting techniques in the literature which however suffer from drawbacks. The classic splitting and Russian Roulette techniques (hammersley2013MCmethods, p. 99f) lead to growing fluctuations of the marker number and total weight because the Russian Roulette does not conserve the conserved quantities of the underlying Fokker-Planck equation. For long times all markers will eventually be deleted (cf. the Bienaymé-Galton-Watson process (GeoffreyGrimmett2001, 5.4), which we introduce in Section 2.3). There are other methods, based on merging markers martin2016octree, which however bias the result. The mean result is then no longer the exact result of the underlying Fokker-Planck equation, although the extent of this effect can be mitigated by selecting well the markers that are merged (Octree binning) martin2016octree.

There exist other methods for resolving regions of phase space with low densities, often collectively reffered to as rare event sampling. Here we however choose splitting and Russian Roulette techniques because

• They can cope with the nonlinear and nonlocal property introduced by the fixed RF power (equation (4)).

• One can avoid additional bias compared to the direct solution of the Langevin equation (2) with the new splitting and roulette techniques presented in this paper. We show this analytically for a special case in in A, and verify it numerically schuster2019MscThesis.

• The new reweighting schemes are structure preserving because no errors in the moments of the distribution function are introduced that grow as the simulation progresses.

• It is possible to extend existing codes that directly simulate the Langevin equation (2) without big changes because the equation of motion for the markers is not altered.

This paper is structured as follows: In Sections 2.2 and 2.3 we introduce the traditional splitting and roulette methods as tools for variance reduction. In Sections 2.3.1 - 2.5 we present our new reweighting schemes, whose parameters can be chosen as described in Section 2.6. The schemes are applied in Section 3. In Section 3.3 we show how the new schemes can reduce by orders of magnitude the number of markers needed to obtain a certain noise amplitude for simulations with the orbit-following code ASCOT hirvijoki2014ascot.

## 2 Reweighting Techniques

Diffusion-advection equations are often solved by markers evolving according to the associated Langevin equation (2). This can be inefficient when the distribution function varies by orders of magnitude in its domain. When we calculate

from the marker positions by using a histogram, we can use bin statistics to estimate the accuracy of the result.

When we assume that the number of markers in a bin is Poisson distributed, we obtain for the relative statistical error of the reconstructed

 =1√⟨n⟩, (5)

where is the reconstructed from equation (3),

the standard deviation and

is the mean number of markers in the corresponding bin of the histogram. When is small in a region of phase space compared to elsewhere in phase space, will also be small by definition. is then large.

We can reduce by using more markers in the low density region. When we use more markers we have to reduce their weight accordingly to obtain the correct . For a problem such as ICRF heating using differently weighted markers in different regions of phase space is not trivial because the simulated time scales are much larger than the collisional time scales. The markers will mix and the advantage of differently weighted markers will vanish. It is therefore necessary to reweight the markers, i.e. adjust their weights during runtime, as they move in the domain. This can be done using splitting techniques (hammersley2013MCmethods, p. 100) for markers that enter a region of lower weight, and employing Russian Roulette techniques for markers leaving low weight regions (hammersley2013MCmethods, p. 99). For this we need to define discrete regions in phase space.

To decide how to reweight the markers we define nested (hyper)volumes, called weight regions or simply regions, in phase space, each with a different weight for the markers therein. We define the weight regions with the help of a function . The weight region with weight is defined as , where is the whole phase space. The weight region for each marker and time step is found by computing . The ratio between the weights of two adjacent regions and

 Ri,i+1=WiWi+1 (6)

is, for now, required to be an integer. After discussing the novel schemes in Sections 2.3.1 and 2.3.2 we will have the tools to relax this constraint.

By setting up weight regions we obtain the following important properties:

1. Markers at the same phase space position have the same weight. This is optimal, assuming uncorrelated markers, in the sense that the variance of equation (3) is minimized. With reweighting schemes it can be beneficial to allow some mixing, as discussed in Section 2.4.

2. The largest weight that markers can obtain is given by the weight of the first weight region. There is, by design, no population of markers with higher weights than desired.

3. We can freely choose which regions of phase space to resolve with which weights. As we discuss in the next paragraphs we can even automatically adapt the regions during runtime.

4. Determining the weight region a marker belongs to is efficient. One only has to evaluate .

As described above the weight of a marker is a function of its phase space position, and it would therefore not be necessary to store the weight. However, in Section 2.4 we allow markers of different weight to mix to some extent, making it necessary to store the weight.

We should not define the weight regions arbitrarily. Bin statistics and importance sampling suggest that we should concentrate all markers in the region of phase space we are interested in. This consideration is, however, unaware of the fact that we still have to resolve the rest of phase space with sufficient resolution to obtain the boundary conditions for our regions of interest. We will later refer to the influx of markers from a poorly resolved region to a well resolved region as noisy flux.

One approach to sufficiently resolve the whole phase space is to keep the marker density in phase space approximately constant. In this case , and we can determine by calculating the binned density whenever we reweight. is not constant in time. This rather radical approach might reduce the accuracy of regions with large considerably compared to simulations with equally weighted markers. As we presented in detail earlier schuster2019MscThesis

we can interpolate between

and . The idea is that the accuracy depends on the phase space density of markers, and by interpolating linearly between the marker densities we can interpolate freely between the weighting functions. In Section 3.2 and Figure 2 we will show that such an interpolated weighting function can be used to resolve both low and high density regions well. We can write the interpolated weight function as

 (7)

where

 N \coloneqq∫Df(z)dz VD \coloneqq∫Ddz

are constants, and and are parameters that can be freely chosen. is the (mean) number of markers used in the simulation, and determines how the weight regions are placed. yields markers of equal weight, and a spatially constant marker density in phase space. As long as and are conserved, the mean number of markers does not change during the simulation, provided the discretized weights are close to one another and can be considered continuous. If is infinite, we can for example define a minimum weight beyond which markers are no longer split. Then the mean number of markers can change during the simulation even when is conserved.

Calculating according to equation (7) in general requires communication between threads because we need to evaluate the local . Usually the distribution function evolves on longer time scales than the time between reweightings. Then does not need to be calculated every time, allowing for reweighting without communication between threads for each operation.

### 2.2 Splitting

When we split one marker into markers we have to

1. reduce the weight of the marker by a factor

2. create identical copies of the marker.

Immediately after splitting there is no improvement of the accuracy. The markers at the same position are mathematically identical to the original marker, but reqire -times as many computations to advance. The accuracy however improves as the simulation progresses: The stochastic motion due to the diffusive term in equations (1) and (2) is different for each marker. After long times the accumulated stochastic motion is sufficiently large that the marker position is mostly independent of the initial position. We call the time scale required for this to happen the decorrelation time. After the markers are decorrelated we can treat them as independent, and equation (5) holds again.

Observe that the notion of independent markers depends on what we evaluate with the markers. When considering the binned density, as used for histograms, the markers can be considered independent when the intially identical markers are not all in the same bin. This happens when the markers have independently traveled one bin-length, which is therefore the length scale on which the markers become independent. To summarize, markers decorrelated on shorter lengths when the measure we evaluate, e.g. the binned density, is more localized. The decorrelation is apparent in figure 1, where the variance of the result drops to the prediction assuming uncorrelated markers approximately at a distance of 0.2 away from the boundary where markers are split.

### 2.3 Russian Roulette

When markers travel from the low-weight region to the high-weight region , we have to reduce their number and therefore increase their weight. Contrary to approaches in current literature vranic2015particleMergingAlgorithmPIC; martin2016octree we do not merge markers as this will result in a biased estimate of the distribution function vranic2015particleMergingAlgorithmPIC. Instead, we delete some of the markers. With the traditional so-called Russian Roulette (hammersley2013MCmethods, p. 99) we delete markers that travel from region to region with a probability of , and multiply the weight of the survivors by .

When implementing reweighting as so far described, i.e. we

• define weight regions,

• split markers when they travel to a region of lower weight,

• delete markers with the Russian Roulette when the travel to a region of higher weight,

the number of markers will fluctuate and eventually () all markers will be deleted. We can understand this by considering the Bienaymé-Galton-Watson process (GeoffreyGrimmett2001, 5.4): A marker of weight travels to a lower-weight region and is split into markers. Subsequetely these markers return to the original region, meaning every marker is deleted with a probability of . The number of markers is then a stochastic variable. As the simulation progresses there will be many subsequent splittings and roulettes. If by chance we once delete all returning markers, what has a nonzero probability of occurring every time the markers travel to the high weight region, the number of markers will be 0 for all future times. The variance of the number of markers increases linearly with the number of reweightings (GeoffreyGrimmett2001, 5.4). In settings as given by plasma physics where the number of physical particles and therefore weight typically is a conserved quantity this is clearly problematic. And also for the simulation of general stochastic processes such artificial fluctuations are numerical errors.

#### 2.3.1 Correlated Roulette

From now on we will describe the new techniques to avoid the shortcomings of the traditional Russian Roulette. Our first approach is to use correlated instead of uncorrelated random numbers for the Russian Roulette. We require to be a rational number with and integers and . To let, on average, only 1 out of markers survive the crossing to the high weight region we delete out of every markers that travel to the high weight region.

One way to implement this is to take a Boolean array with entries. The array contains entries of 1 and entries of 0, where 0 represents deleting a marker and 1 letting it survive. The order is randomly assigned. For the first marker that crosses the boundary we use the first entry to decide whether to delete it or not, for the second marker the second entry and so forth. After the last entry of the array has been used we fill the array with entries in a new and random order and start with the first entry for the next marker. Compared to an uncorrelated random number generator that gives 1 with probability and 0 otherwise we built a random number generator with the same probabilities for the results, but with correlations between the generated random numbers such that we always delete markers out of . We use such a correlated random number generator for each boundary between weight regions.

After markers have traveled to the high weight region, the total weight is identical to the total weight before the markers traveled to the high weight region. The total weight can therefore only deviate slightly from the true value, and this deviation does not grow larger with longer simulation time. Also, this deviation of total weight is 222Suppose markers travel from the low-weight to the high-weight region. They initially have weight , after passing the roulette they have weight . The number of markers that survive the roulette is . The correlated roulette would ideally preserve the total weight

An upper bound for the error of the roulette, i.e. the mismatch between and , is k. Because we obtain
, with the number of markers used in the simulation, and quickly becomes negligible when .

#### 2.3.2 Deterministic Roulette

Our second approach aims at not only avoid growing fluctuations of the zeroth moment of the distribution function but of all moments. As an additional property of each marker we store the region in which the marker was created and use this information instead of random numbers for deciding which markers to delete. Markers are deleted when they travel to a region of higher weight than what they were created with. Markers that originated in the highest weight region will never be deleted and evolve as they would without reweighting. Their weights, however, change as the markers move between regions. All markers that are created by splitting eventually travel to a weight region where they are deleted. The marker population is therefore composed of two parts:

1. Markers created in the highest weight regions, whose distribution is uninfluenced by reweighting.

2. Markers created directly or indirectly by the markers of 1). They are not able to accumulate large fluctuations because they are deleted on the time scale on which markers travel between regions.

We still introduce fluctuations by reweighting because markers that were created in the low weight regions and the original marker from the high weight region will in general not return simultaneously. But these fluctuations can not accumulate or grow as the simulation progresses. When we initialize the markers in the beginning of the simulation we have to set their creation region. A simple approach would be to create markers with weights for the highest weight region and reweight them. We presented a method however that introduces fewer fluctuations and can generate a list of markers with properly set creation weights from any list of (non-uniformly weighted) markers in schuster2019MscThesis.

### 2.4 Hysteresis Region

Reweighting introduces fluctuations, even with the correlated and the deterministic roulette.

A Monte Carlo marker typically moves on a zigzag trajectory as known from Brownian motion. When a marker travels from one weight region to another we can therefore expect it to cross the boundary many times. If the time resolution becomes infinite we expect a diverging number of crossings because the underlying Wiener process is self-similar embrechts2009selfsimilar.

Every time we reweight a marker we introduce fluctuations. It is therefore advisable to not reweight too often, e.g. after every time step. While it is not obvious what the optimal time between reweightings is, letting the markers mix until they have typically moved across a weight region before reweighting has proven to yield satisfying results schuster2019MscThesis.

Selecting the corresponding time between reweightings can be problematic when the diffusion coefficient varies in phase space or when it is not explicitly known. In this case we can reweight after short times and define a hysteresis region where the markers are allowed to mix. Without hysteresis, we split a marker at position with weight when , and subject it to the roulette when . With a hysteresis of width we increase the region the marker can move in before being reweighted to . The hysteresis width does not have to be identical for all boundaries, therefore we introduced the indices.

### 2.5 Rational Instead of Integer Weight Ratios r

Our roulette methods can already cope with more general weight ratios. has to be a rational number for the correlated roulette, while for the traditional Russian roulette and the deterministic roulette real would be possible as well. Splitting is more problematic. After splitting, the newly created markers should have equal weight because markers with less weight require just as much computational time as markers with higher weight, but contribute less to the result. This leads to an increased variance as compared to markers that are created with equal weights.

We can still use non-integer weight ratios when we do not always split into the same number of markers. If we used uncorrelated random numbers to decide into how many markers to split, the total weight in the simulation would fluctuate strongly for long simulation times, just as with the Russian roulette. While the deterministic roulette would still avoid those fluctuations, we can eliminate them independently of the used roulette scheme by using correlated random numbers. With the deterministic roulette we can also avoid fluctuations of the higher moments by using an independent correlated random number generator for each high-weight marker separately. We presented the details elsewhere schuster2019MscThesis.

### 2.6 Choice of Parameters

To represent the chosen well, and to avoid alignment issues between weight regions and histogram bins, one should choose the ratios between weight steps as small as possible. This is limited by eventual implementation issues and the time between two reweightings: We determined empirically that one obtains good results when one allows to mix the markers in a region before reweighting them, although this does not appear to be critical. When the regions are smaller, one needs to reweight more often.

After deciding on we determine the remaining parameters. First, we select a time between reweightings long enough such that the computational overhead from reweighting is justifiable, but short enough to prohibit excessive mixing of markers. After all, we want to resolve the phase space with marker weights that are adapted to the respective phase space positions. Next, we set the discrete weights , which determine the weight regions, such that the markers in each region mix with one another. This can be difficult to determine because the regions change during the simulation and the diffusion coefficient might not be easy to determine. In this case one can choose larger weight steps, and therefore regions, and use a hysteresis width similar to the weight step to allow appropriate mixing of markers.

## 3 Characterization and Performance of Novel Schemes

We characterize the schemes by considering simple diffusion-advection equations. We let the simulations reach steady state and use bins to calculate the phase space density for many, i.e. , points in time. From this set of measurements for the same phase space densities one can then calculate the variance of the density measurement.

After splitting, the markers decorrelate on the length scale of the bin size. To analyze the decorrelation process it is therefore necessary to use overlapping bins.

To be able to compare simulations with different numbers of markers we multiply the determined variance of some quantity with , the mean number of markers used in the simulation. To obtain a quantity for the relative error we additionally divide by the squared mean of the quantity, :

 σ2norm=σ2⟨N⟩μ2 (8)

### 3.1 Wiener Process

We start our analysis with the Wiener process, a simple diffusive process, which is described by

 ∂tf=(∂2x+∂2y)f. (9)

The computational domain is two-dimensional with and , with a common diffusion coefficient for both dimensions and mirror boundary conditions. The analytical solution for the steady state is . In more than one dimension the correlated roulette can show detrimental effects, which is why we consider the 2D case. We choose the domain larger than the domain to amplify these detrimental effects. The bins for the density measurement have a size of and are positioned at . For showing the effect of multiple consecutive crossings we want to choose a small time step for the Euler-Maruyama scheme kloeden2013numerical used to advance the markers. As a compromise with the required computational time we choose . The initial distribution of markers is already constant in phase space. We start the simulations with 1000 markers.

In Figure 1 we see the numerically determined for this setting. With markers of equal weight (gray, dotted line) 333Consider a single marker. The probability distribution for its position is uniform. The result of the histogram is given by . The definitions of mean, variance and can now be applied, and we obtain the normalized variance for the histogram for markers without (re)weighting.. We can estimate analytically using bin statistics (gray, solid line)444For bin statistics we assume that the markers are independent, and that the probability for a marker to be inside a histogram bin is small compared to the probability of being outside of the bin. The number of markers in the bin is then Poisson distributed, therefore the mean equals the variance. With the analytic solution, which is a constant, we calculate mean, variance and normalized variance.. For reweighting we separate the domain into two weight regions: for the markers have twice the weight as for . When we use the correlated roulette without hysteresis (blue, solid line) the results are less accurate than without reweighting everywhere, and lie clearly above the prediction from bin statistics. At we see an increase in variance due to fluctuations arising from reweighting. When we use a hysteresis region of 0.1 (blue, dashed line) the markers are split and deleted less often, and the accuracy is improved. still lies above the prediction from bin statistics.

With the deterministic roulette the markers in the high weight region are not influenced by reweighting, as discussed in Section 2.3.2. This manifests in Figure 1 as a good agreement between the result with the deterministic roulette and the prediction from bin statistics for . At there are already markers from the low weight region inside the bin because the bin width is 0.2. When going towards larger , decreases, but it lags behind the prediction from bin statistics because the markers need time to decorrelate. When introducing a hysteresis of length 0.1 (red, dashed line) we reduce fluctuations and smooth the transition between the weight regions.

In Section 3.1 we looked at a single boundary between two weight regions in detail. There, was constant. Reweighting is however designed to improve the accuracy in low-density regions. To obtain an equilibrium density that varies as a function of we add advection to equation (9). For simplicity we restrict ourselves to a 1D process:

 ∂tf=−v∂xf+D∂2xf. (10)

Together with mirror boundary conditions we obtain the steady state solution

 f(x)=f0exp(−sx) (11)

with

 s=−vD≈13.3. (12)

We simulate this equation in the domain , using mirror boundary conditions. The time step is .

As described in Section 2.1, we place the boundaries between weight regions such that the marker density lies between constant marker density and the marker density for uniform markers. At the markers have equal weight, and at the weight of the markers is proportional to . We use the deterministic roulette because it is more reliable in multiple dimensions as show in figure 1, but the result is very similar for the correlated roulette in this case.

The normalized variances of the simulations are shown in Figure 2. The boundaries between the weight regions for are shown as vertical red lines. The density follows equation (10) and is exponentially declining on a length scale of . Without reweighting, the variance is roughly inversely proportional to the density, leading to inaccurate results at . By reweighting we can reduce the variance at large by more than three orders of magnitude. Simultaneously the variance at is increased by up to a factor of 8. With the parameter we can choose which regions we want to resolve with which accuracy. As an example, suppose we want to increase the accuracy in the low density region without compromising in the high density region, compared to the case without reweighting. By choosing we could reduce the variance at by over two orders of magnitude while increasing the variance at by merely .

There is a descriptive way of understanding how such large gains in accuracy in the low density tail can be achieved by sacrificing comparably little accuracy in the high density bulk. At high densities there are orders of magnitude more markers than at low densities. Moving a small fraction, e.g. a few percent, of the markers at high densities to low densities has only a small impact on the high density region. But it greatly increases the number of markers, and therefore the accuracy, in the low density region.

### 3.3 Ascot-Rfof

As last test case we return to the physical problem of ICRH. The orbit-following code ASCOT hirvijoki2014ascot calculates the distribution function of particles in tokamaks or stellarators. Together with the code library RFOF johnson2011library ASCOT can be used to model ICRH. We implemented reweighting with both the deterministic and the correlated roulette in ASCOT. The weight regions are adjusted automatically, as described in Section 2.1. We use weights proportional to the distribution function, because the distribution function changes drastically during heating with RF waves. Reweighting is not yet parallelized even though the schemes are suited for parallelization.

ASCOT-RFOF is initialized with a wave amplitude which is iteratively adjusted to give a required amount of absorbed power, which is set as input. This feedback constitutes a nonlinearity in the equation of motion of the Monte Carlo markers. A certain number of markers is required such that the absorbed power can be determined accurately. Because reweighting is not yet parallelized we can only use a limited number of markers, and we cannot readjust the RF wave amplitude. Instead the wave amplitude is kept constant throughout the simulation. To simplify the implementation we do not replenish wall losses. The simulation results can therefore not be compared to the experiment, but we can nevertheless compare simulations with and without reweighting.

The hydrogen distribution function is calculated with ASCOT-RFOF, based on the setup from Sipilä sipila2018monte for ASDEX Upgrade discharge #33147. The exact parameters for the reweighting schemes is given and motivated in previous work (schuster2019MscThesis, 4.6.1). We compare simulations without reweighting and with our new weighting schemes in Figure 3. All three distribution functions agree well. The result with the deterministic roulette (red) lies above the result with the correlated roulette (yellow) for most energy values. This does not signify that there is a systematic overestimation of because the markers are not independent when reweighting, and this discrepancy could be caused by noisy flux from the thermal part of the distribution. With reweighting we can resolve energies that are orders of magnitude lower than the ones resolvable with uniform weights (’no reweighting’ in figure 3).

To quantify the Monte Carlo noise we can not use the same approach of the simpler test cases in Sections 3.1 and 3.2 because the simulation is too expensive to be run for long enough times. Bin statistics is also not applicable because the markers are correlated, and therefore no Poisson distribution for the number of markers in the bins can be assumed. Instead, we separate each bin from the histogram shown in Figure 3 into 50 smaller bins. We assume that changes only little over and interpret the values from the 50 smaller bins as samples of the same physical . We can thus calculate the variance , and therefore , for each histogram bin. This yields only an estimate for the variance due to noise: We are unable to detect offsets that span over .

In Figure 4 we show as a function of . At high densities the distribution function varies quickly, violating our assumption of constant density in each bin. It is therefore not possible to reconstruct the variance for , where . With reweighting the results are more accurate than without reweighting already at . When going towards lower densities the normalized variance increases more slowly with reweighting than without it. Both new reweighting schemes perform identically. We fit the normalized variance of the result without reweighting (blue) with , where is determined by the fit. We then divide all data points by to compare the performance more directly, this is shown on the right. At , corresponding to ions of , the variance due to Monte Carlo noise is reduced by a factor of . Observe that due to noisy flux there are correlations between bins, meaning noise is not the only source of the variance of the result.

When interested in ions with an energy of we can reduce the computational time by a factor of 350, provided enough markers are used to renormalize the RF wave field.

## 4 Conclusion

When using splitting and roulette schemes one has to define regions in the simulation domain and boundaries between them. Every region is associated with the weight of the markers therein. We presented a method to automatically define these regions, and even adapt them during the simulation to account for changing requirements due to the evolving distribution function. We developed to new roulette schemes because the traditional Russian Roulette (hammersley2013MCmethods, p. 99) leads to fluctuations that accumulate during the simulation. The new roulette methods possess two important properties:

• Approximate conservation of mass for the correlated roulette, and of all moments of for the deterministic roulette.

• Nonexistence of a bias: The mean result of simulations is the same as without reweighting. This is proven for a special case in A and shown numerically elsewhere schuster2019MscThesis.

The schemes are accompanied by various methods to improve accuracy (Section 2.4) and that simplify use and implementation of the schemes, for example an algorithm for efficient and easy initialization of markers, and a manual for choosing parameters (schuster2019MscThesis, 4.4).

We verified the new schemes with simple models that allow for an analytical solution, they are capable of reducing the variance due to Monte Carlo noise by orders of magnitude. This is also the case with realistic applications, as in the case of fast-ion tails generate with ICRF heating in fusion plasmas. Their implementation in existing codes is relatively straight forward, since the schemes do not alter the equations of motion of the markers. Finally, the reweighting schemes can be parallelized and require minimal communication between processes.

## Acknowledgments

We would like to thank Jakob Ameres and Roman Hatzky for fruitful discussions. This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014-2018 and 2019-2020 under grant agreement No 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission.

## Appendix A Unaltered Mean Result of Simulation

In this section we show for a special case that the deterministic roulette does not alter the mean result of the simulation. The case that we consider is restricted as follows:

• There is only one boundary.

• There is no hysteresis.

• The weight in the high weight region is twice the weight in the low weight region.

### a.1 Problem Description and Definitions

Markers are at discrete positions in a 1D domain, and advance on discrete time steps. The change from continuous to discrete space and time is no restriction as the numerical schemes are used on computers, which require discretization anyways. The positions with index are the high weight region, the positions with are the low weight region with half the weight. We will call the marker that originated in the high weight region ’blue’, and the markers that are created by the blue marker ’green’.

We only consider a single blue marker. This is however no restriction a one blue markers, and the green markers it creates, is completely independent and therefore uncorrelated from any other blue or green markers.

The green and blue markers travel from a position to other positions with probabilities and , respectively. As there is no restriction to and , the marker position can be interpreted as a label for grid points in a multidimensional domain. This proof therefore automatically also holds for multidimensional problems. When a green marker jumps to the high weight region it is deleted. When the blue marker jumps from the high weight region to the low weight region it creates a green marker at the target position. When a green marker has been deleted, we denote its state as .

The set of states the markers can be in is given by

 (13)

i.e. there are green markers, and every marker has a position. The number of elements in is . The probability space for these states is given by 555Since we are only interested in finite times, only a finite number of green markers can be generated. Therefore a probability distribution on can be seen as an element of the space of finite sequences, indexed by .

 C00(X)=span(|b,→g⟩,x∈X). (14)

A (properly normalized) state is called the stochastic vector, and can be interpreted as probability distribution on : If we are certain what the state of the system is. If the system could be in different states with some probabilities. The probabilities are the coefficients for the basis vectors of . The transition probability between states is given by

 m(x→y)=P(state=y at time t+1|state=x at time t), (15)

which is the

element of the corresponding stochastic matrix

. A state at time can be obtained from the state at time :

 |Ψ⟩t+1=M|Ψ⟩t (16)

Our main quantity of interest will be the number of green markers at the position :

 (17)

is the number of entries in , which is the number of green markers that have been created. Similarly, the number of blue markers at the position is given by

 Hp(|b,→g⟩)=δb,p. (18)

We expand the definition sets of and by linearity to the space . This expansion by linearity is equivalent to calculating the mean density of markers as position :

 Fp(|Ψ⟩)=Fp(∑ipi|b,→g⟩)=∑ipiFp(|bi,→gi⟩)=⟨Fp⟩{pi}, (19)

the same holds for .

#### a.1.1 Goal

The reconstructed probability density of the stochastic process is calculated by binning the marker positions, and multiplying the number of markers in each bin with the weight of the markers. The mean density is then

 ⟨ρ⟩=wg⟨Fp⟩+wb⟨Hp⟩, (20)

where and are the weights of the green and blue markers. In the high weight region is 0, and . The same is true if there is no reweighting. We then have

 ⟨ρ⟩=⟨Hp⟩. (21)

In the low weight region we have ,

 ⟨ρ⟩=12(⟨Fp⟩+⟨Hp⟩). (22)

The proof is finished when we find that is the same with and without reweighting. In other words,

 12(⟨Fp⟩+⟨Hp⟩)=⟨ρ⟩=⟨Hp⟩, (23)

hence

 ⟨Fp⟩t=Fp(|Ψ⟩)t=Hp(|Ψ⟩)t=⟨Hp⟩t (24)

in the low weight region () for all times .

#### a.1.2 Outline

In A.2 we first show that at time can be calculated with and at time , without requiring itself. After this we only have to use induction to finish the proof of preposition (24). This is done in A.3.

### a.2 Evolution of the Densities

 Fp(M|Ψ⟩) =∑ipiFp(M|bi,→gi⟩) (25) =∑ipiFp⎛⎜ ⎜⎝∑(b′,→g′)∈Xm((bi,→gi)→(b′,→g′))|b′,→g′⟩⎞⎟ ⎟⎠ (26) =∑ipi∑(b′,→g′)∈Xm((bi,→gi)→(b′,→g′))Fp(|b′,→g′⟩) (27) =∑ipi∞∑b′=−∞∞∑n=0∑(→g′)1,…,(→g′)n∈N0∪Dm((bi,→gi)→(b′,→g′))Fp(|b′,→g′⟩). (28)
666Note that the sums are actually finite since maps from finite sequences to finite sequences (only up to one marker can be created in each time step).

We will now decompose further. For this we distinguish three cases:

All markers in state exist already in state :

All markers must move to the correct position, and no new marker is created.

In state exists one additional green marker:

All markers must move to the correct position, and the blue marker has to create the additional green marker. This additional green marker has to be at the correct position.

Any other initial state :

The probability to advance to state is 0.

We can express this through the probabilities of the blue or green markers to move from position to : and .

 m((bi,→gi)→(b′,→g′))=B(bi,b′)G((→gi)1,(→g′)1)…G((→gi)l(→gi),(→g′)l(→gi))⎧⎪⎨⎪⎩δl(→gi)+1,l(→g′)δ→g′l(→g′),b′if bi<0 and b′≥0δl(→gi),l(→g′)% else (29) =B(bi,b′)G((→gi)1,→g′1)…G((→gi)l(→gi),→g′l(→gi))\vbox%\scalebox.5$∙$⎡⎢ ⎢ ⎢ ⎢⎣δl(→gi),l(→g′)I+⎧⎪ ⎪⎨⎪ ⎪⎩δl(→gi)+1,l(→g′)δ(→g′)n,b′II−δl(→gi),l(→g′)IIIif bi<0 and b′≥00else ⎤⎥ ⎥ ⎥ ⎥⎦ (30)

Now we can insert equation (30) and the sum representation of equation (17) into equation (28). Then equation (28) will decompose into three terms , and corresponding to the terms I, II and III in equation (30)

 Fp(M|Ψ⟩)=TI+TII−TIII. (31)

We will treat the three terms , and separately. We start with :

 TI=∑ipi∞∑b′=−∞∞∑n=0∑(g′)1,…,(g′)n∈N0∪DB(bi,b′)G((gi)1,g′1)…G((gi)l(→gi),g′l(→gi))δl(→gi),l(→g′)l(→g′)∑k=1δg′k,p. (32)

We can immediately evaluate the sums over and . The probability of the blue marker moving anywhere is 1. Note that for the term I since there is no new marker generated. Because , will be 1 only for , what is used to resolve the sum over .

 TI =∑ipi∑(→g′)1,…,(→g′)l(→gi)∈N0∪DG((→gi)1,(→g′)1)…G((gi)l(→gi),(→g′)l(→gi))l(→g′)∑k=1δ(→g′)k,p (33) (34) =∑ipil(→gi)∑k=1∑(→g′)1G((→gi)1,(→g′)1)…∑(→g′)k−1G((→gi)k−1,(→g′)k−1)G((→gi)k,p)∑(→g′)k+1G((→gi)k+1,→g′k+1)…∑(→g′)l(→gi)G((→gi)l(→g′),(→g′)l(→gi)). (35)

The sums over the positions of the markers that are not the -th marker all yield 1: The probability for the marker to either survive or be moved to the ’parking position’ is 1.

 TI =∑ipil(→gi)∑k=1G((gi)k,p). (36)

We insert unity, . It is not necessary to include because a marker that was deleted and is therefore at can never leave again. We recognize the definition of :

 TI =∑ipil(→gi)∑k=1∞∑q=0δ(gi)k,qG(q,p)=∑ipi∞∑q=0G(q,p)l(→gi)∑k=1δ(gi)k,q=∑ipi∞∑q=0G(q,p)Fq(|bi,→gi⟩)=∞∑q=0G(q,p)Fq(|Ψ⟩), (37)

where we used the linearity of in the last step. We can immediately interpret this result: if there was no blue marker, the term I would be the only term. Then the mean green marker density only depends on the mean marker densities of the green markers in the previous time step because the green markers evolve independently.

We will now perform the same calculations with terms II and III. To satisfy the case distinction we restrict the sum over , and use the Iverson bracket, with being a logical expression:

 [P]={1if P is true0else (38)
 TII=∑ipi∞∑b′=0[bi<0]∞∑n=0∑(→g′)1,…,(g′)n∈N0∪DB(bi,b′)G((→gi)1,(→g′)1)…G((→gi)l(→gi),(→g′)l(→gi))δl(→g′),l(→gi)+1δ(→g′)l(→g′),b′l(→g′)∑k=1δ(→g′)k,p. (39)

Other than for we have due to the marker we create, and we cannot resolve the sum.

 TII =∑ipi∞∑b′=0[bi<0]∑(→g′)1,…,(→g′)l(→gi)+1B(bi,b′)G((→gi)1,(→g′)1)…G((→gi)l(→gi),(→g′)l(→gi))δ(→g)′l(→gi)+1,b′l(→g′)∑k=1δ(→g′)k,p (40) =∑ipi∞∑b′=0[bi<0]∑(→g′)1,…,(→g′)l(→gi)B(bi,b′)G((→gi)1,(→g′)1)…G((→gi)l(→gi),(→g′)l(→gi))∑(→g′)l(→gi)+1δ(→g)′l(→gi)+1,b′⎡⎢ ⎢⎣δ(→g′)l(→gi)+1,p+l(→g′)−1∑k=1δ(→g′)k,p⎤⎥ ⎥⎦ (41) =∑ipi∞∑b′=0[bi<0]∑(→g′)1,…,(→g′)l(→gi)B(bi,b′)G((→gi)1,(→g′)1)…G((→gi)l(→gi),(→g′)l(→gi))⎡⎣δb′,p+l(→gi)∑k=1δ(→g′)k,p⎤⎦ (42)

For we get

 TIII =∑ipi∞∑b′=0[bi<0]∞∑n=0∑(→g′)1,…,(→g′)l(→g′)∈N0∪DB(bi,b′)G((→gi)1,(→g′)1)…G((→gi)l(→gi),(→g′)l(→gi))δl(→g′),l(→gi)l(→g′)∑k=1δ(→g′)k,p (43) =∑ipi∞∑b′=0[bi<0]∑(→g′)1,…,(→g′)l(→gi)B(bi,b′)G((→gi)1,(→g′)1)…G((→gi)l(→gi),(→g′)l(→gi))l(→gi)∑k=1δ(→g′)k,p. (44)

We now subtract terms and . The rightmost sum is identical for both terms an cancels, only the additional from term survives.

 TII−TIII (45) =∑ipi∞∑b′=0[bi<0]B(bi,b′)δb′,p (46) =∑ipi[bi<0]B(bi,p)[p≥0]. (47)

Analogously to what we did for term , we insert unity to recover :

 (???) =∑ipi∞∑q=−∞δbi,qneg(q)B(q,p)[p≥0] (48) =∑ipi−1∑q=−∞Hq(|bi,→gi⟩)B(q,p)[p≥0] (49) =[p≥0]−1∑q=−∞B(q,p)Hq(|Ψ⟩). (50)

In the last step we used the linearity of . We can also interpret this result: The blue marker that enters the low weight region acts as a source for the green markers.

Inserting (37) and (47) into equation (31) we finally obtain