1 Introduction
Monte Carlo (MC) integration is the defacto standard for physically based image synthesis, stemming mainly from its broad applicability [CJ16] and solid mathematical foundations [KW08]. On the downside, the stochastic nature of MC methods causes the resulting solution to contain a certain amount of error that fully vanishes only in the theoretical limit case of infinitely many evaluated samples.
This error is the residual variance of the used MC estimator, often referred to as “Monte Carlo noise”. This nomenclature is somewhat unfortunate, since “noise”—in the signal processing context—typically refers to an undesired distortion of a signal caused by instruments or otherwise. In contrast, MC variance is an inherent phenomenon, inseparable from, and specific to any MC estimator.
The above distinction goes beyond mere linguistic nitpicking. Numerous existing approaches [ZJL15] approach the task of MC variance reduction through the imageprocessing lens, that is, as an image denoising problem. While that can certainly produce satisfactory results due to the similarities of the two problems, the inherited assumptions are often not applicable in MC image synthesis (as partially pointed out by Boughida and Malik Boughida2017, for instance). As a practical consequence of this mismatch, it is difficult to predict such methods’ performance upfront, necessitating trialanderror application and handtuning of parameters.
Consequently, the core mission of this paper is to critically examine the problem of variance reduction in Monte Carlo image synthesis. We build on the following paradigm (already adopted in this context by several methods [ZHD18, BB17, DWR10, DMB14]): every image sample produced by an MC estimator carries relevant information about the underlying distribution from which it is drawn. These distributions—specific to every point in the image—will be denoted as characteristic distributions. Our proposition is to identify the characteristic distributions in the context of MC image synthesis, and use that knowledge to produce better estimates for every image pixel individually, rather than simply averaging the samples generated for that pixel. In that sense, our method belongs to the aposteriori category. This is in contrast to apriori methods which attempt to derive a variance reduction strategy from theoretical principles; although we do ground our reasoning in theory as well, whenever possible.
Towards the above goal, we undertake three main steps.

We first discuss the problem of variance reduction in MC image synthesis, pointing out the key sources of variance typical for this domain; then review the works tackling the problem (Section 2). Afterwards we critically examine the assumptions commonly used in MC denoising methods (Section 4) and propose how to alleviate them.

Next, we present a qualitative study of characteristic sample distributions in several scenes and their configurations (Section 5). This is our first contribution, intended to identify patterns occurring in these distributions.

Finally, we describe a prototype framework for learning these characteristic distributions (Section 6). The learned statistical description of the distributions then serves to provide better pixel estimates from MC samples collected during rendering. This framework, agnostic to the internal workings of the employed integrator, is our second contribution.
The benefit of the proposed method is that it is complementary to the existing imagespace methods: our intent is to provide improved estimates for each image pixel, which can subsequently serve as an input to further processing. In contrast to many existing denoisers, we do not (explicitly or implicitly) assume any specific form or properties of the underlying sample distribution. The only assumption embedded in our method is that the characteristic distributions do contain identifiable patterns, which can be learned by a sufficiently expressive statistical model (based on neural nets, in our case).
2 Background
This section first defines the problem of Monte Carlo variance reduction and provides an overview of works that address it.
Realistic image synthesis.
The central problem of realistic image synthesis is governed by the rendering equation [Kaj86], which, for a given position and direction , evaluates the (monochromatic) incident radiance :
(1) 
Here, is the point on the nearest surface visible from in the direction ; and are the emitted radiance and the cosineweighted bidirectional reflectance distribution function (BRDF) at that point, respectively. The key to solving Equation 1 is the hemispherical integral on the right, which again requires solving for incident radiance at , leading to the well known selfreferential nature of the rendering equation which makes its solution so challenging.
Rendering an image for a given scene entails gathering the radiant energy passing through every pixel . Since each pixel corresponds to a small solid angle , this yields the irradiance of that pixel (for a fixed camera position ):
(2) 
The question this paper is concerned with is: assuming a particular Monte Carlo solver, what is the ‘best’ way to obtain the value of for every pixel in the rendered image? For this, we will employ the standard MC path tracing with next event estimation [PJH16]; in general, we will limit our analysis to unbiased MC solvers to only focus on variance analysis and reduction.
Monte Carlo estimation.
Being a stochastic method, path tracing generates samples for each pixel (for simplicity we will assume the same across the entire image, although our analysis does not depend on this assumption). These samples are paths originating at the camera position and uniformly sampled directions , probabilistically traced through the scene. Together they form an estimator for Equation 2
(3) 
where
is the probability of that path; it is the joint probability of all stochastic decisions made by the path tracer. Therefore, the estimate for
is composed of evaluations of the estimator for Equation 1. Since we assumeto be an unbiased estimator, also the estimator for the pixel value is unbiased, that is,
.The variance (the error) of the estimator for a particular depends on two factors: the variance of the individual estimates , and the variation of itself across the region of the pixel
. If, for a moment, we assume the latter to be constant, then
(4) 
where the final inequality results from the fact that sample covariance is zero for samples independently distributed across the pixel (as is the case in this analysis), or negative if correlated (lowdiscrepancy) sampling is used. Dropping the temporary assumption of constant within the pixel , the actual variance of is bound to be slightly higher than Equation 4 indicates (and without caching the estimates to adaptively refine the pixelspace sampling, we cannot do much better than sample it uniformly, since no prior estimate on the distribution of interpixel is available to us).
Reducing estimator variance.
Since the variance of is the main efficiency bottleneck of MC rendering, diverse strategies exist to reduce it. For instance, according to the zerovariance sampling theory [Hoo08, KW08], generating the paths exactly according to the energy distribution in the scene (i. e., , cf. Equation 3) would result in an optimal estimator . On the other hand, any deviations from the optimal sampling probabilities for Equation 1 result in accumulation of variance—including:

using loose approximations for sampling the BRDF ,

omitting the radiance term when sampling the integrand,

incorrect path termination strategies, and

visibility fluctuation during next event estimation.
Numerous works in realistic image synthesis therefore strive to approach the theoretical zerovariance limit by developing advanced importancesampling strategies, for instance using the path guiding approach [VKv14, MGN17, GBBE18]. Given our aposteriori samplebased approach, we refer the reader to established literature [PJH16] for further information regarding this class of methods.
Image filtering.
The tradition of imagespace (lateral) filtering to denoise images has a long history in image processing. Perhaps the most influential signalpreserving method—bilateral filtering [TM98]—has been extended through nonlocal neighborhoods [BCM05], cross / joint filtering [XP05], to comprehensive multifaceted methods [RMZ13], to provide a few examples.
The sophistication of the recent methods nevertheless means that the simplicity of the original bilateral filter is lost, and stricter assumptions are needed (cf. Section 3); assumptions that have the potential to fail. The inherent noisiness of the feature descriptors, and their unknown correlation with radiance distribution, are just some examples of the necessary guesswork. It is interesting that many of these methods can be cast as variants of statistical regression [BRM16]. Reconstruction methods like that can be powerful, as well as modified for efficiency [SKW17]—ultimately though, they still need to assume a certain smoothness of the reconstructed radiance distribution, which is violated by visibility and other forms of discontinuities extremely common in rendering.
Samplebased methods.
A very insightful progress was made by Delbracio et al. Delbracio2014: for lateral filtering to be sensitive to the properties of the reconstructed signal, we need to take into account statistical similarity between sample distributions of the involved image regions. Prior to that, a densitybased classification was proposed for outlier removal [DWR10], but only on the binary accept / reject basis. As these methods have a significant memory bottleneck, an alternative statistical treatment has been developed by Boughida and Boubekeur Boughida2017, however relying heavily on the suboptimal Gaussian representation (see more detailed discussion in Section 4); the same shortcoming has the recent robuststatistical estimation of Jung [JMD15].
Their conclusions however hold: the problematic longtailedness of the sample distributions calls for a robust reweighing approach. This conclusion has been recently confirmed by Zirr et al. Zirr2018 for outlier compensation and likewise our paper—in fact, it is the ideas of Jung, Zirr and colleagues that we take forward and develop from a different perspective. In the following section, we argue for a simple method that combines the advantages of the bottomup statistical approaches and their datadriven counterparts.
Learningbased filtering.
Complex filtering operators can be learned from data—even if accurate references are not feasible [LMH18]. This approach has been used for both indirect [KBS15] and direct [BVM17, VRM18] description of lateral filters. In addition to the offline operators above, a lowsample temporally stable method has been demonstrated by Chaitanya et al. Chaitanya2017. Our aim is to combine such datadriven modeling approaches with a principled analysis of sample distributions and—as we explain in Section 3—reduce the variance of pixel estimators individually.
Ultimately, the amount of variancereduction literature is particularly extensive, and we cannot hope to provide a complete overview. Interested readers are encouraged the seek the detailed review by Zwicker et al. Zwicker2015 and the works of Bitterli et al. Bitterli2016 and Vogels et al. Vogels2018 for reviews of more recent papers.
3 Overview of Methodology
Our exposition has two parts. We first review review the assumptions commonly made to tackle the MC variance reduction problem (Section 4). Finding them too optimistic, we conduct a study to empirically determine the properties of the characteristic sample distributions in different rendering configurations (Section 5). These constitute empirical priors for the next step.
The information contained in each sample is viewed through the perspective of the scenespecific priors to produce better pixel estimates than simply averaging the available samples. In the second part (Section 6) we present a learningbased approach to facilitate this, and show that it is indeed possible to significantly improve the pixel estimates on such grounds.
4 Properties of Sample Distributions
We aim to gain a better understanding of the characteristic perpixel sample distributions in MC image synthesis, and set reasonable expectations about the patterns we might encounter in them. To this end, we start with a highlevel discussion of the assumptions most commonly adopted by existing variancereduction methods, and continue with the detailed study in Section 5.
Spatial uniformity.
One of the two most common assumptions is that the estimator error can be modeled by a spatially invariant additive ‘noise’ , i. e., . While suitable in image processing to model, e. g., thermal sensor noise, it is an unjustified assumption in MC image synthesis.
First, as a straightforward consequence of Equation 4, the estimator variance is a function of the incoming radiance and its estimator. As such, it will differ not only for each image pixel (for ), but even vary within the pixel itself (for ). This has also been pointed out by Boughida and Boubekeur Boughida2017 and Kalantari and Sen Kalantari2013, among others.
Second, as a consequence of the previous property and the fact that light transport is a relative process (implying that the estimator variance scales with the value of the estimated quantity itself), modeling variance with an additive term is problematic in case of , and by transition (although to a lesser extent), for as well. This is also supported by empirical data (cf. Section 5), mainly due to the characteristic skew and longtailedness of characteristic sample distributions (see Figure 1).
Gaussian distribution.
While the perpixel characteristic distributions are known to be complex [DWR10, DMB14, BB17], their means—the estimator values
—are very frequently assumed to be normally distributed. This seems like a safe bet following from the central limit theorem (CLT), but is seldom verified.
CLT holds for random variables which are
independent and identically distributed with finite variance. However, the aggregated random variables are certainly not identically distributed (see previous point) and might not be independent (when correlated sampling is used). Furthermore, in spite of the theoretically backed convergence rates (Equation 4), no bounds on the individual sample variances are available; in fact, under certain circumstances involving source singularities, variance might be provably infinite [GKH13]. The existence of strong outliers demonstrated by Zirr et al. Zirr2018, as well as here (Figure 1), is an additional empirical evidence that not only individual samples but even their averages can follow longtailed distributions which do not have the well mannered behavior assumed by Gaussian statistics.Correlation with ‘scene features’.
A common variancereduction approach is the use of feature buffers—auxiliary information like surface positions, normals and albedos—to guide the filtering process (Section 2
). Again adopted from image processing, this approach is purely heuristical in the context of MC image synthesis: it carries the hidden assumption that variance be correlated with these features, which is trivially not true for numerous physical phenomena (shadows, caustics, transparent objects, volumetric media). Some of these pathological cases have been demonstrated by Boughida and Boubekeur Boughida2017.
Again, this does not mean that a variancereduction approach based on feature buffers cannot work; we simply argue that by relying on feature buffers, an unknown amount of variance is either left untreated, or actual image features get filtered out. In general, the complexity of the path space is virtually unbounded (due to its recursive nature), and thus it seems unlikely that any reasonable number of handpicked features can be found to sufficiently describe that complexity.
Regularity of the radiance distribution.
Finally, even if no assumptions are made about the variance distribution, methods based either on lateral filtering or statistical regression assume some degree of smoothness in the radiance distribution. This assumption can hold true for large portions of the image, but is violated by visibility and illumination discontinuities—an issue that becomes more abundant as the complexity of the scene in question increases.
Concluding this discussion, we have argued that many of the assumptions frequently adopted for solving the MC variance reduction problem are fragile. We therefore propose to take a step back and—to complement the existing theoretical analysis [JMD15, BB17, ZHD18]—examine the characteristic distributions empirically.
5 Case Study
Next, we present a detailed study of characteristic perpixel radiance distributions. We are primarily looking for recurring patterns in the distributions, as well as any other empirical properties that might guide the variance reduction model (Equation 5).
Instrumentation.
To facilitate the study, we developed a specialized webbased tool (which we plan to release with as a part of the publication) which allows for interaction with a specified rendered image, providing ondemand visualization and diagnostics of the characteristic distributions of the selected pixel, both in the linear and log10 domains. These data were extracted during rendering using 2k samples per pixel (SPP) of the nexteventestimation path tracer in PBRT3 [PJH16], and have the form of perpixel RGB histograms with 100 bins for both the linear and log variants.
To overcome the challenge of setting the pixelspecific histogram support (cf. Delbracio et al. Delbracio2014), we take the interval spanned by the first 50 collected samples and pad it by 25 % on both ends; for the log histogram we opted for the fixed
absolute support to complement the relative information in their linear counterparts. We count the null samples from occluded paths separately and do not include them in the visualized histograms, as they would cause discontinuous spikes without contributing much insight into the distributions’ shapes.Procedure.
We study the characteristic distributions (CHD) corresponding to different objects, configurations and phenomena characteristic to every scene. We examined each scene for interesting patterns, which we then present and discuss in the study but only if they are sufficiently abundant (i. e., outlying distributions are not considered). In the following subsections we study four main scenes selected from the standard distribution of PBRT3, depicted in Figure 2.
5.1 Bathroom Scene (Figure 2a)
We start with this simple interior scene illuminated by a single closed window; most of the illumination is therefore indirect. There is a good variation of diffuse and glossy materials. The transport in this scene is moderately difficult for a path tracer—after 2k SPP we get a reasonably converged image with only a few fireflies.
A selection of interesting characteristic distributions is depicted in Figure 3. We will now discuss these cases individually.

Diffuse objects. CHD tend to be quite symmetric in the log domain, close to Gaussian in fact. In the linear (radiance) domain, CHD exhibit long tails, strong skew, with mode typically lower than mean—resembling the Gammatype distribution quite well.

Glossy objects. Shapes surprisingly similar to diffuse objects, although in this case the different RGB channels’ CHD are shifted w. r. t. each other (as a consequence of the underlying brownwood texture). Interestingly, their spread differs across color channels, suggesting that their width is a function of the incoming radiance itself (as proposed in Section 4).

Noisy reflections. All of the pixels in the reflection, in spite of their high estimate variance, share virtually the same CHD shape. This is in full agreement with the assumptions of Delbracio et al. Delbracio2014 regarding viable pixel similarity measures.

Direct illumination. Very well resolved, lownoise CHD—in spite of their very significant spread and long tail.

Indirect illumination.
Noisy CHD with very long tails (high kurtosis). High number of null samples (around 60 %) and significant proportion of dark samples, showing inverted (negative) skew in the log domain (similar to the noisy reclection case).

Illumination gradients. A visibly bimodal CHD marking the transition from lit to shadowed regions. The relative proportions of the modes correspond to the strength of the penumbra.
5.2 WhiteRoom Scene (Figure 2b)
A moderately complex scene illuminated by three windows, sourcing a lot direct illumination but also having a high proportion of indirect illumination due to the highly reflective walls. The illumination has a red color cast, both due to the outside source as well as the darkred carpet. It is predominantly populated with diffuse materials. This scene is easily handled by a path tracer, due to the large open windows and low amount of geometric occlusion; this leads to low numbers of null samples as well.
Figure 4 shows the characteristic distributions.

Direct illumination. White surfaces illuminated both directly and indirectly. This yields complex multimodal CHD, with an added color cast for the aforementioned reasons. Very long tails are present, which is a common trait across the whole scene.

Diffuse objects. Similar as the previous case, but with higher proportion of indirect illumination. This causes less spiky CHD in log domain and significantly longer tails in the primal domain. Significant noise is present as well, indicating more complex path structure in this region.

Indirect illumination. In spite of the fact that these objects are spread across large space, their CHD look strikingly alike, as they uniformly reflect the CHD across the room. We again observe very long tails in the CHD, leading to somewhat higher variance in these areas compared to their directly illuminated counterparts.

Colorful objects. These CHD follow the same patterns as already observed, but exhibit strong color cast due to the underlying diffuse texture. Note also, that the color cast shifts the distributions in the log domain, it basically does not change their shape—this makes the case that the material albedo has no correlation with variance and therefore makes for a poor feature descriptor for denoising purposes.

The Fireplace. This fireplace appears boring, but after closer examination in the log domain displays a wild CHD, which, suspiciously, looks like fire. We are currently trying to interpret this behavior.
5.3 NightRoom Scene (Figure 2c)
The same scene as in Section 5.2, but illuminated by two interior lights instead. Although mostly indirectly illuminated, the lamps’ translucent shades act as secondary ‘virtual’ sources of direct light which then shows in the CHD. This is a moderately difficult scene for a path tracer, due to the small and enclosed sources, leading to increased variance especcialy in the darker regions—consequently the proportion of null samples is high, often exceeding 50 %.
Figure 5 shows the characteristic distributions.

Direct illumination. These areas have bimodal CHD composed of strong direct peaks (with a color shoft corresponding to the nearby illuminant) and faint indirect sample groups again with long tails. Further from the sources, the proportion of indirect samples grows, as expected.

Direct shadows. In the linear domain, the CHD are very similar to the previous case, but missing the direct components. The log domain shows stronger skew towards the dark end, however, due to the higher complexity of the path structure. Consequently, we observe higher estimator variance in these regions.

Multibounce transport. The CHD contain complex multimodal patterns, but their shape is remarkably stable across channels, albeit with a color cast caused by the combination of reddish illuminants and green diffuse color of the objects themselves.

Virtual sources. These are translucent lamp shades, which retain much of the direct component (compare to the first, directly illuminated case), but in addition contain a very long faint tail caused by the multiscattered diffuse transmission. In spite of this, the variance in these areas is low, as the path tracer eventually does find the source inside for a sufficient proportion of the paths.
5.4 MeasureOne Scene (Figure 2d)
A geometrically complex scene with neutrally colored mixture materials. Illuminated mostly indirectly with area sources ranging from large background white panels down to small yellow sources. Interestingly, even though this scene is geometrically very symmetric, the CHD are not in all cases, since the illumination differs for each half somewhat. Though being a moderately difficult scene for a path tracer (with low numbers of null samples, typically under 20 %), the small yellow sources are difficult to find, creating patches of high variance around the image.
Due to the geometric complexity of this scene, we primarily use it for a special purpose: to study CHD with respect to statistical outliers (‘fireflies’) and with respect to different order of light transport (i. e., how many bounces of indirect illumination are simulated by the path tracer).
Figure 6 shows the characteristic distributions.

Indirect illumination. The complexity of this scene is reflected in its CHD as well. In this example, the linear histogram has such a wide support that most of its mass vanishes in a single spike; the log version however reveals a complicated multimodal structure (although only a few modes are dominant, leaving hope that such shapes can still be learned).

Fireflies. Here we compare two pixels: a bright firefly (RGB irradiance ) and its immediate neighbor on top (RGB irradiance ). In spite of this stark difference in value, their distributions are almost identical (which is better visible in the log histograms, as the linear variants differ in support). This is a behavior that we observe universally in all scenes, supporting the thesis of Delbracio et al. Delbracio2014 and justifying the reweighing strategies of Jung et al. Jung2015 and Zirr et al. Zirr2018. One of the aims of our model is to learn such reweighing as well, but from data alone.

Different transport path lengths. Here we compare the scene under identical configurations, with only one change: using two and five bounces of indirect transport respectively. As expected, that influences the amount of transported energy, making the order2 image visibly darker than the order5 one.
The additional insight of our study is that the different path lengths also influence the shape of CHD: in accordance with the previous finding that higher orders of indirect illumination result in more complex path structures, indeed the resulting CHD for the order5 images are more spread, with wider support and longer, noisy tails. The natural result is that the order5 image ends up having higher variance, despite taking roughly twice longer to render. This finding agrees with theory (such as Equation 4), and points to the importance of discussing the complexity of the simulated transport in any MC variance reduction work.
5.5 Findings of the Study
We finish the study by drawing several general conclusions about the observed distributions.
Most importantly, recurring patterns do exist in characteristic sample distributions; despite being complex, CHD are certainly not random. Moreover, CHD are not even discontinuous between neighboring pixels—if a radiance discontinuity lies in a given pixel, its CHD will end up a blend between the two regions of the function separated by that discontinuity (as indirectly implied in the work of Delbracio et al. Delbracio2014).
The most frequent elementary distribution we have encountered across the board has a Gammalike shape: a rapid onset with a long, slowly diminishing tail. In the log domain, that shape appears more balanced and in fact not unlike a Gaussian. As already hinted, this suggests a multiplicative nature of relations between samples. In agreement with previous work [DWR10, JMD15, ZHD18], the long tails are the main culprit behind statistical fireflies and the generally difficult variance distributions so common in rendering.
The complexity of CHD is tightly linked with the scene complexity. In more complex scenes, we observe multimodal CHD, where each mode intuitively corresponds to a certain salient region of the path space. The fact that light transport is a global problem with recursive structure means that the same patterns tend to occur all around the scene, repeatedly warped according to the respective geometric relations. For that very reason, the path length influences the CHD complexity as well, with longer paths leading to distributions with larger support, longer tails and higher skew—and in result, higher residual variance.
The second most important finding, and the one with direct impact on the proposed variancereduction model, is that the shapes of the characteristic distributions are scenespecific
: they depend on the scene configuration holistically, rather than on a few key components or objects. This suggests a clear design path: a variance reduction model aiming to take advantage of the CHD properties needs to be trained for each scene specifically (or even better, be able to classify the scene in a topdown manner and be conditioned by that classification). In the following section we present a prototype of such a model, based on fully connected neural nets.
6 Variance Reduction Model
The approach we propose—complementary to the existing lateral denoising approaches (Section 2)—is to improve the estimates for (Equation 2) by training an aposteriori model defined by a set of parameters , according to the following criterion:
(5) 
where
is a suitable loss function, in our case a squared error between the model’s response and the groundtruth irradiance for each pixel. The assumption embedded in this strategy is that the perpixel sample sets
generated by laws of light transport and the MC process that simulates them (path tracing, in our case) contain systematic patterns that the model can learn and exploit—and indeed, our study in Section 5 has identified such patterns. A reasonable expectation from such a model is that, in the absence of any identifiable patterns (or when they are too complex), it should gracefully reduce to the baseline solution, i. e., the sample average as per Equation 3. In its core, this is a Bayesian approach: start with a prior knowledge about the involved distributions and incorporate new data to create a posterior estimate.Model implementation.
Our prototype model is a multilayer perceptron network with a width32 input layer (for the radiance samples
), three 32neuron densely connected layers, and a single output unit representing the estimate. In accordance with the findings of our study (Section
5) we trained a separate network for each tested scene.The training process works with the sample data obtained from rendering the ground truth images using 2k samples per pixel. For each training run, we uniformly selected 6400 pixels and divided their radiance samples into batches of 32 samples each; then separated out 10 % of the batches as validation data. Using 30k training iterations with the learning rate of 0.005, the training took 20 minutes in Tensorflow on NVIDIA RTX 2080 Ti GPU. The runtime of this model on an 800
800 RGB image with 332 SPP is under 1 minute in the unoptimized development mode.Results.
The ability of our model to improve upon the baseline estimates is visually demonstrated in Figure 7 for the Bathroom and MeasureOne scenes. These results are promising, as our model yields smoother images even without exchaning any information between neighboring pixels. In particular, a majority of the fireflies in the baseline is suppressed, a behavior similar to the results obtained by Zirr et al. Zirr2018. This has the benefit of regularizing the image to yield a more balanced variance distribution, which means that lateral denoising approaches would have an easier time using our model’s output as the starting point.
To evaluate our model numerically, we first examined the quality of the results during training; this is shown in Figure 8 for the two result scenes. We see a reasonably stable behavior for both scenes, although surprisingly the model does not improve markedly after the initial iteration batch. This indicates a space for future optimization of the training time, so that potentially the model could learn on the fly—that is, during the rendering process itself, to adapt to the current scene and rendering configuration.
The numerical counterpart to the evaluation in Figure 7 is shown in Figure 9. Here we compare the baseline results (sample averages) and our model to the groundtruth solution. Although the majority of the estimates is improved by our model, some exceptions remain. One possible way to address these would be to train the network to output a certainty of each individual estimate—this is feasible as we know the deviations of the network’s estimates from ground truth during training. The certainty could then serve as a mixing weight for the model’s output in combination with the baseline as fallback.
7 Discussion
Study extensions.
Our study identifies a number of properties regarding the characteristic sample distributions, as well as putting into question some of the commonly used assumptions about the MC variance reduction problem. The aim of this study was to gain informed understanding about the problem, as well as inciting an open discussion about these issues.
A future prospect in improving the study involves both its breadth and depth. More phenomena should be investigated—such as manylight configurations, caustics and participating media, to name a few—as well as more advanced integrators. In particular, we wish to take a closer look at path guiding methods, as these are becoming state of the art in production and have to potential to subsume the majority of pathbased methods. More advanced visualization strategies might also be explored to extract further insights about the rendering process.
Model improvements.
Based on the obtained results, the prototype implementation of our proposed model already shows promise. For the future we envision several directions in which we plan to improve upon it.
First, several quantitative improvements can be investigated. The training and estimation timings are currently obtained from an unoptimized development implementation; we believe speedups of at least an order of magnitude are feasible. Optimizing the network parameters would orthogonally improve the accuracy of the current estimates.
We also plan to investigate different network architectures, in particular recurrent NNs which have the potential of incorporating additional samples into the estimate as they are evaluated by the integrator; that would allow to run the model in a progressive rendering setting, with interactive refinements of the rendered image.
While naively extending the network to include neighboring samples would not be difficult, our aim was to show the possibility of reducing variance without this. A compromise solution could operate in the spirit of Delbracio et al. Delbracio2014 and Boughida et al. Boughida2017: the individual pixel estimators could share only statistical information about the characteristic distributions. Furthermore, a learningbased approach has the promise of incorporating highlevel information about structures probabilistically detected in the image, such as edges, illumination gradients and even textures.
Apriori versus aposteriori methods.
The presented approach tackles the problem of MC variance reduction in the aposteriori manner, trying to make sense of the collected estimator samples without making overly strong assumption about their distribution. Due to the fact that we have the full information about the rendering process, bottomup apriori approaches might seem favorable at first. However, since we cannot theoretically reason about some of the parameters of the rendering process (particularly the scene), these approaches share the general ‘chickenandegg’ problem of MC methods: the optimal solution to the problem involves knowing the solution upfront (another example is importance sampling).
In MC variance reduction, the problem of noisy pixel estimates is frequently deferred to the problem of noisy metaestimates of higherorder statistical quantities, or to idealized assumptions about their behavior. We therefore argue that a certain amount of ‘datadrivedness’ is necessary to break this loop and inform the denoiser about the properties of the rendered scene. In future, we hope for the bottomup methods (e. g., [JMD15, ZHD18]) to be unified with the learningbased approaches such as ours; a framework with the potential for this is arguably still yet to be found, but we believe that it should have Bayesian nature.
8 Conclusion
Our approach is simple enough to be explained in one sentence: we examine and learn the shapes of statistical sample distributions in Monte Carlo rendering, and apply that prior knowledge to produce improved estimates from finite sample sets.
The case study we conducted confirms the expectation that perpixel sample distributions typically have complex shapes that could hardly be represented by standard statistical distributions. At the same time, we have identified recurring patterns that can be exploited by a learningbased model, along with the key finding that the characteristic perpixel distributions holistically depend on the scene and the particular integrator used for the rendering.
The variance reduction model we present shows promising results, in spite of the fact that it has no information about the surrounding pixels. Our plan is to continue our investigation and develop a truly robust model that is scenesensitive, adapts to the particular rendering configuration, and runs progressively in lockstep with the rendering process.
References
 [BB17] Boughida M., Boubekeur T.: Bayesian collaborative denoising for Monte Carlo rendering. Computer Graphics Forum (Proc. EGSR 2017) 36, 4 (2017), 137–153.
 [BCM05] Buades A., Coll B., Morel J. M.: A review of image denoising algorithms, with a new one. SIAM Multiscale Model. Simul. 4, 2 (2005), 490–530.
 [BRM16] Bitterli B., Rousselle F., Moon B., IglesiasGuitián J. A., Adler D., Mitchell K., Jarosz W., Novák J.: Nonlinearly weighted firstorder regression for denoising Monte Carlo renderings. Computer Graphics Forum (Proc. EGSR) 35, 4 (2016), 107–117.
 [BVM17] Bako S., Vogels T., McWilliams B., Meyer M., Novák J., Harvill A., Sen P., DeRose T., Rousselle F.: Kernelpredicting convolutional networks for denoising Monte Carlo renderings. ACM Transactions on Graphics (Proc. of SIGGRAPH) 36, 4 (2017).
 [CJ16] Christensen P. H., Jarosz W.: The path to pathtraced movies. Found. Trends. Comput. Graph. Vis. 10, 2 (2016), 103–175.

[CKS17]
Chaitanya C. R. A., Kaplanyan A. S., Schied C., Salvi M., Lefohn A.,
Nowrouzezahrai D., Aila T.:
Interactive reconstruction of Monte Carlo image sequences using a recurrent denoising autoencoder.
ACM Trans. Graph. 36, 4 (2017), 98:1–98:12.  [DMB14] Delbracio M., Musé P., Buades A., Chauvier J., Phelps N., Morel J.M.: Boosting Monte Carlo rendering by ray histogram fusion. ACM Trans. Graph. 33 (2014), 8.
 [DWR10] DeCoro C., Weyrich T., Rusinkiewicz S.: Densitybased outlier rejection in Monte Carlo rendering. Computer Graphics Forum (Proc. Pacific Graphics) 29 (2010), 7.
 [GBBE18] Guo J., Bauszat P., Bikker J., Eisemann E.: Primary sample space path guiding. In Proc. of Eurographics Symposium on Rendering – EI&I (2018), pp. 73–82.
 [GKH13] Georgiev I., Křivánek J., Hachisuka T., Nowrouzezahrai D., Jarosz W.: Joint importance sampling of loworder volumetric scattering. ACM Transactions on Graphics 32 (2013), 6.
 [Hoo08] Hoogenboom J. E.: Zerovariance Monte Carlo schemes revisited. Nuclear Science and Engineering 160, 1 (2008), 1–22.
 [JMD15] Jung J. W., Meyer G., DeLong R.: Robust statistical pixel estimation. Computer Graphics Forum 34, 2 (2015), 585–596.
 [Kaj86] Kajiya J. T.: The rendering equation. SIGGRAPH Comput. Graph. 20 (1986), 143–50.

[KBS15]
Kalantari N. K., Bako S., Sen P.:
A machine learning approach for filtering Monte Carlo noise.
ACM Trans. Graph. 34, 4 (2015).  [KS13] Kalantari N. K., Sen P.: Removing the noise in Monte Carlo rendering with general image denoising algorithms. Computer Graphics Forum (Proc. Eurographics) 32 (2013).
 [KW08] Kalos M. H., Whitlock P. A.: Monte Carlo methods. Wiley, 2008.
 [LMH18] Lehtinen J., Munkberg J., Hasselgren J., Laine S., Karras T., Aittala M., Aila T.: Noise2Noise: learning image restoration without clean data. In Proc. of ICML (2018), pp. 2965–2974.
 [MGN17] Müller T., Gross M., Novák J.: Practical path guiding for efficient lighttransport simulation. Computer Graphics Forum 36, 4 (2017), 91–100.
 [PJH16] Pharr M., Jakob W., Humphreys G.: Physically based rendering: from theory to implementation, 3rd ed. Morgan Kaufmann, 2016.
 [RMZ13] Rousselle F., Manzi M., Zwicker M.: Robust denoising using feature and color information. Computer Graphics Forum (2013).
 [SKW17] Schied C., Kaplanyan A., Wyman C., Patney A., Chaitanya C. R. A., Burgess J., Liu S., Dachsbacher C., Lefohn A., Salvi M.: Spatiotemporal varianceguided filtering: realtime reconstruction for pathtraced global illumination. In Proc. of High Performance Graphics (2017), pp. 2:1–2:12.
 [TM98] Tomasi C., Manduchi R.: Bilateral filtering for gray and color images. In Proc. of ICCV (1998).
 [VKv14] Vorba J., Karlík O., Šik M., Ritschel T., Křivánek J.: Online learning of parametric mixture models for light transport simulation. ACM Trans. Graph. 33, 4 (2014), 101:1–101:11.
 [VRM18] Vogels T., Rousselle F., McWilliams B., Röthlin G., Harvill A., Adler D., Meyer M., Novák J.: Denoising with kernel prediction and asymmetric loss functions. ACM Transactions on Graphics (Proc. SIGGRAPH 2018) 37, 4 (2018), 124:1–124:15.
 [XP05] Xu R., Pattanaik S. N.: A novel Monte Carlo noise reduction operator. IEEE Computer Graphics and Applications 25, 2 (2005), 31–35.
 [ZHD18] Zirr T., Hanika J., Dachsbacher C.: Reweighting firefly samples for improved finitesample Monte Carlo estimates. Computer Graphics Forum (Proc. EGSR) 37, 6 (2018), 410–421.
 [ZJL15] Zwicker M., Jarosz W., Lehtinen J., Moon B., Ramamoorthi R., Rousselle F., Sen P., Soler C., Yoon S.E.: Recent advances in adaptive sampling and reconstruction for Monte Carlo rendering. Computer Graphics Forum (Proc. of EG State of the Art Reports) 34, 2 (2015), 667–681.
Comments
There are no comments yet.