Approximate Program Smoothing Using Mean-Variance Statistics, with Application to Procedural Shader Bandlimiting

06/05/2017 ∙ by Yuting Yang, et al. ∙ University of Virginia 0

This paper introduces a general method to approximate the convolution of an arbitrary program with a Gaussian kernel. This process has the effect of smoothing out a program. Our compiler framework models intermediate values in the program as random variables, by using mean and variance statistics. Our approach breaks the input program into parts and relates the statistics of the different parts, under the smoothing process. We give several approximations that can be used for the different parts of the program. These include the approximation of Dorn et al., a novel adaptive Gaussian approximation, Monte Carlo sampling, and compactly supported kernels. Our adaptive Gaussian approximation is accurate up to the second order in the standard deviation of the smoothing kernel, and mathematically smooth. We show how to construct a compiler that applies chosen approximations to given parts of the input program. Because each expression can have multiple approximation choices, we use a genetic search to automatically select the best approximations. We apply this framework to the problem of automatically bandlimiting procedural shader programs. We evaluate our method on a variety of complex shaders, including shaders with parallax mapping, animation, and spatially varying statistics. The resulting smoothed shader programs outperform previous approaches both numerically, and aesthetically, due to the smoothing properties of our approximations.



There are no comments yet.


page 1

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In many contexts, functions that have aliasing or noise could be viewed as undesirable. In this paper, we develop a general compiler-driven machinery to approximately smooth out arbitrary programs, and thereby suppress aliasing or noise. We then apply this machinery to bandlimit procedural shader programs. In order to motivate our approach concretely by an application, we first discuss how procedural shaders may be bandlimited, and then return to our smoothing compiler.

Procedural shaders are important in rendering systems, because they can be used to flexibly specify material appearance in virtual scenes [Akenine-Möller et al., 2008]. One visual error that can appear in procedural shaders is aliasing. Aliasing artifacts occur when the sampling rate is below the Nyquist limit [Crow, 1977]. There are two more conventional approaches used to reduce such aliasing: supersampling and prefiltering. We discuss these before discussing our smoothing compiler.

Supersampling increases the spatial sampling rate, so that the output value for each pixel is based on multiple samples. The sampling rate can be uniform across the image. The sampling rate can also be chosen adaptively based on measurements such as local contrast [Dippé and Wold, 1985; Mitchell, 1987; Hachisuka et al., 2008; Mitchell, 1991]. This approach in the limit recovers the ground truth image, but can be time-consuming due to requiring multiple samples per pixel.

Prefiltering typically stores precomputed integrals in mipmaps [Williams, 1983] or summed area tables [Crow, 1984]. This approach offers the benefit of exact solutions with a constant number of operations, provided that the shading function can be tiled or otherwise represented on a compact domain. However, in practice many interesting shaders do not tile, so this limits the applicability of this method. Further, prefiltering increases storage requirements and may replace inexpensive computations with more expensive memory accesses. This approach is not practical for functions of more than two or three variables because memory costs scale exponentially.

An alternative strategy is to construct a bandlimited variant of the shading function by symbolic integration. This can be expressed by convolving the shading function with a low-pass filter [Norton et al., 1982]. Exact analytic band-limited formulas are known for some specialized functions such as noise functions [Lagae et al., 2009]. In most cases, however, the shader developer must manually calculate the convolution integral. But frequently the integrals cannot be solved in closed form, which limits this strategy.

Our framework takes a different approach from most previous work. Our goal is to smooth out an arbitrary input function represented as a program, by approximately convolving it with a Gaussian filter. We take the program as input, break it into different parts, and relate the statistics of the different parts, under the desired smoothing process. Specifically, we treat each intermediate value in the computation as a random variable with a certain probability distribution. We use mean and variance statistics to model these random variables. In this manner, we can smooth out arbitrary programs that operate over floating-point numbers. Our approach can be applied to bandlimit shader programs, because we take as input an original shader that may have aliasing, and produce as output bandlimited approximations that have been convolved with the Gaussian kernel.

In our framework, we explore a number of approximation rules. We first improve the approximations of Dorn et al. [2015] (Section 4.3) and relate them to the mean and variance statistics in our framework. We then develop a novel adaptive Gaussian approximation (Section 4.4). For a class of programs that are real analytic, this approximation if used at all nodes in the computation results in a smoothed program that is accurate to the second power of the standard deviation. We next relate Monte Carlo sampling (Section 4.5) to our framework. This can give good approximations for non-analytic functions, because it converges for large numbers of samples to the ground truth. Finally, we discuss how compactly supported kernels (Section 4.6) can be used for parts of the computation that would otherwise be undefined or interact with infinite values.

Because each computation node can choose from a variety of approximation methods, the search space for the optimal approximations is combinatoric. We use genetic search to find the Pareto frontier of approximation choices that optimally trade off the running time and error of the program. A programmer can then simply select a program variant from the Pareto frontier according to the desired running time or error level.

To evaluate our framework, we developed a variety of complex shaders, including shaders with parallax mapping, animation, and spatially varying statistics, and compare the performance with Dorn et al. [2015] and commonly used multisample antialiasing (MSAA). Our framework gives a wider selection of band-limited programs with less error than Dorn et al. [2015]. Our shaders are frequently an order of magnitude faster than MSAA for comparable errors.

2. Related work

Mathematics and smoothing. Smoothing a function is beneficial in domains such as optimizing non-convex, or non-differentiable problems [Nesterov, 2005; Chen and Xiu, 1999; Chen and Chen, 1999]. For our purposes, smoothing can be understood as convolving a function with smooth kernels. When used in numerical optimization, this approach is sometimes known as the continuation method or mollification [Ermoliev et al., 1995; Ermoliev and Norkin, 1997; Wu, 1996]. The amount of smoothing can be controlled simply by changing the width of the kernel. In our framework, we model the input program by relating the statistics of each variable in the program, and apply a variety of approximations to smooth the program. Our idea of associating a range with each intermediate value of a program is conceptually similar to interval analysis [Moore, 1979]. Chaudhuri and Solar-Lezama [2011] developed a smoothing interpreter that uses intervals to reason about smoothed semantics of programs. The homogeneous heat equation with initial conditions given by a nonsmoothed function results in a smoothing process, via convolution with the Gaussian that is the Green’s function of the heat equation. Thus, connections can be made between convolution with a Gaussian and results for the heat equation, such as Łysik [2012].

Procedural shader antialiasing. The use of antialiasing to remove sampling artifacts is important and well studied in computer graphics. The most general and common approach is to numerically approach the band-limited signal using supersampling [Apodaca et al., 2000]. Stochastic sampling [Dippé and Wold, 1985; Crow, 1977] is one effective way to achieve this. The sampling rate can be effectively lowered if it is adaptively chosen according to the contrast of the pixel  [Dippé and Wold, 1985; Mitchell, 1987; Hachisuka et al., 2008; Mitchell, 1991]. In video rendering, samples from previous frames can also be reused for computation efficiency [Yang et al., 2009]. An alternative to sample-based antialiasing is to create a band-limited version of a procedural shader. This can be a difficult task because analytically integrating the function is often infeasible. There are several practical approaches [Ebert, 2003] that approximate the band-limited shader functions by sampling. This includes clamping the high-frequency components in the frequency domain [Norton et al., 1982], and producing lookup tables for static textures using mipmapping [Williams, 1983] and summed area tables [Crow, 1984]. Like our work, and unlike most other work in this area, Dorn et al. [2015] uses a compiler-driven technique to apply closed-form integral approximations to compute nodes of an arbitrary program. Unlike Dorn et al. [2015], our framework flexibly incorporates both mean and variance statistics, and we use several approximations that have higher accuracy. Our approach is general and can apply to arbitrary programs: we simply explore shaders as an example application.

Genetic algorithms

. Genetic algorithms and genetic programming (GP) are general machine learning strategies that use an evolutionary methodology to search for a set of programs that optimize some fitness criterion

[Koza, 1992]. In computer graphics, recent work by Sitthi-Amorn et al. [2011] describes a GP approach to the problem of automatic procedural shader simplification. Brady and colleagues [Brady et al., 2014] showed how to use GP to discover new analytic reflectance functions. We use a similar approach as [Sitthi-Amorn et al., 2011] to automatically generate the Pareto frontier of approximated smoothed functions.

3. Overview

This section gives an overview of our system. We first discuss the goal of our smoothing process. Next, we give an overview of the key assumptions and components in our framework.

The goal for our framework is to take an arbitrary program and produce a smoothed output program which closely approximates the convolution of the input program with a Gaussian kernel. This convolution could be multidimensional: for shader programs, the dimension is typically 2D for spatial coordinates. When producing such approximations, we would also like to keep high computation efficiency.

In our compiler-based framework, we assume the input program has a compute graph, where each node represents a floating-point computation, and the graph is a directed acyclic graph (DAG). We label each node in the DAG as a random variable and compute its mean and variance statistics. Note we use these random variables as a helpful conceptual device to determine statistics, but in most cases, we never actually sample from these random variables.111Except for Monte Carlo sampling (Section 4.5), which of course is sampled. We assume the input variables have specified mean and variance, and for simplicity assume they are independent. For example, in a shader, the input variables might be the pixel coordinate of the shader, and the output might be the color. We carry mean and variance computations forward through the compute graph, and collect the output by taking the mean value of the output variable.

We now give an brief conceptual example of how random variables can be used to collect statistics associated with values in a program. Suppose that a part of an input program contains a function application , where both and are intermediate values, and is a built-in mathematical function (e.g. cosine). In our framework, we model this computation as . Here, and are random variables with mean and variance statistics. In our framework, we simplify the problem of smoothing out the overall function by breaking the program into sub-parts, which can each be separately smoothed. In this paper, we use lower-case letters such as and to represent real values (scalars) in the original non-smoothed input function. These can be either input, output, or intermediate values. We use corresponding capital letters such as and to represent random variables for the corresponding smoothed computation node.

As we just mentioned, the random variables in our framework have two key values associated with each individual: mean and variance. We denote these as and respectively.

Figure 2. An overview of our system showing its components. The key components are: approximation rules, quality improvements, and genetic search. In approximation rules (Section 4 - 4.6), a variety of approximation methods are implemented to smooth the input function. We can optionally improve the quality of the output program by using the simplex method to adjust variances and denoising the output (Section 4.7). All smoothed program variants are selected through a genetic search (Section 5), which finds a Pareto frontier that optimally trades off program running time and error.

An overview of the components of this framework is shown in Figure 2. Our framework has three main parts: approximation rules, quality improvements, and genetic search. Under the random variable assumption, the compiler first approximates the smoothed function by determining the mean and variances of each compute node (discussed in  Section 4 - 4.6

). Next, additional quality improvements can be optionally made. These include heuristically adjusting the variance of each node using simplex search, and denoising (discussed in  Section 

4.7). Finally, we use genetic search to select the best-performing program variants based on the criterion of fast run-time and low error (discussed in  Section 5).

(a) Input function (b)  [Dorn et al., 2015], (c) Adaptive Gaussian, (d) Monte Carlo sampling (e) Compactly supported kernels
(Section 4.3) (Section 4.4) (Section 4.5) (Section 4.6)
Figure 3. An overview of different approximation techniques. (a) The input function . This function is evaluated in our framework as the composition of two “atomic” functions that we do know how to smooth: and . The “ground truth” correctly smoothed function is shown in blue dashed curves in subfigures (b-e). This is determined by sampling at a very high sample rate. The orange lines in subfigures (b-e) are the mean of the output random variable calculated by different approximations. The dark red subplots in (b-e) give an abstract illustration of the kernels that were actually used to evaluate these. (b) The approximation by Dorn et al. [2015] (Section 4.3); (c) Our adaptive Gaussian approximation (Section 4.4); (d) Monte Carlo sampling approximation with 8 samples (Section 4.5); (e) Compactly supported kernels approximation: here we use a box kernel (Section 4.6). We use a standard deviation of for all input distributions.

As shown in Figure 3, we implemented several different approximation methods to calculate the mean and variance of each random variable. These approximations are:

  • Dorn et al. [2015] (Section 4.3): we improve the approximations described in [Dorn et al., 2015] and relate them to the mean and variance of our framework. The variance calculation here is naive yet fast.

  • Adaptive Gaussian approximations (Section 4.4

    ): these use convolution to estimate both the mean and variance for a given compute stage assuming Gaussian distributed inputs and outputs. When applied to a whole program that is in a certain real analytic class, this approximation rule gives accuracy that is accurate to the second power of

    , the standard deviation of the input variables.

  • Monte Carlo sampling: we integrate this widely used method into our framework. The mean and variance are given by sampled estimators. For large number of samples, this converges to the ground truth.

  • Compactly supported kernels: for a sub-part of the computation that contains undefined or infinite values, (e.g. for ), the corresponding integrals with the Gaussian for mean and variance may not exist. However, the full program may have a well-defined result, so smoothing should still be possible through such a sub-part. To handle this case, we use compactly supported kernels such as box or tent. The kernel size is limited based on the distance to the nearest undefined point.

In Figure 3(b-e), we show an example of using these different approximations to smooth the function . The approximations are shown as orange curves and the ground truth as blue dashed curves. Specifically, we have considered this function as the composition of two primitive functions and , which have each been atomically smoothed. The adaptive Gaussian rule in Figure 3(c) gives a close approximation to the ground truth for small . For demonstration purposes, in Figure 3(e) we also use to show the approximation for compactly supported kernels. But in our implementation, only functions with undefined regions, or those that do not have a closed form Gaussian convolution will be approximated using this method.

4. Approximation Rules

Suppose that the programmer writes down a given input program that we wish to smooth. In our framework, we carry out smoothing by conceptually replacing each intermediate float value in the computation with a random variable having a known probability density function. We thus represent each compute node by having instead of a specific value, having specified mean and standard deviation statistics. For each node X in the computation, we use

to denote its mean, and for its standard deviation. The output of the program is then the mean value for the output variable.

As a concrete example, for shader bandlimiting, each input pixel coordinate could be regarded as two independent random variables, and . The means of these random variables could represent pixel positions, , and . Because we wish to evaluate an antialiased variant of the shader, we can model the standard deviations of the inputs as , i.e. half a pixel, to suppress aliasing beyond the Nyquist rate. Then the mean of the output variables in the computation is simply used as the rendered color.

In this section, we will first use an example to illustrate how we use this approach to smooth functions. Then, we will describe how our composition rules can be used to combine different smoothed sub-parts of a program. After that, we will talk about different approximation rules to estimate the mean and standard deviation statistics.

4.1. Motivating example

Assume we are smoothing a function . The smoothed function can be denoted as , and the output value can be computed as . Here, controls the smoothness. Higher means is more smooth. We will use the operator throughout this paper to denote that a function is being smoothed. We use convolution to define as follows.


In equation (4.1), is the smoothing kernel that is used to smooth out the original function . To more explicitly identify the kernel as being , we can also use the notation . The convolution kernel can be any non-negative kernel that has an integral over from to of 1. This allows us to interpret the kernel also as a probability density function. For example, we might use the Gaussian kernel:


If happens to be a shader program, then as is discussed in [Dorn et al., 2015], is simply a band-limited version of the same procedural shader function.

We now show how the convolution from equation (4.1) fits into our framework. We assume that in the input program, an intermediate value is computed by applying a function to a previous value , i.e., . But in our framework, both and are random variables. If the probability density function of is , then can be computed from and as:


As an example, if we assume X is normally distributed,

, then equation (4.3) can be rewritten as:


In equation (4.4), indicates convolution. If has a different probability density function, then will be a different kernel. Thus, is the smoothed value for . This gives some intuition for how our model is used for smoothing functions. Our framework provides different methods to estimate and . We will describe them in the following subsections.

4.2. Composition rules

In principle equation (4.1) can be used to determine the correct smoothed function from any input function . However, in practice, the associated integrals often do not have a closed-form solution. Therefore, a key observation used in our paper is that we can break up the computation into different sub-parts, and compute approximate mean and standard deviation statistics for each sub-part. We do this by simply substituting the smoothed mean and standard deviation that are output from one compute node as the inputs for any subsequent compute nodes.

As an example, suppose we have the computation of Figure 3. From the input , we compute , and then the output , with associated random variables , respectively. We are given , the mean and standard deviation of the input. Using the approximation rule chosen for the node , we compute from . Then from , using the approximation rule for the node , we compute : is the smoothed output of the program ( can be discarded).

4.3. Approximation of Dorn et al. 2015

We integrate the approximation methods described in Dorn et al. [2015] as one of our approximation options. Dorn’s method involves computing the mean for a smoothed function by convolving with a Gaussian kernel. Suppose an intermediate variable is computed from another variable , and the associated random variables are and , respectively, where . Then is:


This is the same as the result we derived in equation (4.4). Here is computed from equation (4.1). In Table 3, we show commonly used functions and their corresponding smoothed functions . In Dorn’s method, is determined based on the following simplifying assumption: output is a linear combination of the axis-aligned input s in each dimension. Simple rules are used, such as for addition and subtraction are the sum of input s, and for multiplication or division are the product or quotient, respectively, of the input s. In all other cases, including function calls, the output is the average of the non-zero s of all the inputs.

We make two improvements to Dorn et al. [2015], and use the improved variant of this method for all comparisons in our paper. The first improvement gives better standard deviation estimates, and the second collects a Pareto frontier. For the standard deviations (known as “sample spacing” in Dorn et al. [2015]), we detect the case of multiplication or division by a constant and adjust the standard deviation accordingly (i.e. ). This improvement helps give more accurate estimates of the standard deviations and thus reduces the problem seen in Dorn et al.’s Figure 5(c-d), where the initial program found by the genetic search is quite different from the final program after the variances have been adjusted by a simplex search. The adjustment process is described later in more detail in Section 4.7.

Our second improvement is to collect not just a single program variant with least error, but instead a Pareto frontier of program variants that optimally trade off running time and error. This process is described later in Section 5.

4.4. Adaptive Gaussian Approximations

In this novel approximation, we model the input variables to a compute node as being Gaussian distributed, and then fit a Gaussian to the produced output by collecting its mean and standard deviation. Thus, this rule more accurately and adaptively calculates standard deviations.

We first consider the case that we have a function of one variable. In the same manner as Section 4.3, if a random variable is computed from another random variable as , then can be determined from equation (4.4), equation (4.1) and Table 3. However, the standard deviation is determined differently based on the definition of variance of :


The approximation of can also be extended to multiple dimensions. Suppose is a function applied to an

-dimensional vector. Then

can be computed by the convolution of and an -dimensional multivariate Gaussian with zero mean and covariance matrix . Suppose for simplicity that the input variables have zero correlation () and equal standard deviation, so , where

is the identity matrix. By using Green’s function on this convolution

[Baker and Sutlief, 2003], we can find a Taylor expansion for the function in terms of :


The derivation of equation (4.7) assumes that is real analytic on , and can be extended to a holomorphic function on , so that all the derivatives exist, and the Taylor series has an infinite radius of convergence [Wikipedia, 2017c]. This class of functions includes polynomials, sines, cosines, and compositions of these. It is also necessary to assume that the function is bounded by exponentials: the precise conditions are discussed by Łysik [2012]. These properties could hold for some shader programs, but even if they do not hold for an entire program, they could often hold for sub-parts of a program. We show in Appendix D that a single function composition gives a result accurate to second order in for this rule. Similarly, this property can be proved via induction for multiple function compositions.

(a) Input function (b) Adaptive Gaussian (c) Truncated Taylor
approximation (4.4) expansion (equation (4.7))
Figure 4. A comparison of different approximation techniques. (a) The input function . The ground truth correctly band-limited functions are shown in blue in subfigures (b-c). These were determined by sampling at a high sample rate. (b) Our adaptive Gaussian approximation (Section 4.4) is shown in orange and compared against the ground truth in blue. The approximation is good. (c) A truncated Taylor expansion with 10 terms does not result in smoothing.

There are also other second order accurate approximations, such as simply truncating the Taylor expansion in equation (4.7) to use only the first and second term. Why would we bother to propose an adaptive Gaussian approximation?

To illustrate why adaptive Gaussian gives a more accurate approximation, we show an example in Figure 4. Here, we use the function . We show the approximation using adaptive Gaussian and simply truncating the Taylor expansion in equation (4.7). In Figure 4, the blue lines represent the ground truth, and the orange lines represent different approximations. From Figure 4(c), we can see that simply truncating the Taylor expansion results in large extrapolation errors and actually amplifies high frequencies, instead of smoothing them out. In Figure 4(b), we show that the function is smoothed more accurately using the adaptive Gaussian approximation.

For binary functions, we can still show that adaptive Gaussian is accurate to second order if a correlation term is carefully chosen. But for brevity, we are only going to show the rules to compute the mean and standard deviation for binary functions we used in our compiler.

In general, the inputs of a binary function can be considered as two random variables and (corresponding to and ). We make the assumption that and are distributed according to a bivariate Gaussian:


Here, and are standard deviations of and . These can be determined directly from the approximation of previous computation. Here is the correlation term between and . We will talk about how we choose later in this section. The mean and standard deviation for binary function plus , minus and multiplication can be derived from these assumptions based on properties of the Gaussian distribution [Petersen et al., 2008]:


For the binary function divide , we reduce this to multiplication by using as . The mean and standard deviation for division can be calculated via the composition rules. Here, is an univariate function with singularity at . Technically, the mean and variance therefore do not exist if the Gaussian kernel is used. We work around this singularity by approximating using a compact kernel with finite support. This will be described in detail in Section 4.6. The modulo function, , can be rewritten as . Here, fract() is the fractional part of . We make the simplifying assumption that the second argument of mod is an ordinary (non-random) variable (so ), to obtain:


Comparison functions are approximated by converting them to univariate functions including the Heaviside step function . As an example, the function greater than can be rewritten as . This can be approximated using rules described previously.

One other important multi-variate function we approximated is the select function. We approximated this in the same manner as Dorn et al. [2015]

as a linear interpolation:

. But in our case we apply the above univariate and binary function approximations to this formula.

As we discussed before, for binary functions, we approximate the input random variables and as bivariate Gaussian with correlation coefficient (equation (4.8)). In general, it is difficult to determine , because determining exactly involves an integral over the entire subtrees of the computation. In our framework, we provide three options to approximate : (1) Assume is zero; (2) Assume is a constant for each node. The constant value is estimated at training stage by sampling; (3) Estimate to accuracy that is second order in , based on a simplified assumption that the given nodes are affine functions of the inputs. For case (3) we simply take the gradients with respect to the input of the terms and , normalize these gradients, and take the dot product, which recovers . This can be done using reverse mode automatic differentiation.

We explored these different rules in our genetic search. In practice, we find that for shader programs, using only rule (1), typically gives good results. If the other rules (2, 3) are also included, minor quality improvements are gained, but these rules are used relatively rarely by our genetic search process of Section 5. We include in Appendix B the details for the these other choices for correlation coefficients, because they may be more beneficial for other applications, and the second order accuracy property is interesting.

4.5. Monte Carlo Sampling

We relate the Monte Carlo stochastic sampling [Cook, 1986; Dippé and Wold, 1985] to our framework. Here is the number of samples. The computation for a node computing is modeled as , where the output mean and standard deviation are given by sampled estimators as follows:


Here, each are random numbers independently drawn from normal distribution . We experimented with applying the Bessel’s correction [Wikipedia, 2017a] to correct the bias variance that occurs for small sample counts . In practice, we found it does not have a significant improvement on the result for our system. This is mainly because the variance can also be adjusted in the “quality improvement” phase (discussed in Section 4.7).

We choose sample numbers from (). The approximation converges to the ground truth for large sample numbers, and the output program simplifies to MSAA [Cook, 1986] when the entire input program is approximated under Monte Carlo sampling.

The error of the Monte Carlo sampling is estimated as follows [Feiguin, 2011].


Here, is the standard deviation computed from equation (4.11) and is sample number. This error estimate becomes more accurate in the limit of large sample numbers.

4.6. Compactly Supported Kernels Approximation

Because the Gaussian kernel has infinite support, it cannot be used on functions with undefined regions. For example, is only defined on non-negative , and its convolution with Gaussian using equation (4.1) does not exist. Monte Carlo sampling may also encounter such problem. However, even if an input program contains such functions as sub-parts, the full program may have a well-defined result, so smoothing should still be possible for such programs. To handle this case, we use compactly supported kernels.

Results for certain compactly supported kernels can be obtained by using repeated convolution [Heckbert, 1986]

of boxcar functions. This is because such kernels approximate the Gaussian by the central limit theorem

[Wells, 1986]. In our framework, we use box and tent kernels to approximately smooth functions with undefined values. Because the convolution with a box kernel is easier to compute, this approximation can also be used when the Gaussian convolution does not have a closed-form solution. In Table 3, we list the smoothed result using the box kernel for commonly used functions.

When integrating against a function that has an undefined region, it is important to make sure that the integral is not applied at any undefined regions. Our solution to this is to make the kernel size adapt to the location at which the integral is evaluated at. Thus, the integral is no longer technically a convolution, because it is not shift-invariant. We first measure the distance from value that we are determining the integral at to the function’s nearest undefined point. If the kernel half-width was before re-scaling, then we rescale the half-width to be . Here is a constant less than one, and in practice we use .

We can also use this truncation mechanism to better model functions such as (the fractional part of ), which have discontinuities. We observe that is discontinuous at integer . If we input a distribution that spans a discontinuity, such as , into , then we find that the output may be bimodal, with some values close to zero (due to being a positive value), and other values close to one (due to being negative with small absolute value). If we fit a Gaussian to this resulting bimodal distribution, as our adaptive Gaussian rule proposes, then the mean would be , even though most of the values of are either near or . This may result in a poor approximation, which can show up in tiled pattern shaders (which use ) as an improper bias towards the center of the tile’s texture. One fix would be to randomly select either mode, based on the probability contained in each mode. However, this introduces sampling noise in the result, which we wish to avoid. Therefore, our solution in practice is to first identify whether the output distribution is bimodal: for we can do this by simply checking if input distribution when represented as a uniform (box) distribution contains exactly one discontinuity, i.e. one integer. If so, we simply truncate the filter at the location of the discontinuity.

4.7. Quality Improvements

At this point, we assume we have now applied the approximation rules described in Sections 4.3 through 4.6 to an input program. We can optionally improve the approximation quality in two ways: a) adjust the standard deviation made in the approximations, and b) apply denoising to program variants that use Monte Carlo sampling.

Because our approximations are not exact, the standard deviation of some nodes may be too high or too low. Following [Dorn et al., 2015], we learn coefficients to refine the standard deviation estimates. By comparing with the ground truth image for the shader rendering, we use the Nelder-Mead simplex search [Nelder and Mead, 1965] to learn multiplicative factors for standard deviations.

When Monte Carlo sampling is used as part of the approximation, noise is introduced because of the relatively small sample count. A variety of techniques have been developed to filter such noise [Kalantari et al., 2015; Bako et al., 2017; Rousselle et al., 2012]. We implement the non-local means denoising method [Buades et al., 2005, 2011] with Laplacian pyramid [Liu et al., 2008]. We find that aesthetically appealing denoising results can be using a three level Laplacian pyramid, with a patch size of 5, search radius of 10, and denoising parameter is 10 for the lower resolutions, and searched over or set by the user for the finest resolution. In the genetic search process (Section 5), we experimented with allowing the algorithm to search from a variety of denoising parameters for the best result. However, because our denoising algorithm incurs some time overhead, it ends up being only rarely chosen. Thus, in our current setup, denoising is typically specified by the user manually choosing that he or she wants to denoise a result.

5. Genetic Search

In this section, we describe the genetic search algorithm. This automatically assigns approximation rules to each computation node. The algorithm finds the Pareto frontier of approximation choices that optimally trade off the running time and error of the program.

We developed this genetic search because it gives users the opportunity to explore the trade-off between efficiency and accuracy of the smoothed program. Although developers can manually assign approximation rules to each node, we found this to be a time-consuming process that can easily overlook beneficial approximation combinations. This is because each individual computation node may choose from several approximation rules, and the search space for this is combinatoric.

Our genetic search closely follows the method of Sitthi-Amron et al. [2011]. We adopt their fitness function and tournament selection rules, and we use the same method to compute the Pareto frontier of program variants that optimally trade-off running time and error with ground truth.

We start with “decent initial guesses”. For each approximation method, we create a program where the rule is applied to all the expression nodes. For such initial guesses, we also apply cross-over with a probability of 0.5. Then we employ standard mutation and cross-over operations to explore the search space. The mutation step chooses a new approximation rule, and with equal probability, assigns this new rule to 1, 2, or 4 adjacent expression nodes in depth-first order. As an alternative, with equal probability, the new approximation rule can also be assigned to the whole subtree of an arbitrary node. In the genetic search algorithm, we choose our probability of crossover as 0.4, the probability of retaining elite individuals to be 0.25, and the mutation probability to be 0.35. Also, we use a tournament size of 4, and population size of 40, with 20 generations. Finally, we run 3 random restarts for the algorithm.

For the Monte Carlo sampling approximation, during initialization and mutate, we select sample counts with equal probability from the set . For the determination of correlation coefficients described in Section 4.4, we pick with equal probability one of the three options.

6. Evaluation

The previous work of Dorn et al. [2015] was evaluated primarily on relatively simple shaders. To provide a more challenging and realistic benchmark, we authored 21 shaders. Unlike the simple shaders of Dorn et al., these include shaders that have a Phong lighting model, animation, spatially varying statistics, and which include parallax mapping [Szirmay-Kalos and Umenhoffer, 2008]. Our 21 shaders were produced by combining 7 base shaders with 3 choices for parallax mapping: none, bumps, and ripples. In Table 1, we describe our base shaders, the choices for parallax mapping, and the associated code complexity.

Shader Lines Exprs Description
Base shaders
   Bricks 38 192 Bricks with noise pattern
   Checkerboard 20 103 Greyscale checkerboard
   Circles 16 53 Tiled greyscale circles
   Color circles 26 164 Aperiodic colored circles
   Fire 49 589 Animating faux fire
   Quadratic sine 26 166 Animating sinewave
of quadratic (non-stationary)
   Zigzag 24 224 Colorful zigzag pattern
Parallax mappings
   None 0 0 No parallax mapping
   Bumps 21 203 Spherical bumps
   Ripples 23 178 Animating ripples
Table 1. A table of our 21 shaders. At the top we list our 7 base shaders, which are each combined with 3 different choices for parallax mapping, listed at the bottom. We also report the number of non-comment lines used to construct the shader program, and the number of scalar expressions in the program’s compute graph.

Results for 7 of our shaders are presented in Figure 1 and Figure 5, including one result for each base shader. The result for our method was selected by a human choosing for each shader a program variant that has sufficiently low error. Dorn et al. [2015] typically cannot reach sufficiently low errors to remove the aliasing, so we simply selected the program variant from Dorn et al. that reaches the lowest error. The MSAA result was selected based on evaluating MSAA program variants that use 2, 4, 8, 16, 32 samples, and selecting the one that has most similar time as ours. Please see our supplemental video for results with a rotating camera for all 21 shaders.

We also show in Figure 6 time versus error plots for the Pareto frontiers associated with these 7 shaders. Note that Dorn et al. typically has significantly higher error, which manifests in noticeable aliasing. Also note that the MSAA method frequently takes an order of magnitude more time for equal error. Plots for all 21 of our shaders are included in the supplemental document.

Statistics for the approximations used are presented in Table 2. Note that a rich variety of approximation strategies are used: all five choices for approximation are selected for different programs. For the correlation term discussed in Section 4.4, when aggregated across all 21 shaders, nearly all approximations for programs on the Pareto frontier prefer the simple choice of . We weight each shader’s contribution equally, and find 87% of program variants prefer , whereas only 4% use a constant, and 6% use estimated to second order accuracy. We conclude that for shader programs, the simple choice of in most cases suffices.

Note that our brick shader gives poor results for the method of Dorn et al. [2015], while in that paper, a brick shader with similar appearance shows good results. This is because the brick shader in Dorn et al. [2015] was implemented using floor() functions which can each be bandlimited independently, and then a good result is obtained by linearity of the integral. In our paper, we implemented a number of shaders using the fract() function to perform tilings that are exactly or appropriately periodic, including the brick shader. The fract() function ends up being more challenging to bandlimit for the framework of Dorn et al. [2015], but our method can handle such shaders.

Shader Dorn et al. Adaptive Monte Carlo None
 [2015] Gaussian Sampling
Bricks w/ None 28% 0% 30% 29%
Checkerboard w/ Ripples 66% 34% 0% 1%
Circles w/ None 4% 21% 71% 4%
Color Circles w/ Bumps 8% 47% 44% 0%
Fire w/ Bumps 1% 7% 33% 60%
Quadratic sine w/ Ripples 13% 80% 0% 8%
Zigzag w/ Ripples 0% 91% 1% 8%
All shaders (Pareto frontier) 29% 15% 25% 30%
All shaders (fastest time) 13% 10% 0% 77%
All shaders (median time) 20% 19% 49% 13%
All shaders (slowest time) 10% 27% 49% 14%
Table 2. Statistics of which approximations were chosen for different shaders. We show statistics for the 7 program variants for the shaders presented in Figure 1 and Figure 5. We also show aggregate statistics over all 21 shaders, with each shader’s contribution weighted equally. For the aggregate statistics we report statistics from the entire Pareto frontier, as well as for each shader choosing only the slowest, fastest, or median speed program variant. Our results show that a rich variety of our different approximation rules are needed for the best performance.

We performed our evaluation on an Intel Core i7 6950X 3 GHz (Broadwell), with 10 physical cores (20 hyperthreaded), and 64 GB DDR4-2400 RAM. All shaders were evaluated on the CPU using parallelization. The tuning of each shader took between 1 and 3 hours of wall clock time. However, we note that good program variants are available after just a few minutes, and most of the remaining tuning time is spent making slight improvements to the best individuals. Also, our tuner is intentionally a research prototype that is not particularly optimized: it could be significantly faster if it was parallelized more effectively, cached more redundant computations, or targeted the GPU.

Ground Truth No Antialiasing Our Result Dorn et al. 2015 MSAA
Checkerboard with Ripples
30 ms, error: 0.194 54 ms (2x), error: 0.071 50 ms (2x), error: 0.102 47 ms (2x), error: 0.233
Circles with None
20 ms, error: 0.148 71 ms (4x), error: 0.035 39 ms (2x), error: 0.063 67 ms (3x), error: 0.087
Color Circles with Bumps
37 ms, error: 0.098 149 ms (4x), error: 0.039 56 ms (2x), error: 0.079 112 ms (3x), error: 0.061
Fire with Bumps
39 ms, error: 0.170 698 ms (18x), error: 0.037 67 ms (2x), error: 0.136 705 ms (18x), error: 0.037
Quadratic Sine with Ripples
38 ms, error: 0.184 81 ms (2x), error: 0.045 59 ms (2x), error: 0.094 99 ms (3x), error: 0.158
Zigzag with Ripples
57 ms, error: 0.139 77 ms (1x), error: 0.045 59 ms (1x), error: 0.072 83 ms (1x), error: 0.122
Figure 5. Selected result images for 6 shaders. Please see the supplemental video for a comprehensive comparison of all shaders. Reported below each shader are the time to render a frame, time relative to no antialiasing, and error. Please zoom in to see aliasing and noise patterns in the different methods. Program variants with comparable time were selected: see Section 6 for more details. Note that the amount of aliasing and error for our result is significantly less than Dorn et al. [2015]. Note that we typically have significantly less error and noise than the comparable MSAA results.
Bricks Checkerboard Circles Color Circles Fire Quadratic Sine Zigzag
with None with Ripples with None with Bumps with Bumps with Ripples with Ripples
Figure 6. Time versus error plots for the 7 shaders in Figure 1 and Figure 5. Here we show the Pareto frontier of program variants that optimally trade off running time and error. We show results for our method, Dorn et al [2015], MSAA with varying numbers of samples, and the input shader without antialiasing. Note that our approach typically has significantly less error than Dorn et al [2015] and is frequently an order of magnitude faster than MSAA for comparable error.

7. Discussion and Conclusion

In this paper, we presented a novel compiler framework that smoothes an arbitrary program over the floats by convolving it with a Gaussian kernel. We explained several different approximations and discussed the accuracy of each. We then demonstrated that our framework allows shader programs to be automatically bandlimited. This shader bandlimiting application achieves state-of-the-art results: it often has substantially better error than Dorn et al. [2015] even after our improvements, and is frequently an order of magnitude faster than multi-sample antialiasing (MSAA). Our framework is quite general, and we believe it could be useful for other problems in graphics and across the sciences. In order to facilitate reproducible research, we intend to release our source code under an open source license.


We thank Zack Verham for authoring some shaders, Ning Yu for helping produce the supplementary video, and Francesco Di Plinio for providing references about the heat equation and its Taylor expansion. This project was partially funded by NSF grants HCC 1011444 and SHF 1619123.

Appendix A Table of Smoothed Formulas

In Table 3, we show a table of functions and their corresponding convolutions with box and Gaussian kernels. These are needed for the approximations we developed in Section 4. This table can be viewed as an extension of the table presented in Dorn et al. [2015]. Note that in particular, for each function , we report not only the result of smoothing but also smoothing (e.g. if we report then we also report ). This is needed to determine the standard deviations output by a given compute stage for the adaptive Gaussian approximation rule of Section 4.4.

In Table 3, we give that bandlimiting by a Gaussian is a generalized Hermite polynomial . This can be derived trivially from a property of generalized Hermite polynomials: the

th noncentral moment of a Gaussian distribution

with expected value and variance is a generalized Hermite polynomial [Wikipedia, 2017d]. The ordinary Hermite polynomial is defined as:


Then is the generalized Hermite polynomial with parameter , defined by:

Function Bandlimited with box kernel: Bandlimited with Gaussian kernel:
Table 3. A table of univariate functions, and their corresponding bandlimited result, using a box kernel and a Gaussian . The box kernel is the PDF of the uniform random variable . The Gaussian kernel is the PDF of the random variable . Each random variable has standard deviation . We define , and the Heaviside step function is for and for positive. Note that functions with undefined regions, such as for negative or fractional have limited as described in Section 4.6.

Appendix B Correlation Coefficients for Multivariate Functions

In this appendix, we describe rules to compute the correlation coefficient , which is briefly discussed in Section 4.4. Specifically, we are given a binary function , which receives two inputs and , with associated random variables and , respectively. We discuss the following two rules: a) assume is constant and estimate by sampling and b) compute under the assumption that computations are affine.

Estimate by sampling. In a training stage, we use samples to approximate of two random variables and . The samples drawn from these two distributions are represented as and with corresponding sample mean and . Thus, can be estimated by equation (B.1) [Wikipedia, 2017b].


Estimate by an affine assumption. When we calculate under this rule, we assume the variables and input to are affine transformations of the variables , …, which are input to the program. Here, s are inputs to the function that is being smoothed, which for a shader program could be the spatial coordinate. Under this assumption, and can be expressed as follows.


In equation (B.2), and are coefficients of the affine transformation, and and end up not mattering for the computation, so we ignore these constants. In our implementation, we find and by taking the gradient, via the automatic differentiation of the expression nodes and with respect to the inputs . Here is computed as:


Appendix C Smoothing Result for Periodic Functions

In this section, we derive a convenient formula that gives the bandlimited result for any periodic function if its integral within a single period is known. We extend the analysis of fract() made by Dorn et al. [2015] to any periodic function. We use Heckbert’s technique of repeated integration [Heckbert, 1986] to derive the convolution of a periodic function with a box kernel.

Specifically, we assume the periodic function has period T and its first and second integrals within one period are also known. These are denoted as and , respectively.


Using equation (C.1), we derive the first and second integral of as follows.


Using Heckbert’s result, the convolution of the periodic function with a box kernel that has support [] (corresponding to a uniform kernel with standard deviation ) can be expressed as follows.


And the convolution of the periodic function with a tent kernel that has support [] (corresponding to a uniform kernel with standard deviation ) can be expressed as follows.


Appendix D Proof of Second Order Approximation for a Single Composition

Here we show for a univariate function, applying function composition using our adaptive Gaussian approximation from Section 4.4 is accurate up to the second order in standard deviation . Suppose we wish to approximate the composition of two functions: , where . Assume the input random variable is : the Gaussian kernel centered at . The output from is an intermediate value in the computation: we can represent this with another random variable . Similarly, the output random variable . We conclude that .

We apply equation (4.7) and equation (4.6) to , and obtain the following mean and standard deviation.


Using our composition rule, is approximated as a normal distribution using the mean and standard deviation calculated from equation (D.1). That is, we approximate as being distributed as . Similarly, , which is the output we care about, can be computed based on equation (4.7), equation (D.1), and repeated Taylor expansion in around .


Comparing equation (D.2) with equation (