1 Introduction
Procedural modeling, or the use of randomized procedures to generate computer graphics, is a powerful technique for creating visual content. It facilitates efficient content creation at massive scale, such as procedural cities CGAShape . It can generate fine detail that would require painstaking effort to create by hand, such as decorative floral patterns FloralOrnament . It can even generate surprising or unexpected results, helping users to explore large or unintuitive design spaces GraphicsHMC .
Many applications demand control over procedural models: making their outputs resemble examples InverseProceduralTrees ; InteractivePDFDesign , fit a target shape SyntheticTopiary ; MPM ; SOSMC , or respect functional constraints such as physical stability GraphicsHMC
. Bayesian inference provides a generalpurpose control framework: the procedural model specifies a generative prior, and the constraints are encoded as a likelihood function. Posterior samples can then be drawn via Markov Chain Monte Carlo (MCMC) or Sequential Monte Carlo (SMC). Unfortunately, these algorithms often require many samples to converge to highquality results, limiting their usability for interactive applications. Sampling is challenging because the constraint likelihood implicitly defines complex (often nonlocal) dependencies not present in the prior. Can we instead make these dependencies
explicit by encoding them in a model’s generative logic? Such an explicit model could simply be run forward to generate highscoring results.In this paper, we propose an amortized inference method for learning an approximation to this perfect explicit model. Taking inspiration from recent work in amortized variational inference, we augment the procedural model with neural networks that control how the model makes random choices based on the partial output it has generated. We call such a model a neurallyguided procedural model. We train these models by maximizing the likelihood of example outputs generated via SMC using a large number of samples, as an offline preprocess. Once trained, they can be used as efficient SMC importance samplers. By investing time upfront generating and training on many examples, our system effectively ‘precompiles’ an efficient sampler that can generate further results much faster. For a given likelihood threshold, neurallyguided models can generate results which reliably achieve that threshold using 1020x fewer particles and up to 10x less compute time than an unguided model.
In this paper, we focus on accumulative procedural models that repeatedly add new geometry to a structure. For our purposes, a procedural model is accumulative if, while executing, it provides a ‘current position’ from which geometry generation will continue. Many popular growth models, such as Lsystems, are accumulative LSystemBook . We focus on 2D models () which generate images, though the techniques we present extend naturally to 3D.
2 Related Work
Guided Procedural Modeling
Procedural models can be guided using nonprobabilistic methods. Open Lsystems can query their spatial position and orientation, allowing them to prune their growth to an implicit surface SyntheticTopiary ; PlantEnvironment . Recent followup work supports larger models by decomposing them into separate regions with limited interaction GuidedProceduralModeling . These methods were specifically designed to fit procedural models to shapes. In contrast, our method learns how to guide procedural models and is generally applicable to constraints expressable as likelihood functions.
Generatively Capturing Dependencies in Procedural Models
A recent system by Dang et al. modifies a procedural grammar so that its output distribution reflects user preference scores given to example outputs InteractivePDFDesign
. Like us, they use generative logic to capture dependencies induced by a likelihood function (in their case, a Gaussian process regression over userprovided examples). Their method splits nonterminal symbols in the original grammar, giving it more combinatorial degrees of freedom. This works well for discrete dependencies, whereas our method is better suited for continuous constraint functions, such as shapefitting.
Neural Amortized Inference
Our method is also inspired by recent work in amortized variational inference using neural variational families NVIL ; SGVB ; AEVB , but it uses a different learning objective. Prior work has also aimed to train efficient neural SMC importance samplers NASMC ; NeuralStochasticInverses
. These efforts focused on time series models and Bayesian networks, respectively; we focus on a class of structured procedural models, the characteristics of which permit different design decisions:

The likelihood of a partiallygenerated output can be evaluated at any time and is a good heuristic for the likelihood of the completed output. This is different from e.g. time series models, where the likelihood at each step considers a previouslyunseen data point.

They make many local random choices but have no global/toplevel parameters.

They generate images, which naturally support coarsetofine feature extraction.
These properties informed the design of our neurallyguided model architecture.
3 Approach
function chain(pos, ang) {
var newang = ang + gaussian(0, PI/8);
var newpos = pos + polarToRect(LENGTH, newang);
genSegment(pos, newpos);
if (flip(0.5)) chain(newpos, newang);
}

function chain_neural(pos, ang) {
var newang = ang + gaussMixture(nn1(...));
var newpos = pos + polarToRect(LENGTH, newang);
genSegment(pos, newpos);
if (flip(nn2(...))) chain_neural(newpos, newang);
}

Consider a simple procedural modeling program chain that recursively generates a random sequence of linear segments, constrained to match a target image. Figure 0(a) shows the text of this program, along with samples generated from it (drawn in black) against several target images (drawn in gray). Chains generated by running the program forward do not match the targets, since forward sampling is oblivious to the constraint. Instead, we can generate constrained samples using Sequential Monte Carlo (SMC) SOSMC . This results in final chains that more closely match the target images. However, the algorithm requires many particles—and therefore significant computation—to produce acceptable results. Figure 0(a) shows that particles is not sufficient.
In an ideal world, we would not need costly inference algorithms to generate constraintsatisfying results. Instead, we would have access to an ‘oracle’ program, chain_perfect, that perfectly fills in the target image when run forward. While such an oracle can be difficult or impossible to write by hand, it is possible to learn a program chain_neural that comes close. Figure 0(b) shows our approach. For each random choice in the program text (e.g. gaussian, flip), we replace the parameters of that choice with the output of a neural network. This neural network’s inputs (abstracted as “...”) include the target image as well the partial output image the program has generated thus far. The network thus shapes the distribution over possible choices, guiding the programs’s future output based on the target image and its past output. These neural nets affect both continuous choices (e.g. angles) as well as control flow decisions (e.g. recursion): they dictate where the chain goes next, as well as whether it keeps going at all. For continuous choices such as gaussian, we also modify the program to sample from a mixture distribution. This helps the program handle situations where the constraints permit multiple distinct choices (e.g. in which direction to start the chain for the circleshaped target image in Figure 1).
Once trained, chain_neural generates constraintsatisfying results more efficiently than its unguided counterpart. Figure 0(b) shows example outputs: forward samples adhere closely to the target images, and SMC with 10 particles is sufficient to produce chains that fully fill the target shape. The next section describes the process of building and training such neurallyguided procedural models.
4 Method
For our purposes, a procedural model is a generative probabilistic model of the following form:
Here,
is the vector of random choices the procedural modeling program makes as it executes. The
’s are local probability distributions from which each successive random choice is drawn. Each
is parameterized by a set of parameters (e.g. mean and variance, for a Gaussian distribution), which are determined by some function
of the previous random choices .A constrained procedural model also includes an unnormalized likelihood function that measures how well an output of the model satisfies some constraint :
In the chain example, is the target image, with measuring similarity to that image.
A neurallyguided procedural model modifies a procedural model by replacing each parameter function with a neural network:
where renders the model output after the first random choices, and are the network parameters. is a mixture distribution if random choice is continuous; otherwise, .
To train a neurallyguided procedural model, we seek parameters such that is as close as possible to . This goal can be formalized as minimizing the conditional KL divergence (see the supplemental materials for derivation):
(1) 
where the are example outputs generated using SMC, given a drawn from some distribution over constraints, e.g. uniform over a set of training images. This is simply maximizing the likelihood of the under the neurallyguided model. Training then proceeds via stochastic gradient ascent using the gradient
(2) 
The trained can then be used as an importance distribution for SMC.
It is worth noting that using the other direction of KL divergence, , leads to the marginal likelihood lower bound objective used in many blackbox variational inference algorithms AVIPP ; BBVI ; NVIL . This objective requires training samples from , which are much less expensive to generate than samples from . When used for procedural modeling, however, it leads to models whose outputs lack diversity, making them unsuitable for generating visuallyvaried content. This behavior is due to a wellknown property of the objective: minimizing it produces approximating distributions that are overlycompact, i.e. concentrating their probability mass in a smaller volume of the state space than the true distribution being approximated MacKayBook . Our objective is better suited for training proposal distributions for importance sampling methods (such as SMC), where the target density must be absolutely continuous with respect to the proposal density AbsolutelyContinuous .
4.1 Neural Network Architecture
Each network should predict a distribution over choice that is as close as possible to its true posterior distribution. More complex networks capture more dependencies and increase accuracy but require more computation time to execute. We can also increase accuracy at the cost of computation time by running SMC with more particles. If our networks are too complex (i.e. accuracy provided per unit computation time is too low), then the neurallyguided model will be outperformed by simply using more particles with the original model . For neural guidance to provide speedups, we require networks that pack as much predictive power into as simple an architecture as possible.
Figure 2
shows our network architecture: a multilayer perceptron with
inputs, one hidden layer of size with a nonlinearity, and outputs, where is the number of parameters the random choice expects. We found that a simpler linear model did not perform as well per unit time. Since some parameters are bounded (e.g. Gaussian variance must be positive), each output is remapped via an appropriate bounding transform (e.g. for nonnegative parameters). The inputs come from several sources, each providing the network with decisioncritical information:Local State Features
The model’s current position , the current orientation of any local reference frame, etc. We access this data via the arguments of the function call in which the random choice occurs, extracting all scalar arguments and normalizing them to .
Partial Output Features
Next, the network needs information about the output the model has already generated. The raw pixels of the current partial output image provide too much data; we need to summarize the relevant image contents. We extract 3x3 windows of pixels around the model’s current position at four different resolution levels, with each level computed by downsampling the previous level via a 2x2 box filter. This results in features for a
channel image. This architecture is similar to the foveated ‘glimpses’ used in visual attention models
RecurrentVisualAttention . Convolutional networks might also be used here, but this approach provided better performance per unit of computation time.Target Image Features
Finally, if the constraint being enforced involves a target image, as in the chain example of Section 3, we also extract multiresolution windows from this image. These additional features allow the network to make appropriate decisions for matching the image.
4.2 Training
5 Experiments
In this section, we evaluate how well neurallyguided procedural models capture imagebased constraints. We implemented our prototype system in the WebPPL probabilistic programming language WebPPL using the adnn neural network library.^{1}^{1}1https://github.com/dritchie/adnn All timing data was collected on an Intel Core i73840QM machine with 16GB RAM running OSX 10.10.5.
5.1 Image Datasets
Scribbles  Glyphs  PhyloPic  

In experiments which require target images, we use the following image collections:

Scribbles: 49 binary mask images drawn by hand with the brush tool in Photoshop. Includes shapes with thick and thin regions, high and low curvature, and selfintersections.

Glyphs: 197 glyphs from the FF Tartine Script Bold typeface (all glyphs with only one foreground component and at least 500 foreground pixels when rendered at 129x97).

PhyloPic: 35 images from the PhyloPic silhouette database.^{2}^{2}2http://phylopic.org
We augment the dataset with a horizontallymirrored copy of each image, and we annotate each image with a starting point and direction from which to initialize the execution of a procedural model. Figure 3 shows some representative images from each collection.
5.2 Shape Matching
We first train neurallyguided procedural models to match 2D shapes specified as binary mask images. If is the spatial domain of the image, then the likelihood function for this constraint is
(3) 
where is a binary edge mask computed using the Sobel operator. This function encourages the output image to be similar to the target mask , where similarity is normalized against ’s similarity to an empty image . Each pixel ’s contribution is weighted by , determined by whether the target mask is empty, filled, or has an edge at that pixel. We use , so empty and edge pixels are worth 1.5 times more than filled pixels. This encourages matching of perceptuallyimportant contours and negative space. in all experiments.
We wrote a WebPPL program which recursively generates vines with leaves and flowers and then trained a neurallyguided version of this program to capture the above likelihood. The model was trained on 10000 example outputs, each generated using SMC with 600 particles. Target images were drawn uniformly at random from the Scribbles dataset. Each example took on average 17 seconds to generate; parallelized across four CPU cores, the entire set of examples took approximately 12 hours to generate. Training took 55 minutes in our singlethreaded implementation.
Figure 4 shows some outputs from this program. 10particle SMC produces recognizable results with the neurallyguided model (Guided) but not with the unguided model (Unguided (Equal )). A more equitable comparison is to give the unguided model the same amount of wallclock time as the guided model. While the resulting outputs fare better, the target shape is still obscured (Unguided (Equal Time)). We find that the unguided model needs 200 particles to reliably match the performance of the guided model. Additional results are shown in the supplemental materials.
Target  Reference  Guided 
Unguided

Unguided

, 30.26  , 1.5  , 0.1  , 1.58 
Figure 5 shows a quantitative comparison between five different models on the shape matching task:

Unguided: The original, unguided procedural model.

Constant Params: The neural network for each random choice is a vector of constant parameters (i.e. a partial mean field approximation AVIPP ).

+ Local State Features: Adding the local state features described in Section 4.1.

+ Target Image Features: Adding the target image features described in Section 4.1.

All Features: Adding the partial output features described in Section 4.1.
We test each model on the Glyph dataset and report the median normalized similaritytotarget achieved (i.e. argument one to the Gaussian in Equation 3), plotted in Figure 4(a). The performance of the guided model improves with the addition of more features; at 10 particles, the full model is already approaching an asymptote. Figure 4(b) shows the wallclock time required to achieve increasing similarity thresholds. The vertical gap between the two curves shows the speedup given by neural guidance, which can be as high as 10x. For example, the + Local State Features model reaches similarity 0.35 about 5.5 times faster than the Unguided model, the + Target Image Features model is about 1.5 times faster still, and the All Features Model is about 1.25 times faster than that. Note that we trained on the Scribbles dataset but tested on the Glyphs dataset; these results suggest that our models can generalize to qualitativelydifferent previouslyunseen images.
Figure 5(a) shows the benefit of using mixture guides for continuous random choices. The experimental setup is the same as in Figure 5. We compare a model which uses fourcomponent mixtures with a nomixture model. Using mixtures boosts performance, which we alluded to in Section 3: at shape intersections, such as the crossing of the letter ‘t,’ the model benefits from multimodal uncertainty. Using more than four mixture components did not improve performance on this test dataset.
We also investigate how the number of training examples affects performance. Figure 5(b) plots the median similarity at 10 particles as training set size increases. Performance increases rapidly for the first few hundred examples before leveling off, suggesting that 1000 sample traces is sufficient (for our particular choice of training set, at least). This may seem surprising, as many published neurallybased learning systems require many thousands to millions of training examples. In our case, each training example contains hundreds to thousands of random choices, each of which provides a learning signal—in this way, the training data is “bigger” than it appears. Our implementation generates 1000 samples in just over an hour using four CPU cores.
5.3 Stylized “Circuit” Design
We next train neurallyguided procedural models to capture a likelihood that does not use a target image: constraining the vines program to resemble a stylized circuit design. To achieve the dense packing of long wire traces that is one of the most striking visual characteristics of circuit boards, we encourage a percentage of the image to be filled ( in our results) and to have a dense, highmagnitude gradient field, as this tends to create many long rectilinear or diagonal edges:
(4) 
where is the relative error of from and . We also penalize geometry outside the bounds of the image, encouraging the program to fill in a rectangular “die”like region. We train on 2000 examples generated using SMC with 600 particles. Example generation took 10 hours and training took under two hours. Figure 7 shows outputs from this program. As with shape matching, the neurallyguided model generates highscoring results significantly faster than the unguided model.
Reference  Guided 
Unguided

Unguided



6 Conclusion and Future Work
This paper introduced neurallyguided procedural models: constrained procedural models that use neural networks to capture constraintinduced dependencies. We showed how to train guides for accumulative models with imagebased constraints using a simpleyetpowerful network architecture. Experiments demonstrated that neurallyguided models can generate highquality results significantly faster than unguided models.
Accumulative procedural models provide a current position , which is not true of other generative paradigms (e.g. texture generation, which generates content across its entire spatial domain). In such settings, the guide might instead learn what parts of the current partial output are relevant to each random choice using an attention process RecurrentVisualAttention .
Using neural networks to predict random choice parameters is just one possible program transformation for generatively capturing constraints. Other transformations, such as control flow changes, may be necessary to capture more types of constraints. A first step in this direction would be to combine our approach with the grammarsplitting technique of Dang et al. InteractivePDFDesign .
Methods like ours could also accelerate inference for other applications of procedural models, e.g. as priors in analysisbysynthesis vision systems Picture . A robot perceiving a room through an onboard camera, detecting chairs, then fitting a procedural model to the detected chairs could learn importance distributions for each step of the chairgenerating process (e.g. the number of parts, their size, arrangement, etc.) Future work is needed to determine appropriate neural guides for such domains.
References
 [1] Bedřich Beneš, Ondřej Št’ava, Radomir Měch, and Gavin Miller. Guided Procedural Modeling. In Eurographics 2011.

[2]
Minh Dang, Stefan Lienhard, Duygu Ceylan, Boris Neubert, Peter Wonka, and Mark
Pauly.
Interactive Design of Probability Density Functions for Shape Grammars.
In SIGGRAPH Asia 2015.  [3] Charles Geyer. Importance Sampling, Simulated Tempering, and Umbrella Sampling. In S. Brooks, A. Gelman, G. Jones, and X.L. Meng, editors, Handbook of Markov Chain Monte Carlo. CRC Press, 2011.
 [4] Noah D Goodman and Andreas Stuhlmüller. The Design and Implementation of Probabilistic Programming Languages. http://dippl.org, 2014. Accessed: 20151223.
 [5] Shixiang Gu, Zoubin Ghahramani, and Richard E. Turner. Neural Adaptive Sequential Monte Carlo. In NIPS 2015.
 [6] K. Norman J. Manning, R. Ranganath and D. Blei. Black Box Variational Inference. In AISTATS 2014.
 [7] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In ICLR 2015.
 [8] Diederik P. Kingma and Max Welling. AutoEncoding Variational Bayes. In ICLR 2014.
 [9] T. Kulkarni, P. Kohli, J. B. Tenenbaum, and V. Mansinghka. Picture: An Imperative Probabilistic Programming Language for Scene Perception. In CVPR 2015.
 [10] David J. C. MacKay. Information Theory, Inference & Learning Algorithms. Cambridge University Press, 2002.
 [11] Andriy Mnih and Karol Gregor. Neural Variational Inference and Learning in Belief Networks. In ICML 2014.
 [12] Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. Recurrent Models of Visual Attention. In NIPS 2014.
 [13] Pascal Müller, Peter Wonka, Simon Haegler, Andreas Ulmer, and Luc Van Gool. Procedural Modeling of Buildings. In SIGGRAPH 2006.
 [14] Radomír Měch and Przemyslaw Prusinkiewicz. Visual Models of Plants Interacting with Their Environment. In SIGGRAPH 1996.
 [15] B. Paige and F. Wood. Inference Networks for Sequential Monte Carlo in Graphical Models. In ICML 2016.
 [16] P. Prusinkiewicz and Aristid Lindenmayer. The Algorithmic Beauty of Plants. SpringerVerlag New York, Inc., 1990.
 [17] Przemyslaw Prusinkiewicz, Mark James, and Radomír Měch. Synthetic Topiary. In SIGGRAPH 1994.

[18]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra.
Stochastic Backpropagation and Approximate Inference in Deep Generative Models.
In ICML 2014.  [19] Daniel Ritchie, Sharon Lin, Noah D. Goodman, and Pat Hanrahan. Generating Design Suggestions under Tight Constraints with Gradientbased Probabilistic Programming. In Eurographics 2015.
 [20] Daniel Ritchie, Ben Mildenhall, Noah D. Goodman, and Pat Hanrahan. Controlling Procedural Modeling Programs with StochasticallyOrdered Sequential Monte Carlo. In SIGGRAPH 2015.
 [21] Jerry O. Talton, Yu Lou, Steve Lesser, Jared Duke, Radomír Měch, and Vladlen Koltun. Metropolis Procedural Modeling. ACM Trans. Graph., 30(2), 2011.
 [22] O. Št’ava, S. Pirk, J. Kratt, B. Chen, R. Měch, O. Deussen, and B. Beneš. Inverse Procedural Modelling of Trees. Computer Graphics Forum, 33(6), 2014.
 [23] David Wingate and Theophane Weber. Automated Variational Inference in Probabilistic Programming. In NIPS 2012 Workshop on Probabilistic Programming.
 [24] Michael T. Wong, Douglas E. Zongker, and David H. Salesin. Computergenerated Floral Ornament. In SIGGRAPH 1998.
Comments
There are no comments yet.