Neural Painters: A learned differentiable constraint for generating brushstroke paintings

04/17/2019 ∙ by Reiichiro Nakano, et al. ∙ 0

We explore neural painters, a generative model for brushstrokes learned from a real non-differentiable and non-deterministic painting program. We show that when training an agent to "paint" images using brushstrokes, using a differentiable neural painter leads to much faster convergence. We propose a method for encouraging this agent to follow human-like strokes when reconstructing digits. We also explore the use of a neural painter as a differentiable image parameterization. By directly optimizing brushstrokes to activate neurons in a pre-trained convolutional network, we can directly visualize ImageNet categories and generate "ideal" paintings of each class. Finally, we present a new concept called intrinsic style transfer. By minimizing only the content loss from neural style transfer, we allow the artistic medium, in this case, brushstrokes, to naturally dictate the resulting style.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 5

page 6

page 9

page 11

page 12

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: Using a neural painter as a differentiable image parameterization Mordvintsev et al. (2018), we are able to directly optimize brushstrokes to minimize style transfer’s content loss (left), or visualize ImageNet classes (right; pineapple, starfish, and violin categories).

There has been much work on using neural networks to generate painting-like images, some of the most notable being style transfer

Gatys et al. (2015) and GANs Goodfellow et al. (2014), and all their many variationsElgammal et al. (2017); Jing et al. (2017); Gatys et al. (2016). Most of these techniques generate images by having a network directly calculate the RGB value of each pixel.

However, artists don’t paint by generating each individual pixel, they paint by generating brushstrokes.

As early as 1990 Haeberli (1990)

, long before the sudden popularity of deep learning 

LeCun et al. (2015), there has been extensive work on automatically finding a set of brushstrokes to “paint” images Haeberli (1990); Shiraishi and Yamaguchi (2000); Hertzmann (1998); Winkenbach and Salesin (1994). Stroke-based rendering (SBR) is the field of digitally generating paintings by arranging brushstrokes on a canvas according to some optimization goal Hertzmann (2003). The simple idea of using actual brushstrokes gives the outputs of SBR algorithms a very real painting-like texture.

Recently, there has been significant progress on getting neural networks to produce paintings by generating brushstrokes Xie et al. (2013); Ganin et al. (2018); Frans and Cheng (2018) instead of pixels. An example is the work done by Ganin et al. (2018)

on SPIRAL, a reinforcement learning agent trained adversarially to learn how to use a real painting program to generate images. A neural agent called SPIRAL learns to reconstruct images from MNIST, Omniglot, and CelebA by defining brushstrokes on a canvas.

Though impressive, the reinforcement adversarial learning framework used by SPIRAL is complex and requires significant computational resources Ganin et al. (2018). Recent work by Ha and Schmidhuber (2018) and Hafner et al. (2018) have shown that building a differentiable world model of an environment can drastically reduce computational requirements for reinforcement learning algorithms.

In this paper, we perform various experiments with neural painters, which are differentiable simulations of a non-differentiable painting program. Our contributions are as follows:

First, we present two ways of training a neural painter using VAEs and GANs, respectively. We then recreate SPIRAL’s CelebA reconstruction results Ganin et al. (2018) using a non-reinforcement learning adversarial approach with a neural painter. Concurrent to our work, similar work on training neural painters was done by Zheng et al. (2019) on StrokeNet. Our approach was developed independently at around the same time and uses a different painting program and training methodology, so we believe it can be viewed as a complementary work. We also propose a simple method based on preconditioning to encourage this agent to follow human strokes when reconstructing digits.

In addition, we also propose the use of a neural painter as a differentiable image parameterization Mordvintsev et al. (2018)

. By directly optimizing brushstrokes using backpropagation, we open up a powerful new way to visualize pre-trained image classifiers, by allowing them to “paint” classes they were trained to identify.

Finally, we combine neural painters with neural style transfer Gatys et al. (2015). By optimizing brushstrokes to minimize only the content loss, we can “paint” the higher-level features of a target image. The artistic medium, in this case, brushstrokes, dictates the style of the resulting image. We call this method intrinsic style transfer.

2 Training Neural Painters

The main role of a neural painter is to serve as a fully differentiable simulation of a particular painting program. In this paper, we use an open source non-deterministic painting program called MyPaintcontributors (2019), which is the same program used by SPIRAL. There are two main considerations for training a neural painter, the painting program’s action space and the neural painter’s architecture.

2.1 The Action Space

The action space defines the set of parameters that are used as control inputs for the painting environment. It serves as the interface that an agent can use to generate a painting.

For the experiments in this paper, we use a slight variation of the action space used by SPIRAL Ganin et al. (2018). This action space maps a single action to a single brushstroke in the MyPaint program. An agent “paints” by successively generating actions and applying full brushstrokes on a canvas.

The action space consists of the following variables:

  • Start and end pressure - Two variables that define the pressure applied to the brush at the beginning and end of the stroke.

  • Brush size - Determines the radius of the generated brushstroke.

  • Color - 3D integer vector determining the RGB color of the brushstroke.

  • Brush coordinates - Three Cartesian coordinates on a 2D canvas, defining the brushstroke’s shape. The coordinates define a starting point, end point, and an intermediate control point, constituting a quadratic Bezier curve.

The second consideration in training a neural painter is the architecture. We need an appropriate architecture and training paradigm to learn an accurate mapping from a point in the action space to the corresponding brushstroke. In this paper, we consider 2 approaches for training a neural painter.

2.2 Training a VAE Neural Painter

Our first approach is inspired by the two-stage method used by Ha and Schmidhuber (2018) to learn a world model for a particular environment. 222In the original paper, a VAE Kingma and Welling (2013) is used to learn a latent space for all possible frames in an environment. An RNN is then used to predict the next frame of an environment given an action.

Figure 2: VAE neural painter training process

A variational autoencoder (VAE) 

Kingma and Welling (2013) is trained to learn a latent space of brushstrokes. We then train a separate network to map an action to the point in latent space corresponding to the expected brushstroke. Unlike the approach in Ha and Schmidhuber (2018)

, we do not need a recurrent neural network (RNN) to map from actions to brushstrokes as there is no relationship between a brushstroke and the previous actions performed on the painting program. Figure

2 shows the training process for a VAE neural painter.

Figure 3: Pairs of real brushstrokes (left) and the corresponding VAE neural painter outputs (right). Notice how the VAE outputs are “smudged” versions of the ground truth.

Figure 3 compares the results of a trained VAE neural painter with the real output of MyPaint.

The biggest weakness of the VAE neural painter is its “smudging” effect on the brushstrokes. Instead of accurately recreating the dotted texture of the larger brushstrokes, the VAE chooses to smoothen them out instead. Depending on the task, this inaccuracy could lead to less than ideal results when we transfer an agent from a neural painter to the real painting program.

2.3 Training a GAN Neural Painter

To solve this problem, we turn to another widely popular family of generative models, generative adversarial networks Goodfellow et al. (2014). GANs have been shown to produce sharper images than VAEs, and this property could help the neural painter produce accurate brushstrokes.

Instead of relying on the reconstruction and KL divergence loss used by VAEs, we use an adversarial loss function to directly learn a mapping from actions to brushstrokes. Unlike a regular GAN, we do not inject noise into the input of the generator. Instead, we feed the generator the input action and have it map directly to a brushstroke. The discriminator is given real and generated action-brushstroke pairs and tries to distinguish whether the pair is valid or not. In this way, it is similar to a conditional GAN 

Mirza and Osindero (2014). The training process, which uses Wasserstein loss Arjovsky et al. (2017); Gulrajani et al. (2017), is illustrated in Figure 4.

The results of training this network is shown in Figure 5. Although the recreation isn’t perfect, we can see that the outputs of the neural painter are “rougher” and more realistic as opposed to the smoothed out VAE brushstrokes.

Figure 4: GAN neural painter training process
Figure 5: Pairs of real brushstrokes (left) and the corresponding GAN neural painter outputs (right). Notice how the GAN is able to capture the irregularities of real brushstrokes.

How well an agent’s actions transfer from a neural painter to the real painting program depend directly on how accurate the neural painter’s outputs are. In this section we explored only two possible approaches, and there remains a lot of room for improvement in this area. We have provided accompanying notebooks for training our VAE and GAN neural painters to serve as a good starting point for anyone interested in training their own.

3 Recreating SPIRAL Results

Ganin et al. (2018) trained a neural agent called SPIRAL to learn to “paint” images by using the constrained action space of a real painting program. In the paper, an agent was trained to paint images from three different datasets: MNIST, Omniglot, and CelebA. As the painting program is non-differentiable, the agent was trained using adversarial reinforcement learning.

Since a neural painter is fully differentiable, we don’t need reinforcement learning techniques to perform the same experiments. We can simply train the agent using regular adversarial methods.

Our LSTM-basedHochreiter and Schmidhuber (1997) neural agent is designed to take an input target image and output a set of actions. These output actions are connected directly to the neural painter and mapped to brushstrokes on a canvas. The agent’s goal is to recreate the input image on the canvas using the constraints that the neural painter imposes. The idea is that the agent’s outputs can be transferred directly to the real painting program, despite seeing only the neural painter during training.

When training this agent, instead of directly optimizing a pixelwise loss like L2 distance, we use an adversarial loss. As discussed in the SPIRAL paper, this leads to better and more well-behaved gradients that the agent can use to learn. Figure 6 shows the full training setup for our agent.

Figure 6: Adversarial training of agent for the purpose of reconstruction

We test this approach’s performance on three datasets: MNIST, KMNIST Clanuwat et al. (2018), and CelebA. You can explore the results for our agent in Figure 7.

Figure 7: Each group shows three images: the target image (left), the neural painter output (center), and the generated brushstrokes transferred back to MyPaint (right).

A significant advantage of this approach over SPIRAL is the amount of computing resources needed to achieve these results. Training SPIRAL involves using several multi-CPU/multi-GPU computers over the course of a few days. Our experiments can be reproduced entirely within the single GPU-environment of Google Colaboratory.

We believe this quick convergence can be partially attributed to the ease of credit assignment. When training our agent, the full gradients from each stroke are available and can be used directly in backpropagation, as opposed to the reinforcement learning paradigm used by SPIRAL, where only the reward at the end of painting is taken into account. This can be observed qualitatively by comparing the strokes produced by these agents on the CelebA dataset. SPIRAL’s first few strokes are completely occluded by future strokes, while this does not happen with our approach.

3.1 The effect of discrete actions

In the previous section on training neural painters, we mentioned how we removed discrete variables from the action space that SPIRAL used. Why? It is not impossible to train a neural painter on a discrete action space. Our methods for training a neural painter work just as well for a continuous action space as it does with an action space with discrete variables 333Brush size and stroke pressure are discrete variables with 10 levels. An extra binary flag is used to determine whether or not a brush is lifted i.e. a lifted brush produces no stroke..

However, there is one very important distinction: neural networks take continuous inputs. Even if a neural painter perfectly recreates the painting program’s outputs when given a valid discrete action, its output between two discrete values is completely undefined. The neural painter must bridge this gap, and it is free to do it however it wants.

Figure 8: This image shows the effect of lifting the brush partially (i.e. a value between 0 and 1). Values between 0 and 1 are undefined and do not exist in the real environment, however, a neural painter must still “dream” up an output.
Figure 9: The figure shows three images: the target image (left), the neural painter output (center), and the generated brushstrokes transferred back to MyPaint (right). This result shows the effect of an agent thinking it can use a thicker brush than it actually can.

Figure 8 shows the output of the neural painter as the lift variable is moved continuously from 0 to 1. It shows an interesting effect. As the brush is lifted, the produced stroke seems to “flicker” until eventually disappearing completely. Somehow, the network has decided that these random flickers were the easiest way to represent a lift value between 0 and 1.

When the goal is to simply recreate a painting program’s output, this behavior is not a problem. After all, a user can simply constrain the inputs to use only valid values. Unfortunately, this matters very much when we try to use the gradients from this neural painter to train an agent.

At best, this behavior makes the agent think it can produce impossible strokes, causing a discrepancy between the outputs of the neural painter and the painting program. One specific example is when the neural painter interpolates a stroke thickness between discrete values, which is rounded to the nearest brushstroke size when transferred to the painting program. This is shown in Figure

9.

At worst, it can kill training due to bad gradients. When the lift variable is used, the agent quickly gets into a state where it produces only invisible strokes. At this point, moving the agent’s variables slightly in any direction would still produce invisible strokes. Essentially, there is no gradient for the agent to learn anything and training is stuck.

Figuring out how to handle these discrete actions will be an interesting research direction moving forward. Unfortunately, we cannot always side step this issue as we have done in this case by completely ignoring discrete variables. Many interesting environments (including the MuJoCo Scenes environment solved by SPIRAL) will have unavoidable discrete actions, and if we want to apply neural painters to those tasks, handling a discrete action space will be necessary.

4 Towards Learning Human Strokes

In the previous section, you may have observed an interesting thing about the stroke order used by the painting agent. For any given agent, the stroke order generated for all target images will be similar.

For example, in the MNIST case, an agent may choose to draw all digits using a bottom-to-top, counter-clockwise approach. The agent does not bother using different strokes for different digits. Notice how 8’s are usually treated as 3’s with closed loops.

For CelebA, an agent might follow a certain set of steps for every target image e.g. shade the background, fill in the face shape, add hair, then add a stroke for the eyes.

In these experiments, the painting agent was trained with only one goal: recreate a given target image using brushstrokes. Since neural networks are “lazy learners” that converge to the nearest local minimum, the agent learns the simplest possible solution: use similar strokes everywhere. There’s no reason to favor dissimilar, let alone human, stroke orders, as long as the final painted image looks as much like the target image as possible.

Of course, there is value in an agent that tries to draw like a human. First, we might be able to learn a model that accurately converts pixel character images to stroke vector data. Second, it might actually improve the original goal of reconstruction. Since digits in the MNIST dataset were actually drawn using a human order, an agent will likely find it easier to reconstruct some of the finer details of an image if it understands how a digit is usually drawn.

In this section, we try a very simple but effective method to bias the agent to learn human strokes: preconditioning. Instead of starting adversarial training with the agent’s variables initialized randomly, we precondition the agent by forcing it to generate a particular set of strokes for each class. Our process is as follows:

  • We begin by manually generating a single example for each class, with strokes we think are representative for the entire class. e.g. We can decide to draw 0’s counter-clockwise, 1’s top-to-bottom.

  • Train the agent to reconstruct our manual strokes for each class via mean squared error, disregarding adversarial loss. Of course, we do not train the discriminator at this point.

  • After the agent has started producing our manual strokes for every class, we can consider it preconditioned.

  • Reintroduce adversarial loss. At this point, we can either completely drop off stroke reconstruction loss, or reduce its effect at a scheduled interval.

  • Train normally.

We test our process on MNIST. Note that the approach requires us to provide only a single human example for each class. After training, the agent has more or less learned to stick with our original stroke order, with slight variations to make sure the target image is reconstructed. All 0’s are drawn counter-clockwise, and all 1’s are drawn top-to-bottom.

One way to explain why preconditioning works is that it changes the basins of attraction for the optimization problem. For an untrained, randomly initialized agent, the nearest local minimum for the adversarial loss is likely one that keeps stroke order similar, regardless of the target character. However, once it has been preconditioned to produce a certain set of strokes for each class, the closest local minimum becomes one that is as similar as possible to those strokes.

Although the approach shows promise, there is much room for improvement. Preconditioning relies on us knowing the correct class label for each image. Without this information, we will not be able to tell the agent to use a certain stroke order for a particular digit, simply because we do not know what digit it is.

Another weakness of the approach is its inability to properly handle multimodal classes that can be written in different ways. An example is the MNIST 7. 7’s can be drawn either with or without the horizontal bar at the center, which are two different stroke orders. To apply the same technique, we need to condition the agent in a way that it distinguishes different modes of the class, and have it apply the correct stroke order. Solving these problems are a good direction for future research.

5 As a Differentiable Image Parameterization

Differentiable image parameterizations are a technique developed by Mordvintsev et al. (2018) to generate visualizations and art from pre-trained neural networks. Given a network trained on images (usually a convolutional network), we attempt to find a 2D image that maximally activates a particular neuron in the network. Instead of directly optimizing the individual RGB values of each pixel, we try different image generation processes that map some set of parameters to a 2D image. As long as this process is differentiable, we can directly optimize the parameters via backpropagation. Depending on the image generation process, the results can be strikingly different and beautiful. Various parameterizations such as CPPNs Stanley (2007)

, Fourier transforms, and 3D to 2D mappings were demonstrated in 

Mordvintsev et al. (2018).

Since neural painters are differentiable mappings from the brushstroke action space to a 2D image, they can be used directly as a differentiable image parameterization.

5.1 Visualizing ImageNet classes

We focus on visualizing the final layer of the pre-trained networks, corresponding to specific ImageNet classes. We find that a good way to improve the outputs is to optimize more than one pre-trained network at the same time444As done by Tom White in his work on Synthetic Abstractions White (2018). This helps the outputs generalize better by reducing the effect of each individual network’s imperfections. Figure 10 shows how to use a neural painter as a differentiable image parameterization to visualize ImageNet classes.

Figure 10: A neural painter as a differentiable image parameterization.
Figure 11: A neural network’s paintings of the “optimal” pandas

A fun way to interpret the results of a neural painter used as a differentiable image parameterization is as the answer to the question:

If you gave a pre-trained network a brush and asked it to paint a picture of the optimal panda, what would it paint?

Figure 11 shows examples of outputs generated by optimizing different ImageNet classes. The results show just how diverse the generated outputs can be for any given class, by simply tweaking the number of strokes, changing the neural painter, or using different pre-trained networks.

5.2 Intrinsic Style Transfer

As a differentiable image parameterization, neural painters are not limited to visualizing layers of a pre-trained neural network. We can produce various interesting effects depending on the loss we are optimizing for. One such effect is stroke-based painterly rendering Hertzmann (2003) of a target image. With this method, we optimize brushstrokes to minimize only the content loss555To calculate the content loss between a content image and an output image, we take the mean squared error of their respective activations for a particular layer in a pre-trained neural network. in neural style transfer Gatys et al. (2015). This setup is shown in Figure 12.

Figure 12: Intrinsic style transfer using a neural painter

Intuitively, this technique lets us find brushstrokes that preserve only the higher-level content in the target image. The effect produced is that of painting only the meaningful parts of a target image, without caring about pixel-level reconstruction. The style is an intrinsic property dictated purely by the artistic medium, in this case, brushstrokes. Figure 13 shows some results of intrinsic style transfer.

By manually changing the primitives of the brushstroke, we can achieve vastly different styles. Note the difference in outputs by simply constraining the brush to use only grayscale values. Finding new styles by applying different constraints or using different artistic mediums666Constraints could be simple modifications such as changing the color palette or making brushstrokes thinner. A good place to start looking for interesting brushstroke constraints is the extensive literature available on stroke-based rendering Hertzmann (2003). We can also completely change the medium and try differentiable parameterizations like CPPNs, which have been compared to light paintingsMordvintsev et al. (2018). will be an exciting research path forward.

6 Conclusions

Constraints are a key element of creativity. The natural constraints that an artistic medium imposes upon the artist give a piece of art a distinct look from others. An artist attempting to paint a scene using oil paints will get a remarkably different result from someone trying to sketch the same scene with a pencil. In the same sense, using a neural painter leads to creative ways to achieve an objective - whether it be recreating digits, painting faces, or maximizing a pre-trained network’s activations.

This paper explored the power of neural painters - a differentiable constraint learned from a non-differentiable real-life constraint. We believe this concept could be extended to different artistic mediums, such as splatter painting, or even 3D sculptures.

Acknowledgments

We would like to thank David Ha for his detailed feedback and encouragement throughout this work. We are also grateful to Ludwig Schubert and the Distill community for providing support for the diagrams in this paper.

Many of our diagrams were repurposed from the article Differentiable Image Parameterizations by Mordvintsev et al. (2018)

Figure 13: Results for intrinsic style transfer using a GAN neural painter for both colored and grayscale brushstrokes. The strokes are generated on multiple overlapping grids. Although the neural painter is only designed to output 64x64 pixels on a canvas, we can stitch multiple canvases together to achieve an arbitrary resolution, limited only by GPU memory.
Figure 14: A neural network’s paintings of the optimal bees
Figure 15: A neural network’s paintings of the optimal violins
Figure 16: Other ImageNet class visualizations

References