Texture synthesis is an actively studied problem of computer graphics and image processing. Most of the work in this area is focused on creating new images of a texture specified by the provided image pattern [Efros1999TextureSB, Lefebvre2006AppearancespaceTS]
. These images should give the impression, to a human observer, that they are generated by the same stochastic process that generated the provided sample. An alternative formulation of the texture synthesis problem is searching for a stochastic process that allows efficient sampling from the texture image distribution defined by the input image. With the advent of deep neural networks, feed-forward convolutional generators have been proposed that transform latent vectors of random i.i.d. values into texture image samples[Ulyanov2016TextureNF].
Many texture patterns observed in nature result from local interactions between tiny particles, cells or molecules, which lead to the formation of larger structures. This distributed process of pattern formation is often referred to as self-organisation. Typical computational models of such systems are systems of PDEs [Turing1990TheCB, Chan2020LeniaAE], cellular automata, multi-agent or particle systems.
In this work, we use the recently proposed Neural Cellular Automata (NCA) [mordvintsev2020growing, randazzo2020self-classifying, sandler2020image]
as a biologically plausible model of distributed texture pattern formation. The image generation process is modelled as an asynchronous, recurrent computation, performed by a population of locally-communicating cells arranged in a regular 2D grid. All cells share the same differentiable update rule. We use backpropagation through time and a grid-wide differentiable objective function to train the update rule, which is able to synthesise a pattern similar to a provided example.
The proposed approach achieves a very organic-looking dynamic of progressive texture synthesis through local communication and allows great flexibility in post-training adaptation. The decentralized, homogeneous nature of the computation performed by the learned synthesis algorithm potentially allows the embedding of their implementations in future media, such as smart fabric displays or electronic decorative tiles.
2 Neural CA image generator
We base our image generator on the Neural Cellular Automata model [mordvintsev2020growing]. Here we summarize the key elements of the model and place them in the context of PDEs, cellular automata, and neural networks to highlight different features of the model.
2.1 Pattern-generating PDE systems
Systems of partial differential equations (PDEs) have been used to model natural pattern formation processes for a long time. Well known examples include the seminal work by Turing[Turing1990TheCB], or the Grey-Scott reaction diffusion patterns [Munafo2014StableLM]. It seems quite natural to use PDEs for texture synthesis. Specifically, given a texture image sample, we are looking for a function that defines the evolution of a vector function , defined on a two-dimensional manifold :
where represents a dimensional vector, whose first three components correspond to the visible RGB color channels: . The RGB channels should form the texture, similar to the provided example. denotes a matrix of per-component gradients over , and is a vector of laplacians 333We added the Lapacian kernel to make the system general enough to reproduce the Gray-Scott reaction-diffusion system.. The evolution of the system starts with some initial state and is guided by a space-time uniform rule of . We don’t imply the existence of a static final state of the pattern evolution, but just want the system to produce an input-like texture as early as possible, and perpetually maintain this similarity .
2.2 From PDEs to Cellular Automata
In order to evaluate the behaviour of a PDE system on a digital computer, one must discretize the spatio-temporal domain, provide the discrete versions of gradient and Laplacian operators, and specify the integration algorithm. During training we use a uniform Cartesian raster 2D grid with torus topology (i.e. wrap-around boundary conditions). Note that the system now closely mirrors that of a Cellular Automata - there is a uniform raster grid, with each point undergoing time evolution dependant only on the neighbouring cells. The evolution of the CA state , where and are integer cell coordinates, is now given by
Discrete approximations of gradient and Laplacian operators are provided by linear convolutions with a set of 3x3 kernels , and . We use Sobel filters [Sobel1990AnI3] and a 9-point variant of the discrete Laplacian:
We call a perception vector, as it gathers information about the neighborhood of each cell through convolution kernels. The function
is the per-cell learned update rule that we obtain using the optimisation process, described later. The separation between perception and update rules allows us to transfer learned rules to different grid structures and topologies, as long as the gradient and Laplacian operators are provided (see section4.4).
The cell update rate is denoted by . In the case of the uniform update rate (), the above rule can be interpreted as a step of the explicit Euler integration method. If all cells are updated synchronously, initial conditions have to vary from cell-to-cell in order to break the symmetry. This can be achieved by initializing the grid with random noise. The physical implementation of the synchronous model would require existence of a global clock, shared by all cells. In the spirit of self-organisation, we tried to decouple the cell updates. Following the [mordvintsev2020growing], we emulate444This is a pretty rough model of the real world asynchronous computation, yet it seems to generalise well into the unforeseen scenarios, like two adjacent grid regions exhibiting very different update rates (fig. 14). the asynchronous cell updates by independently sampling from for each cell at each step, with . Asynchronous updates allow to CA to break the symmetry even for the uniform initial state .
2.3 From CA to Neural Networks
The last component that we have to define is the update function. We use , where is a perception vector, and ,
are the learned parameters. If we look at the resulting system from the differentiable programming perspective, we can see that the whole CA image generator can be represented by a recurrent convolutional neural network (Fig.1
), that can be built from standard components, available in modern deep learning frameworks. Using the established neural net terminology, we can call the perception stage a depth-wise 3x3 convolution with a set of fixed (non-learned) kernels. The per-cell update (
) is a sequence of 1x1 convolutions with a ReLU. The additive update is often referred to as a ”residual network”, and even the stochastic discarding of updates for some cells can be thought of as a variant of dropout, applied per-cell, rather than per-value.
Once the image generator is expressed in terms of standard differentiable building blocks, we can use back-propagation from a provided objective function to learn the model parameters.
NCA models described here are relatively compact by the modern standards. They contain less that 10k trainable parameters. We also use the quantization-aware training [Jacob2018QuantizationAT] to make sure that our models can be efficiently executed on the hardware that stores both parameters and activations as 8-bit integers. This allowed us to develop a WebGL-demo that allows to interact with learned NCA models in real time. We refer readers to the supplemental materials and the code release555https://selforglive.github.io/cvpr_textures/.
The cell-state vector size (including visible RGB) is . Perception vector size is ; . The hidden layer size is 96. Thus, matrices have dimensions 48x96 and 96x12. Total number of CA parameters is 5868.
3 Training the Neural CA
In order to train a NCA we need to define differentiable objective (loss) functions, that measure the current performance of the system, and provide a useful gradient to improve it. We experiment with two objectives - a VGG-based texture synthesis loss[Gatys2015TextureSU] and an Inception-based feature visualisation loss [olah2017feature]. Hereinafter, we refer to these as ”texture-loss” and ”inception-loss”, respectively. These losses are applied to the snapshots of CA grid state , and are only affected by the first three values of state vectors, that are treated as RGB-color channels.
Style transfer is an extensively studied application of deep neural networks. L. Gatys et al. [Gatys2015TextureSU]
introduced the approach common to almost all work since - recording and matching neuronal activations in certain layers of an ”observer” network - a network trained to complete a different task entirely whose internal representations are believed to capture or represent a concept or style. We apply the same approach to training our NCA. We initialize the NCA states as vectors of uniform random noise, iterate for a stochastic number of steps and feed the resulting RGB channels of the state into the observer network (VGG-16[Simonyan2015VeryDC]), and enforce a loss to match the values of the gram matrices when the observer network was fed the target texture and when it was fed the output of the NCA. We backpropagate this loss to the parameters of the NCA, using a standard backpropagation-through-time implementation [bpp].
We use the textures collected in the Describable Textures Dataset by Cimpoi et al [cimpoi14describing]. DTD has a human-annotated set of images relating to 47 distinct words describing textures, which in turn were chosen to approximate the high level categories humans use to classify textures. Each image has a primary descriptor label, as well as secondary and tertiary descriptors. We do not explicitly make use of the texture labels in this work, but we notice significant differences in the quality of the reproduction across the different texture categories. See 3 for some examples of categories where our method fails to produce a coherent output. We note that these tend to be images representing textures that aren’t the result of locally interacting processes, such as human faces or singular images of large potholes.
Deepdream [Mordvintsev2015InceptionismGD] and subsequent works have allowed insight into the features learned by networks, in addition to opening the doors to an extensive set of artistic and creative works [mordvintsev2018differentiable]. We investigate the behaviours learned by NCA when tasked with maximizing certain neurons in an observer network. We use Inception [Szegedy2015GoingDW] as an observer network and investigate the resulting behaviours for a variety of layers in this network. In the results section we show some of the more remarkable patterns generated using this loss.
3.2 Training procedure
At each training iteration we sample a batch of initial grid states , and iterate the current NCA for steps. The batch loss is computed at and backpropagation through time is used to adjust the CA parameters. Batches of 4 images, 128x128, are used during training. The state checkpoint pool size is 1024. The NCA model for each pattern is trained for 8000 steps using the Adam optimizer. Learning rate is 2e-3 and decays to 2e-4 after 2000 steps. A single texture CA trains in a few minutes on a V100 GPU.
We’d like to ensure that the successive application of the learned CA rule doesn’t destroy the constructed pattern over numbers of steps that largely exceed that of training runs. We adopt the checkpointing strategy from [mordvintsev2020growing]. It consists of maintaining a pool of grid states, initialised with empty states. At each training step we sample a few states from the pool and replace one of them with an empty state, so the the model doesn’t forget how to build the pattern from scratch. The final states are placed back into the pool, replacing the sampled ones.
An interesting feature of the proposed texture generation method is the lack of an explicitly defined final state of the process, nor the requirement to generate a static pattern after a series of CA steps. We only require individual snapshots to look like coherent textures. In practice, we observe that this leads to the emergence of ”living”, constantly evolving textures. We hypothesize that the NCA finds a solution where the state at each step is aligned [mordvintsev2018differentiable] with the previous step and thus the motion, or appearance of motion, we see in the NCA is the state space traversing this manifold of locally aligned solutions.
4 Results and discussion
4.1 Qualitative texture samples
Below we present some of the learned textures which demonstrate creative and unexpected behaviours in the NCA. While these snapshots provide some insight into the time-evolution of the NCA, we strongly urge readers to view the videos and interactive demonstrations in the supplementary materials.55footnotemark: 5
Figure 4 shows the time-evolution of a texture-generating algorithm trained on the static image of several bubbles on a plain blue background. The resulting NCA generates a set of bubbles, moving in arbitrary directions, with the spatial density of bubbles in the image roughly corresponding to that of the static template image. It is important to bear in mind that the NCA knows nothing of the concept of bubbles existing as individual objects that can have their own velocity and direction. However the NCA treats them as such, not allowing them to collide or intersect. We refer to such structures as solitons in the solution space of NCA, named after the concept introduced to describe the structures and organisms found in the solution space of Lenia [chan2019lenia] 666Classically, solitons refer to self-reinforcing wave packets found in the solutions to certain PDEs. Chan borrowed this terminology for organisms in Lenia, and we borrow it for structures in our NCA..
Figure 5 shows the behavior of the NCA when some bubbles are destroyed by setting the states of these cells to random noise. The rest of the pattern remains unchanged and stable, and over the course of a few time steps, a somewhat consistent pattern is filled in inside the gap. Many new bubbles appear and some are destroyed when there is crowding of the bubbles, eventually returning to a ”bubble-density” that roughly corresponds to that of template image. Some of the new bubbles immediately after the damage are misshapen - oblong or almost divided into two. Misshapen bubbles have the ability to recover their shape, or in severe cases divide into smaller bubbles.
Figure 6 shows behaviour trained on a chequered diamond pattern. The NCA generates a number of potential black and white diamonds. At first, these are randomly positioned and not aligned, resulting in a completely inconsistent grid. After a few iterations, certain diamonds are removed or joined with each other, resulting in more consistency. After a long time period (), in this case approximately steps, we see the grid reaching perfect consistency, suggesting the NCA has learned a distributed algorithm to continually strive for consistency in the grid, regardless of current state.
In figure 7, the NCA learns the core concept of a wall, or barrier, as a soliton. A similar behaviour can be observed on the 6th row of the front-page figure. The walls proceed to move in arbitrary directions, and when a free edge reaches another wall, it joins to form a vertex. The resulting pattern appears random, but incorporates many of the details of the template image, and has successfully learned the ”rules” of walls in the input texture (they remain of a fixed width, they tend to be aligned in certain directions, and so forth).
In figure 8, a distributed algorithm emerges which tries to merge different stripes to achieve consistency. Ill-fitting stripes with free ends travel up or down along the diagonal, until they either spontaneously merge with the neighbouring stripe, or find another loose end to merge with. Eventually, total consistency is achieved.
Figure 9 depicts the generation of a triangular mesh. As with other templates, at first the NCA generates a multitude of candidate strokes in the pattern. These strokes then either traverse along each other, disconnect, or rejoin other vertices in order to increase consistency in the pattern. Their motion appears smooth and organic in nature. After a longer time, the generated texture approaches perfect consistency.
In figure 10, the NCA attempts to learn to generate a pattern that consists of a weave of different thread crossing each other. The NCA captures this rule of threads being oriented in one of three directions - the diagonals or the vertical, and generates a texture similar in style to the original. However, it does not exactly capture the repeating pattern of the template texture.
4.2 Inception samples
Figure 11 shows a small selection of patterns, obtained by maximising individual feature channels [olah2017feature] of the pre-trained Inception network. Some of these patterns exhibit similar behaviour to those trained with the texture-loss - merging or re-arranging different solitons to achieve some form of consistency in the image.
4.3 Advantages of NCA for texture generation
Distributed algorithms for texture generation
Neural GPUs and Neural Turing Machines introduced to a wider audience the idea of deep neural networks being able to learn algorithms, as opposed to just serving as excellent function approximators. We believe this is an underappreciated method shift that will serve to allow deep learning solutions to solve more complex problems than is currently possible, due to the more general nature of an algorithm as opposed to an approximated function. Similar observations have been made about RNNs[rnns], but empirical results in terms of learning computations versus statistical correlations have been weak.
We believe an NCA inherently learns a distributed algorithm to solve a task. Any single cell can only communicate with neighbouring cells, a key first skill the cells must learn is an algorithm for coordination and information sharing over a longer period of time. The stochastic updates encourage any such algorithm to be resilient and inherently distributed in nature - it must perform its task regardless of when or where the cell may be invoked.
The results section present several qualitative observations of behaviour we think shows evidence of a distributed algorithm having been learned.
Long term stability
Recall the model and training regime exposes the NCA to a loss after steps. Longer time period stability is encouraged by the sample pool mechanism. However, we observe the solutions learned by the NCA to be stable for far longer time periods than those used during training, even accounting for the sample pooling mechanism. This suggests the learned behaviour enters a stable state where minor deviations return back to the same solution state (a basin of attraction [mordvintsev2020growing]).
Figure 12 shows an NCA evaluated for , and steps. We believe most NCA we have trained on textures are fully stable over longer time periods on this time-scale.
NCA are spatially invariant - it is possible to extend them in any direction with linear complexity on the order of number of pixels and number of time-steps. The computation can continue even with irregular or dynamic spatial boundaries. Figure 13 shows a fixed size NCA being expanded to double the width and height. The newly added cells are initialised in the usual way, with uniform random noise, and run the same NCA rule as the existing cells. The NCA immediately fills out the empty space, interacting with the existing cells to form a continuation of the pattern consistent with the initial texture (i.e. the newly formed checkerboard spaces align themselves with the existing grid).
The NCA is further employable as a ”texture printer” - generating the first rows of an output pattern, ”freezing” these cells by stopping them from undergoing any further updates, then evaluating the next rows of cells. The next rows of cells would only rely on the :th cell for the information necessary to continue generating the pattern in a consistent fashion. Thus, arbitrarily large, consistent, textures can be generated with only a linear computational cost and a linear memory cost to store the ”finished” cells, without the need for the entire grid to be computed at once to achieve consistency.
NCA are evaluated in a highly parallel fashion. For instance, one could have two NCA running in parallel on separate hardware coordinate spatially by simply sharing the boundary layer of cells between them. Synchronisation in time is not required as the NCA are robust to asynchronous updates, as can be seen in figure 14.
Robust to unreliable computations
Thirdly, the resulting algorithm is extremely robust. For instance, it is possible to delete individual cells or groups of cells, or add individual cells at the boundaries. We demonstrate this behaviour in in the Results section in figure 5 as well as on several NCA in the first-page figure. NCA are thus ideal for any unreliable underlying computational hardware - many cells can fail or be reset, and they will ”heal” in a fashion that is consistent with the existing pattern. Section 4.4 further explores this property by altering the grid underlying the computation.
4.4 Post-training behaviour control
As mentioned in the section 2.2, trained CA cells obtain the information about their neighborhood through local gradient and Laplacian operators. This opens a number of opportunities for post training model adaptation and transfer to new environments. For example, we demonstrate the possibility of replacing the square grid with a hexagonal one just by using a different set of convolution kernels (fig. 15).
Another adaptation possibility, inspired by [Lefebvre2006AppearancespaceTS]
, is enabled by rotating the estimated local gradient vectors before applying the update rule:
where , and is a local rotation angle for the cell at position . This trick allows to project the texture onto an arbitrary vector field (fig. 16)
5 Related work
Probably the most popular family of methods for solving this task is based on sampling new pixels or patches from the provided example, conditioned on the already synthesized part of the output image [Efros1999TextureSB, Gumin2016, Barnes_Shechtman_Finkelstein_Goldman]. Another family of methods is based on the iterative image optimisation process that tries to match output image feature statistics, produced by some extractor, to those of the input sample. The extractor can be hand-designed [Portilla2004APT], or based on a pretrained convolutional neural network [Gatys2015TextureSU]. In either case, the output of texture synthesis is the raster image.
A very unconventional approach to texture synthesis is presented by C. Reynolds in [Reynolds2011InteractiveEO]. Authors use human guided selection to steer the evolution process to create image patches that are hard to distinguish from the textured background. This work, along with a few others [Sims1991ArtificialEF, Stanley2007CompositionalPP, ha2016abstract], uses function composition instead of a raster grid as an image representation.
The idea of the importance of the right image representation for image processing tasks in the differentiable optimisation context is explored further in [mordvintsev2018differentiable]. Another line of research focuses on employing image-generating neural networks to represent a whole family of texture images [Ulyanov2016TextureNF].
There is existing work attempting to learn rules for CAs that generate target images [Elmenreich_Fehervari_2011]. Our method uses backpropagation instead of evolution, and focuses on textures rather than pixel-perfect reconstructions. Authors of [Elmenreich_Fehervari_2011] admit inability of their method to synthesize textures in the last paragraph of section five.
Other related work includes [Henzler_Mitra_Ritschel_2019] and [Yu_Barnes_Shechtman_Amirghodsi_Lukac_2019]; both make use of an encoder-decoder architecture to construct a texture generator. Also related is a family of work using GAN-style approaches for texture generation; [Portenier_Bigdeli_Goksel_2020], [Bergmann_Jetchev_Vollgraf_2017], [Zhou_Zhu_Bai_Lischinski_Cohen-Or_Huang_2018].
A very interesting image restoration method proposed in [Chen2017TrainableNR]. The authors use a learned PDE system, that is spatially (but not temporally uniform).
Inspired by pattern formation in nature, this work applies an augmented version of the Neural Cellular Automata model to two new tasks: a texture synthesis task and a feature visualisation task. Additionally, we show how training NCA approximates identification of a PDE within a family of PDEs often used to model reaction diffusion systems, evaluated in discretised time and space. We demonstrate remarkable qualities of the model: robustness, quick convergence, learning a qualitatively algorithmic solution, and relative invariance to the underlying computational implementation and manifold. These qualities suggest that models designed within the paradigm of self-organisation may be a promising approach to achieving more generalisable and robust artificial systems.