Generative Adversarial Neural Cellular Automata

Motivated by the interaction between cells, the recently introduced concept of Neural Cellular Automata shows promising results in a variety of tasks. So far, this concept was mostly used to generate images for a single scenario. As each scenario requires a new model, this type of generation seems contradictory to the adaptability of cells in nature. To address this contradiction, we introduce a concept using different initial environments as input while using a single Neural Cellular Automata to produce several outputs. Additionally, we introduce GANCA, a novel algorithm that combines Neural Cellular Automata with Generative Adversarial Networks, allowing for more generalization through adversarial training. The experiments show that a single model is capable of learning several images when presented with different inputs, and that the adversarially trained model improves drastically on out-of-distribution data compared to a supervised trained model.

READ FULL TEXT VIEW PDF
06/29/2021

Towards self-organized control: Using neural cellular automata to robustly control a cart-pole agent

Neural cellular automata (Neural CA) are a recent framework used to mode...
11/30/2017

Unsupervised Learning for Cell-level Visual Representation in Histopathology Images with Generative Adversarial Networks

The visual attributes of cells, such as the nuclear morphology and chrom...
03/15/2021

Growing 3D Artefacts and Functional Machines with Neural Cellular Automata

Neural Cellular Automata (NCAs) have been proven effective in simulating...
06/22/2020

Neural Cellular Automata Manifold

Very recently, a deep Neural Cellular Automata (NCA)[1] has been propose...
01/28/2022

Variational Neural Cellular Automata

In nature, the process of cellular growth and differentiation has lead t...
09/20/2020

Predicting Geographic Information with Neural Cellular Automata

This paper presents a novel framework using neural cellular automata (NC...

1 Introduction

Regeneration of body parts is an example of the fundamental self-organization of cells. Even though each cell can only interact with its immediate surroundings, it acts accordingly to its position inside the body. Cells are able to react and re-organize depending on external stimulation or environmental changes [8]. These biological phenomena are one of the motivations for Neural Cellular Automata (NCAs).

Shortly after the paper introducing NCAs by Mordvintsev et al. [11], several papers were published showcasing the performance of this architecture on different tasks. Growing and robustly repairing images [11], texture generation [12], classification [14] or pixel wise segmentation [16], show the high diversity of applications for Neural Cellular Automata.

In all previous experiments, NCAs are trained to perform well in a single situation, producing good-looking results, but defying the nature of the biological motivation for the model. This is why this paper starts of by illustrating that the NCA is capable of generating different images depending on the initial image. These generalization capabilities will then be further tested on unseen initial images through validation data and out-of-distribution data. Furthermore, we introduce a new type of NCA, the Generative Adversarial Neural Cellular Automaton (GANCA), combining the adversarial training of a GAN structure with the generative capabilities of an NCA, to increase the performance on out-of-distribution data.

2 Foundation

A Neural Cellular Automaton, spreads and computes local information while using the same program for each local part. An example of an NCA generating images frame by frame can be seen in Figure 1.

Figure 1: Frame by frame output of three different NCAs growing emojis. Each image shows progress of frames, starting from a blank (as in Mordvintsev et al. [11]) or edge image (ours) from the training set. The last two emojis were generated from the same NCA.
Figure 2: Sketch of the implementation of a single iteration of a simple Neural Cellular Automaton. The world state contains 4 channels of R.G.B.A. information, while additional channels contain latent state information. We use a world state depth of 16, resulting in around 20k parameters for the NCA. Increasing the depth drastically increases the number of parameters.

Intuitively speaking, an NCA follows the same mandatory restrictions:

  1. Every cell only sees its local neighborhood.

  2. Every cell runs the same program.

To propagate the local information globally, the NCA spreads this information throughout time. In most cases the world consists of a grid of cells. An exemplary grid world is shown in Figure 1, where each pixel represents one cell.

A cell

has a state vector containing information of this cell, and can only observe its local neighborhood by seeing the state vectors of the closest cells. Using the

program, every cell will use this information to update their state vector. This way, information can be propagated through the whole world.

All NCA architectures inherently follow these statements. Some architectures add additional constraints, while others only differ in the program that each cell will execute.

2.1 Formalization

Using the restrictions, we propose a new formalization for the NCA architecture. Assuming a grid world111A grid world is not mandatory. However, assuming a grid world simplifies the formalization, making it easier to follow., defines the world state at time step , where is the width, the height and the depth of the world. The depth  also describes the size of the state vectors for each cell. The state vector of a cell at position , during time step will be written as .

This will lead to a definition of the world state :

Using these notations, the first statement (i) can be written as:

where is a neighborhood function returning only the local information around a cell, the neighborhood in this case. The program, or local transition function, is denoted here as . This can be used to define the transition function of the NCA as:

The second statement (ii) can now be defined through the update rule of the whole world state:

Using zero-padding, the values outside the boundaries, e.g.

, are set to 0. Throughout the matrix, the same function is applied to every element, running the same program across all cells.

Using these definitions, one of the final images in Figure 1 can be written as:

The initial world state is, for example, the blank image with only modified pixels in the middle of the image. The same transition will be applied 60 times to result in .

2.2 Implementation

The NCA can be implemented through existing deep learning structures, as it simply is a ResNet block

[5] with two convolutional layers. The first layer is the perception layer propagating the information of a cell to the neighboring cells through convolution, while the following layer(s) are convolutions for additional computation. A single iteration is a single pass through this block, which is visualized in Figure 2.

It is possible to train the NCA in a supervised fashion through a pixel-wise L2 loss, e.g. comparing the first 4 layers produced by the NCA, representing the RGBA channels, to the target image. Additionally, in order to keep the image stable over a longer period of time, Mordvintsev et al. [11] introduced persistence. The goal of persistence is the same as finding an update rule such that for every the following equation holds

where

is an exemplary loss function, calculating the loss to the target image at time step 

, with .

Persistence can be accomplished, for example, by having a chance of using the output of the final NCA iteration of the previous training step as input for the first iteration of the next training step. This way, the NCA additionally has to keep the image stable over a possibly infinite period of time.

An exemplary implementation of an NCA and all the experiments presented in this paper can be found on GitHub222Link to be announced.

3 Related Work

Beginning with a collection of articles about differentiable self-organizing systems333https://distill.pub/2020/selforg/, Mordvintsev et al. [10] motivate a new approach to self-organization in systems.

In the first article of this thread, Mordvintsev et al. [11] introduce the concept of Neural Cellular Automata, where they grew emojis from a blank seed using a single NCA. The objective was to start from a blank image and start to grow the emoji step by step. This was further extended by regrowing the image after it has been damaged by a user interaction.

Their follow-up article uses the NCA for the MNIST classification problem

[14]

. This is done by using the information of the state vector from each cell, such that each cell will classify itself as one of the classes from the MNIST set. Even though the task is vastly different to the first NCA application, the training is very similar.

The third paper in the timeline, by Sandler et al. [16], is not part of the self-organizing thread. In this paper, the task is to use the NCA for pixel-wise segmentation and classification. With this paper, they introduced several new ways of training the NCA, which will be partly used throughout the experiments.

The most recent article by Niklasson et al. [12] focused on creating textures with NCA architectures. Here, they used the information of a VGG-16 ([17]) network to apply a loss on the generated texture, allowing for the NCA to generate good-looking textures.

4 Methods

This section begins by extending the idea of generating images with an NCA, by producing multiple target images with a single NCA. Moreover, this training concept will then be improved in generalization by adversarially training the NCA, creating the Generative Adversarial Neural Cellular Automata (GANCA).

4.1 Multiple target images

Figure 3: Example emojis for training, validation and out-of-distribution data. “Edge” is the input image with only edge information, while “GT” shows the ground truth image. Every ground truth image is an emoji from the Windows 10 Segoe UI Emoji font. The out-of-distribution data are hand drawn images, resembling faces, but vastly different to the training and validation images.
Figure 4: Visualization of the Generative Adversarial Neural Cellular Automata (GANCA) architecture with a very similar setup compared to a standard GAN architecture. The generator now uses several iterations of the same NCA to generate an image. Additionally, the input for the generator is an image, which is in our task an edge image generated from the real images.

To generate different images based on the input, the input has to contain information of the target image. Our approach here is to change the dataset for training to contain a ground truth image as the target and an image consisting of the edges of the ground truth image, which will be used as the input.

The edge image is obtained by using an edge detection algorithm, a Canny filter [2] in this case, on the ground truth image. Examples are shown in Figure 3.

This way, the NCA has a chance to differentiate between the shape of the input image to generate different fully colored emojis. Furthermore, the performance will be tested on a validation set from the same distribution and on an out-of-distribution dataset.

The out-of-distribution images are hand drawn edges which are not perfectly round or have the same line width etc. This is used to test how the NCA will react on images vastly different from the training examples.

4.2 Generative Adversarial Training

The generalization performance on out-of-distribution inputs, after training on multiple target images, can be further improved through partly444As the real images are used to generate the initial image for the NCA, the task is not fully unsupervised. This step is not mandatory, and edge images not directly related could be used as well. unsupervised training. To train unsupervised, we use adversarial training from GAN architectures [4], where several NCA steps replace the generator. Because the NCA operates on images, the input also changes to an image, instead of a random value vector. In our case, the GAN will be trained on multiple emoji faces, as used in section 4.1. As the goal of the training is to transform an edge image into a colored version, other GAN architectures show strong similarities. Because the edge image is based on a specific emoji, the GAN structure can be considered conditional [9]. However, the GANCA also shows strong relations to the Image2Image GAN [6], or Cycle GANs [18] in general.

The Generative Adversarial Neural Cellular Automata (GANCA) architecture concept is explained in Figure 4. This is very similar to a standard GAN structure, with the main difference being the generator and its input. The generator in the GANCA uses an NCA to update the input edge image step by step with only local information to produce an image. The input edge image is based on the set of real images provided for the discriminator reduced to edges. The NCA uses a random amount of iterations in a defined range, e.g. between 50 and 60, to produce the image. A single iteration is a single pass through an NCA block, as visualized in Figure 2.

As GANs are notoriously difficult to train, several papers have been published to increase and stabilize the performance ([1], [3], [7], [13], [15]). A good improvement to training, is to add noise to the input images of the discriminator and the output of the generator [7], which makes it a lot harder to overfit for the discriminator. Adding even a small amount of noise leads to a drastically more stable performance. The second change we included, is to smooth the labels for training, 0 or 1 to a value similar to 0.1 or 0.9 as this tends to improve the performance of GAN models [15]. A comparison using these training improvements can be found in Figure 5.

(a) Simple GANCA without any modifications
(b) Using improved training techniques
Figure 5: Training the GANCA structure with different settings. The first image shows the loss during training when using a simple GANCA structure, while the right image shows a loss graph when training with noise and label smoothing. A good loss for the generator and discriminator should converge between 0 and 1, similar to the right plot.

Additionally, using the WGAN loss helped to improve stability. The Wasserstein loss[1]

uses a critic network, which does not output probabilities but instead values between

.

5 Results

Without any major modifications, the standard NCA is able to fully reproduce the training images and keep them stable (persistence), after only 10k steps with a batch size of 16.

Figure 6: Output for training, standard validation and out-of-distribution data of the NCA after training for 10k steps on 50 different emoji faces. “Edge” is the edge image used as the input for the NCA, “NCA” is a single frame of the output of the NCA, while “GT” shows the target ground truth image.

These results are visualized in Figure 6, where the NCA is trained on 50 different emoji faces with different edge images as input. The NCA is able to overfit on the training data, producing very good replicates to the ground truth images. Simple validation images also look good, while more complex images are missing color information. As the color information is never provided, this result is to be expected. However, on out-of-distribution data, the NCA does not perform very well on any image. Many artifacts are introduced, and some details are removed.

Using the introduced method of the GANCA, the NCA can be trained adversarially to increase the out-of-distribution performance.

Figure 7: Output of the GANCA and the standard NCA architecture after training on 50 different emojis. Edge images used as input here are out-of-distribution validation images. The GANCA shows significant improvement over the standard NCA training.

As visualized in Figure 7, the performance on out-of-distribution data increases drastically in comparison to the standard NCA architecture. Additional images can be found in the appendix in Section A. Most images contain all the details and do not introduce any artifacts.

6 Discussion

The NCA showed very good results on generating different emojis with just a single NCA structure. Thereby demonstrating that the architecture has enough capacity to reproduce all training images, to the extent of overfitting. Even after overfitting, it was still able to produce decent-looking results on validation images. Because the NCA is only able to use local information, the model did not fully overfit on the training data, and still performed decently on the validation images. This shows that the NCA is inherently able to generalize through the architecture alone.

These generalization capabilities, specifically on out-of-distribution data, are further improved through unsupervised adversarial training.

In addition, it should be noted that the number of parameters used in the generator architecture by using an NCA (around 20k parameters) is much smaller compared to other generators. This is because a lot of information is placed in the environment and passed through the image step by step. Because of this step by step information sharing, the NCA will need a longer time to produce the final output, compared to standard feed forward architectures. However, Sandler et al. [16] used asynchronous spatial updates for NCAs, allowing the use of a grid of processors without global synchronization, for a drastic decrease in computational time.

These experiments show a proof of concept, as training on high-quality images with big datasets, was not the main concern. This is still possible and highly encouraged for future projects. It is very promising to see such a drastic difference in results of the out-of-distribution data between the adversarial training and the standard supervised training.

7 Conclusion

We proposed two novel approaches for training NCAs, namely by training on multiple target images with a single model and by adversarial training with the GANCA architecture. We demonstrated that a single NCA is capable of learning different emojis and is still able to perform well on validation images without additional color information. Using existing training improvements from GAN architectures, NCAs can be trained in an adversarial fashion, which drastically improve performances on out-of-distribution data.

7.1 Future Work

As NCAs have only recently been introduced and show great adaptability to a wide range of tasks, many possible directions are promising. We strongly encourage further work that takes advantage of the fact that NCAs use an image as input and operate on the image step by step.

As a use case, NCA could be used for videos, e.g. for segmentation or generation, as each frame is connected to previous frames, which is inherently built in the architecture.

At last, working with user interactions is a promising field. By modifying the current state of the image, it allows for an easy way to interact with the model, as already showcased by Mordvintsev et al. [11]. This type of interactive behavior is very intuitive with an NCA architecture, compared to a deconvolutional architecture in which the image is generated in a single forward pass. Moreover, extended by adversarial training, reacting to unpredictable behavior of users, could be a strong use case.

References

  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017-07) Wasserstein Generative Adversarial Networks. In

    International Conference on Machine Learning

    ,
    pp. 214–223 (en). External Links: ISSN 2640-3498 Cited by: §4.2, §4.2.
  • [2] J. Canny (1986) A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence PAMI-8 (6), pp. 679–698. Cited by: §4.1.
  • [3] I. J. Goodfellow (2015)

    On distinguishability criteria for estimating generative models

    .
    In International Conference on Learning Representations, Cited by: §4.2.
  • [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Eds.), Vol. 27. Cited by: §4.2.
  • [5] K. He, X. Zhang, S. Ren, and J. Sun (2016-06) Deep Residual Learning for Image Recognition. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    pp. 770–778. External Links: ISSN 1063-6919, Document Cited by: §2.2.
  • [6] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: Document Cited by: §4.2.
  • [7] S. Jenni and P. Favaro (2019-06) On Stabilizing Generative Adversarial Training With Noise. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 12137–12145 (en). External Links: Document, ISBN 978-1-72813-293-8 Cited by: §4.2.
  • [8] E. Karsenti (2008-03) Self-organization in cell biology: a brief history. Nature Reviews Molecular Cell Biology 9 (3), pp. 255–262 (en). External Links: ISSN 1471-0080, Document Cited by: §1.
  • [9] M. Mirza and S. Osindero (2014-11) Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs, stat]. External Links: 1411.1784 Cited by: §4.2.
  • [10] A. Mordvintsev, E. Randazzo, E. Niklasson, M. Levin, and S. Greydanus (2020-08) Thread: Differentiable Self-organizing Systems. Distill 5 (8), pp. e27 (en). External Links: ISSN 2476-0757, Document Cited by: §3.
  • [11] A. Mordvintsev, E. Randazzo, E. Niklasson, and M. Levin (2020-02) Growing Neural Cellular Automata. Distill 5 (2), pp. e23 (en). External Links: ISSN 2476-0757, Document Cited by: §1, Figure 1, §2.2, §3, §7.1.
  • [12] E. Niklasson, A. Mordvintsev, E. Randazzo, and M. Levin (2021-02) Self-Organising Textures. Distill 6 (2), pp. e00027.003 (en). External Links: ISSN 2476-0757, Document Cited by: §1, §3.
  • [13] A. Radford, L. Metz, and S. Chintala (2016-11) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In International Conference on Learning Representations, Cited by: §4.2.
  • [14] E. Randazzo, A. Mordvintsev, E. Niklasson, M. Levin, and S. Greydanus (2020-08) Self-classifying MNIST Digits. Distill 5 (8), pp. e00027.002 (en). External Links: ISSN 2476-0757, Document Cited by: §1, §3.
  • [15] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, and X. Chen (2016) Improved techniques for training GANs. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Cited by: §4.2.
  • [16] M. Sandler, A. Zhmoginov, L. Luo, A. Mordvintsev, E. Randazzo, and B. A. y Arcas (2020-08) Image segmentation via Cellular Automata. arXiv:2008.04965 [cs]. External Links: 2008.04965 Cited by: §1, §3, §6.
  • [17] K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §3.
  • [18] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, Cited by: §4.2.

Appendix A Additional Images

As including videos in pdfs is not possible, Figure 8 shows the output of every frame for 36 frames. It should be noted here that after around frame 24 the NCA has finished creating the emoji and will keep the image stable for the following frames. Additional frames could be added, but they do not show any or only minor changes.

The following Figure 9 shows the output of the NCA on a different dataset. The training images look very close to the ground truth images, as the NCA was able to overfit on the training data. On the validation set, the performance decreases drastically. For example, the first image shows a tree, which the NCA mistakenly drew yellow instead of green. Because the NCA does not know the color of a tree, as no trees are present in the training set, this behavior is to be expected. Interestingly, the overall shape of each image stays consistent and only wrong colors are chosen. This behavior is consistent to the training task of the NCA, as it needs to color some object in some specific way and keep it stable.

Figure 8: Output of an NCA frame by frame for a single emoji, showing 36 frames.
Figure 9: Output of the NCA after training on 50 different emojis from the “nature” category. The input (Edge) produces the output of the NCA (NCA) and is compared to the ground truth (GT).

Additional outputs for training, validation and out-of-distribution images are provided in Figure 10, 11 and 12. In each figure, the first column of images shows the input edge image, the GANCA output, the output of the NCA and the ground truth, if it exists. Similar to the example on “nature” images, the validation results for the supervised trained NCA are missing important color information, following the same reasoning. Furthermore, the performance of the GANCA on the training and validation images can not directly be compared to the standard supervised NCA architecture. This is because through the unsupervised training, the GANCA was never trained to replicate the ground truth images.

Figure 10: Additional outputs for the training dataset.

Appendix B Additional Training Details

The additional graph in Figure 13 shows the L2 loss when training an NCA supervised on 50 different emojis. After around 1k steps with a batch size of 16, the NCA stops improving the validation loss. At this point, most images consist of a yellow mush, resulting in a decent average loss. The training loss keeps improving for the next steps, while the validation loss keeps mostly stable.

In Figure 5, the training losses of two different GANCA architectures is compared. The improved GANCA uses the techniques introduced in Section 4.2.

Figure 11: Additional outputs for the validation dataset
Figure 12: Additional outputs for the out-of-distribution dataset.
Figure 13: Training graph of the standard NCA learning to grow 50 different emoji faces. The loss used is the L2 loss.