Active Divergence with Generative Deep Learning – A Survey and Taxonomy

07/12/2021 ∙ by Terence Broad, et al. ∙ 4

Generative deep learning systems offer powerful tools for artefact generation, given their ability to model distributions of data and generate high-fidelity results. In the context of computational creativity, however, a major shortcoming is that they are unable to explicitly diverge from the training data in creative ways and are limited to fitting the target data distribution. To address these limitations, there have been a growing number of approaches for optimising, hacking and rewriting these models in order to actively diverge from the training data. We present a taxonomy and comprehensive survey of the state of the art of active divergence techniques, highlighting the potential for computational creativity researchers to advance these methods and use deep generative models in truly creative systems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Generative deep learning methods, and in particular deep generative models, have become very powerful at producing high quality artefacts and have garnered a huge amount of interest in machine learning, computer graphics and audio signal processing communities. In addition, because they are capable of producing artefacts of high cultural value, they are also of interest to artists and for the development of creativity support tools.

One of the main goals of researchers in computational creativity and by artists and others using generative deep learning systems, is to find ways to get generative models to produce novel outcomes that diverge from the training data. In some respects, attempting to create a generative model that does not model the training data is an oxymoron, as by definition a generative model

must model some existing data distribution. However, generative neural networks are powerful tools with the unique capability of learning to render entire distributions of complex high dimensional data with ever-increasing fidelity. It is no wonder then, that there have been a large number of approaches developed in order tweak, manipulate and optimise these models in order to actively diverge from the training data, or any existing data distribution.

The term active divergence (berns2020bridging) describes methods for utilising generative deep learning in ways that do not simply reproduce the training data. Methods for this have been developed within the field of computational creativity, but also a goal commonly shared by neighbouring communities, such as those building creativity support tools and artists, researchers and other pracitioners publishing and sharing results under the ‘CreativeAI’ banner (cook2018neighbouring). This paper offers a comprehensive survey and taxonomy of the state of the art with respect to methods developed across these fields.

Additionally, this paper outlines some of the possible applications, and outlines key opportunities for computational creativity research to advance active divergence methods beyond tricks and hacks, towards more automated and autonomous creative systems. Many of the research directions presented are still very nascent and a lot of work is still to be done in regards to evaluating and benchmarking these methods. Better ways of measuring and evaluating these techniques will go a long way to advancing understanding and allowing more creative responsibility to be handed over to the systems. The comparative account of the methods, use-cases and future research directions for active divergence is offered as a resource to inform future research in generative deep learning tools and systems that take creative leaps beyond reproducing the training data.

Technical Overview

While not all generative models rely on generative deep learning, we refer here to those that build on artificial neural networks111For further reading, a comprehensive overview of generative models is given in harshvardhan2020comprehensive.. Given a data distribution , a generative model will model an approximate distribution

. The parameters for the approximate distribution can be learned by an artificial neural network. This learning task is tackled differently by different architectures and training schemes. E.g. autoencoders

(rumelhart1985learning) and variational autoencoders (VAE) (kingma2013auto; rezende2014stochastic)

learn to approximate the data through reconstruction via an encoding and a decoding network, while generative adversarial networks (GAN)

(goodfellow2014generative) consists of a generator that is guided by a discriminating network. In most cases, the network learns a mapping from a lower-dimensional latent distribution to the complex high-dimensional feature space of a domain. The model, thus, generates a sample

given an input vector

which should resemble samples drawn from the target distribution . In the simplest case of a one layer network the generated sample is generated using the function: where is the input vector from the latent distribution ,

is a non-linear activation function,

and

are the learned association matrix and bias vector for generating samples in the approximate distribution

. The model parameters and

, are typically learned through gradient-based optimisation process. In this process, a loss function will require the model to maximise the likelihood of the data either: (i) explicitly, as in the case of autoencoders, autoregressive

(frey1996does) and flow-based generative models (dinh2014nice); (ii) approximately, as is the case in VAEs; (iii) or implicitly, as in the case of GANs. Generative models can also be conditioned on labelled data. In the conditional case, the generative model takes two inputs and , where

represents the class label vector. Another form of conditional generative models are translation models, such as pix2pix

(isola2017image), that takes a (high dimensional) data distribution as input and learns a mapping to which is an approximation of the true target function .

All deep generative models, and in particular ones that generate high dimensional data domains like images, audio and natural language, will have some level of divergence between the target distribution and the approximate distribution , because of the complexity and stochasticity inherent in high dimensional data. The goal of all generative models is to minimise that level of divergence, by maximising the likelihood of generating the given data domain. Active divergence methods however, intentionally seek to create a new distribution that does not directly approximate a given distribution , or resemble any other known data distribution. This is either done by seeking to find model parameters and (in the single layer case) that generate novel samples , or by making other kinds of interventions to the chain of computations.

Survey of Active Divergence Methods

We present a comprehensive overview and taxonomy of the state of the art in methods for achieving active divergence. In this survey, we will use the term divergence in the statistical sense, as being the distance (or difference) between two distributions. There are other definitions of divergence relevant to research in creativity, such as Guildford’s dimensions of divergent thought (hocevar1980intelligence). While there are some parallels that can be drawn between some of the active divergence methods, and theories of divergent thinking; for the clarity of technical exposition, we will be sticking strictly to the statistical definition of divergence in this overview of active divergence methods.

(a) divergent fine-tuning
(b) chaining models
(c) network bending
(d) network blending
Figure 1: Some visual examples of results produced using various active divergence methods. (a) An image from Strange Fruit by Mal Som (som2020strange), that was created by fine-tuning a pre-trained model towards a continously shifting domain. (b) A frame from the video artwork You Are Here by Derrick Schultz (schutlz2020you)

, created by chaining multiple models and technniques including: a custom GAN, network bending, image translation, and super-resolution. (c) An image from the series of artworks

Teratome (broad2020teratome), that was created using network bending techniques (broad2021network). (d) An example of network blending (pinkney2020interpolation), where the image provided has been generated from a model which combines the photorealistic textures from the FFHQ StyleGAN2 model, but the spatial structure from a model trained on an Ukiyo-e dataset (pinkney2020aligned). All images are reproduced with permission from their respective creators.

Novelty search over learned representations

Methods in this category take existing generative models trained using standard maximum likelihood regimes and then specifically search for the subset of learned representations that do not resemble the training data by systematically sampling from the model222An overview of methods for sampling generative models is given in white2016sampling.. Taking account of the fact that any approximate distribution will be somewhat divergent from the true distribution , these methods seek to find the subset of the approximate distribution which is not contained in the true distribution . kazakcci2016digits

present an algorithm for searching for novelty in the latent space of a sparse autoencoder trained on the MNIST dataset

(lecun1998gradient)

. They start by creating a sample of random noise and by using a Markov chain monte carlo (MCMC) method of iteratively re-encoding the sample through the encoder, then refining the sample until it produces a stable representation. They use this approach to map out all the representations the model can generate, then perform k-means clustering on the latent space encoding of these representations. By disregarding clusters that correspond to real digits, they are left with clusters of representations of digits that do not exist in the original data distribution. It has been argued that these ‘spurious samples’ are the inevitable outcome of generative models that learn to generalise from given data distributions

(kegl2018spurious) and that there is a trade off between the ability to generalise to every mode in the dataset and the ratio of spurious samples in the resulting distribution.

Novelty generation from an inspiring set

The methods in this section train a model from scratch using a training dataset, but do not attempt to model the data directly, rather using it as reference material to draw inspiration from. We therefore refer to this training set (the given distribution ) as the inspiring set (ritchie2007some).

An approach for novel glyph generation utilises a class-conditional generative model trained on the MNIST dataset (lecun1998gradient), but in this case they train the model with ‘hold-out classes’ (cherti2017out), additional classes that do not exist in the training data distribution. These hold-out classes can then sampled during inference, which encapsulate the subset of the approximate distribution that is not included in the target distribution . These divergent samples can then be generated directly by conditioning the generator with the hold-out class label, without the need for searching the latent space.

An approach that directly generates a new distribution from an inspiring set is the creative adversarial networks (CAN) algorithm (elgammal2017can). The algorithm uses the WikiArt dataset (saleh2016large)

, a labelled dataset of paintings classified by ‘style’ (historical art movement). This algorithm draws inspiration from the GAN training procedure

(goodfellow2014generative), but adapts it such that the discriminator has to classify real and generated samples by style, and the generator is then optimised to maximise the likelihood of the generated results being classified as ‘artworks’ (samples that fit the training distribution of existing artworks) but maximise their deviation from existing styles in order to produce the novel distribution .

Training without data

Training a model from a random initial starting point without any training data, almost certainly guarantees novelty in the resulting generated distribution. Existing approaches to doing this all rely on the dynamics between multiple models to produce emergent behaviours through which novel data distributions can be generated.

Multi-generator dynamics

broad2019searching present an approach to training generative deep learning models without any training data, by using two generator networks, and relying on the dynamics between them for an open-ended optimisation process. This approach took inspiration from the GAN framework, but instead of a generator mimicking real data, two generators attempt to mimic each other while the discriminator attempts to tell them apart. In order to have some level of diversity in the final results, the two generators are simultaneously trying to produce more colours in the generated output than the other generator network, leading to the generation of two novel, yet closely related distributions and .

Generation via communication

An alternative approach to generating without data uses a single generator network, and uses the generated distribution as a channel for communication between two networks, which together learn to generate and classify images that represent numerical and textual information from a range of existing datasets (simon2019dimensions). In subsequent work, by constraining the generator with a strong inductive bias for generating line drawings, this approach can be utilised for novel glyph generation (park2020generating).

Divergent fine-tuning

Divergent fine-tuning methods take pre-trained models that generate an approximate distribution and fine-tune the model away from the original training data. This can either be done by optimising on new training data, or by using auxiliary models and custom loss functions. The goal being to find a new set of model parameters that generate a novel distribution , that is significantly divergent from the approximate distribution and the original distribution .

Cross domain training

In cross domain training, transfer learning is performed to a pre-trained model that generates the approximate distribution

and is then trained to approximate the new data distribution . This transfer learning procedure will eventually lead to the model learning a set of parameters that generate the approximate distribution . However, by picking an iteration of the model mid-way through this process, a set of parameters can be found that produced a blend between the two approximate distributions and , resulting in the producing the novel distribution (schultz2020mixed). This method, was discovered by many artists and practitioners independently, who were performing transfer learning with GAN models for training efficiency, but noted that the iterations of the model part-way through produced the most interesting, surprising and sometimes horrifying results (adler2020transfer; black2020noface; mariansky2020transfer; shane2020cat).

Continual domain shift

Going beyond simply mixing two domains, one approach that gives more opportunity to steer the resulting distribution in the fine-tuning procedure, is to optimise on a domain that is continually shifting. In creating the artworks Strange Fruit (som2020strange), the artist Mal Som “iterate[s] on the dataset with augmenting, duplicating and looping in generated images from previous ticks” to steer the training of the generator model (som2021personal). In this process, the target distribution at step may contain samples generated from earlier iterations of the model at any previous time step where . Additionally, the target distribution , may no longer include samples, or may have duplicates of samples from previous iterations of the target distribution. Using this process, the target distribution can be continually shaped and guided.

This process of modelling a continually shifting domain often leads to the —generally unwanted— phenomenon of mode collapse (thanh2020catastrophic). However, in Som’s practice, this is induced deliberately. After a model has collapsed, Som explores its previous iterations to find the last usable instance right before collapse. Som likens this practice to the artistic technique of defamiliarisation, where common things are presented in unfamiliar ways so audiences can gain new perspectives and see the world differently (som2021personal).

Loss hacking

An alternative strategy, is to fine-tune a model without any training data. Instead a loss function is used that directly transforms the approximate distribution into a novel distribution without requiring any other target distribution. broad2020amplifying use the frozen weights of the discriminator to directly optimise away from the likelihood of the data, by using the inverse of the adversarial loss function. This process reverses the normal objective of the generator to generate ‘real’ data and instead to generate samples that the discriminator deems to be ‘fake’. By applying this process to a GAN that can produce photo-realistic images of faces, this fine-tuning procedure crosses the uncanny valley in reverse, taking images indistinguishable from real images, and amplifying the uncanniness of the images before eventually leading to mode collapse. In a similar fashion to Som’s practice (see previous sub-section), one instance of the model before mode collapse was hand-selected and a selection of its outputs turned into the series of artworks Being Foiled (broad2020being).

Infusing external knowledge

By harnessing the learned knowledge of externally trained models, it is possible to fine-tune models to infuse that knowledge to transform the original domain data with characteristics defined using the auxiliary model. broad2019transforming utilise a classifier model trained to differentiate between datasets, in conjunction with the frozen weights of the discriminator to fine-tune a pre-trained GAN generator model away from the original distribution and towards a new local minimum defined by the loss function . is defined as the weighted sum of the two auxiliary models given the random latent vector , and and being the hyper-parameters defining the weightings for the two components of the loss function.

The StyleGAN-NADA framework (gal2021stylegan) takes advantage of the external knowledge of a contrastive language–image pre-training model (CLIP) (radford2021learning)

. CLIP has been trained on billions of text and image pairs from the internet and provides a joint-embedding space of both images and text, allowing for similarity estimation of images and text prompts. In StyleGAN-NADA, pretrained StyleGAN2 models

(karras2019analyzing) can be fine-tuned using user-specified text prompts, the CLIP model

is then used to encode the text prompts and the generated samples in order to provide a loss function where the cosine similarity

between the clip encodings of the text string and the generated image embedding given random latent , can be minimised using the loss . This training procedure, guides the generator towards infusing characteristics from an unseen domain defined by the user as text prompts.

Chaining models

An approach that is widely used by artists who incorporate generative models into their practice, but not well documented in academic literature, is the practice of chaining multiple custom models trained on datasets curated by the artists. The ensembles used will often utilise standard unconditional generative models, such as GANs, in combination with other conditional generative models such as image-to-image translation networks, such as pix2pix (isola2017image) and CycleGAN (zhu2017unpaired), along with other approaches for altering the aesthetic outcomes of results such as style transfer (gatys2016neural). Artists will often train many models on small custom datasets and test out many combinations of different models, with the aim of finding a configuration that produces unique and expressive results. The artist Helena Sarin will often chain multiple CycleGAN models into one ensemble, and will reuse training data during inference, as the goal of this practice “is not generalization, my goal is to create appealing art” (sarin2018playing). The artist Derrick Schultz draws parallels between the practice of chaining models and Robin Sloan’s concept of ‘flip-flopping’ (schultz2021personal), where creative outcomes can be achieved by “pushing a work of art or craft from the physical world to the digital world and back, often more than once” (sloan2012flipflop).

Network bending

Network bending (broad2021network) is a framework that allows for active divergence using individual pre-trained models without making any changes to the weights or topology of the model. Instead, additional layers that implement standard image filters are inserted into the computational graph of a model and applied during inference to the activation maps of the convolutional features333Inserting filters into GANs was also developed independently in the Matlab StyleGAN playground (pinkney2020matlab).. As the computational graph of the model has been altered, the model which previously generated samples from the approximate distribution , now produces novel samples from the new distribution , without any changes being made to the parameters of the model. In the simplest case of a two layer model an association weight matrix and bias vector for each layer . Which generates sample from input vector and using a non-linear activation function . In the network bending framework, a deterministic function (controlled by the parameter ) is inserted into the computational graph of the model and applied to the internal activations of the model , allowing the model to produce new samples from the new distribution . Beyond the simplest case of a transformation being applied to all features in a layer, the transformation layer can also be applied to a random sub-section of features, or to a pre-selected set of features. broad2021network present a clustering algorithm, that in an unsupervised fashion, groups together sets of features within a layer based on the spatial similarity of their activation maps. This clustering algorithm is capable of finding sets of features responsible for the generation of various semantically meaningful components of the generated output across the network (and semantic) hierarchy, which can then be manipulated in tandem allowing for semantic manipulation of the internal representations of the generative model.

In addition to applying filters to the activation maps, it is also possible to enlarge samples by increasing the size of the activation maps and interpolating and tiling them

(pouliot2020gan). The network bending framework has been extended into the domain of audio synthesis (mccallum2020network) where it has been applied to neural vocoder models using the differential digital signal processing (DDSP) approach (engel2020ddsp). In order to adapt the framework for the audio domain, mccallum2020network implement a number of filters that operate in the time domain, such as oscillators. Network bending has also been applied in the domain of audio-reactive visual synthesis using generative models (brouwer2020audio)

, with the deterministic transformations being controlled automatically using features extracted from audio analysis.

Network blending

Blending multiple models trained on different dataset allows for more control over the combination of learned features from different domains. This can either be done by blending the predictions of the models, or by blending the parameters of the models themselves.

Blending model predictions

akten2016real

present an interactive tool for text generation allowing for the realtime blending of the predicted outputs of an ensemble of long-short term memory network (LSTM) models

(hochreiter1997long) trained to perform next character prediction from different text sources. A graphical user interface allows the user to dynamically shift the mixture weights for the weighted sum for the predictions of all of the models in the ensemble, prior to the one hot vector encoding which is used to determine the final predicted character value.

Blending model parameters

A number of approaches, all demonstrated with StyleGAN2 (karras2019analyzing), take advantage of the large number of pre-trained models that have been shared on the internet (pinkney2020awesome). Of these almost all have been transfer-learned from the official model weights trained on the Flickr-Faces High Quality (FFHQ) dataset. It has been shown that the parameters of models transfer-learned from the same original source share commonalities in the way their weights are structured. This makes it possible to meaningfully interpolate between the parameters of the models directly (aydao2020interp). By using an interpolation weighting , it is possible to control the interpolation for the creation of a set of parameters .

Layers can also be swapped from one model to another (pinkney2020interpolation), allowing the combination of higher level features of one model with lower level features of another. This layer swapping technique was used to make the popular ‘toonification’ method, which can be used to find the corresponding sample to a real photograph of a person in a Disney-Pixar-esque ‘toonified’ model, simply by sampling from the same latent vector that has been found as the closest match to the person in FFHQ latent space (abdal2019image2stylegan). A generalised approach that combines both weight interpolation and layer-swapping methods for multiple models, uses a cascade of different weightings of interpolation for the various layers of the model (arfafax2020barycentricnotebook).

colton2021evolving presents an evolutionary approach for exploring and finding effective and customisable neural style transfer blends. Upwards of 1000 neural style transfer models trained on 1-10 style images each, can be blended through model interpolation, using an interface that is controlled by the user. MAP-Elites (mouret2015illuminating) in combination with a fitness function calculated using the output from a ResNet model (he2016deep) were used in evolutionary searches for optimal neural style transfer blends.

Model rewriting

Model rewriting encompasses approaches where either the weights or network topology are altered in a targeted way, through manual intervention or by using some form of heuristic based optimisation algorithm.

Stochastic rewriting

To create the series of artworks Neural Glitch the artist Mario Klingemann randomly altered, deleteed or exchanged the trained weights of pre-trained GANs (klingemann2018neural). In a similar fashion, the convolutional layer reconnection technique (ruzika2020gan) randomly swaps convolutional features within layers of pre-trained GANs. This technique is applied in the Remixing AIs audiovisual synthesis framework (collins2020remixing).

Targeted rewriting

bau2020rewriting present a targeted approach to model rewriting. Here, a sample is taken from the model and manipulated using standard image editing techniques (referred to as a ‘copy-paste’ interface). Once the sample has been altered corresponding to the desired goal (such as removing watermarks from the image, or getting horses to wear hats), a process of constrained optimisation is performed. All of the layers but one are frozen, and the weights of that layer are updated using gradient descent optimisation until the generated sample matches the new target. After this optimisation process is complete, the weights of the model are modified such that the targeted change becomes present in all the samples that the model generates.

The CombiNets framework (guzdial2018combinets), informed by prior reseach in combinational creativity (boden2004creative), can be utilised to create a new model by combining parameters from a number of pre-trained models in a targeted fashion. The parameters of existing models are recombined to take into account a new mode of generation that was not present in the training data (an example given would be a unicorn for a model trained on photographs of non-mythical beings). In this framework, a small number of new samples is provided (not enough to train a model directly) and then heuristic search is used to recombine parameters from existing models to account for this new mode of generation.

Further Demarcations

In this section, we highlight demarcations that can be used to classify methods for active divergence. The following categories serve as criteria for further discussion and method comparison.

Training from scratch vs. using pre-trained models

Finding stable, effective ways of training generative models, in particular GANs, is difficult and, depending on the training scheme, there are only a handful of methods that have been found to work successfully. Few methods for active divergence train a model completely from scratch. Instead, most take pre-trained models as their starting point for interventions. This way, training from scratch can be avoided, but fine-tuning may still be required.

Utilising data vs. dataless approaches

Most of the approaches described utilise data in some way, whether as an inspiring set for novelty generation, or for combining features from different datasets (divergent fine-tuning, network blending and chaining models). Even methods for model rewriting use very small amounts of example data to guide optimisation algorithms that alter the model weights. However, methods like network bending, show how models can be analysed in ways that don’t rely on any data, and are used for intelligent manipulation of the models —an approach which could be applied to other methods like model rewriting. Methods that train and fine-tune models without data also show how auxiliary networks and the dynamics between models can be utilised for achieving active divergence.

Human direction vs. creative autonomy

Very few of the approaches described have been developed with the expressed intention of handing over creative agency to the systems themselves. Most of the methods have been developed by artists or researchers in order to allow people to manipulate, experiment with and explore the unintended uses of these models for creative expression. However, the methods described that are currently designed for, or rely on a high degree of human curation and intervention, could easily be adapted and used in co-creative or autonomous creative systems in the future (berns2021automating).

Applications of Active Divergence

In this section we outline some of the applications for active divergence methods.

Novelty generation

Generative deep learning techniques are capable of generalisation, such that they can produce new artefacts of high typicality and value, but are rarely capable of producing novel outputs that do not resemble the training data. Active divergence techniques play an important role in getting generative deep learning systems to generate truly novel artefacts, especially when there may be limited or even no data to draw from.

Creativity support and co-creation

Some of the frameworks presented are already explicitly designed as creativity support tools, such as the network bending framework, designed to allow for expressive manipulation of deep generative models. The Style Done Quick (colton2021evolving) application where many style transfer models have been evolved, was built as a casual creator application (compton2015casual). Though many of the other methods described are still preliminary artistic and research experiments, there is a lot of potential for these methods to become better understood and eventually adapted and applied in more easily accessible creativity support tools and co-creation frameworks.

Knowledge recombination

Reusing and recombining knowledge in efficient ways is an important use-case of active divergence methods. While impressive generalisation can be ascertained from extremely large models trained on corpuses extracted from large portions of the internet (ramesh2021zero), this is out of the capabilities for all but a handful of large tech companies. Instead of relying on ever expanding computational resources, active divergence methods allow for the recombination of styles, aesthetic characteristics and higher level concepts in a much more efficient fashion. Methods like chaining models, network blending and model rewriting offer alternatives routes to achieving flexible knowledge recombination and generalisation to unseen domains without the need for extremely large models or data sources.

Unseen domain adaptation

Active divergence methods allow for the possibility of adapting to and exploring unseen domains, for which there is little to no data available. The network blending approach presented by pinkney2020interpolation can be used for the translation of faces while maintaining recognisable identity into a completely synthesised data domain, something which would not be possible with standard techniques for image translation (zhu2017unpaired).

The model rewriting and network bending approaches offer the possibility of reusing and manipulating existing knowledge in a controlled fashion to create new data from a small number of given examples, or theoretically without any prior examples if external knowledge sources are integrated, as discussed further below. This approach could also be utilised by agents looking to explore hypothetical situations, by reorganising learned knowledge from world models (ha2018worldmodels) to explore hypothetical situations or relations.

A benchmark for creativity

Generative models represent large knowledge bases that can produce high quality artefacts. There is a lot of unexplored potential for how the information and relationships they contain can be reused and rewritten with frameworks for manipulating them such as network bending and model re-writing. Active divergence frameworks could make good candidates for exploring and evaluating modes of creativity, such as combinational creativity (boden2004creative) and conceptual blending (fauconnier2008way). These could be used to inform how the features in the model could be re-organised, and then evaluated by examining the artefacts generated from the altered models.

Future Research Directions

In this section we discuss possible future research directions and applications for developing, evaluating and utilising methods for active divergence.

Metrics for quantitative evaluation

For the advancment of research on active divergence, methods for quantitative evaluation will be critical in order to keep track of progress, to compare techniques and for benchmarking. Metrics for active divergence will have to go beyond measuring the similarity or dissimilarity between distributions, as is usually done in the evaluation of generative models (gretton2019interpretable). Active divergence metrics should contribute to a better understanding of how the distributions diverge. Therefore, various changes to the modelled distribution should be taken into consideration when looking to measure divergence between distributions in creative contexts. These include increases or decreases in diversity, the consistency and concurrency of change across the whole distribution and whether changes primarily effect low or high level features.

Automating qualitative evaluation

In addition to quantitative evaluation, other metrics are needed for evaluating active divergence metrics. In order to rely less on qualitative evaluation for guiding decisions in creating new models, and do this in computational fashion so that these aspects of the process can be handed over to the computational systems. For instance, a recently developed metric for measuring visual indeterminacy (wang2020towards), which is argued as being one of the key drivers for what people find interesting in GAN generated art (hertzmann2020visual), could be used for replacing the qualitative evaluation and curation step done by humans. Other metrics that could be used are: novelty metrics (grace2019expectation), bayesian surprise (itti2009bayesian), aesthetic evaluation (galanter2012computational), or measurements for optimal blends between data domains and evaluating the novelty of changes made to semantic relationships.

Inventing new objective functions

None of the methods presented to date that are based on generative deep learning have been capable of inventing their own objective functions. Instead, methods such as creative adversarial networks (elgammal2017can)

rely on hand crafted variations of well established objective functions. This will be one of most challenging future research directions to overcome, as generative deep learning systems rely on a small handful of objectives that result in stable convergence. However, in conjunction with the development of new evaluation metrics, it may be possible to explore whole new categories of objective functions that diverge from existing data representations and produce artefacts of high-value.

Utilising external knowledge

Harnessing expert knowledge external to the dataset, which may come from separate domains or symbolic knowledge representations will allow much more flexibility in how generative models are manipulated in combinational creativity (boden2004creative) and conceptual blending frameworks (fauconnier2008way)

. Combining research into analysing the semantic purpose and relationship between features, and creating mappings of those to external data sources or knowledge graphs, would allow for more flexibility in controlling techniques which currently rely on human intervention (network bending, model rewriting). This could be adapted to be controlled and manipulated computationally, allowing for some creative decision making to be handed over to the computer.

Formulating and realising intentions

For many of the methods described, a system that could formulate and realise its intentions would have to be capable of sourcing and creating its own dataset. For instance, a system that wants to create a model that generates hybrids between cats and dogs, would have to be capable of collecting data of cats and dogs separately, and then decide to use some method for network blending to get the desired results. Alternatively, utilising external knowledge sources in combination with semantic analysis of features, would allow computational systems more flexibility in generating new models by altering the semantic relationships between features in model rewriting or network bending approaches.

Multi-agent systems

It has been argued the the GAN framework is the simplest example of a multi-agent system (arcas2019social), and frameworks such as neural cellular automata (mordvintsev2020growing) offer new possibilities for multi-agent approaches in generative deep learning. The active divergence methods for training without data described in this paper all rely on the dynamics of multiple agents to produce interesting results, but this could be taken much further. It has been argued that art is fundamentally social (hertzmann2021social) and exploring more complex social dynamics between agents (saunders2019multi) could be a fruitful avenue for exploration in the development of these approaches. There is a large body of work in emergent languages from co-operative multi-agent systems (lazaridou2016multi) that could be drawn from in furthering the work in generative multi-agent systems.

Open-ended reinforcement learning

Open-ended reinforcement learning, where there is no set goal

(wang2020enhanced), offers possibilities for new more autonomous approaches to achieving active divergence. Reinforcement learning has not been discussed in this survey, but has been used in generative settings (luo2020reinforcement) in nascent research. Reinforcement learning approaches offer many opportunities for frameworks of creativity to be explored that are not available to standard generative deep learning methods, as they take actions in response to their environment, rather than just fitting functions. Paradigms like intrinsic motivation (shaker2016intrinsically), cooperating or competing with other agents, formulating and acting on intentions are all concepts that conventional generative deep learning systems alone cannot explore, but these paradigms could be explored in open-ended systems utilising reinforcement learning.

Conclusion

We have presented a taxonomy and survey of the state of the art in methods for achieving active divergence from a range of sources, including artistic experiments, creativity support tools and in computational creativity research. Many of these methods represent nascent areas of research and there is a lot of scope for future work utilising them in co-creative and automated creative systems as they overcome a key shortcoming of mainstream generative deep learning approaches, which are unable to diverge from reproducing the training data in creative ways. In addition, we outline a number of the key future research directions needed in order to advance the state of the art for creativity support tools and computationally creative generative deep learning systems.

Acknowledgements

We thank our reviewers for their helpful comments. This work has been supported by UK’s EPSRC Centre for Doctoral Training in Intelligent Games and Game Intelligence (IGGI; grants EP/L015846/1 and EP/S022325/1).

References