crea.blender: A Neural Network-Based Image Generation Game to Assess Creativity

08/13/2020 ∙ by Janet Rafner, et al. ∙ Aarhus Universitet IT University of Copenhagen 0

We present a pilot study on crea.blender, a novel co-creative game designed for large-scale, systematic assessment of distinct constructs of human creativity. Co-creative systems are systems in which humans and computers (often with Machine Learning) collaborate on a creative task. This human-computer collaboration raises questions about the relevance and level of human creativity and involvement in the process. We expand on, and explore aspects of these questions in this pilot study. We observe participants play through three different play modes in crea.blender, each aligned with established creativity assessment methods. In these modes, players "blend" existing images into new images under varying constraints. Our study indicates that crea.blender provides a playful experience, affords players a sense of control over the interface, and elicits different types of player behavior, supporting further study of the tool for use in a scalable, playful, creativity assessment.



There are no comments yet.


page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction and Related Work

Creativity is commonly understood as the combination of novelty and value (Runco and Jaeger, 2012), and is one of the most prized skills of the 21st century (PwC, 2017). Creative processes are explored extensively in the burgeoning field of creativity support tools and co-creative systems (Lin et al., 2020; Oh et al., 2018; Sethapakdi and McCann, 2019; Frich et al., 2019). These fields are faced with a fundamental trade-off between imposed constraints, granularity of the problem representation, and user control (Csikszentmihalyi and Sawyer, 2014; Hewett, 2005): On one hand, low degrees of automated support leave the user in more control, but typically at the expense of requiring extensive training and/or labor in performing fine-grained operations in the creation of creative products. On the other hand, high levels of automated support may enable rapid production of creative products, but the loss of detailed user control leaves the relevance and level of human creativity and involvement in the process unclear.

In this paper, we present a new co-creative system, crea.blender and use it to investigate if a ML-based image generation game can provide appropriate, coarse-grained support to allow for playful and scalable assessment of human creativity.

1.1. Creativity Assessment

Established methods for measuring creativity often focus on two processes: divergent and convergent thinking. Divergent thinking (DT) is commonly referred to as the process of thinking flexibly and using existing knowledge to come up with new ideas and solutions (Kaufman and Sternberg, 2010; Guilford, 1968). Convergent thinking (CT) is the process of selecting which of those ideas is worth further elaboration (Kaufman and Sternberg, 2010; Guilford, 1968).

The use of games for creativity assessment is picking up traction (Shillo et al., 2019; Hart et al., 2017; Huang et al., 2010) as it has been shown that game-based psychometric tests can combat test anxiety or the researcher effect, thus providing cleaner data on the tested phenomenon (DiCerbo, 2014). Additionally, unlike common DT and CT tests that record only the discovered solution, games can record the process of exploration and convergence to a solution (Hart et al., 2017).

crea.blender is intended to be the centerpiece of the online game-based large-scale portfolio, CREA (for Hybrid Intelligence, 2020), which has been designed in response to the call for portfolio based assessment of creativity (Cortes et al., 2019; Reiter-Palmon et al., 2019; Acar and Runco, 2019).

1.2. ML-Supported Image Generation

Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) are the most widely-used type of ML models for image generation (Bailey, 2020; Schneider and Rea, 2018)

. GANs traditionally consist of two competing models: A generator, that is trained to generate images, and a discriminator, that is trained to distinguish between real images, and images created by the generator. Artists have recently started using just the trained generator to produce images. When used as a tool for artistic expression, one can feed an input vector into the generator and it will provide an image based on both the vector, and its internal state, i.e. its weights and biases. Importantly, while the internal state of the generator directly corresponds with the features of the generated image, these features do not necessarily align with the features that a human would perceive.

For example, in a picture containing an orange ball and trees in the background, humans intending to enhance a distinct shape from an image (e.g. ‘the orange ball’) might have the frustrating experience that the system instead enhances some blurred trees in the background that the participant didn’t even notice. The question of user control over GANs for image generation is thus fundamental in determining the systems feasibility for creative processes (Mazzone and Elgammal, 2019; Hertzmann, 2019).

2. Exploratory Study

While both manually blended images (Shklovsky, 2016) and computer generated images (Crawford and Paglen, 2019) have been explored extensively, here we transform a co-creative image generation system into a playful game for the general public.Our game also carefully aligns with established task and game-based creativity assessments (Kwon et al., 1998; Beketayev and Runco, 2016; Lau and Cheung, 2010; Shillo et al., 2019; Hart et al., 2017; Huang et al., 2010). However before we can assess DT and CT in crea.blender, we need to address the fundamental question: does crea.blender’s interface support a playful, controllable, and versatile image manipulation user interaction? Concretely we address

  1. Player control: to what extent does the interface afford players to intentionally express creativity?

  2. Varying types of behavior: Do we see participants playing differently in different play modes?

  3. Playfulness: Does this interaction with crea.blender make users feel playful?

2.1. Presenting crea.blender

crea.blender affords creativity by letting players “blend” existing images into new images. Using BigGAN (Brock et al., 2018)

, which has been trained on ImageNET

(Deng et al., 2009), and by providing sets of between 3-6 source images, players can easily create a large number of new images by simply adjusting how much of each source image will be blended in. crea.blender takes inspiration from the project Artbreeder (originally Ganbreeder), which aims to be a new type of creative tool that empowers users creativity and collaboration (Simon, 2019). crea.blender works with one of the core aspects of creativity, constraint-based combinational creativity, which is here conceived visually as a means to achieve a creative outcome (Costello and Keane, 2000).

Figure 1. Illustration of crea.blender’s mechanics When mixing images in crea.blender, players use sliders to indicate how much they want each image to contribute to the generated image. A vector is calculated from these weights and the underlying vector of each respective source image. This new vector is then passed into the GAN. Above we illustrate how two source images can produce two relatively differently looking images, depending on which is weighted higher.

crea.blender has three modes, each designed to afford (and test) specific aspects of creativity. Due to the focus of this paper, most discussions of image selection, wording of instructions, timing, etc., are outside the scope of this paper and will only be briefly described.

  1. Creatures: (Figure 2) Players are presented with six images and are asked to create and save as many different “animal-like” creatures as possible in five minutes.

  2. Challenge: (Figure 3) Players are presented with a target image and three sets of three source images. Only one set can produce the exact target image, and players’ objective is first to determine which set was used to create the target image (up to 30 seconds), and then to recreate their closest approximation of the target image (up to three min). There are three levels in the Challenge mode.

  3. Open Play: (Figure 2) Players are presented with the same six source images as they used in Creatures mode. Unlike in the Creatures mode, they are asked to create any image (not just animal-like) they find interesting during five minutes of playtime.

Figure 2. Creatures and Open Play mode interface. The top row of six images are the source images that players can blend together. Below this, we see the last generated image. At the bottom, we see images that the player has saved.
Figure 3. Challenge mode interface. The source image sets are on the left and the target image is on the right. When a set is chosen, the player blends them similar to the other modes (Figure 2)

2.2. Procedure and Data collection

For the pilot study we convenience sampled and recruited eight participants from our institution. Participants were asked to Think Aloud (Ericcson and Simon, 1978) while playing with crea.blender. Each user session, (including the follow-up user experience survey) took about 40 minutes. While the final version of crea.blender will be built in Unity for cross-platform access, we built this prototype in Python3 using the Flask framework, and our participants played with it on a desktop with a mouse at our lab.

Participants were audio recorded, and two researchers were present and wrote observational field notes. crea.blender saved the image, the slider values and a timestamp for each time a player generated an image.

2.3. Results and Discussion

In the following section we address the three themes of the paper by looking presenting parts of the data collected.

2.3.1. User Control

We address in three ways players’ feelings of, and ability to exhibit control over crea.blender. We first present data from the Challenge mode in which players have to generate their closest approximation of a predetermined image. The goal-oriented nature of the task allows us to measure whether players’ interactions with crea.blender were seemingly random or seemingly directed towards the pre-specified goal. Specifically, we can see whether players get closer to the target image in a controlled incremental way or whether they happen to stumble upon it. As a proxy for distance to the target image, in Figure 4 we plot for each image how far each slider is from the correct setting.

Figure 4. Players converge on the target image in Challenge Mode. We see how players consistently get closer to the target image.

Reading Figure 4 from left to right shows player progression towards the target. If a player reaches 0 on the y-axis, they have perfectly re-created the target. The orange line (Player 2) shows near-monotonic convergence towards the target-image while the blue line (Player 1) shows more explorative progression. For all players, we see at most two ‘worsenings’ away from the target image before the player corrects their action and moves closer to the target image. This suggests that players recognize when they have gone off track and immediately know how to correct it, and thus are indeed in some sense in control during the Challenge mode.

The second way in which we address user control is by presenting data from the post-play survey. Here, we asked players to indicate on a 1-6 Likert scale how much control they felt they had at the beginning of and at the end of playing, respectively. We conducted a pairwise t-test and calculated the effect size. Based on the results, we can say that there is a large increase (2 to 3.5 on average) in how much control the players felt towards the end of the task, compared to the beginning (Cohen’s d = 1.3, p = 0.003).

Finally, we illustrate player control and intentions through a brief vignette from one of our player’s Think Aloud transcripts, logs and observations. In this particular vignette, we analyze Player 4 (P4) during Challenge mode after they had just chosen the correct set of source images. P4 immediately turns up all three sliders, one of them by 0.24 and the other two by about 0.4, and generates an image that looks like a bird. P4 then says,

Okay. I can only see the color of the beak matching this thing. So let’s try. Let’s try with that. That may be totally off anyways but, I kind of have this… this is not quite a ball, but it’s maybe close and maybe if you mix a bird with a golf ball you get something close.

What we see in this quote is that P4 attends to, first, the target image, and looks for colors in the source images that match its colors. P4 then attends to the shape of one of the source images, a golf ball, and hypothesizes that if they mix some more of that shape with their current image, they will get closer to the target image. Around one minute later, P4 produces their tenth image (Figure 6, image 10) which is quite to the target image. P4 now focuses on features in their generated image to fine-tune their creation.

Figure 5. Subset of images produced by P4 during Challenge mode. These 12 images, pared with quotes from the P4 to examine if players exhibit control over crea.blender

(#15). Yeah the left wing now is kind of… “perfect” is a bit too strong word, but it’s pretty good. And you also have this slight, slight antenna here (#16), which is kind of bit to the side (#17). No (#18), I don’t quite recall how it was made, maybe this one I didn’t touch so much. Oh, that’s getting very close (#19). I think because this is growing out… let’s go back (#20). So it looks like when this goes up a bit, this part grows out… and this part gets a bit slimmer (#21) on the right.

Importantly here, P4 does not - at least explicitly - reason about transferring features from the source images, but has nonetheless acquired a sense of how different proportions of each source image affects the generated image, and they use this to successfully navigate towards a close approximation of the target image.

These two quotes illustrate a recurring theme across all players: sometimes, players would attend to features (colors, shapes, textures, etc.) in the source images, and hypothesize how mixing them together could produce the target image. Other times, players focused purely on how different slider settings affect features in the generated image. These two gestalts offered complementary perspectives, and together, they enabled participants to generate images relevant to each game mode.

2.3.2. Varying types of behavior

It is important to explore if different game-mode prompts in crea.blender can elicit different types of behavior as each mode is tied to specific creative processes. The primary interaction method in crea.blender is changing the weights of each image with the corresponding slider before generating a new image. Therefore, one approach is to look for systematic differences in the size of the changes to sliders players make in the different modes (Figure 6). DT is most commonly associated with an open, explorative process whereas CT is commonly associated with iterative narrowing in on a particular solution or idea (Guilford, 1968). Thus, in the Creatures mode (DT task), we expect much larger average step sizes when creating images than in the challenge mode (CT task)

Figure 6. Cumulative histogram of slider changes for all players in the different modes.

On average players generated 33.5 (SD), 86.25 (SD), and 43.38 (SD) images in the Creates, Challenge and Open Play modes respectively. Players’ changes to sliders ranged from small (iterative) to large (explorative). Figure 6

shows the cumulative fraction of total changes in sliders for players per image generated. For instance, 78% of images in the Challenge mode were generated with a change smaller than 0.2. In contrast, 79% and 66% of images respectively in the Creatures and Open Play modes were made with changes larger than 0.2. A Kruskal-Wallis test (p¡0.001) and post-hoc pairwise Mann-Whitney U tests revealed small but significant differences in behavior between the open play and creatures mode (

= 0.17, p ¡0.001) and much more dramatic differences between these and the challenge mode ( =0.66 and = 0.55, respectively, p¡0.001). The latter confirms expectations from previous creativity research, whereas the former provides intriguing input to further work. This demonstration that crea.blender can indeed drive different types of behavior thus fulfills an important criterion for assessing its suitability as a means for assessing creativity.

2.3.3. Playfulness

As a final investigation of the suitability of crea.blender as a basis for future creativity research we now turn to the question of perceived playfulness. This is essential for realizing large-scale adoption of the game portfolio. Players were asked on a 1-6-Likert scale to rate how playful they felt overall throughout the game. The mean of their rating was 4.375, with a mode of 4. No one rated below 3, and two out of the eight players rated it as a 6. These data suggest that crea.blender feels like a playful experience and our observations of gameplay have provided indications on how this can be improved.

3. Conclusion and Outlook

From this pilot study we can conclude that ML-assisted image generation provides a promising playground for research in creativity assessment. In our data analysis, we found that players were able to intentionally create images with crea.blender; that the game encouraged different uses depending on the creative constraints on the mode; and that players by and large found it playful. This pilot was the first step in order to determine if crea.blender is feasible to systematically study creativity in a playful way. Our pilot data support this use, however larger studies must be done to substantiate the work.

We plan to further investigate CT and DT in crea.blender and incorporate crea.blender into the full CREA suite to provide a holistic and scalable approach to testing creativity in a playful way.

4. Acknowledgments

The authors thank the Novo Nordisk, the Synakos, and Carlsberg Foundations for their generous support, and Mads Kock Pedersen for useful discussions.


  • (1)
  • Acar and Runco (2019) Selcuk Acar and Mark A Runco. 2019. Divergent thinking: New methods, recent research, and extended theory. Psychology of Aesthetics, Creativity, and the Arts 13, 2 (2019), 153.
  • Bailey (2020) J Bailey. 2020. The tools of generative art, from Flash to neural networks. Art in America (2020).
  • Beketayev and Runco (2016) Kenes Beketayev and Mark A Runco. 2016. Scoring divergent thinking tests by computer with a semantics-based algorithm. Europe’s journal of psychology 12, 2 (2016), 210.
  • Brock et al. (2018) Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).
  • Cortes et al. (2019) Robert A Cortes, Adam B Weinberger, Richard J Daker, and Adam E Green. 2019. Re-examining prominent measures of divergent and convergent creativity. Current Opinion in Behavioral Sciences 27 (2019), 90–93.
  • Costello and Keane (2000) Fintan J Costello and Mark T Keane. 2000. Efficient creativity: Constraint-guided conceptual combination. Cognitive Science 24, 2 (2000), 299–349.
  • Crawford and Paglen (2019) Kate Crawford and Trevor Paglen. 2019. Excavating AI: The politics of images in machine learning training sets. Excavating AI (2019).
  • Csikszentmihalyi and Sawyer (2014) Mihaly Csikszentmihalyi and Keith Sawyer. 2014.

    Creative insight: The social dimension of a solitary moment.

    In The systems model of creativity. Springer, 73–98.
  • Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    . Ieee, 248–255.
  • DiCerbo (2014) Kristen E DiCerbo. 2014. Game-based assessment of persistence. Journal of Educational Technology & Society 17, 1 (2014), 17–28.
  • Ericcson and Simon (1978) K Anders Ericcson and Herbert A Simon. 1978. Think-Aloud Protocols as Data. CIP Working (1978).
  • for Hybrid Intelligence (2020) ScienceAtHome Center for Hybrid Intelligence. 2020. Crea large-scale game-based creativity portfolio. (2020).
  • Frich et al. (2019) Jonas Frich, Lindsay MacDonald Vermeulen, Christian Remy, Michael Mose Biskjaer, and Peter Dalsgaard. 2019. Mapping the landscape of creativity support tools in HCI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–18.
  • Goodfellow et al. (2014) Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.
  • Guilford (1968) Joy Paul Guilford. 1968. Intelligence, creativity, and their educational implications. Edits Pub.
  • Hart et al. (2017) Yuval Hart, Avraham E Mayo, Ruth Mayo, Liron Rozenkrantz, Avichai Tendler, Uri Alon, and Lior Noy. 2017. Creative foraging: An experimental paradigm for studying exploration and discovery. PloS one 12, 8 (2017), e0182133.
  • Hertzmann (2019) Aaron Hertzmann. 2019. Visual Indeterminacy in Generative Neural Art. arXiv preprint arXiv:1910.04639 (2019).
  • Hewett (2005) Thomas T Hewett. 2005. Informing the design of computer-based environments to support creativity. International Journal of Human-Computer Studies 63, 4-5 (2005), 383–409.
  • Huang et al. (2010) Chun-Chieh Huang, Ting-Kuang Yeh, Tsai-Yen Li, and Chun-Yen Chang. 2010. The idea storming cube: Evaluating the effects of using game and computer agent to support divergent thinking. Journal of Educational Technology & Society 13, 4 (2010), 180–191.
  • Kaufman and Sternberg (2010) James C Kaufman and Robert J Sternberg. 2010. The Cambridge handbook of creativity. Cambridge University Press.
  • Kwon et al. (1998) Myoungsook Kwon, Ernest T Goetz, and Ronald D Zellner. 1998. Developing a Computer-Based TTCT: Promises and Problems. The Journal of Creative Behavior 32, 2 (1998), 96–106.
  • Lau and Cheung (2010) Sing Lau and Ping Chung Cheung. 2010. Creativity assessment: Comparability of the electronic and paper-and-pencil versions of the Wallach–Kogan Creativity Tests. Thinking Skills and Creativity 5, 3 (2010), 101–107.
  • Lin et al. (2020) Yuyu Lin, Jiahao Guo, Yang Chen, Cheng Yao, and Fangtian Ying. 2020. It Is Your Turn: Collaborative Ideation With a Co-Creative Robot through Sketch. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.
  • Mazzone and Elgammal (2019) Marian Mazzone and Ahmed Elgammal. 2019.

    Art, creativity, and the potential of artificial intelligence. In

    Arts, Vol. 8. Multidisciplinary Digital Publishing Institute, 26.
  • Oh et al. (2018) Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I lead, you help but only with enough details: Understanding user experience of co-creation with artificial intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
  • PwC (2017) PwC. 2017. The talent challenge: Harnessing the power of human skills in the machine age.
  • Reiter-Palmon et al. (2019) Roni Reiter-Palmon, Boris Forthmann, and Baptiste Barbot. 2019. Scoring divergent thinking tests: A review and systematic framework. Psychology of Aesthetics, Creativity, and the Arts 13, 2 (2019), 144.
  • Runco and Jaeger (2012) Mark A Runco and Garrett J Jaeger. 2012. The standard definition of creativity. Creativity research journal 24, 1 (2012), 92–96.
  • Schneider and Rea (2018) Tim Schneider and Naomi Rea. 2018. Has artificial intelligence given us the next great art movement? Experts say slow down, the ‘field is in its infancy.’. Artnet News (2018).
  • Sethapakdi and McCann (2019) Ticha Sethapakdi and James McCann. 2019. Painting with CATS: Camera-aided texture synthesis. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–9.
  • Shillo et al. (2019) Roi Shillo, Nicholas Hoernle, and Kobi Gal. 2019. Detecting Creativity in an Open Ended Geometry Environment. International Educational Data Mining Society (2019).
  • Shklovsky (2016) Viktor Shklovsky. 2016. Viktor Shklovsky: A Reader. Bloomsbury Publishing USA.
  • Simon (2019) Joel Simon. 2020. Ganbreeder. Accessed March, 29.