Digital painting systems are increasingly used by artists and content developers for various applications. One of the main goals has been to simulate popular or widely-used painting styles. With the development of non-photorealistic rendering techniques, including stroke-based rendering and painterly rendering [9, 33]
, specially-designed or hand-engineered methods can increasingly simulate the painting process by applying heuristics. In practice, these algorithms can generate compelling results, but it is difficult to extend them to new or unseen styles.
Over the last decade, there has been considerable interest in using machine learning methods for digital painting. These methods include image synthesis algorithms based on convolutional neural networks, including modeling the brush, generating brush stroke paintings , reconstructing paintings in specific styles , constructing stroke-based drawings , etc. Recent developments in generative adversarial networks 
and variational autoencoders have led to the development of image generation algorithms that can be applied to painting styles [40, 39, 12, 16, 27].
One of the goals is to develop an automatic or intelligent painting agent that can develop its painting skills by imitating reference paintings. In this paper, we focus on building an intelligent painting agent that can reproduce a reference image in an identical or transformed style with a sequence of painting actions. Unlike methods that directly synthesize images bypassing the painting process, we focus on a more general and challenging problem of training a painting agent from scratch using reinforcement learning methods. [36, 35, 34, 39] also use reinforcement learning to solve the problem. All the methods encode goal states, which are usually defined as reference images, to the observations. This set-up is different from classic reinforcement learning tasks because, while the problem introduces an implicit objective to the policy network of reinforcement learning, the distribution of the reward in the action space can be very sparse and it makes training a reinforcement learning algorithm from scratch very difficult. To solve the problem, [36, 35, 34, 39] pre-train the policy network with a paired dataset consisting of images and corresponding actions defined in 
. However, it is very expensive to collect such a paired dataset of human artist and we need to explore other unsupervised learning methods.
Main Results: We present a reinforcement learning-based algorithm (LPaintB) that incorporates self-supervised learning to train a painting agent on a limited number of reference images without paired datasets. Our approach is data-driven and can be generalized by expanding the image datasets. Specifically, we adopt proximal policy optimization (PPO)  by encoding the current and goal states as observations and the continuous action space defined based on configurations of the paint brush like length, orientation and brush size. The training component of our method only requires the reference paintings in the desired artistic style and does not require paired datasets collected by human artists. We use a self-supervised learning method to increase the sampling efficiency. By replacing the goal state of an unsuccessful episode with its final state, we automatically generate a paired dataset with positive rewards. After applying the dataset to retrain the model using reinforcement learning, our approach can efficiently learn the optimal policy. The novel contributions of our work include:
An approach for collecting supervised data for painting tasks by self-supervised learning.
An adapted deep reinforcement learning network that can be trained using human expert data and self-supervised data, though we mostly rely on self-supervised data.
An efficient rendering system that can automatically generate stroke-based paintings of desired resolutions by our trained painting agent.
We evaluate our approach by comparing our painting agent with prior painting agents that are trained from scratch by reinforcement learning . We collect 1000 images with different color and patterns as the benchmark and compute L2 Loss between generated images and reference images. Our results show that self-supervised learning can efficiently collect paired data and can accelerate the training process. The training phase takes about 1 hour and the runtime algorithm takes about 30 seconds on a GTX 1080 GPU for high resolution images.
2 Related Work
In this section, we give a brief overview of prior work on non-photorealistic rendering and the use of machine learning techniques for image synthesis.
2.1 Non-Photorealistic Rendering
Non-photorealistic rendering methods render a reference image as a combination of strokes by determining many properties like position, density, size, and color. To mimic the oil-painting process, Hertzmann  renders the reference image into primitive strokes using gradient-based features. To simulate mosaic decorative tile effects, Hauser  segments the reference image using Centroidal Voronoi diagrams. Many algorithms have been proposed for specific artistic styles, such as stipple drawings , pen-and-ink sketches  and oil paintings  . The drawback of non photo-realistic rendering methods is the lack of generalizability to new or unseen styles. Moreover, they may require hand tuning and need to be extended to other styles.
2.2 Visual Generative Algorithms
Hertzmann et al.  introduce image analogies, a generative method based on a non-parametric texture model. Many recent approaches are based on CNNs and use large datasets of input-output training image pairs to learn the mapping function . Inspired by the idea of variational autoencoders , Johnson et al.  introduce the concept of perceptual loss to model the style transferbetween paired dataset. Zhu et al.  use generative adversarial networks to learn the mappings without paired training examples. These techniques have been used to generate natural images [16, 27], artistic images , and videos [32, 21]. Compared to previous visual generative methods , our approach can generate results of high resolution, can be applied to different painting media and is easy to extend to different painting media and artistic styles.
2.3 Image Synthesis Using Machine Learning
Many techniques have been proposed for image synthesis using machine learning. Xie et al. [34, 36, 35] present a series of algorithms that simulate strokes using reinforcement learning and inverse reinforcement learning. These approach learn a policy from either reward functions or expert demonstrations. For interactive artistic creation, stroke-based approaches can generate trajectories and intermediate painting states. Another advantage of stroke-based methods is that the final results are trajectories of paint brushes, which can then be deployed in different synthetic natural media painting environments and real painting environments using robot arms. In contrast to our algorithm, Xie et al. [34, 36, 35] focus on designing reward functions to generate orientational painting strokes. Moreover, their approach requires expert demonstrations for supervision. Ha et al. 
collect a large-scale dataset of simple sketches of common objects with corresponding recordings of painting actions. Based on this dataset, a recurrent neural network model is trained in a supervised manner to encode and re-synthesize the action sequences. Moreover, the trained model is shown to be capable of generating new sketches. Following, Zhou et al. 
use reinforcement learning and imitation learning to reduce the amount of supervision needed to train such a sketch generation model. In contrast to prior methods, operate in a continuous action space with higher dimensions applying PPO reinforcement learning algorithm to train the agent from scratch. It can handle dense images with high resolutions.
Compared with prior visual generative methods, our painting agent can automatically generate results using a limited training dataset without paired dataset.
3 Self-Supervised Painting Agent
In this section, we introduce notations, formulate the problem and present our self-supervised learning algorithm for natural media painting.
|time steps to compute accumulated rewards|
|current painting state of step , canvas|
|target painting state, reference image|
|observation of step|
|action of step|
|reward of step|
|accumulated reward of step|
|discount factor for computing the reward|
|painting policy, predict by|
|value function of the painting policy,|
|feature extraction of state|
|render function, render action to|
|observation function, encode the current|
|state and the target state|
|loss function, measuring distance between|
|state and objective state|
Self-supervised learning methods  are designed to enable learning without explicit supervision. The supervised signal for a pretext task is created automatically. It is a form of unsupervised learning where the data itself provides supervision. In its original formulation, this process is performed by withholding part of the information of the data and training the classification or regression function to predict it. The required task usually has a definition of the proxy loss so that it can be solved by self-supervised learning. There are a variety of applications for self-supervised learning in different areas such as audio-visual analysis , visual representation learning , image analysis , robotics  etc. In this paper, we use the term self-supervised learning to refer to the process of generating self-supervision data and feeding the data to the policy network of the reinforcement learning framework.
3.2 Problem Formulation
Reproducing images with brush strokes can be formalized as finding a series of actions that minimizes the distance between the reference image and the current canvas in the desired feature space. Based on notations in Table 1, this can be expressed as minimizing the loss function:
After we apply reinforcement learning to solve the problem by defining function, we can get:
Function and can perform feature extraction of states as and , and the feature extraction can be either the same or different. In other words, functions and can be either in the same or in different feature space. If and extract different features, the policy will learn to map the observation into the feature space that the reward function uses.
3.3 Behavior Cloning
Behavior cloning uses a paired dataset with observations and corresponding actions to train the policy to imitate an expert trajectory or behaviors. In our setup, the expert trajectory is encoded in the paired dataset which is related to step 4 in Figure 2. We use behavior cloning to initialize the policy network of reinforcement learning with the supervised policy trained by paired data. The paired dataset can be generated by a human expert or an optimal algorithm with global knowledge, which our painting agent does not have. Once the paired dataset is obtained, one solution is to apply supervised learning based on regression or classification to train the policy. The trained process can be represented using an optimization formulation as:
It is difficult to generate such an expert dataset for our painting application because of the large variation in the reference images and painting actions. However, we can generate a paired dataset by rolling out a policy defined as Eq.(4), which can be seen as iteratively applying predicted actions to the painting environment. For the painting problem, we can use the trained policy itself as the expert by introducing self-supervised learning.
3.4 Self-Supervised Learning
As we apply reinforcement learning to the painting problem, there are several new identities that emerge as distinct from those of the classic controlling problems [29, 28, 23, 30]. We use the reference image as the objective and encode it in the observation of the environment defined in Eq.(12). As a result, the objective of the task Eq.(3) is not explicitly defined. Hence the rollout actions on different reference images can vary.
Through the reinforcement learning training process, the positive rewards in the high dimensional action space can be very sparse. In other words, only a small portion of actions sampled by policy network have positive rewards. To change the reward distribution in the action space by increasing the probability of a positive reward, we propose using self-supervised learning. Our formulation uses the rollout of the policy as the paired data to train the policy network and retrains the model using reinforcement learning. Specifically, we replace the reference imagewith the final rendering of the rollout of the policy function . Moreover, we use the updated observation and the actions as the paired supervised training dataset. For the rollout process of the trained policy , we have:
Next, we can collect as the paired data. We denote the rendering of the final state as . The reward function is defined as the percentage improvement of the loss over the previous state:
Next, we modify and to a self-supervised representation as and as:
We use to train a self-supervised policy and the value function . Algorithm 1 highlights the learning process for self-supervised learning.
3.5 Retraining with Reinforcement Learning
After we build the expert dataset from the rollout of the trained agent, we use this dataset to train the agent by behavior cloning. However, the policy generated by supervised learning described in Alg.1 is not robust enough if we only use supervised learning to train the policy. There are two main problems. First, the paired data only consists of actions with positive rewards, which makes it difficult to recover from actions that return negative rewards. Second, the expert data generated by the policy is not always optimal. For the painting and other controlling problems, each state can be the result of multiple series of actions.
One solution to the generalization problem of behavior cloning is using data aggregation . It increases the robustness of the trained model by adding noise to the trajectories and computing the corresponding recovering actions and observations. The critical condition of the data aggregation is that the expert has global knowledge to provide the recovering actions for the bad states. For our problem, we still need human experts to provide guiding information to aggregate the dataset.
Another solution is retraining the model with reinforcement learning. After we obtain the expert data , we can use the to train the value network and use the subset to train the policy network . In this manner, we can retrain using reinforcement learning and set the objective state as , which is the same as self-supervised learning.
Reinforcement learning can solve the two problems mentioned above based on:
Exploring more regions in the action space with negative or positive rewards by adding noise to the action, which can generalize the model.
Optimizing the actions of the expert guide with the reward function.
As described in Figure 2, the self-supervised learning takes a random policy as input, which randomly samples from the action space. In this case, reinforcement learning can benefit from the paired dataset with positive rewards after the initialization. After policy optimization, reinforcement learning can optimize the policy for the next turn of self-supervised learning. The role that reinforcement learning plays is to generalize the model and optimize the trajectories. Self-supervised learning provides paired datasets and expands the variation of the objective states. Therefore, the gap between the performances of reinforcement learning and self-supervised learning narrows during the training process until they converge.
3.6 Painting Agent
In this section, we present technical details of our reinforcement learning-based painting agent.
As shown in Figure 3, our observation function is defined as follows. First, we encode the objective state (reference image) with the painting canvas. Second, we extract both the global and the egocentric view of the state. As mentioned in [39, 14], the egocentric view can encode the current position of the agent and it provides details about the state. The global view can provide overall information about the state. is defined as Eq.(12), given the patch size and the position of the brush position.
The action is defined as a vector in continuous space with positional information and paint brush configurations.. Each value is normalized to . The action is in a continuous space, which makes it possible to train the agent using policy gradient based reinforcement learning algorithms. The updated position of the paint brush after applying an action is computed by adding to the coordinates of the paint brush .
3.6.3 Loss Function
The loss function defines the distance between the current state and the objective state. It can guide how the agent reproduces the reference image. In practice, we test our algorithm with defined as Eq.(13), where is the image of size .
For the self-supervised learning process, the loss function only affects the reward computation. However, the reinforcement learning training process uses as the reference images to train the model and the loss function can affect the policy network.
3.6.4 Policy Network
To define the structure of the policy network, we consider the input as a concatenated patch of the reference image and canvas in egocentric view and global view, given the sample size of . The first hidden layer convolves 64
filters with stride 4, the second convolves 64filters with stride 2 and the third layer convolves 64 19].
3.6.5 Runtime Algorithm
After we trained a model using self-supervised learning and reinforcement learning, we can apply the model to generate reference images with different resolutions. First, we randomly sample a position from the canvas and draw a patch with size and feed it to the policy network. Second, we iteratively predict actions and render them by environment until the value network returns a negative reward. Then we reset the environment by sampling another position from the canvas and keep the loop until less than .
Our painting environment is similar to that in , which is a simplified simulated painting environment. Our system can execute painting actions with parameters describing stroke size, color and positional information and updates the canvas accordingly (as shown in Equation 5). We also implement the reward function Equation 8, which evaluates the distance between the current state and the goal state. We use a vectorized environment  for a parallel training process, as shown in Figure 5, to train our model. A vectorized environment consists of environments. usually is decided by the cores of the CPU to achieve the best performance. The environments share the same policy network and its value network and they update the weights of the neural network at the same time. As a result, we can change the number of the environments for the roll-out or retraining process. Then we adapt the proximal policy optimization  to train the model on the vectorized environment. Training process with timesteps can finish within 2 hours.
4.1 Data Collection
To train the model, we draw random patches from reference images in a specific style at varying scales to assemble the training dataset and then sample the patches to a fixed size. By applying self-supervised learning, we can augment the dataset by the rollout of the intermediate policy through the training process. To reproduce , we also initialize the canvas with a random sampled reference image so that . The goal of the learning process is to minimize the loss between and :
After self-supervised learning, we have an updated reference image . If we have training samples and each self-supervised task consists of steps, we can have paired supervision data, which can make the algorithm generalize better.
In practice, we use a 16 core CPU and a GTX 1080 GPU to train the model with a vectorized environment of dimension 16. We use SSPE  as Equation 5 to accelerate the training process. The learned policy can also be transferred to other simulated painting media like MyPaint or WetBrush  to get different visual effects and styles.
In this section, we highlight the results and compare the performance with prior learning-based painting algorithms.
For the first benchmark, we apply a critic condition to reward each step for . Once the agent fails the condition, the environment will stop the rollout. We compare the cumulative reward by feeding the same set of unseen images to the environment. We use two benchmarks to test the generalization of the models. Benchmark1 is to reproduce a image from a random image like the training scheme mentioned in subsection 4.1. Benchmark2 is to reproduce a image from a blank canvas. Each benchmark have 1000 patches. Some result of our approach is shown in Figure 6. As shown in Table 2, our combined training scheme outperforms using only self-supervised learning or only reinforcement learning.
|Reinforcement Learning Only|
|Self-supervised Learning Only|
|Our Combined Scheme|
For the second benchmark, we evaluate the performance on the high-resolution reference images. We compute the Loss Eq.(13) and cumlative rewards Eq.(8) and compare our approach with . We draw patches from 10 reference images to construct the benchmark. Moreover, we iteratively apply both the algorithms times to reproduce the reference images. We use the same training dataset with images to train the models. As shown in Table 3, our approach have a lower , loss although both methods perform well in terms of cumulative rewards.
Overall, the comparison results show that our approach (LPaintB) combined with self-supervised learning and reinforcement learning have a better performance in terms of convergence, cumulative rewards and generalization.
|Mona Lisa by Leonardo da Vinci Figure 1|
|Sunflowers by Vincent van Gogh Figure 1|
|Girl with a Pearl Earring by Johannes Vermeer Figure 1|
|The Starry Night by Vincent van Gogh Figure 7|
|Lake Photo Figure 7|
|Bedroom in Arles by Vincent van Gogh Figure 8|
|Giudecca by William Turner Figure 8|
|Poppies Near Argenteuil by Claude Monet Figure 8|
|Painting by Pierre Bonnard Figure 8|
|Tulip Photo Figure 8|
|Road Photo Figure 8|
6 Conclusion, Limitations and Future Work
We present a novel approach for stroke-based image reproduction using self-supervised learning and reinforcement learning. Our approach is based on a feedback loop with reinforcement learning and self-supervised learning. We modify and reuse the rollout data of the previously trained policy network and feed it into the reinforcement learning framework. We compare our method with both the model trained with only self-supervised learning and the model trained from scratch by reinforcement learning. The result shows that our combination of self-supervised and reinforcement learning can greatly improve efficiency of sampling and performance of the policy.
One major limitation of our approach is that the generalization of the trained policy is highly dependent on the training data. Although reinforcement learning enables the policy to generalize to different states that supervised learning cannot address, the states still depend on the training data. Specifically, the distribution of generated supervision data is not close to the unseen data. Another limitation is that our method is based on a simplified painting environment for training due to the extremely large exploration space of reinforcement learning. We need to investigate better techniques to handle such large exploration spaces.
For future work, we aim to enlarge the runtime steps and action space of the painting environment so that the data generated by self-supervised learning can be closer to the distribution of the unseen data. Our current setup includes most common stroke parameters like brush size, color, and position, but the painting parameters describing pen tilting, pen rotation, and pressure are not used. Moreover, we also aim to build a model-based reinforcement learning framework that can be incorporated with a more natural painting media simulator.
-  Z. Chen, B. Kim, D. Ito, and H. Wang. Wetbrush: Gpu-based 3d painting simulation at the bristle level. ACM Transactions on Graphics (TOG), 34(6):200, 2015.
-  O. Deussen, S. Hiller, C. Van Overveld, and T. Strothotte. Floating points: A method for computing stipple drawings. In Computer Graphics Forum, volume 19, pages 41–50. Wiley Online Library, 2000.
C. Doersch, A. Gupta, and A. A. Efros.
Unsupervised visual representation learning by context prediction.
Proceedings of the IEEE International Conference on Computer Vision, pages 1422–1430, 2015.
-  L. A. Gatys, A. S. Ecker, and M. Bethge. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576, 2015.
-  S. Gidaris, P. Singh, and N. Komodakis. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
-  D. Ha and D. Eck. A neural representation of sketch drawings. CoRR, abs/1704.03477, 2017.
-  A. Hausner. Simulating decorative mosaics. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 573–580. ACM, 2001.
-  A. Hertzmann. Painterly rendering with curved brush strokes of multiple sizes. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages 453–460. ACM, 1998.
-  A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 327–340. ACM, 2001.
-  A. Hill, A. Raffin, M. Ernestus, A. Gleave, R. Traore, P. Dhariwal, C. Hesse, O. Klimov, A. Nichol, M. Plappert, A. Radford, J. Schulman, S. Sidor, and Y. Wu. Stable baselines. https://github.com/hill-a/stable-baselines, 2018.
-  H. Huang, P. S. Yu, and C. Wang. An introduction to image synthesis with generative adversarial nets. CoRR, abs/1803.04469, 2018.
-  E. Jang, C. Devin, V. Vanhoucke, and S. Levine. Grasp2vec: Learning object representations from self-supervised grasping. arXiv preprint arXiv:1811.06964, 2018.
-  B. Jia, C. Fang, J. Brandt, B. Kim, and D. Manocha. Paintbot: A reinforcement learning approach for natural media painting. arXiv preprint arXiv:1904.02201, 2019.
J. Johnson, A. Alahi, and L. Fei-Fei.
Perceptual losses for real-time style transfer and super-resolution.In European Conference on Computer Vision, pages 694–711. Springer, 2016.
-  T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
-  D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
-  A. Kolesnikov, X. Zhai, and L. Beyer. Revisiting self-supervised visual representation learning. arXiv preprint arXiv:1901.09005, 2019.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
-  Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Universal style transfer via feature transforms. In Advances in Neural Information Processing Systems, pages 386–396, 2017.
-  Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang. Flow-grounded spatial-temporal video prediction from still images. In Proceedings of the European Conference on Computer Vision (ECCV), pages 600–615, 2018.
-  T. Lindemeier, J. Metzner, L. Pollak, and O. Deussen. Hardware-based non-photorealistic rendering using a painting robot. In Computer graphics forum, volume 34, pages 311–323. Wiley Online Library, 2015.
-  V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
-  A. Owens and A. A. Efros. Audio-visual scene analysis with self-supervised multisensory features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 631–648, 2018.
S. Ross, G. Gordon, and D. Bagnell.
A reduction of imitation learning and structured prediction to
no-regret online learning.
Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635, 2011.
-  M. P. Salisbury, S. E. Anderson, R. Barzel, and D. H. Salesin. Interactive pen-and-ink illustration. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pages 101–108. ACM, 1994.
P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays.
Scribbler: Controlling deep image synthesis with sketch and color.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, 2017.
-  J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897, 2015.
-  J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
-  R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057–1063, 2000.
-  F. Tang, W. Dong, Y. Meng, X. Mei, F. Huang, X. Zhang, and O. Deussen. Animated construction of chinese brush paintings. IEEE transactions on visualization and computer graphics, 24(12):3019–3031, 2018.
-  C. Vondrick, H. Pirsiavash, and A. Torralba. Generating videos with scene dynamics. In Advances In Neural Information Processing Systems, pages 613–621, 2016.
-  G. Winkenbach and D. H. Salesin. Rendering parametric surfaces in pen and ink. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 469–476. ACM, 1996.
-  N. Xie, H. Hachiya, and M. Sugiyama. Artist agent: A reinforcement learning approach to automatic stroke generation in oriental ink painting. CoRR, abs/1206.4634, 2012.
-  N. Xie, T. Zhao, and M. Sugiyama. Personal style learning in sumi-e stroke-based rendering by inverse reinforcement learning. Information Processing Society of Japan, 2013.
-  N. Xie, T. Zhao, F. Tian, X. H. Zhang, and M. Sugiyam. Stroke-based stylization learning and rendering with inverse reinforcement learning. IJCAI, 2015.
-  K. Zeng, M. Zhao, C. Xiong, and S. C. Zhu. From image parsing to painterly rendering. ACM Trans. Graph., 29(1):2–1, 2009.
-  N. Zheng, Y. Jiang, and D. Huang. Strokenet: A neural painting environment. In International Conference on Learning Representations, 2019.
-  T. Zhou, C. Fang, Z. Wang, J. Yang, B. Kim, Z. Chen, J. Brandt, and D. Terzopoulos. Learning to doodle with deep q networks and demonstrated strokes. British Machine Vision Conference, 2018.
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros.
Unpaired image-to-image translation using cycle-consistent adversarial networks.In Proceedings of the IEEE International Conference on Computer Vision, pages 2223–2232, 2017.