Making Robots Draw A Vivid Portrait In Two Minutes

05/12/2020 ∙ by Fei Gao, et al. ∙ Peking University NetEase, Inc 3

Significant progress has been made with artistic robots. However, existing robots fail to produce high-quality portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid portrait, and then draw it on paper within two minutes averagely. At the heart of our system is a novel portrait synthesis algorithm based on deep learning. Innovatively, we employ a self-consistency loss, which makes the algorithm capable of generating continuous and smooth brush-strokes. Besides, we propose a componential-sparsity constraint to reduce the number of brush-strokes over insignificant areas. We also implement a local sketch synthesis algorithm, and several pre- and post-processing techniques to deal with the background and details. The portrait produced by our algorithm successfully captures individual characteristics by using a sparse set of continuous brush-strokes. Finally, the portrait is converted to a sequence of trajectories and reproduced by a 3-degree-of-freedom robotic arm. The whole portrait drawing robotic system is named AiSketcher. Extensive experiments show that AiSketcher can produce considerably high-quality sketches for a wide range of pictures, including faces in-the-wild and universal images of arbitrary content. To our best knowledge, AiSketcher is the first portrait drawing robot that uses deep learning techniques. AiSketcher has attended a quite number of exhibitions and shown remarkable performance under diverse circumstances.



There are no comments yet.


page 3

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The ability of robots to create high-quality artworks is popularly considered a benchmark of progress in Artificial Intelligence (AI). Reaserchers have contributed great efforts to develop artistic robots, which can draw sketch portraits

[1], colourful pictures [2], watercolors [3], etc. Some of them have attended various exhibitions and shown impressive performance [4, 5]. Drawing colourful images typically costs a robot several hours to finish it [3]. In contrast, a portrait typically contains a parse set of graphical elements (e.g., lines) and can be finished in a short time. Portrait drawing robots thus allow active interactions between robots and normal consumers.

In recent years, portrait drawing robots has attracted a lot of attention. Existing robots typically produce portraits by means of low-level image processing [1],. They are unable to draw high-quality portraits, especially for faces in-the-wild, i.e. faces presenting extreme poses, expressions, or occlusions, etc. Portraits have a highly abstract style. Even for artists, portraits drawing relies on professional training and experience [6]. Therefore, it is still a complicated and elaborate task for robots to draw high-quality portraits.

The other critical challenge is to balance the ”vividness” of a sketch and the time-budget for drawing it. Vivid sketches can please human users, while limited time-budget would avoid their impatience. However, the vividness and the drawing time contradict each other in practice. Vividness is correlated with details about characteristics in a given face. Generating more details in a portrait generally improves the vividness, but prolongs the drawing procedure. Conversely, a portrait with a sparse set of elements can be quickly drawn but may fail to capture individualities.

To address the above challenges, in this work, we develop a novel portrait drawing robot by means of deep neural networks

[7]. The proposed method is inspired by the great success of neural style transfer (NST) in creating various types of artworks [8]. To our best knowledge, no drawing robots have been developed based on these algorithms. Besides, preliminary experiments show that these algorithms cannot produce continuous and smooth brush-strokes.

In this work, we propose a novel portrait synthesis algorithm for drawing robots. Specially, we propose two novel objectives for generating sparse and continuous brush-strokes. First, we enforce the algorithm capable of reconstructing real sketches (of arbitrary content) by using a self-consistency loss. This constraint considerably improves the continuity and realism of synthesised brush-strokes. Second, we apply a sparsity constraint to regions which are insignificant for characterizing individualities. The corresponding componential-sparsity loss significantly reduces the number of brush-strokes without degrading the quality of a portrait. Here, we use face parsing masks to represent facial components. We also implement a specific local synthesis algorithm as well as several pre- and post-processing techniques to deal with the background and details.

Finally, we develop a drawing robot, namely AiSketcher, by using a 3-degree-of-freedom (DOF) robotic arm. The synthesised sketch is converted into a sequence of trajectories that the robot reproduces on paper or other flat materials. In the robotic system, we implement several human-robot interaction techniques to facilitate uses of AiSketcher. Extensive experimental results show that AiSketcher significantly outperform previous state-of-the-art. Besides, AiSketcher works well over a wide range of images, including faces in-the-wild and universal images of arbitrary content. AiSketcher has attended a quite number of exhibitions and shown remarkable as well as robust performance under diverse circumstances.

In summary, we make the following contributions:

  • AiSketcher is the first portrait drawing robot that uses deep learning techniques.

  • We propose a self-consistency loss to make the algorithm produce continuous and realistic brush-strokes.

  • We propose a componential-sparsity loss to balance the vividness and time-budget.

  • Our portrait synthesis algorithm doesn’t need a large-scale photo-sketch pairs for training.

  • Our robotic system works well for faces in-the-wild and a wide range of universal images.

The paper is organized as follows: Section II provides a brief description of related work. Section III presents an overview on the whole system. In Section IV and Section V, the portrait synthesis algorithms and the robotic system are detailed, respectively. Extensive experiments are presented in Section VI. Finally, we conclude this work in Section VII.

Ii Related Work

In this section, we briefly introduce some related work, including portrait drawing robots, face sketch synthesis, deep neural style transfer, and generative adversarial network.

Ii-a Portrait Drawing Robots

A number of efforts have been devoted to develop portrait drawing robots. For example, Calinon et al. [9] develop a humanoid robot drawing binary facial pictures. Lin et al. [10] use center-off algorithm and median filtering to generate a portrait, and then refine it based on low-level textural features. The resulting portraits are unduly simple and roughly present geometric outlines of human faces. Mohammed et al. [11] use edge detection and counter tracing to synthesis sketches and plan the path. Additionally, Lau et al. [12] explore a portrait robot by means of point-to-point drawing instead of line-drawing. Tresset et al. [13] extract Gabor kernels to extract salient lines from a detected face, and finish drawing by means of visual feedback.

Recenlty, Xue and Liu [14] use facial key-points and a mathematical morphology algorithm to generate a primal sketch. Then, they generate some steaks to represent hairs based on textural features. Similarly, Gao et al. [1] use facial key-points to synthesis a outline portrait, and then use random points and lines to represent eyebrows and hair. The portraits produced by these methods typically present stiff steaks. Besides, these methods rely heavily on the performance of key-points detection algorithms, which unfortunately don’t work well for faces in-the-wild. Finally, the generated strokes are not in an artistic style.

Ii-B Deep Neural Style Transfer

Neural Style Transfer (NST) means using Convolutional Neural Networks (CNNs) to render a content image in a target style. The problem of style transfer is closely related to texture synthesis and transfer

[15]. Early approaches typically rely on low-level statistics. Recently, Gatys et al. [16] first use CNNs to solve this problem and have lead to a trending developments both in academic literature and industrial applications [8]. Inspired by the success of NST, we develop a novel portrait synthesis algorithm in this work.

Ii-C Generative Adversarial Networks

Recently, Generative Adversarial Networks (GANs) [17, 18] have show inspiring results in generating facial pictures, natural images, facial sketches, paintings etc. In GAN, a neural network works as generator , which learns to transfer a source image to a target image . The other network works as discriminator , which aims at distinguishing with the generated image [18]. and are alternately optimized in an adversarial manner. Finally, would be capable of producing images in the style of . There have been a huge number of algorithms and applications of GANs. Please refer to [19] for a comprehensive study.

Recently, Yi et al. proposed a APDrawingGAN model for portrait drawing [6]. However, the resulting sketches present too many dilated details and black areas, which would significantly increase the drawing time of a robot. Besides, APDrawingGAN cannot produce high-quality portraits for faces in-the-wild. Corresponding experimental results will be shown in Section VI-C.

Ii-D Facial Sketch Synthesis

Another close topic to our work is face sketch synthesis (FSS). FSS generally means synthesizing a pencil-drawing sketch based on an input face photo [20]. Recently, researchers develop some NST based [21] or GAN based [22] methods, which can produce fantastic pencil-drawing sketches. However, pencil-drawing sketches typically consume a long time to draw. They are thus unsuitable for real-time drawing robots.

Fig. 1: Overview of our portrait drawing robotic system, AiSketcher.

Iii Architecture of The System

In this section, we overview the architecture of the robotic portrait drawing system. It is composed of both software (e.g. the algorithms for portrait synthesis) and hardware components (e.g. the robotic arm and remote server). Fig. 1 shows an overview on the system.

First, a facial picture is uploaded by users through a social media account and fed into the portrait synthesis algorithms, as explained in Section IV. Afterwards, the resulting portrait is converted to a sequence of trajectories, by the path planning module. Finally, the list of commands is deployed on a NVIDIA Jetson Nano board to control the robotic arm, which reproduces the portrait on paper or other flat materials. Additionally, we provide a mobile device for users to choose a preferred portrait from a number of alternatives, and to start the drawing procedure.

Iv Algorithms for Portrait Synthesis

The pipeline of our portrait synthesis algorithms is as shown in Fig. 1. Given a facial picture

, we first pre-process it by means of face detection, face alignment, face parsing, and background removal. The resulting picture is denoted by

. Afterwards, we feed into a global sketch synthesis algorithm, which produces a primary portrait. Additionally, we use a local sketch synthesis algorithm to specifically synthesis eyebrows. We then fuse the globally and locally synthesised ketches, and deal with details, in the post-processing stage.

Fig. 2: Illustration of image pre-processing. (a) Input picture, (b) aligned picture, (c) background removal, (d) face parsing mask , and (e) composentional sparsity mask .

Iv-a Pre-processing

We use the following approaches to align an input picture and to deal with the background.

Face Detection. First, we use the dlib toolbox in OpenCV [23] to detect the face in a given image, as well as 68 key-points in the face. In default settings, if more than one faces are detected, we choose the largest one. If no face is detected, we feed the original image into the global sketch synthesis algorithm to produce a final output.

Face Alignment. Afterwards, we geometrically align the facial picture by affine transformation, relying on centers of eyes. The aligned image is automatically cropped to pixels (1:1) or pixels (3:4), according to the size of the original picture. Fig. 2b shows an aligned example.

Face Parsing. Then, we use face parsing to represent facial components. Specially, we use MaskGAN [24] to segment the aligned picture into 19 components, including background, hair, two eyebrows, two eyes, nose, two lips, neck, clothes, etc. Fig. 2d illustrates a parsing mask , where facial components are distinguished by colors.

Background Removal. Finally, based on the parsing mask, we replace the background with white pixels (Fig. 2(c)), so no elements would be generated in the background.

Iv-B Global Sketch Synthesis

We first develop a global sketch synthesis algorithm to transfer the facial picture to a sketch. Here, we adopt the algorithm propoed in [25]

as our base model, due to its inspiring performance in the arbitrary style transfer task. However, this algorithm cannot generalize sparse and continuous brush-strokes for sketch synthesis. We thus propose two novel loss functions to boost the quality of sketch synthesis. The framework of our global sketch synthesis network is as shown in Fig.

3. Details will be presented bellow.

Fig. 3: Pipeline of the global sketch synthesis algorithm.

Network Architecture. The global synthesis algorithm takes the pre-processed face photo and a style image as inputs, and outputs a synthesised sketch. Here, we use a real sketch (of arbitrary content) as the style image.

We use the same network architecture as that used in [25]. Specially, we use the first few layers of a pre-trained VGG-19 [26] as an encoder . Both the content and style images are fed into and converted to feature maps: and . Afterwards, an AdaIN layer produces the target feature maps by:

where and

denote the mean and variance of features. Finally,

is fed into a decoder , generating a sketch portrait :

includes nine convolutional layers and three nearest up-sampling layers. Reflection padding is applied to each layer.

In the implementation, is fixed and is randomly initialized. We use a pre-trained VGG-19 to compute the following four loss functions for training .

Content Loss. The content loss encourages a synthesised sketch has the same content as the input facial picture . It is formulated as the Euclidean distance between the target features and the features of the portrait :

Style Loss.

In AdaIN, the style is represented by the mean and standard deviation of deep features. The style loss is correspondingly formulated as:

where denotes the outputs over the -th layer in VGG-19. In the experiments, we use , , , and layers to compute the style loss. .

Self-consistency Loss. We argue that the learned mapping function should be self-consistent: For a real sketch image , the algorithm should be able to reconstruct it, i.e. . The corresponding self-consistency loss is formulated as:

Experiments show that the self-consistency loss significantly improves the continuity of synthesised brush-strokes.

Compositional Sparsity. Finally, we employ a componential sparsity to constrain the network producing a sketch with a sparse set of brush-strokes. High-quality portraits mainly present brush-strokes for primary components or geometric boundaries. We therefore merely add a sparse constraint to relatively unnecessary regions, such as the background, hair, skin, and body. The corresponding compostional-sparsity loss is formulated as:

where denotes the element-product operation. is a binary sparsity mask derived from the parsing mask . Specially, is of the same size as and . In , pixels corresponding to boundaries, eyes, eyebrows, and lips are assigned 0. All the rest positions are assigned 1 in , and encouraged to present white color in . Since the predicted parsing mask might be imprecise, we slightly enlarge the black areas in by image erosion. The final composional-sparsity mask is as illustrated in Fig. 2(g).

Using a global sparsity, i.e. is full of 1, also leads to a parse set of elements in . However, it reduces the elements about individual characteristics as well. Corresponding experiments will be presented in Section VI-F.

Full Objective. Our full objective is:

where, are weighting factors.

Iv-C Local Sketch Synthesis

By carefully examining the globally synthesised portraits, we find eyebrows are not well generated occasionally. For light-colored eyebrows, the algorithm might produce no elements in this region. For eyebrows with non-uniform colors, the algorithm may produce a line with bifurcations.

To address this problem, we additionally use a GAN to specifically synthesis eyebrows. Here we reproduce the local GAN in APDrawingGAN [6]. The local sketch synthesis network follows an U-Net architecture [18] and trained on a small set of eyebrow photo-sketch pairs [6].

Iv-D Post-processing

Finally, we implement a number of post-processing techniques to deal with the background and details.

Image Binarization.

The output of both the global and local synthesis algorithms are gray-scale sketches. We convert them to binary images based on a threshold of .

Fusing Global and Local Sketches. Before fusion, we thin the locally synthesised eyebrows and remove delicate eyelashes [6], by using image dilation. Afterwards, we replace the eyebrows in the globally synthesised portrait with these dilated eyebrows. We here use the face parsing mask to determine areas of eyebrows.

Eyeball Renewal. Occasionally, our algorithm only produce a circle to represent an eyeball. Although this hardly harm the final sketch drawn by the robot, we propose a method to renewal an eyeball. Specially, we add a black spot to an eye center if it is blank. We determine the position of eye center based on facial key-points detection.

Style Fusion. Given a facial picture, the global synthesis algorithm produces diverse sketches while using different style images. Such sketches might diverse in the number, width, and continuity of brush-strokes. It is therefore provide a chance to fuse these sketches based on the facial parsing mask. In our implementation, we use a style image to synthesis the primal sketch, and use another style image to synthesis the hair. Afterwards, we replace the hair region in the former sketch by those in the latter one.

V The Robotic System

In this section, we introduce the hardware components and the path planning module, which implement the above synthesised portrait into a real sketch.

V-a Hardware

We use a commercial robotic arm (3 DOFs), i.e. uArm [28], to perform portrait drawing. We choose this robotic arm based on a trade-off between the performance and the cost. Using better robots might boost the performance, but will dramatically increase the cost for developing a portrait drawing robot. The workspace size on paper is about . In this work, we have adopted writing brushes for portraits drawing. We integrate the brush with the robotic arm through plastic supports, which are manufactured by means of 3D printing.

To improve the mobility of the robot, we have implemented all the portrait synthesis and path planning algorithms on a remote server. In this way, the terminal of our drawing robot is composed of a robotic arm, a NVIDIA Jetson Nano board, and a mobile device. The light weight and small volume of our robotic terminal dramatically ease its applications. Besides, by using a NVIDIA GeForce GTX 1060 GPU on the remote server, we can transfer an input picture to a sketch in one second averagely. Such a high-speed synthesis meets the requirements of real-time applications.

V-B Path Planning

Artists typically draw graphical lines along geometrical boundaries, and render black marks for eyebrows and eyeballs. To imitate this behaviour, we use the following methods to convert a synthesised portrait to a trajectory.

We first extract the skeletons of a portrait image. The portrait typically contain brush-strokes with non-uniform width. It is time-consuming for a robot to elaborately rendering such details. By extracting skeletons of brush-strokes, we reduce the elements for a robot to reproduce. We implement this by means of mathematical morphology. Afterwards, we search a sequence of points on skeletons, along the gradient orientations estimated by a Canny edge detector. Here breadth-first search is used. Finally, to fill the black marks representing eyeballs and eyebrows, we draw the closed loop of edge from outside to inside iteratively.

Vi Experiments

In this section, we first introduce implementation details, and then present a number of experiments to verify the performance of our portrait drawing robot.

Vi-a Implementation Details

Datasets. First, we need a large set of content images and a number of style images, to train the global sketch synthesis algorithm. To this end, we randomly select 1,000 low-resolution facial pictures from the CelebA dataset [27] and 1,000 high-resolution facial pictures from the CelebA-HQ dataset [24], as the set of content images. All these pictures are resized into pixels. We also download 20 real sketches of arbitrary content from the Web, as style images. We randomly choose 95% of facial pictures for training, and use the rest for validation.

Second, to train our local sketch synthesis algorithm, we extract 280 eyebrow photo-sketch pairs the dataset released by [6]. Each sample is of pixels. We randomly choose 210 samples for training, and the rest 70 samples for validation.

Finally, we download about 200 images, including celebrity faces and universal images of arbitrary content, from the Web for testing. These facial pictures present faces in-the-wild, which are diverse in poses, expressions, etc. We apply our AiSketcher to these images so as to evaluate its performance in practical applications. The corresponding results have been released at:

Training. In the training stage, we use the Adam optimizer with a learning rate of and a batch size of , for both the global and local sketch synthesis algorithms. Besides, to optimize the global algorithm, we have , and in Equ.7, and run for 160,000 iterations. To optimize the local algorithm, we alternate between one gradient descent step on , then one step on . We use the Adam optimizer with and run for different epochs. We train our algorithms on a single NVIDIA GeForce GTX 1080Ti GPU. It takes about 22 hours and 1 hour to train the global and the local algorithms, respectively.

Vi-B Comparison with State-of-the-art

In this part, we compare our portrait synthesis algorithm with existing work about portrait drawing robots, i.e. [10], [14], and [1], and a state-of-the-art (SOTA) portrait synthesis method, i.e. APDrawingGAN [6]. For this purpose, we collect the pictures and synthesised portraits presented in [10], [14], and [1]. Afterwards, we apply the officially released APDrawingGAN and our learn sketch synthesis algorithm to these facial pictures, respectively.

Fig. 4: Comparison with existing portrait drawing methods. Input images are colected from [10], [14], and [1]. The second row shows the corresponding results in the original papers. The third and bottom rows show sketches synthesised by APDrawingGAN [6] and our method, respectively.

As shown in Fig. 4, the portraits produced by [10] are over simple and present sinuate lines. Portraits produced by both [14] and [1] represent facial structures by stiff and unrealistic lines. Besides, both of them cannot properly synthesis the hair. The portraits generated by APDrawingGAN capture the distinctive appearance of a person, but present low-quality elements. Besides, APDrawingGAN cannot produce portraits with a sparse set of continuous lines and marks.

In contrast, the portraits synthesised by our algorithm successfully capture individual characteristics and feelings, by using a sparse set of realistic brush-strokes. Moreover, our algorithm produces a primal sketch for the area of cloths, which further improve the vividness of facial portraits. Note that these facial pictures are low-resolution and blurring. Results shown in Fig. 4 demonstrate the remarkable capacity of our algorithm in generating high-quality portraits.

Vi-C Qualitative Evaluation

We further apply our robotic system to a wide range of faces in-the-wild. Such faces are diverse in illumination condition, pose, expression, etc. and may contain occlusions. Fig. 5 show some examples. Here, we pre-process photos in Fig. 5a -5d before feeding them into APDrawingGAN or our AiSketcher. While the photos in Fig. 5e -5h are directly input into APDrawingGAN or our global synthesis algorithm for producing portraits.

Fig. 5: Sketchy portraits produced by our AiSketcher for national celebrities. From top to bottom are: input images, sketches synthesised by APDrawingGAN [6], sketches synthesised and drawn by AiSketcher. All the input images shown here are downloaded from the Web.

Similarly, APDrawingGAN doesn’t work properly for these photos. In contrast, our AiSketcher considerably and consistently produce high-quality sketches for all these photos. Our algorithm successfully deal with complex backgrounds (Fig. 5a and 5b), glasses (Fig. 5b), extreme poses (Fig. 5e, 5f, and 5h), and occlusions (Fig. 5f). Notably, given a half-length photo (Fig. 5h), our algorithm is capable of producing a high-quality and abstract portrait.

By comparing our synthesised sketches with the final versions drawn by AiSketcher, we conclude that the robotic arm precisely reproduce the synthesised sketches. Although some details are slightly missing, due to the limited capability of our 3-DOF robotic arm, the final sketches successfully present the individual appearances.

Vi-D Generalization Ability

We further apply AiSketcher to universal images, such as pencil-drawing images, cartoon images, etc. In this case, we don’t use pre-processing or the local synthesis algorithm, because there might be no human face in a given image. As illustrated in Fig. 6, AiSketcher still produce considerably high-quality sketches. A wide range of test shows that our AiSketcher tends to produce a high-quality sketch for an universal image, unless the image contains a large area of delicate textures or is low-quality (e.g. low-resolution, blurred, or noisy, etc.).

Fig. 6: Synthesised sketches for universal images. From top to bottom are: input images, sketches synthesised and drawn by AiSketcher. All the input images shown here are downloaded from the Web.

Vi-E Time Consumption

In the testing stage, given an input picture, it takes about for AiSketcher to synthesis a sketch, by using a server with a NVIDIA GeForce GTX 1060 GPU. Afterwards, it costs about 1-3 minutes for the robotic arm to draw it. The total time-consuming is about 2 minutes in average. Thus AiSketcher fairly satisfy the time-budget in practical scenarios. Recall the considerable performance of AiSketcher in generating high-quality portraits, we conclude that AiSketcher successfully balance vividness and time-consumption.

Vi-F Ablation Study

Finally, we present a series of ablation studies to verify the effects of our proposed techniques. For this purpose, we test several model variants derived from our portrait synthesis algorithm, including: (i) the based model, AdaIN [25]; (ii) AdaIN with the self-consistency loss, AdaIN w/ ; (ii) AdaIN with both and global sparsity, AdaIN w/ ; (iv) AdaIN with both and compositional sparsity loss, AdaIN w/ ; and (v) our full model, including pre-processing, local synthesis, and post-processing.

Fig. 7: Ablation study on our novel portrait synthesis algorithm. From left to right are: (a) Input photo, and sketches synthesised by (b) AdaIN [25], (c) AdaIN w/ , (d) AdaIN w/ , (e) AdaIN w/ , and (f) our full model, (g) parsing mask predicted by using MaskGAN [24].

Effects of Self-consistency Loss. Fig. 7a shows that AdaIN typically produces black marks over the hair region, and snatchy brush-strokes. In contrast, AdaIN w/ produces smooth and continuous brush-strokes with few black marks. The apparent improvement demonstrates our motivation of using the self-consistency loss.

Effects of Compositional Sparsity. Sketches in Fig. 7b present too many brushes, which requires a long drawing time. Both Fig. 7c and 7d show that using a sparsity constraint dramatically reduce the number of brush-strokes. However, using a global sparsity reduces the elements representing individual characteristics (e.g. the eyes and eyebrows in Fig.7c). In contrast, our compositional sparsity successfully avoid this problem. In other words, the compositional sparsity dramatically decrease the drawing time without apparently decreasing the quality of synthesised portraits.

Effects of Local Synthesis. Fig.7d shows that the globally synthesised eyebrows may present lines with bifurcations. While replacing the eyebrows by the locally synthesised ones results in a better portrait, as shown in Fig.7e.

Based on the above observations, we can safely draw the conclusion that our proposed techniques significantly improve the quality of synthesised portraits.

Vii Conclusions

In this work, we present a portrait drawing robot, named AiSketcher. AiSketcher has shown fantastic performance over a wide range of images. Besides, extensive experimental results demonstrate that AiSketcher achieves a balance between the quality and the drawing time of sketch portraits. By carefully examining the produced sketches by AiSketcher, we find there is still substantial room for improvement. First, our sketch synthesis algorithm may not produce primal portrait for faces with shadows. It typically produce undesired brush-strokes along boundaries of a shadow. Illumination normalization is a promising approach for solving this problem. Second, the robotic arm cannot precisely reproduce delicate elements in a synthesised sketch. It is interesting to optimize both the path planning and the portrait synthesis algorithms, by taking into account the limitations of a robot arm. Besides, visual feedback [2, 29] is another promising solution. We will explore these issues in the near future.


  • [1] Gao, Qiaochu, et al. A Robot Portraits Pencil Sketching Algorithm Based on Face Component and Texture Segmentation, IEEE ICIT, 2019: 48-53.
  • [2] Luo, R.C., Hong, M.-J., Chung, P.-C.: Robot artist for colorful picture painting with visual control system. In: 2016 IEEE/RSJ IROS, 2016, pp. 2998?3003 (2016)
  • [3] Lorenzo S., Stefano Seriani, Alessandro G., Paolo G., Watercolour Robotic Painting: a Novel Automatic System for Artistic Rendering, Journal of Intelligent & Robotic Systems, (2019) 95: 871?88.
  • [4] 2018 International Robotic Art Competition (RobotArt). (2018)
  • [5] Gommel, M., Haitz, M., Zappe, J.: Robotlab autoportrait project: Human portraits drawn by a robot.
  • [6] Ran Yi, Yong-Jin Liu, Yu-Kun Lai, Paul L. Rosin, APDrawingGAN: Generating Artistic Portrait Drawings from Face Photos with Hierarchical GANs, IEEE CVPR, 2019, pp. 10743-10752
  • [7]

    Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012: 1097-1105.

  • [8] Jing, Yongcheng, et al., Neural Style Transfer: A Review, IEEE TVCG, 2019: 1-1.
  • [9] Sylvain Calinon, Julien Epiney and Aude Billard, A humanoid robot drawing human portraits.” IEEE/RSJ IROS (2005): 161-166.
  • [10] Lin, Chyiyeu, Liwen Chuang, and Thi Thoa Mac, Human portrait generation system for robot arm drawing, IEEE AIM (2009): 1757-1762.
  • [11] Mohammed, Abdullah, Lihui Wang, and Robert X. Gao, Integrated Image Processing and Path Planning for Robotic Sketching, Procedia CIRP (2013): 199-204.
  • [12] Lau, Meng Cheng, et al., A portrait drawing robot using a geometric graph approach: Furthest Neighbour Theta-graphs, IEEE AIM (2012): 75-79.
  • [13] Tresset P, Leymarie F F, Portrait drawing by Paul the robot, Computers & Graphics, 2013, 37(5):348-363.
  • [14] Xue T, Liu Y, Robot portrait rendering based on multi-features fusion method inspired by human painting, IEEE ROBIO. IEEE, 2017.
  • [15] A. A. Efros and T. K. Leung, Texture synthesis by nonparametric sampling, In ICCV, 1999.
  • [16] L. A. Gatys, A. S. Ecker, andM. Bethge, Image style transfer using convolutional neural networks, In CVPR, 2016.
  • [17] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, In NIPS, 2014.
  • [18]

    P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, Image-to-image translation with conditional adversarial networks, In CVPR, 2017.

  • [19] Gui, Jie, et al., A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications, arXiv:2001.06937, pp. 1-28 (2020).
  • [20] N. Wang, D. Tao, X. Gao, X. Li, and J. Li, A comprehensive survey to face hallucination, IJCV, vol. 106, no. 1, pp. 9?30, 2014.
  • [21] Chen, Chaofeng, Xiao Tan, and Kwanyee K. Wong, Face Sketch Synthesis with Style Transfer Using Pyramid Column Feature, WACV, 2018: 485-493.
  • [22] J. Yu, X. Xu, F. Gao, S. Shi, M. Wang, D. Tao, and Q. Huang, Towards Realistic Face Photo-Sketch Synthesis via Composition-Aided GANs, IEEE Trans. CYBERN, DOI: 10.1109/TCYB.2020.2972944, 2020.
  • [23]
  • [24] Lee, Cheng-Han, Liu, Ziwei, Wu, Lingyun, Luo, Ping, MaskGAN: Towards Diverse and Interactive Facial Image Manipulation, arXiv preprint arXiv:1907.11922, 2019.
  • [25] Huang, Xun, and Serge Belongie, Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization, In ICCV, 2017: 1510-1519.
  • [26] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, In ICLR, 2015.
  • [27] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, ICCV, 2015, pp.3730?3738
  • [28]
  • [29] Berio, D., Calinon, S., Leymarie, F.F., Learning dynamic graffiti strokes with a compliant robot, In: 2016 IEEE/RSJ IROS, IEEE, pp. 3981?3986 (2016)