Line Artist: A Multiple Style Sketch to Painting Synthesis Scheme

03/18/2018 ∙ by Jinning Li, et al. ∙ Shanghai Jiao Tong University 0

Drawing a beautiful painting is a dream of many people since childhood. In this paper, we propose a novel scheme, Line Artist, to synthesize artistic style paintings with freehand sketch images, leveraging the power of deep learning and advanced algorithms. Our scheme includes three models. The Sketch Image Extraction (SIE) model is applied to generate the training data. It includes smoothing reality images and pencil sketch extraction. The Detailed Image Synthesis (DIS) model trains a conditional generative adversarial network to generate detailed real-world information. The Adaptively Weighted Artistic Style Transfer (AWAST) model is capable to combine multiple style images with a content with the VGG19 network and PageRank algorithm. The appealing artistic images are then generated by optimization iterations. Experiments are operated on the Kaggle Cats dataset and The Oxford Buildings Dataset. Our synthesis results are proved to be artistic, beautiful and robust.



There are no comments yet.


page 1

page 2

page 5

page 12

page 13

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural networks and Generative adversarial networks (GANs) has been applied to many scientific and engineering fields. However, researches on artist production are still limited. There are many people who want to paint beautiful paintings but feel depressed about being not good at drawing and coloring. So we are considering creating a method using machine learning to assist people in drawing artistic paintings.

In this paper, we propose a new scheme, Line Artist, to paint like a well-known painter. The only thing needed to be done is to draw some sketch lines. Then, Line Artist will draw a meaningful and elegant painting for the users like images shown in Fig.1. Everybody can become a skilful painter!

Many works have been done to transform a real-world image into an artistic one. However, in our work, the inputs are sketch images containing much less information instead real photos because we aim at building a more convenient artistic style generating system for the users. This task is much harder than others since we need to obtain more information with limited information inputted.

Figure 2: Overview of Line Artist. Left: SIE model used to generate the dataset for training. Middle: DIS model based on cGAN used to add informative details into sketches. Right: AWAST model for stylizing images with multiple style samples.

In order to generate more information. Our scheme employs the GANs to extract the information from the sketch drafts and generate detailed images with more information, which is shown on the middile of Fig.2. Then, we propose a novel multi-style transfer algorithm based on the Artistic Style Transfer [1] algorithm and PageRank [2] to transform our synthesized detailed images into artistic ones.

The first challenge is the dataset, there are few usable datasets about the sketch images and their corresponding real-world images. However, in order to train a supervised system that is capable of generating a more informative detailed image using a sketch image, datasets are necessary. Inviting volunteers to tag a new dataset will cost a large scale of time and human resources. To solve this problem, we introduce the Sketch Image Extraction (SIE) model to synthesize sketch-like images and build the dataset efficiently, which is very similar to real freehand sketches. The SIE model is shown on the left side of Fig.2.

To extract the sketch more accurately in the SIE model, we apply the -Smooth [3] algorithm to smooth the initial image. This process will make the edges more distinct and wipe out the unnecessary veins. Then, we adapt the pencil drawing [4] method to extract the sketch images from the smooth ones. Through the SIE model, we obtain the dataset containing pairs of the generated sketch images and their corresponding real-world images.

To generate informatively detailed images from given sketch images, the Detailed Image Synthesis (DIS) model is introduced whose procedure is shown on the middile of Fig.2

. We use the dataset generated by the SIE model to train a system that receive the skech images extracted and output detailed images by generating more information. In this paper, we adapt the Pix2Pix 

[5] model with a new generator to the realize this synthesis operation. Experiments prove that the DIS model can also synthesize a nice result even though with the dataset generated by computer in the SIE section.

Finally, the Adaptively Weighted Artistic Style Transfer (AWAST) model, shown on the right of Fig.2, solves the challenge to transfer the style of images with multiple painting samples for a selected style. Artist Style Transfer algorithm proposed in [1]

is able to combine two images with both low-level features and high-level features extracted leveraging a pretrained VGG19 network 

[6]. In this paper, we introduce the Adaptive Style Weight (ASW) based on the feature similarity network and PageRank algorithm. Using ASW, multiple painting images can be combined more naturally and beautifully with the content. After these process, a colorful painting will be obtained using just a line sketch drawn by the users.

The contributions of this paper are summarized as bellow:

  • A delicate sketch image extracting scheme and two elaborate datasets containing pairs of real-world images and their corresponding sketch images.

  • An efficient detailed image synthesis model achieving more real-world details and patterns by inputing sketch images.

  • A novel algorithm to adaptively combine multiple painting samples with the content and synthesize appealing artistic images.

2 Related Works

Sketch Extraction

Many researches [7, 8, 9] focus on the edge extraction based on algebraic algorithm like sobel operator and fuzzy mathematics. These traditional edge extraction methods are good substitutions of sketch and usually run fast. However, the trends and continuity of extracted edges are not as natural as man-made ones. In [10]

, the author propose a random forest based method to detect edges. In the reference 

[11], the author propose a CNN-based edge detection algorithm, which performs better than the traditional ones.

In the paper [12], the author propose a RSRCNN to extract roads from aerial images, which can also be applied to the sketch extraction. However, the CNN-based methods are highly relied on the training datasets and cost a lot of resources to train a network. In [13] the author propose a face sketch synthesis scheme base on greedy search, this technique can synthesis sketch for other objects. However, its effect is more like a grey transfer than sketch extraction. In [14, 3], efficient methods of image smoothing are proposed based on wave pattern, which is helpful preprocessing for extraction. In [4], Lu propose a fast scheme to synthesize pencil drawing sketch image. The result is very appealing.

Detail Synthesis In [15, 5, 16], methods extended from GAN are used to synthesize detailed images with more information from given materials. These models are usually easier to train while the result are more fuzzy to some extent. Chen proposed an end-to-end cascaded refinement networks in [17] to synthesize large-size reality image from semantic layouts, whose result has a high resolution and accuracy. But this method is highly dependent on training datasets. In [18], the author propose an image synthesis model based on Laplacian pyramid, which has a lower computation complexity. In [19], Context Encoders, a GAN based model, is promoted to generate more information from the surroundings, in which artist style could be applied.

Style Transfer Most researches about style transfer focus on the combination of content and single style. In [20], initial content is transformed into oil style by pixel-level analysis. Gatys et al. [1] proposed a style transfer scheme based on CNN whose results are quite appealing. The model in [21] combines markov random fields with [1]. In [22], a faster feed-forward style image synthesis network is proposed. Texture synthesis is used to transfer image style in [23]. In this paper, multiple styles are combined together to synthesize appealing results with PageRank [2] in undirected graph [24].

3 Methodology of Line Artist

Our scheme includes three models: the Sketch Image Extraction (SIE) model, the Detailed Image Synthesis (DIS) model ,and the Adaptively Weighted Artistic Style Transfer (AWAST) model. The overall goal of our scheme is to generate a synthesized painting with artist style when receiving a freehand sketch image . The overview of our scheme is shown in Fig.2.

3.1 Sketch Image Extraction Model

The SIE model receives a reality image . After the L0-Smooth process, which is denoted by , is transformed into the smooth image . Then, the sketch extraction algorithm receives the smooth image and produce the sketch image .

where and are sets containing and . Then the pairs of and is denoted by

Figure 3: The procedure and results of SIE model. The upper left side is the origin image. (a) and (b) shows the differences of Canny algorithm before and after smoothing. (c) is the result of Pencil Sketch Extraction.

3.1.1 Image Smoothing

A challenge of sketch extraction is that there are complicated edges and patterns in the real-world images, for example, the gingko tree in Fig. 3. There are much patterns of the leaves so that the edges extracted in (a) is not similar with sketches of humans. When drawing a picture, most people cannot draw so subtle. Instead, we only draw the overall edges and simple patterns. So, the image smoothing process is necessary to make the dataset generated by SIE model more natural and similar to man-made one.

In this paper, we adopt the -Smooth [3], an image smoothing algorithm via gradient minimization. We use to denote the pixel map of the objective smooth image. For every pixel in , the gradient in is . The gradient measure in is defined as:


which counts the number of pixels in where .

The objective function of the optimization process is:


In -smooth algorithm, auxiliary variables and are introduced to calculate the minimization of Eqn.2:


with . Then with initialized by , the iteration will converge to the smooth image

3.1.2 Canny Edge Extraction

With the smooth image extracted, we can extract the contributive edge image by Canny operator. Here we adopt the Canny operator with Sobel method and double threshold. The result is shown in Fig.3(b). We can see that the overall shape of the tree is extracted without unexpected noise, which is much more like human sketch. However, there are still some problems. the edges are not smooth and natural enough. When people draw, different shades will be applied to different lines and areas. However, the Canny algorithm does not produce the information of shades.

3.1.3 Pencil Sketch Extraction

Although a sketching-like image is already generated use Canny operator in Section 3.1.2, the line in the generated image is not natural enough compared with real freehand sketching by humans.

To solve this problem, we use the the algorithm proposed in [4] to produce pencil sketches from real-world images. We mainly use the line drawing with strokes method, for we do not need pencil to draw the shadow.

There are some issues we cannot ingore when building the line drawing with strokes method. One is that artists always draw lines with breaks, but not long lines without any breaks. Another is that there are always crosses at the junction of two lines. Based on these two important issues, the method of drawing lines from strokes was born. When drawing strokes at a point, we determine the direction, length, width and shade in a pixel classification and the link process based on a unified convolution framework.

Classfication We first transform the input image into grayscale version. Then compute the gradients of it, yielding magnitude:


where I is the grayscale input, and are gradient operators in two directions, implemented by forward difference. Then we use local information to do the classification. We choose (depending on the case, it may changed to or other value) directions, each is degrees apart and denote the line segment as ( representing the directions and the length of is of the height or width. So the responding map for a certain direction is


where is the convolution operator. Finally, the classification is performed by choosing the maximum value among the response map in the directions. This step is written as


where refers to the pixels and is the magnitude map for direction .

Line Shaping We generate lines also by convolution when given the map set ,


Convolution aggregates nearby pixels along direction , which links edge pixels that are even not connected in the original gradient map.

Using these two method, we get freehand sketches, which are more like what ordinary people draw than using the Canny method.

3.2 Detailed Image Synthesis Model

In DIS model, We adapt the conditional Generative Adversarial Networks in [5], which is denoted by . This architecture fits our work because cGAN runs fast and its precision is high enough for artistic synthesis. In the training process, receive the pairs of reality and sketch images, , generated in Section 3.1 to train the model. In the test process, the DIS model receives the real freehand sketch image and then generates a detailed informative image .

The loss function of CGAN is:


where represents the generator, represents the discriminator.

In [5], is set to use the U-Net [25], which is similar to the encoder-decoder architecture. However, we find that the U-Net model often leads to crash and the results become awfully meaningless noise images with strange patterns.

We propose a new architecture of generator named Fantasy-Net, which is shown in Fig. 4. The Fantasy-Net combines the advantages of the U-Net and Residual-Net [26]. We attributes the crash problem to the simple skip connection in the U-Net. Intuitively, the simple skip connection will lead to some disorder of the convolutional layers. The residual blocks have both the features of skip connection and further encoding. So we use the residual blocks to optimize the skip connection in U-Net.

Figure 4: The architecture of Fantasy-Net. The blue arrows are convolutional operations, the red ones are pooling operations, and the green ones are transposed convolutional operations. In the middile are many residual blocks consisting of skip connections and weight layers.

Experiments prove that these residual blocks does not increase much running time, and the crash problem is solved. The results of Fantsy-Net are also less blurring and the colors are more natural.

In order to make the results closer to reality, we adapt the objective cGAN with distance inspired by [19]. By doing this, the generator is able to generate image more similar to the ground truth.

The compound objective is:


Then, the objective generator by the optimization is:


with the objective by training, we can generate the detailed image by .

3.3 Adaptively Weighted Artistic Style Transfer

The Adaptively Weighted Artistic Style Transfer (AWAST) model recieves the detailed informative image synthesized in Section 3.2 and a given set of artistic style samples . The features of the detailed image and the artistic samples are extracted from a pretrained VGG19 [6] net. A novel weight calculating algorithm based on PageRank is used to combine the features. Then, a random noise image was optimized to our objective image , which combines both the low and high level features of and .

Figure 5: The AWAST model. A pretrained VGG19 network is used to downsample the style and content images. Undirected graphs are built according to the similarity between the feature maps of different style images. ASWs are calculated with PageRank algorithm and applied to weighting the features.

The first step of AWAST is to feed and every into the pretrained VGG19 network inspired by the artistic style algorithm [1]. After this process, we obtain an activated feature map on each convolutional layer . Every activated feature map represents the different levels of features. In the beginning of convolutional layer, more low-level information about colors, patterns and details is extracted. And the high-level information like the distribution and shape is store in the latter convolutional layer.

We denote the set of activated feature maps for every artistic sample as : . Similarly, we denote the extracted features of informative image as and for noise image .

Our approach is optimizing by iteration to make sure it is similar to both and in . So, we define the loss function of the optimization as:


where is the content loss between and , is the style loss between and . is the adaptive style weight (ASW) of sample and layer . is the overall content weight and is the overall style weight.

The content loss is defined as the squared error loss between and :


The style loss is defined with the Eccentric covariance matrix (Gram) matrix in different layer , . Our optimization problem is converted to minimize the difference between the Gram matrix of and . We define the style loss as


where is the normalization coefficient in order to make compatible with .

Recall that different can represent different features, in another word, the painting skills of different artists. For example, in Pablo Picasso’s painting, the high-level feature would contain more sharp, irregular geometry and lines while the low-lever feature would include deeper colors and smooth pattern. For Claude Monet’s paiting, the features are almost opposite.

To make the artistic style painting synthesized from multiple artist painting samples be more delicate and appealing, we introduce the adaptive style weight (ASW) to balance the style futures from different samples and layers. Since multiple painting samples are input to our system, various styles and skills from different artists will pile up. In order to emphasize the most common used skills, factions, and color preference, etc. and weaking some minor and unimportant skills from these artists , we use ASW to balance the importance of them.

To calculate the ASW, we build a style similarity network and use the undirected PageRank algorithm [27] to calculate the ASW. We define the difference matrix between different by the squared error:


We define the similarity matrix as:


The matrix is symmetric, full-rank and the diagnal elements are all zero. We build a fullly connected undirected graph according to . The weight of edges in this graph is defined as: . The weight of nodes in this graph is denoted as , representing the importance of this node in the PageRank algorithm.

The iteration formulation of undirected PageRank is:


where and are nodes in the graph. is the damping factor of PageRank algorithm. is the number of nodes. is the set of nodes that connecting with

Then, the matrix is obtained after the PageRank algorithm converged. Then, we define the ASW with a sigmoid mapping:


By normalization, the ASW is obtained.

Then, by minizing the overall loss in Eqn.3.3, the random noise image will converge to the objective image .

4 Experiments

Experiments of our scheme and baselines are set to evaluate the qualities of the results. The experiment of AWAST is also operated to compare with the result of Gatys et al.

Datasets: We use the cats dataset [28] on Kaggle and the Oxford buildings dataset [29].

For the cats dataset, after the preprocess of abandoning the images which mainly contains human beings, buildings or something but not cats, we choose images to train our model.

For the Oxford Buildings Dataset. We randomly choose building images in this dataset to carry out the training process.

For the styles samples in AWAST model, we mainly choose style images from Google for the ink painting style, the Picasso style, the Van gogh style, the watercolor style and Ukiyo-e style. For the Onmyoji style, we choose style images from a popular mobile game named Onmyoji.

Environment: All the experiments are conducted on a PC device (Intel(R) Core(TM) i7-6900K 3.2GHz, 16GB memory, NVIDIA GeForce(R) GTX 1080 Ti) and are implemented in Python 3.6.2.

4.1 Training

Because our model is the combination of the three models mentioned above, we now describe the training process in three main steps corresponding to the three models.

In the SIE model, for smoothing, we set the parameters lamda to be , kappa to be . And for pencil sketching, we set the length of convolution line to be , the width of the stroke to be , the number of directions to be for the cat dataset. For the building dataset, we set the length of convolution line to be , the width of the stroke to be , the number of directions to be . Because for some image in the two datasets, there are some pictures too complex for freehand drawing, we deleted these complex pictures.

The DIS model receives the training datasets from SIE, which are pairs of sketch images and the reality images. These images are resized to to accelerate our training speed. Then, we set the learning rate , the generator use the unet 256 network. The initialization of network is set to use xavier initializer. Both the input and output channels are set to be .

The AWAST model receives the -size images synthesized by the DIS model and the style image samples illustrated in Section 4. The parameters and in Eqn.3.3 are set as and . The dumping factor of PageRank in Eqn. 3.3 is set to be and the convergent accuracy of PageRank is . The learning rate is and parameters and for Adam optimizer are and . The iteration of optimization is set to be .

4.2 Baselines and Analysis

4.2.1 Baselines

We set some baselines to show our method of assembling the model works best.

  • Baseline 1: Real-world image Canny edges Train the model Artistic image. Result is shown in Fig.9.

  • Baseline 2: Real-world image Smoothing image Sketch image Train the model Artistic image. Result is shown in Fig.9.

  • Baseline 3: Real-world image Smoothing image Canny image Train the model Detailed image Artistic image. Result is shown in Fig.9.

  • Our scheme: Real-world image Smoothing image Pencil sketch Train the model Detailed image Artistic image. Result is shown in Fig.9.

Figure 7: Baseline 2: Smoothing, extracting pencil sketh, and feed AWAST directly
Figure 8: Baseline 3: Smoothing, Canny, DIS, and feeding AWAST with the detailed image
(a) Canny edge
(b) Onmyoji
(c) Ukiyo-e
(d) Van Gogh
(e) Chinese
(a) Sketch
(b) Onmyoji
(c) Ukiyo-e
(d) Van Gogh
(e) Chinese
(a) Detailed img
(b) Onmyoji
(c) Ukiyo-e
(d) Van Gogh
(e) Chinese
(a) Detailed Img
(b) Onmyoji
(c) Ukiyo-e
(d) Van Gogh
(e) Chinese
Figure 6: Baseline 1: Extract edge with canny algorithm and feed AWAST directly
Figure 7: Baseline 2: Smoothing, extracting pencil sketh, and feed AWAST directly
Figure 8: Baseline 3: Smoothing, Canny, DIS, and feeding AWAST with the detailed image
Figure 9: Our scheme: Smoothing, extracting of pencil sketch, DIS and feeding AWAST with the detailed image
Figure 6: Baseline 1: Extract edge with canny algorithm and feed AWAST directly

4.2.2 Analysis of baselines

As shown in Fig.9, the synthesized artistic image is quite blurring. There are many noisy points that can’t converge to the optimizing object. And the features of the content are not extracted successfully. So there are some areas just copy the pattern of the style samples. For example, in Fig.5(c), near the cat’s feet, the pattern is just like the tide which comes from the style samples of Ukiyo-e style. This is because the information in the edge picture is not enough to push the AWAST synthesize the objetive result, so as to people’s freehand sketch. So, the information generating process, DIS, is needed.

In Fig.9, we can see that by smoothing and pencil sketch extraction, the results are much better. The problems of pattern copying like Baseline are improved and the shape of the cat looks more clear. At the same time, the style image is not so blurring as Baseline . These improvements suggest that the smoothing and pencil sketch extraction are quite helpful to extract the key information in the reality image while weak the noise. By comparing Fig.5(a) and Fig.6(a), the sketch in Baseline is more like human freehand sketch on the continuity and directions.

Fig.9 represents the final results of Baseline , which includes smoothing, canny, DIS, and AWAST. Compared to Baseline , the results are much more beautiful. The shape of the cat is more clear and the problem of pattern copying is solved. The shade of the styles is also more natural. The details are also more clear. In Fig.7(e), for example, the eyes of the cat become yellow which is different from the color of its body. This means the detail of the cat’s eyes is noticed and emphasize by the AWAST model.

Our final scheme is shown in Fig.9, including smoothing, pencil sketch extraction, DIS, and AWAST. The performance improves a lot comparing to Baseline . For example, in Fig.8(e), eyes of the cat are much more vivid and the patterns of fur and beard on its head and body are detailed.

On Fig.8(b), the style transfer is active and does not break the shape of the cat while on Fig.7(b), the style mix the cat’s body and the environment.

4.2.3 Analysis of AWAST

We use images in Fig 11 of Ukiyo-e style to analyze the performance of AWAST. However, one image of them is coverd by black color. The experiment result when iterations is shown in Fig.11. The result of Gatys et al. in Fig.9(b) appears to have many black areas, which is attributed to taking the patterns of black. This is because the blending method of them is simply taking the average of multiple features. Our method in Fig.9(c) is steadier and more natural, because ASW based on PageRank algorithm weaks the weight of the unusual styles like black images and emphasizes the right styles in the other images.

(a) Reality (Golden Gate)
(b) Result of Gatys et al.
(c) Result of Ours
Figure 10: Performance analysis of AWAST compared to Gatys et al.
Figure 11: Images of Ukiyo-e style to analyze AWAST. There are lots of black areas in one of the images.
Figure 10: Performance analysis of AWAST compared to Gatys et al.

4.3 Test of man-made sketch

Most of the experiments above is based on the test set generated by SIE model. However, our general objective is to transform a real man-made sketch to an artistic painting. So we take an paper and a pencil and draw a cat sketch by hand, which is shown in Fig.11(a). This is quite easy actually. We simply take a photo with our mobile-phone and then feed it into the DIS and the AWAST model. The result is shown in Fig.12.

(a) Sketch
(b) Detailed
(c) Onmyoji
(d) Ukiyo-e
(e) Van Gogh
(f) Chinese
Figure 12: Test result of real freehand sketch of cat. (a) is the man-made sketch. (b) is the detailed image synthesized by DIS. (c)-(f) are artistic paintings.

5 Conclusions

To be an artist is always many people’s dream. In this paper, we propose Line Artist to synthesize appealing paintings with freehand sketch. To achieve this goal, we propose the SIE model to smooth the images and extract the sketch to build new datasets. The DIS model based on GAN allow the sketch drew by users to generate more details and looks natural. The AWAST model propose an novel algorithm based on PageRank to adaptively combining styles from multiple painting samples. Our results are prove to be vivid, artistic, and adapted to different styles stably. There are also some issues need to be improved. In the DIS and the AWAST model, the synthesis is not real-time. The quality of detailed images synthesized by DIS is also not so satisfactory because DIS is a process to generate more information with limited conditions. However, Line Artist still has the potential to become an powerful entertainment APP and assistant of artists.