Fast Preprocessing for Robust Face Sketch Synthesis

08/01/2017 ∙ by Yibing Song, et al. ∙ 0

Exemplar-based face sketch synthesis methods usually meet the challenging problem that input photos are captured in different lighting conditions from training photos. The critical step causing the failure is the search of similar patch candidates for an input photo patch. Conventional illumination invariant patch distances are adopted rather than directly relying on pixel intensity difference, but they will fail when local contrast within a patch changes. In this paper, we propose a fast preprocessing method named Bidirectional Luminance Remapping (BLR), which interactively adjust the lighting of training and input photos. Our method can be directly integrated into state-of-the-art exemplar-based methods to improve their robustness with ignorable computational cost.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Exemplar-based face sketch synthesis has received much attention in recent years ranging from digital entertainment to law enforcement [Liu et al.2007, Wang et al.2012, Wang et al.2014, Zhang et al.2015b, Peng et al.2016b, Peng et al.2016a, Wang et al.2017]. Typically these methods usually consist of two steps. In the first step, all photos (including a given input photo and all training photos) are divided into local patches, and a K-NN patch search is performed among all training photos for each input photo patch. The second step is to merge the corresponding sketch patches (according to the photo patch search results) into an output sketch image via global optimization [Wang and Tang2009, Zhang et al.2010, Zhou et al.2012, Wang et al.2013, Zhang et al.2015a] or local fusion [Song et al.2014]. However, these methods usually fail when input photos are captured differently from training photos which only contain faces in normal lighting. The critical step causing the failure is the search of similar patch candidates for a given input photo patch.

(a) Photo (b) MRF (c) MWF (d) SSD
(e) Photo (f) MRF (g) MWF (h) SSD
(i) Artist (j) Ours+MRF (k) Ours+MWF (l) Ours+SSD
Figure 1: An example of varying lighting conditions. An input photo in (a) is captured in the same condition with training photos. The sketches generated by state-of-the-art MRF, MWF and SSD methods are in (b)-(d). (e) is a synthesized input photo in a different lighting and background condition. (f)-(h) are the results generated by these methods. Our method can be integrated into existing methods to improve the output quality as shown in (j)-(l).

Most state-of-the-art methods (e.g., MRF [Wang and Tang2009], MWF [Zhou et al.2012], and SSD [Song et al.2014]) adopt either or norm based on pixel luminance differences during photo patch search. They perform well on ideal cases where both input and training photos are captured in the same lighting condition. However, for input photos which are captured in different lighting conditions from training photos, these distance metrics often cause incorrect matchings of photo patches and thus lead to erroneous sketch synthesis. Fig. 1 shows an example. A direct amending to these methods is to replace the metrics of the pixel luminance difference with illumination invariant ones based on gradient (like DoG [Zhang et al.2010]) or correlation (like NCC [Szeliski2010]). However, illumination invariant patch distances will fail when local contrast within a patch changes. For example, if the background is brighter than facial skin in input photos while the background is darker than facial skin in training photos, the photo patches near face boundaries are difficult to locate training correspondences (e.g., the left ear region in Fig. 4(e)). Meanwhile, illumination invariant methods [Han et al.2010, Xie et al.2008]

for face recognition are not suitable for face sketch synthesis. They only focus on face region where hair and background are not included.

To enable similar statistics of the face and non-face regions between input and training photos, we propose a novel method, namely Bidirectional Luminance Remapping (BLR), to interactively adjust the lighting of both input and training photos. First, the BLR method adapt the lighting of the input photo according to training photos, then it utilizes the offline pre-computed alpha matte information of training photos to recompose them according to the adapted input photo. The advantage of BLR is that it formulate online foreground/background segmentation into offline alpha matting, which enables efficient and accurate patch search. It can be integrated into existing face sketch synthesis methods with ignorable computational cost.

2 Proposed Algorithm

In this section, we present the details of how BLR handles lighting variations. Meanwhile, we describe the details for how to integrate BLR into existing methods.

2.1 Bidirectional Luminance Remapping (BLR)

When an input photo with a different lighting from the training photos is given, a straightforward solution is to perform a global linear luminance remapping (LR) on the input photo to make it contain the same luminance statistics (e.g.

, mean and variance) with those in training photos

[Hertzmann et al.2001, Zhang et al.2010]. However, the global mapping scheme is not applicable for many cases (e.g., when the background has different intensities), and thus, the result is erroneous shown in Fig. 2(b).

We now present BLR to make the luminance statistics of the face and non-face regions individually consistent between input and training photos. Each photo consists of the face and non-face regions and the remapping algorithm is performed in two steps. First, we perform a global linear luminance remapping on the input photo according to training photos. Note that this global remapping is only based on the luminance in the face region. It is computed regardless of the non-face region in the training photos, and the non-face region of the input photo can be remapped to arbitrary luminance. In the second step, we adjust the luminance of non-face region in each training photo (using offline pre-computed alpha matte) to make the overall statistics of training photos consistent with those of the input photo obtained in the first step. In this way, the luminance statistics of the face and non-face regions are adjusted similar between input and training photos.

(a) Photo (b) LR+MRF (c) ULR+MRF (d) BLR+MRF
Figure 2: Improvement with BLR intergration. MRF sketch synthesis method is used in this example. (a) is a challenging input photo captured in dark lighting condition and textured background. (b) is the result with luminance remapping. (c) is the result of the first step of BLR (Sec. 2.1.1). (d) is the result with BLR.

2.1.1 Luminance Remapping on Input Photo

We perform luminance remapping on the input photo to enable it contains similar luminance statistics of the face region with those of training photos. The face region is approximately obtained using facial landmarks. We denote x as the input photo, as the adapted photo (where and are two scalars), and y as all the training photos. We denote , and as the mean of the face photo(s) x, X and y, respectively. We denote , and

as the corresponding standard deviations of

x, X and y, respectively. Our remapping transforms the luminance statistics of the input photo as:

(1)
(2)

The remapping parameters and are computed based on the face region between input and training photos. We denote and as the face region in the adapted photo X and the training photos y, respectively. We set and to enable similar luminance statistics on the face region between input and training photos. As a result, parameters and are computed as follows:

(3)
(4)

We use parameters and to adjust the input photo while not altering training photos at present.

2.1.2 Luminance Remapping on Training Photos

After conducting luminance remapping in Sec. 2.1.1, we are confident that the luminance statistics of the face region in adapted input photo X are similar with those of the training photos. The remaining problem resides in the boundary between the face and non-face regions, which may lead to incorrect patch search and thus the erroneous boundary occurs in the results shown in Fig. 2(f). We decompose each training photo into portrait image, non-portrait image and alpha map using matting algorithm [Levin et al.2008] with manually labeled trimap. The portrait image contains the whole human portrait region while the non-portrait image contains the background region. The non-portrait image is used to approximate the non-face region and the matting operation is done offline. We keep the portrait image fixed and a luminance remapping on the non-portrait image is performed to enable the overall statistics of the training photos similar to those of the adapted input photo obtained in Sec. 2.1.1.

We denote , and as the portrait images, non-portrait images and alpha maps in the training images. So training images y can be written as

(5)

We denote Y as the adapted training images with luminance remapped non-portrait region, then

Y (6)

where and are parameters to adjust the non-portrait regions. We denote , , and . The adapted training images Y can be written as:

(7)

and its mean can be computed as:

(8)

We denote as the mean operator on photos Y. So we compute the variance of Y as:

(9)

where corresponds to the covariance between x and y.

We set and to enable the luminance statistics of adapted input photo similar with the adapted training photos. The parameters and can be computed by solving the above two quadratic equations.

In practice, we notice that parameter is normally small, and thus we can approximate Eq. (6) by

Y (10)

Then we have

(11)
(12)

Parameter can then be computed by solving the quadratic equation in Eq. (12) and then used to solve for parameter in Eq. (11

). There will be two possible solutions for the linear transform. To attenuate noise, we choose the positive

value that minimizes parameter :

(13)
(14)

After obtaining parameters and we perform remapping on the non-portrait images. Then we recompose training photos using adapted non-portrait image, portrait image, and alpha mat. As a result, we enable similar luminance statistics of face and non-face regions between input and training photos. The photo patch search has been accurate for existing face sketch synthesis methods to synthesize sketches.

2.2 Practical Issues

2.2.1 Side Lighting

In practice, side lighting may occur in input photos. We use Contrast Limited Adaptive Histogram Equalization (CLAHE) [Pizer et al.1987] to reduce the effect but find that shadows may still exist around facial components. Then we remap shadow region under the guidance of its symmetric normal lighting region on the face. Specifically, we use landmarks to divide the whole face region into two symmetric parts, i.e, shadow region, and normal lighting region. For each patch in the shadow region, we perform a -NN search in the normal lighting region around the corresponding symmetric position using normalized cross correlation (NCC). Then we remap the luminance of pixels in the shadow region using gamma correction. We denote as a patch centered at a pixel in the shadow region and as the most similar patch centered at in the normal lighting region. The gamma correction can be written as:

(15)

where is the luminance of . and are the mean luminance of patch and , respectively.

2.2.2 Pose Variance

In addition to lighting problem, the patch appearance distortion due to pose variations also degrades the quality of selected patch. We precompute the average position of each facial landmark from all training photos to generate a triangulated face template. Given an input photo, we detect its landmarks and compute the local affine transform. Through this transform, input photo is warped to a pose corrected photo, and -NN patch search is then performed between the pose corrected photo and training photos. After sketch synthesis, we warp it back to the original pose.

2.2.3 Implementation Details

In our implementation, we precompute the facial landmarks, portrait, and non-portrait images, alpha mattes for all training photos in advance. Given an input photo, we first detect facial landmarks using the algorithm in [Kazemi and Sullivan2014] and perform local affine transform illustrated in Sec 2.2.2 to warp the input photo into a pose corrected one for further processing. The landmark detection and local affine transform can be conducted in real-time. Second, side lighting is handled as described in Sec 2.2.1. Then BLR is applied to adapt both input and training photos. After BLR, preprocessed input and training photos can be adopted by existing face sketch synthesis algorithms to synthesize sketch images. Finally, the sketch image is mapped back using local affine transform to yield the final sketch result.

MRF
MWF
SSD
RMRF
Figure 3: Quantitative evaluation on synthetic CUHK dataset. We use and to adjust the foreground and background lightings of input photos, correspondingly. Our integration improves the robustness of MRF, MWF and SSD regarding to different lightings. It performs favorably against luminance remapping integration and original RMRF method.

3 Experiments

We conduct experiments using state-of-the-art face sketch synthesis methods including MRF [Wang and Tang2009], RMRF [Zhang et al.2010], MWF [Zhou et al.2012] and SSD [Song et al.2014]. The focus is to demonstrate the improvement after integrating BLR into existing methods. The experiments are conducted on the benchmarks including CUHK [Wang and Tang2009], AR [Aleix and Robert1998], and FERET datasets [Zhang et al.2011]. The number of photo-sketch pairs for CUHK, AR and FERET are 188, 123 and 1165, respectively. The photos in these three datasets are captured in frontal view and neutral expression. In CUHK dataset the lighting condition is similar for all the photos. In AR dataset the lighting condition is also similar among the photos. However, the lighting condition of CUHK dataset is different from that of AR dataset. For FERET the lighting varies in different photos within this dataset. In addition, we also conduct experiments on the CUHK side lighting and pose variation datasets, which belong to CUHK dataset.

3.1 Varying Lighting Conditions

3.1.1 Synthetic Experiments

We first carry out quantitative and qualitative evaluations for synthetic lighting conditions. The synthetic evaluations are conducted on modified CUHK dataset. We split the CUHK dataset into 88 training photo-sketch pairs and 100 input pairs and then generate synthetic input photos in varying lightings as follows. We use matting algorithm [Levin et al.2008] to divide each input photo into foreground, background and alpha matte images. Then we separately adjust the luminance of foreground and background using two scalars (i.e., and ). The luminance values of all foreground pixels are multiplied by and those of background are multiplied by . Then we combine adjusted foreground and background images with alpha matte to generate synthetic input photos.

We compare our method with baseline luminance remapping (LR) [Hertzmann et al.2001] preprocessing. Note that RMRF is specially designed for improving the robustness of MRF, we compare with RMRF when evaluating the improvement on MRF. In addition, RMRF can be treated as an independent method which can be integrated with our algorithm. So we first evaluate the original performance of MRF, MWF, SSD and RMRF. Then we compare the improvements of the LR and our integration of these methods.

Quantitative evaluation of face sketch synthesis methods can be conducted through face sketch recognition as suggested in [Wang and Tang2009]. For each input photo, the synthesized sketch should be matched to the corresponding sketch drawn by the artist. If an algorithm achieves higher sketch recognition rates, it suggests that this method is more robust to synthesize sketches. Fig. 3 shows the performance of quantitative evaluation. The foreground of input photos is adjusted by three values of , i.e., 0.5, 1.0, and 1.5. These values simulate the dark, normal, and bright foreground of input photos, respectively. For each we adjust from 0.5 to 1.5 incremented by 0.1, which simulates the varying background lightings. The results show that MRF, MWF, and SSD are not robust to synthesize sketches from photos captured in different lightings. Due to its global normalization scheme, LR preprocessing cannot robustly handle all lighting conditions. Our algorithm can consistently improve the performance of existing methods. Compared with RMRF, our algorithm is more robust against extreme cases (the first row of Fig. 3). Moreover, our algorithm can be integrated with RMRF to improve its robustness (the last row of Fig. 3).

(a) Photo (b) MRF (c) MWF (d) SSD
(e) RMRF (f) LR+MRF (g) LR+MWF (h) LR+SSD
(i)Ours+RMRF (j)Ours+MRF (k)Ours+MWF (l)Ours+SSD
Figure 4: An example of synthetic lighting experiments. (a) is the input photo consists of dark foreground and bright background. (b)-(e) are the results of existing methods. (f)-(h) are the results of improved existing methods with luminance remapping integration. (i)-(l) are the results of improved existing methods with our integration.

Fig. 4 shows one example of the visual comparison for the synthetic evaluation. The input photo consists of dark foreground and bright background. As the foreground differs from training photos patch candidates can not be correctly matched, which results in blurry and artifacts as shown in (b)-(d). LR based on global luminance statistics fails to correct the lightings and thus produces erroneous results as shown in (f)-(h). In comparison, BLR adapts both input and training photos to enable more accurate patch search in the face and non-face regions. As a result, the accuracy of -NN patch searching is improved and the obtained sketch results achieve ideal performance as shown in (i)-(l). Meanwhile, the local contrast within photo patch is reduced through our integration and thus the result in (i) is improved around face boundary.

3.1.2 Cross-Dataset Experiments

We notice that CUHK and AR datasets are captured in different lightings. Thus we evaluate the robustness of BLR using CUHK as training and AR as input and vice versa. Fig. 5 shows the visual comparison where BLR can consistently improve existing methods. Although ethnic facial difference exists between two datasets, BLR can still robustify sketch synthesis of existing methods.

3.1.3 Real Lighting Experiments

We conduct an evaluation of BLR on FERET dataset. Different from the previous two datasets FERET contains photos captured in real world varying lighting conditions. We randomly select 100 photo-sketch pairs as training and use the remaining 1065 pairs as input. Fig. 6 shows one example of the visual evaluation. The lighting is different in both foreground and background regions, which leads to artifacts on the synthesized sketches of existing methods. Through our integration, the statistics of the face and non face regions are adjusted similarly among input and training photos. It enables existing methods to robustify sketch synthesis.

(a) Photo (b) MRF (c) MWF (d) SSD
(e) RMRF (f) LR+MRF (g) LR+MWF (h) LR+SSD
(i)Ours+RMRF (j)Ours+MRF (k)Ours+MWF (l)Ours+SSD
Figure 5: An example of cross-dataset experiments (CUHK as training while AR as input). (a) is an input photo. (b)-(l) are with the same meaning as Fig. 4.
(a) Photo (b) MRF (c) MWF (d) SSD
(e) RMRF (f) LR+MRF (g) LR+MWF (h) LR+SSD
(i)Ours+RMRF (j)Ours+MRF (k)Ours+MWF (l)Ours+SSD
Figure 6: An example of experiments on FERET dataset. (a) is an input photo. (b)-(l) are with the same meaning as Fig. 4.

3.1.4 Side Lighting Experiments

We conduct experiments on CUHK side lighting dataset [Zhang et al.2010] which contains two different types of side lighting (dark left / dark right) photos for each subject. As the input photo contains shadows in the facial region shown in Fig. 7, existing methods cannot find correctly matched photo patches around these shadow regions. It leads to blur and artifacts shown in (b)-(d). In comparison, Our method can locally adjust input photo to make an improvement.

3.2 Varying Poses

We perform experiments on CUHK pose variation dataset [Zhang et al.2010] where subjects are in varying poses. Note that some methods [Song et al.2014, Zhou et al.2012] tend to increase search range for handling varying poses. Thus we also compare BLR with existing methods using extended search range. Fig. 8 shows an example of the visual evaluation result. Our algorithm favorably improves the robustness of existing methods.

(a) Photo (b) MRF (c) MWF (d) SSD
(e) RMRF (f) LR+MRF (g) LR+MWF (h) LR+SSD
(i)Ours+RMRF (j)Ours+MRF (k)Ours+MWF (l)Ours+SSD
Figure 7: An example of side lighting experiments. (a) is an input photo. (b)-(l) are with the same meaning as Fig. 4.
(a) Photo (b) MRF (c) MWF (d) SSD
(e) RMRF-ext (f) MRF-ext (g) MWF-ext (h) SSD-ext
(i)Ours+RMRF (j)Ours+MRF (k)Ours+MWF (l)Ours+SSD
Figure 8: An example of varying pose experiments. (a) is an input photo. (b)-(d) are the synthesized sketches. (e)-(h) are the results synthesized with extended search range. (i)-(l) are the sketches synthesized with our integration.
MRF   MWF   RMRF   SSD
Original   38.4   35.6   88.2   4.5
Original (ext.)   94.5   93.8   252.3   13.4
Original + Ours   38.6   35.9   93.5   4.7
With extended search range (see Sec. 3.2).
Table 1: Runtime (seconds) for a CUHK input image.

3.3 Computational Cost

Table 1 shows the runtime of existing methods to process a CUHK input image (obtained from a 3.4GHz Intel i7 CPU). It shows that the additional computation cost brought by BLR is ignorable compared with the original time cost of existing methods. Note that the reason why RMRF needs more additional computational cost is that we need to extract features online of the recomposed training photos.

4 Concluding Remarks

We propose BLR, which interactively adjusts the lighting of training and input photos. It moves online face image segmentation to offline using human supervised alpha matting. The experiments demonstrate that BLR improves the robustness of existing methods with ignorable computational cost.

References

  • [Aleix and Robert1998] Martínez Aleix and Benavente Robert. The ar face database. Technical Report CVC Tech Report 24, Purdue University, 1998.
  • [Han et al.2010] Hu Han, Shiguang Shan, Laiyun Qing, Xilin Chen, and Wen Gao. Lighting aware preprocessing for face recognition across varying illumination. In

    European Conference on Computer Vision

    , 2010.
  • [Hertzmann et al.2001] Aaron Hertzmann, Charles E Jacobs, Nuria Oliver, Brian Curless, and David H Salesin. Image analogies. ACM Transactions on Graphics (SIGGRAPH), 2001.
  • [Kazemi and Sullivan2014] Vahid Kazemi and Josephine Sullivan. One millisecond face alignment with an ensemble of regression trees. In

    IEEE Conference on Computer Vision and Pattern Recognition

    , 2014.
  • [Levin et al.2008] Anat Levin, Dani Lischinski, and Yair Weiss. A closed-form solution to natural image matting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
  • [Liu et al.2007] Wei Liu, Xiaoou Tang, and Jianzhuang Liu.

    Bayesian tensor inference for sketch-based facial photo hallucination.

    In

    International Joint Conference on Artificial Intelligence

    , 2007.
  • [Peng et al.2016a] Chunlei Peng, Xinbo Gao, Nannan Wang, and Jie Li. Graphical representation for heterogeneous face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016.
  • [Peng et al.2016b] Chunlei Peng, Nannan Wang, Xinbo Gao, and Jie Li. Face recognition from multiple stylistic sketches: Scenarios, datasets, and evaluation. In European Conference on Computer Vision Workshop, 2016.
  • [Pizer et al.1987] Stephen M Pizer, E Philip Amburn, John D Austin, Robert Cromartie, Ari Geselowitz, Trey Greer, Bart ter Haar Romeny, John B Zimmerman, and Karel Zuiderveld. Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing, 1987.
  • [Song et al.2014] Yibing Song, Linchao Bao, Qingxiong Yang, and Ming-Hsuan Yang. Real-time exemplar-based face sketch synthesis. In European Conference on Computer Vision, 2014.
  • [Szeliski2010] Richard Szeliski. Computer vision: algorithms and applications. Springer, 2010.
  • [Wang and Tang2009] Xiaogang Wang and Xiaoou Tang. Face photo-sketch synthesis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
  • [Wang et al.2012] Shenlong Wang, Lei Zhang, Yan Liang, and Quan Pan.

    Semi-coupled dictionary learning with applications in image super-resolution and photo-sketch synthesis.

    In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
  • [Wang et al.2013] Nannan Wang, Dacheng Tao, Xinbo Gao, Xuelong Li, and Jie Li. Transductive face sketch-photo synthesis.

    IEEE Transactions on Neural Networks and Learning Systems

    , 2013.
  • [Wang et al.2014] Nannan Wang, Dacheng Tao, Xinbo Gao, Xuelong Li, and Jie Li. A comprehensive survey to face hallucination. International Journal of Computer Vision, 2014.
  • [Wang et al.2017] Nannan Wang, Xinbo Gao, Leiyu Sun, and Jie Li. Bayesian face sketch synthesis. IEEE Transactions on Image Processing, 2017.
  • [Xie et al.2008] Xiaohua Xie, Wei-Shi Zheng, Jianhuang Lai, and Pong C. Yuen. Face illumination normalization on large and small scale features. In IEEE Conference on Computer Vision and Pattern Recognition, 2008.
  • [Zhang et al.2010] Wei Zhang, Xiaogang Wang, and Xiaoou Tang. Lighting and pose robust face sketch synthesis. In European Conference on Computer Vision, 2010.
  • [Zhang et al.2011] Wei Zhang, Xiaogang Wang, and Xiaoou Tang. Coupled information-theoretic encoding for face photo-sketch recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2011.
  • [Zhang et al.2015a] Shengchuan Zhang, Xinbo Gao, Nannan Wang, and Jie Li. Face sketch synthesis from a single photo-sketch pair. IEEE Transactions on Circuits and Systems for Video Technology, 2015.
  • [Zhang et al.2015b] Shengchuan Zhang, Xinbo Gao, Nannan Wang, Jie Li, and Mingjin Zhang. Face sketch synthesis via sparse representation-based greedy search. IEEE Transactions on Image Processing, 2015.
  • [Zhou et al.2012] Hao Zhou, Zhanghui Kuang, and Kenneth Wong. Markov weight fields for face sketch synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.