I Introduction
Face sketch synthesis mainly refers to generating a sketch given one input photo and some face sketchphoto pairs as the training dataset. It has achieved wide applications in both digital entertainment and law enforcement [1]. For example, since limited information about the suspect is available due to the low quality of surveillance videos or even no video/image clues, a sketch drawn by the artist is usually taken as the substitute for suspect identification. Then, face sketch synthesis bridges the great texture discrepancy between face photos and sketches.
Exemplarbased face sketch synthesis generally proceeds in two steps: neighbor selection and reconstruction weight representation. Given an input test photo, it is divided into some patches with even size and adjacent patches have some overlap to guarantee the compatibility. Then for each test patch, some number (e.g. ) of nearest photo patches are selected from the training photos. Sketch patches corresponding to these nearest photo patches are taken as the candidates for sketch patch synthesis. The prevalent way to represent the target sketch patch is the linear combination of selected candidate sketch patches. The linear combination coefficients are usually calculated under the assumption that a photo patch and its corresponding sketch patch share similar geometric manifold structure, i.e. if two photo patches are similar, then their sketch patch counterparts are also similar.
Exemplarbased face sketch synthesis started from the work of Eigentransformation of Tang and Wang [2, 3]
. In their work, there is no special neighbor selection process but all training images are utilized. The linear combination coefficients are learned by projecting the input photo onto the training photos through principal component analysis.
Considering that only learning one holistic reconstruction model is difficult to represent the nonlinear mapping between face photos and sketches, Liu et al. [4]
proposed to estimate the holistic nonlinear mapping relationship with many piecewise linear mappings, which are generally followed in subsequent methods. This method works on the image patch level.
nearest photo patches are searched from the training set in terms of Euclidean distance. Then the reconstruction weight is calculated in the spirit of locally linear embedding [5]:(1) 
where
is the representation weight vector,
is the test photo patch in the form of column vector and is the matrix of columnconcatenation of selected training photo patches. The target sketch patch corresponding to the test photo patch is reconstructed from the linear combination of training sketch patches weighted by . Song et al. [6] casted the face sketch synthesis problem into a spatial sketch denoising (SSD) problem and calculated the reconstruction weight through conjugate gradient solver. Gao et al. [7] proposed to adaptively determine the number of nearest neighbors by sparse representation [8] rather than the fixed number (e.g. ) of nearest neighbors. Instead of using sparse representation for neighbor selection, some dictionaries are learned through sparse coding and sparse representation to substitute the role of nearest neighbors in the work [9].Wang et al. [10] employed Markov random field (MRF) to model the dependency from two aspects: the dependency between test photo patches and nearest photo patches and the dependency between adjacent synthesized sketch patches which are neglected in the above methods. In their method, nearest photo patches and their corresponding sketch patches are selected from the training dataset. Only one single nearest sketch patch is finally selected through MRF networks which is taken as the target synthesized sketch patch. In other words, the weight reconstruction representation for this method can be seemed as finding one most appropriate sketch patch and its weight is set to 1.
Zhou et al. [11] proposed to introduce the linear combination into the MRF model (namely Markov weight field, MWF) to overcome the face deformation problem due to single sketch patch search strategy in [10]. The difference between the MWF method and the LLE method [4] is the consideration of the dependency between adjacent synthesized sketch patches as follows:
(2)  
where the second term represents the dependency constraint between the synthesized sketch corresponding to the test photo patch and its four adjacent synthesized sketch patches. Here the constraint is modeled by the distance of pixel intensity vector extracted from the overlapping area between adjacent sketch patches. In equation (2), the column vector is the reconstruction weight corresponding to the th adjacent sketch, denotes the pixel intensity vector extracted from the overlapping area of adjacent sketch patch , and denotes the pixel intensity vector extracted from the overlapping area of current target sketch patch. Wang et al. [12] further developed the MWF model from the perspective of transductive learning. Peng et al. [13] extended the MWF model to a multiview version which improves the robustness against the cluttered background and lighting variations. Unlike the even patch employed in aforementioned methods, superpixel segmentation of image patches is employed and the reconstruction representation model as in equation (2) is adopted in the work [14].
All aforementioned methods perform nearest neighbor (NN) selection online, which heavily increases the time consuming for test. Moreover, with the increase of the scale of the database, the computation complexity would also increase linearly. In addition, the reconstruction weight representation model either in (1) or in (2) consider that all selected nearest neighbors contribute equally to the reconstruction weight computation process, while the distinct distance between these neighbors and the test patch are neglected.
In this paper, instead of online searching neighbors, we randomly sample some patches offline and then these patches are used to reconstruction the target sketch patch. This random sampling strategy greatly speeds up the synthesis process, which is much faster than NN based methods (e.g. the LLE method [4]) under the same experimental settings. In addition, stateoftheart methods consider that all selected neighbors contribute equally to the reconstruction weight computation process while the distinct similarity between the test patch and these neighbors are neglected. Since these random sampled patches have distinct similarities with the test photo patch, we impose the locality constraint [15]
to regularize their corresponding reconstruction weights. The locality constraint would restrain the contribution of patches which distribute far from the test patch and excite the contribution of patches which distribute around the test patch. Similar techniques appeared in image restoration tasks such as image superresolution
[16, 17] and image denoising [18]. To further accelerate the synthesis process, we employ principal component analysis (PCA) [19] to reduce the dimension of each patch vector. A graphical outline of the proposed random sampling with locality constraint for face sketch synthesis method (RSLCR) is shown in Fig. 1. In addition, we proposed a fast version of the proposed method by dropping out some random sampled patches, namely FastRSLCR.The contributions of this paper are twofold. Firstly, an offline random sampling strategy is employed to reduce the time consuming for online neighbor selection. In addition, the proposed strategy has stronger scalability than stateoftheart methods due to the fact that the timeconsuming does not depend on the scale of training dataset for our proposed strategy while not the case for other methods. We further imposed locality constraint to the reconstruction weight representation which takes the distinct similarities between the test patch and random sampled patches into consideration. This improves the quality of synthesized sketches. Secondly, both our proposed RSLCR method and its fast version FastRSLCR achieve superior performance than stateoftheart methods in terms of both synthesis performance and synthesis efficiency. Specially, our proposed FastRSLCR could synthesize a sketch using no more than 1.5 seconds on the Chinese University of Hong Kong (CUHK) face sketch FERET database (CUFSF) under the MATLAB environment, which is the fastest exemplarbased face sketch synthesis method.
In this paper, excepted when noted, a bold lowercase letter represents a column vector, a bold uppercase letter denotes a matrix and regular lowercase and uppercase letters denotes scalars. The rest of this paper is organized as follows. Section II introduces the proposed RSLCR method and FastRSLCR method. Experimental results and analysis are given in section III and section IV concludes this paper.
Ii Random Sampling for Face Sketch Synthesis
In this section, we would introduce how to sample ”neighbors” in an offline manner, i.e. random sampling training image patches, and then how to represent the test photo patch using these random sampled training photo patches, i.e. locality constraint (LCR) based weight representation model.
Iia Random Sampling Image Patches
Supposing there are pairs of training sketches and training photos which are geometrically aligned according to three points: two eye centers and the mouth center. Each image is cropped to the size of . We first divide these photos and sketches into some patches with even size. There is some overlapping (denoted as ) between adjacent patches. As shown in Fig. 1, each image is divided into patches where there are patches in each column and patches in each row (the patch size is set to 20 with pixels overlapped between adjacent patches). We reshape each image patch as a column vector. denotes the location of the patch at the th row and the th column, .
Our target is to generate clusters of photosketch patch pairs corresponding to the locations . The most intuitive way is to put patches located at the same position together. However, since images are aligned relying on only three points, there exist misalignments between test photos and training photos, which may result in mismatch during the reconstruction process. To alleviate the influence of misalignment, we enlarge the sampling area to allow more candidate patches bo be sampled as shown in Fig. 2. Let denote the search length and then there are patches in the search region. Therefore, for each location, we have pairs of patches for sampling. Let denote the number of random sampled patches. In our implementations, we employ the MATLAB function randperm() to sample training sketchphoto patch pairs. and denote the sampled training photo patches and sketch patches in th cluster respectively, .
Algorithm 1 Random Sampling Image Patches 

Input: , , , 
Step 1: According to the patch size and overlap size , compute 
, , and the positions of all patches in an image; 
Step 2: Within the search region of each patch position , 
, random sample pairs 
of photo patches and sketch patches in all 
training image pairs; 
Step 3: Compute the PCA projection matrix for each cluster 
of training photo patches and project the training photo 
patches to the subspace spanned by as in equation 
(3). 
Output: ,, , . 
In order to improve the computation efficiency, we employ PCA to reduce the dimension of training photo patches. 99% energy is preserved in the projection process. Let represent the projection matrix and is the reduced dimension. The training photo patches are projected onto the subspace spanned by the column vectors of :
(3) 
where is the newly projected training photo patches. For easy of notation, we still use to denote the projected training photo patches in the following text. Algorithm 1 summarizes the proposed random sampling method.
IiB Reconstruction Weight Representation
Given a test photo , it is divided into some patches according to the same way for training images, . These patches are projected to the respective subspace obtained in the training phase:
(4) 
where is the projected training photo patch and for easy of notation, we still use to represent the test photo patch. In order to take the correlations between different random sampled patches into considerations, we impose a weight to the distances of the test photo patch and random sampled photo patches. Then the reconstruction weight representation model is written as follows:
(5)  
where denotes the elementwise multiplication, is the weight representation for the test photo patch , balances the reconstruction error and the locality constraint, and is the Euclidean distance vector between the test photo patch and sampled training photo patches . It can be derived that the problem (5) has analytical solution:
(6)  
where is a column vector of all 1s and its dimension can be determined in the context. denotes the data covariance matrix and extends the vector into a diagonal matrix. The target sketch patch is generated from the linear combination of random sampled training sketches weight by the obtained representation vector :
(7) 
After obtaining all target sketch patches, they are arranged into a whole sketch with overlapping area averaged.
Since the computation complexity in equation (6) mainly depends on the number of random sampled patches, we could further accelerate the proposed RSLCR method by dropping out some random sampled patches in the training phase. Actually we have already computed the distance between the test photo patch and the random sampled training photo patches, we could drop out sampled patches whose distance to the test photo patch are larger. In other words, we could retain sampled patches whose distance to the test photo patch are among the first smallest. In comparison to equation (6), we only need to update the data matrix with its subset, i.e. and where idx stores the index to distances between the test photo patch and sampled training photo patches in an ascending order. We call this fast version as FastRSLCR. We summarize the proposed RSLCR and FastRSLCR algorithm in Algorithm 2.
Algorithm 2 RSLCR & FastRSLCR 

Input: , , , 
Step 1: According to the patch size and overlap size , divide 
into patches ,,; 
Step 2: For 
For 
Step 3 RSLCR: Compute with ; 
FastRSLCR: Compute with ; 
Step 4: Compute as in equation (6); 
Step 5: RSLCR: ; 
FastRSLCR: ; 
Step 6: Arrange all target sketch patches into a whole sketch with 
overlapping area averaged. 
Output: the target sketch . 
Iii Experimental Results and Analysis
Experimental results are conducted to illustrate the efficiency and effectiveness of the proposed RSLCR method and FastRSLCR method. Two public available database are used: the CUHK face sketch database (CUFS) [10] and the CUFSF database [20]. The CUFS database consists of face photos from three databases: the CUHK student database [21] (188 persons), the AR database [22] (123 persons) and the XM2VTS database [23] (295 persons). Persons in the XM2VTS database are different in ages, skins (races) and hair styles. The CUFSF database includes 1194 persons from the FERET database [24]. There are one face photos and one face sketch drawn by the artist for each person in both CUFS and CUFSF databases. Face photos in the CUFSF database are with lighting variation and sketches are with shape exaggeration. All these face photos and sketches are geometrically aligned relying on three points: two eye centers and the mouth center and they are cropped to the size of . Fig. 3 gives some examples from these two databases.
In the following context, we would first discuss the experimental settings (parameter settings) for our proposed method on the CUHK student database. Afterwards, under the experimental settings, we perform face sketch synthesis on the CUFS database and the CUFSF database to subjectively illustrate the superiority of the proposed RSLCR method and the FastRSLCR method compared with stateofthearts. Then time consumption has discussed. Subsequently, objective statistic experiments (objective image quality assessment and face recognition) are conducted to indirectly validate the superiority of proposed methods.
Iiia Discussion on Experimental Settings
We employ the CUHK student database [21] to perform parameter adjusting in this subsection. 88 pairs of face photosketch are taken as the training set and the rest 100 pairs of face photosketch are taken for validation. To objectively assess the quality of synthesized sketches under different experimental settings, structural similarity index metric (SSIM) [25] is adopted as the evaluation criterion. The 100 sketches drawn by the artist in the validation set are taken as the reference image and 100 photos in the validation set are taken as the test image for face sketch synthesis. Under each experimental setting, the average SSIM score of 100 synthesized sketches are taken as the final evaluation value.
There are five parameters (i.e. patch size , overlap size , search length , the number of random sampled patches and the tradeoff parameter ) for the proposed RSLCR method and one additional parameter (i.e. the number of selected neighbors) for FastRSLCR. All experiments are conducted using MATLAB R2015a on Windows 7 system with i74790 3.6G CPU. Fig. 4 presents the SSIM scores against different parameter settings.
IiiA1 Patch Size
We set patch size to 5, 10, 15, 20, 30 and 40 respectively and keep the overlap size as 70% of the patch size. It can be seen from Fig. 4 that both RSLCR and FastRSLCR achieves the highest SSIM score when the patch size is 20. When the patch size is 10, it has very close performance with the patch size 20. However, smaller patch size means more patches to be synthesized and hence it consumes much more time (78.84 seconds vs. 18.79 seconds for RSLCR and 21.07 seconds vs. 1.82 seconds for FastRSLCR).
IiiA2 Overlap Size
Given the patch size being 20, we set the overlap size to different values: 0, 2, 4, 6, 8, 10, 12, 14, 16 and 18. From Fig. 4 it can be seen that with the increase of the overlap size, SSIM scores for both RSLCR and FastRSLCR are also increasing. However, the timeconsuming is also growing rapidly. In our following experiments, we set the overlap size as the tradeoff value 14.
IiiA3 Search Length
The search length is used in the training stage and hence it does not affect the test phase. From Fig. 4
it can be seen that though the SSIM score of FastRSLCR always grows with the increase of the search length, the SSIM score of RSLCR grows at first and then goes down. This is because with the increase of the search length, some outliers may be sampled and these outliers would bring noise to the RSLCR method. However, FastRSLCR does not subject to these outliers because it selects the
nearest neighbors among random sampled patches which could filter these outliers. The search length is set to 5 in our experiments.IiiA4 Random Sampling
Fig. 4 presents the SSIM score corresponding to different number of random sampled training face photosketch patches. It can be seen that generally the SSIM score grows with the increase of random sampling number for the RSLCR method. However, it begins to decrease when the number is bigger than 800 for the FastRSLCR method. We set it to 800 in our experiments.
IiiA5 Regularization Parameter
It should be noted that when (refer to equation (5)), the locality constraint has no contribution to the reconstruction weight computation. Then equation (5) reduces to the LLE model in equation (1). The difference is that the entries of the data matrix for the LLE method [4] is selected through NN while they are random sampled for RSLCR and FastRSLCR. From Fig. 4, on the one hand, it can be seen that when , SSIM scores for RSLCR and FastRSLCR are 0.6250 and 0.6150 respectively compared with 0.5990 of the LLE method [4]. In addition, RSLCR and FastRSLCR run much faster than the LLE method (18.79 seconds for RSLCR, 1.82 seconds for FastRSLCR, and 536.34 seconds for LLE). This illustrates the effectiveness of the proposed random sampling strategy. On the other hand, when , RSLCR and FastRSLCR achieve the best performance among all values, which is much larger than SSIM socres for . This validates that the locality constraint does help to improve the performance. is set to 0.5 in this paper.
IiiA6 Number of Nearest Neighbors for FastRSLCR
NN is conducted in FastRSLCR to improve the computation efficiency. Fig. 4 presents the SSIM score against different number of nearest neighbors. Generally it grows with the increase of the number. To comprehensively comprise the time consuming and the SSIM socre, it is set to 200 in our experiments. From Fig. 4 it can be seen that directly random sampling 200 patches achieves an SSIM score of 0.6301 (at the time cost of 1.82s) while our proposed FastRSLCR (also use 200 sampled patches) could achieve an SSIM score of 0.6339 (at the time cost of 1.89s). It demonstrates the effectiveness of the proposed FastRSLCR method.
IiiA7 Random Sampling Searching Vs. Accelerated Nearest Neighbor Searching
In order to illustrate the effectiveness of the proposed the offline random sampling searching strategy in comparison to accelerated nearest neighbor (ANN) searching strategies, we utilize two kinds of ANN approaches for online neighbor searching: iterative quantization based localitysensitive hashing (ITQLSH) ^{1}^{1}1The MATLAB/C++ mixed source codes are download from the website: https://github.com/RSIALIESMARSWHU/LSHBOX [26] and the KDTree method ^{2}^{2}2We use the open source implementation (MATLAB/C++ mixed programming) of this method in VLFeat: http://www.vlfeat.org/ [27]
. We substitute the random sampling strategy in our proposed method with these two KNN searching method and they are denoted as ITQLSHLC and KDTreeLC respectively. Table
I gives the effect of the number of nearest neighbors on these two methods. It can be seen that when the number of neighbors is 20, these two methods achieve the best performance. Table II shows the comparison between the proposed methods with these two ANN based methods. Fig. 5 presents the synthesized sketches by four different methods. Table II and Fig. 5 illustrate that the random sampling strategy outperforms ITQLSH and KDTree in terms of both time consumption and image quality.Number of Neighbors  5  20  50  100  200  400  800 

KDTreeLC  60.72 (434.89)  61.27 (436.18)  61.13 (437.03)  60.72 (439.77)  60.36 (462.63)  60.34 (474.85)  60.50 (501.57) 
ITQLSHLC  58.80 (412.16)  59.30 (421.09)  58.96 (432.16)  58.58 (437.24)  58.53 (439.78)  58.65 (443.14)  58.76 (461.01) 
Method  KDTreeLC  ITQLSHLC  RSLCR  FastRSLCR 

SSIM  61.27  59.30  63.57  63.39 
Time  436.18  421.09  18.79  1.82 
IiiA8 Locality Constraint (LC)
Locality constraint is firstly proposed for image classification and it shows comparable performance with sparse constraint [15]. The nonlocal similarity constraint as a regularization term in [9] is also very similar with the locality constraint. Table III compares the performance of different face sketch synthesis methods with or without locality constraint. It can be found that locality constraint improves the performance of our proposed FastRSLCR method a lot and it also improves the LLE method and the RSLCR method. However, this is not the case for the MWF method. This is because the locality constraint is implicitly embedded in the neighboring constraint of the MWF model (see the second term of equation (2)). In this paper, locality constraint is utilized to distinguish the random sampled patches since these patches may be distributed in a scattered way, i.e. it is especially appropriate for random sampling based methods where the random sampled patches may distribute far away from each other.
Method  LLE  MWF  FastRSLCR  RSLCR 

With LC  61.00  62.31  63.39  63.57 
Without LC  59.97  62.31  61.42  62.47 
IiiB Face Sketch Synthesis
After the experimental setting for parameters, we set the patch size , overlap size , search length , the number for random sampling , the regularization parameter , the number of nearest neighbors for FastRSLCR . For the CUHK student database, 88 pairs of face photosketch are taken for training and the rest for testing (the data has been partitioned in this database). For the AR database, we randomly choose 80 pairs for training and the rest 43 pairs for testing. For the XM2VTS database, we randomly choose 100 pairs for training and the rest 195 pairs for testing. Six stateoftheart methods are compared: the FCN method [28], the GAN method [29]^{3}^{3}3 The source codes are availabe online: https://github.com/phillipi/pix2pix, the LLE method [4], the SSD method [6], the MRF method [10], and the MWF method [11]. All synthesized sketches by the SSD method and the MWF method are generated from the source codes provided by the authors. For the MRF method, we use the codes from the implementation provided by authors of SSD [6]^{4}^{4}4The source codes for both the MRF method and the SSD method are available online: http://www.cs.cityu.edu.hk/~yibisong/eccv14/index.html. Results of the LLE method and the FCN method are based on our implementations^{5}^{5}5Available online: http://www.ihitworld.com/RSLCR.html. On this project website, we also release the source codes of both our proposed methods and the evaluation codes (objective image quality assessment codes and face recognition codes).. The full list of synthesized sketches (both of our methods and all four compared methods) on these two databases is available on our project website.
Fig. 6 presents some synthesized face sketches from different methods on the CUFS database. It can be seen that the proposed RSLCR method and the FastRSLCR method could generate fine textures (e.g. hair region) and structures (e.g. glasses). This is because more candidate patches in our proposed methods are effectively incorporated through random sampling and locality constraint. Synthesized sketches on photos from the XM2VTS database generated by SSD, MRF, LLE, and MWF are less satisfying than photos from the CUHK student database and the AR database due to the fact that there are more variations such as aging, race, and hair styles on faces of the XM2VTS database. However, RSLCR and FastRSLCR achieve much better performance than these four comared methods and afford comparable performance on face photos from three different databases. This illustrates the robustness of the proposed methods.
We have also investigated the robustness of the proposed methods against shape exaggeration and illumination variations on the CUFSF database. We randomly choose 250 face photosketch pairs for training and the rest 944 pairs for test. Fig. 7 shows the synthesized results on this database by various methods. It is shown that there are some deformations on synthesized sketches by SSD and MRF, specially for the mouth area. In addition, our proposed FastRSLCR and RSLCR method could handle glasses with reflect light well while other methods cannot (see the third row of Fig. 7).
IiiC Time Consumption
Given the experimental settings in IIIB, we count the time consumption for the proposed methods. Table IV compares the time consuming for different methods on different databases. There are 88, 80, 100, and 250 training photosketch pairs for the CUHK student, AR, XM2VTS and FERET database respectively. It can be seen from the table that time consumptions for SSD, MRF, LLE, and MWF are proportional to the scale of the training set because these methods search neighbors by traversing the whole training dataset. However, our proposed RSLCR and FastRSLCR method costs comparable time on these four databases. This validates the stronger scalability of the proposed RSLCR framework. Moreover, it can be seen that RSLCR has comparable or even less time consuming compared with stateoftheart methods. Our proposed FastRSLCR is the most efficient method among all methods. It requires less than 1.5 seconds to synthesize a sketch on the CUHK FERET database, which is dozens of times faster than stateoftheart methods.
Methods  SSD  MRF  LLE  MWF  RSLCR  FastRSLCR 

Programming language  C++  C++  MATLAB  C++  MATLAB  MATLAB 
CUHK Student  4.50  8.60  536.34  16.10  18.79  1.82 
AR  4.10  8.40  496.47  15.33  19.10  1.73 
XM2VTS  5.10  10.4  642.50  18.80  18.14  2.36 
CUHK FERET  11.60  24.25  1591.95  45.20  17.66  1.44 
IiiD Objective Image Quality Assessment
We utilize SSIM to evaluate the quality of synthesized sketches by different methods on CUFS and CUFSF. There are 338 () and 944 synthesized sketches for each method generated from the CUFS database and CUFSF database respectively. Fig. 8 gives the statistics of SSIM scores on these two databases respectively. The horizontal axis labels the SSIM score from 0 to 1. The vertical axis means the percentage of synthesized sketch whose SSIM scores are not smaller than the score marked on the horizontal axis. Table V presents the average SSIM score on the CUFS and CUFSF database respectively.
It can be seen from Fig. 8 and table V that both FastRSLCR and RSLCR outperform four other stateoftheart methods. Comparable performance is achieved for SSD and MWF on the CUFS database but SSD outperforms MWF on the CUFSF database which illustrates SSD could handle faces with illumination variations better than the MWF method.
Methods  FCN  GAN  LLE  SSD  MRF  MWF  FastRSLCR  RSLCR 

CUFS (%)  96.49 (131)  93.48 (139)  91.12 (148)  90.24 (149)  87.29 (149)  92.13 (149)  98.35 (121)  98.38 (133) 
CUFSF (%)  69.80 (237)  71.44 (164)  61.76 (274)  70.92 (266)  46.03 (223)  74.15 (299)  73.41 (287)  75.94 (296) 
IiiE Face Sketch Recognition
Sketch based face recognition is always used to assist law enforcement. The sketch drawn by the artist is generally taken as the probe image and synthesized sketches play the role of images in the gallery. Nullspace linear discriminant analysis (NLDA) [30]
is employed to conduct the face recognition experiments. For the CUFS database, we randomly choose 150 synthesized sketches and corresponding groundtruth sketches drawn by the artist to train the classifier. The rest 188 sketches consists of the gallery. For the CUFSF database, we randomly choose 300 synthesized sketches and corresponding groundtruth sketches for training and the rest 644 synthesized sketches consist of the gallery. We repeat each face recognition experiment 20 times by randomly partition the data.
Fig. 9 gives the face recognition accuracy against variations of the number of reduced dimensions by NLDA on the CUFS database and CUFSF database respectively. Table VI presents the best face recognition accuracy at some dimension (the number in bracket). It can be seen that on the CUFS database, the proposed two methods outperform stateoftheart methods a lot and on the more challenging CUFSF database, our proposed RSLCR method also obtain the best performance with an accuracy of 75.94%. The FastRSLCR method has comparable performance with MWF. As shown in table V, although SSD achieves higher SSIM score than MWF, it has lower face recognition accuracy than MWF. This is because though SSD could clear face sketches (much less noise than MWF) it generates face deformations (e.g. mouth area as shown in Fig. 6 and Fig. 7).
Iv Conclusion
In this paper, we presented a simple yet effective framework for face sketch synthesis based on random sampling and locality constraint. Random sampling in the offline stage could speed up the synthesis process since there is no need to search neighbors online as done in existing methods. The locality constraint could guarantee that similar sampled photo patches have similar reconstruction weights which is neglected in existing works. Through experiments including subjective (perception on the quality of synthesized sketches) and objective (image quality assessment and face recognition) evaluations illustrate the effectiveness of the proposed methods. In addition, discussion on time consumption demonstrates that the proposed FastRSLCR method is most efficient method in comparison to stateoftheart methods. In the future, we would further improve the robustness of our proposed methods by incorporating more robust features. The application of RSLCR framework on related fields is another mission on the schedule.
References

[1]
N. Wang, D. Tao, X. Gao, and X. Li, “A comprehensive survey to face
hallucination,”
International Journal of Computer Vision
, vol. 31, no. 1, pp. 9–30, 2014.  [2] X. Tang and X. Wang, “Face sketch recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, pp. 1–7, 2004.
 [3] ——, “Face sketch synthesis and recognition,” in Proceedings of IEEE International Conference on Computer Vision, 2003, pp. 687–694.

[4]
Q. Liu, X. Tang, H. Jin, H. Lu, and S. Ma, “A nonlinear approach for face
sketch synthesis and recognition,” in
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
, 2005, pp. 1005–1010.  [5] S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
 [6] Y. Song, L. Bao, Q. Yang, and M. Yang, “Realtime exemplarbased face sketch synthesis,” in Proceedings of Eureopean Conference on Computer Vision, 2014, pp. 800–813.
 [7] X. Gao, N. Wang, D. Tao, and X. Li, “Face sketchphoto synthesis and retrieval using sparse representation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 8, pp. 1213–1226, 2012.
 [8] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.
 [9] S. Wang, L. Zhang, Y. Liang, and Q. Pan, “Semicoupled dictionary learning with applications to image superresolution and photosketch synthesis,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2216–2223.
 [10] X. Wang and X. Tang, “Face photosketch synthesis and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 11, pp. 1955–1967, 2009.
 [11] H. Zhou, Z. Kuang, and K. Wong, “Markov weight fields for face sketch synthesis,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 1091–1097.

[12]
N. Wang, D. Tao, X. Gao, X. Li, and J. Li, “Transductive face sketchphoto
synthesis,”
IEEE Transactions on Neural Networks and Learning Systems
, vol. 24, no. 9, pp. 1–13, 2013.  [13] C. Peng, X. Gao, N. Wang, D. Tao, X. Li, and J. Li, “Multiple representationsbased face sketchphoto synthesis,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 11, pp. 2201–2215, 2016.
 [14] C. Peng, X. Gao, N. Wang, and J. Li, “Superpixelbased face sketchphoto synthesis,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–12, DOI: 10.1109/TCSVT.2015.2502861, 2016.
 [15] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Localityconstrained linear coding for image classification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 3360–3367.
 [16] H. Chang, D. Yeung, and Y. Xiong, “Superresolution through neighbor embedding,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2004, pp. 1–8.
 [17] J. Yang, Z. Lin, and S. Cohen, “Fast image superresolution based on inplace example regression,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1059–1066.
 [18] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm minimization with application to image denoising,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2862–2869.
 [19] I. Jolliffe, Principal component analysis. New York: Springer, 2002.
 [20] W. Zhang, X. Wang, and X. Tang, “Coupled informationtheoretic encoding for face photosketch recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 513–520.
 [21] X. Tang and X. Wang, “Face photo recognition using sketch,” in Proceedings of IEEE International Conference on Image Processing, 2002, pp. 257–260.
 [22] A. Martinez and R. Benavente, “The AR face database,” CVC, Barcelona, Spain, Tech. Rep. 24, Jun. 1998.
 [23] K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, “XM2VTSDB: the extended M2VTS database,” in Proceedings of the International Conference on Audio and VideoBased Biometric Person Authentication, Apr. 1999, pp. 72–77.
 [24] P. Phillips, H. Moon, P. Rauss, and S. Rizvi, “The feret evaluation methodology for face recognition algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1090–1104, 2000.
 [25] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.

[26]
Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “Iterative quantization: a procrustean approach to learning binary codes for largescale image retrieval,”
IEEE Transactions on Pattern Analsyisi and Machine Intelligence, vol. 35, no. 12, pp. 2916–2929, 2013. 
[27]
M. Muja and D. Lowe, “Scalable nearest neighbor algorithms for high dimensional data,”
IEEE Transactions on Pattern Analsyisi and Machine Intelligence, vol. 36, no. 11, pp. 2227–2240, 2014.  [28] L. Zhang, L. Lin, X. Wu, S. Ding, and L. Zhang, “Endtoend photosketch generation via fully convolutional representation learning,” in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015, pp. 627–634.
 [29] P. Isola, J. Zhu, T. Zhou, and A. Efros, “Imagetoimage translation with conditional adversarial networks,” arXiv preprint arXiv: 1611.07004, Tech. Rep., 2016.
 [30] L. Chen, H. Liao, and M. Ko, “A new ldabased face recognition system which can solve the small sample size problem,” Pattern Recognition, vol. 33, no. 10, pp. 1713–1726, 2000.
Comments
There are no comments yet.