Facial makeup transfer is a new application requirement of virtual reality technology in the image. How to see the virtual makeup effect on the image is the need of many young women. Facial makeup is a technique that changes the appearance with special toiletries such as compact, setting powder, and moisturizer. Under many circumstances, particularly for females, makeup is deemed as a necessary practice to beautify appearance. Emulsions are often used to alter the facial skin detail. Compacts are primarily used to hide defects and overlay the initial facial skin detail. Setting powder often satisfies detail for the skin. Except that, other colour makeup, such as eyeliner and shadow, is applied to the upper layer of the setting powder.
The ever-developing makeup technology now extends to different women facial types, different scenes, different ages, different skin, and even different costumes with different makeup [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. The choice of makeup naturally creates a personal experience but greatly consumes time and damages women’s skin.
Our method based on the technical application of facial makeup transfer completely considered all of the above circumstances. As shown in FIGURE 1, with the image prototype (FIGURE (a)a) as the input image, with the pattern example (FIGURE (b)b) as the reference image, our method could successfully transfer the reference image makeup to the input image to generate output image (FIGURE (c)c).
Ii Related Work
In 2007, Tong et al.  of the Hong Kong University proposed a facial-to-facial makeup transfer method based on a quotient image. Using the quotient image from a pair of images of the identical person applying and removing makeup as reference images to transfer the reference makeup to the input facial image. Their presented method could be divided into four steps, firstly removing the eyebrows and eyelashes of the input image to prepare for the eye makeup transfer. Then filling resulting holes using texture-synthesis, thus to extract inherent skin features of the input image. Manually specifying the point correspondence between the facial image and the facial model containing 84 landmark points to prepare for facial deformation. Secondly, the reference facial image is deformed according to the input image. Thirdly, the output is multiplied by the input image to achieve the makeup transfer, where the makeup of the same facial before and after is used to indicate the change of the makeup. Finally, eye makeup requires additional processing, which is generally more complicated, and the color is changeable.
In 2009, Guo et al.  of the Singapore National University proposed a simpler method, not for reference image before facial makeup but for an reference image after facial makeup. The method first performs facial alignment between the input facial image and the reference facial image. Since the information is transferred from pixel to pixel, it needs to be fully aligned before transfering, and then layer is decomposed by the Edge Preserving Smooth Filter. The input image and the reference image are resolved into the following three layers: facial structure layer, facial color layer, and facial detail layer. The information for each layer of the reference image is transferred to the homologous layer of the input image differently: the facial detail layer is direct transferred; the facial color layer is transferred in alpha hybrid mode. The three composite layers are combined to obtain the resulting image.
In 2015, Li et al.  of Zhejiang University proposed a facial image makeup editing method based on intrinsic images. The method uses the intrinsic image decomposition method to directly decompose the input facial image into the illumination layer and the reflectance layer, and then edits the makeup information of the facial image in the reflectivity layer, rather than need reference image, and finally decomposes the previous image. The illumination and shadow layers are combined to obtain a makeup editing effect.
In 2016, Liu et al. 
of NVIDIA Research designed a new deep convolutional neural network for makeup transfer, which not only could transfer makeup, eye shadow, lip makeup, but also recommend the most suitable input image’s makeup. The network consists of two consecutive steps. The first step is to use the FCN network to parse the facial and resolve different parts, which are distinguished by different colors. The input image and the facial decomposition image of the input image, and the reference image and the facial decomposition image of the reference image are used as input of the makeup transfer network. According to the characteristics of the facial makeup, eye shadow and lip makeup are processed by different loss functions, and the three are integrated. And adding a part of the retained facial image of the input image to get the final result image.
In 2018, Chang et al.  of Princeton University in the United States proposed the PairedCycleGAN network for transfering the facial makeup of the reference image to the input image. The main idea is to train the generation network and the authentication network to transfer a specific makeup style. Chang et al.  trained three generators separately, focusing the network capacity and resolution on the unique features of each region. For each pair of images before and after makeup, firstly apply a facial analysis algorithm to segment each facial component, such as eyes, eyebrows, lips, nose, and etc. Finally each component is separately calculated and recombined.
Iii Facial Makeup Transfer
Our method uses the input image which applies facial makeup image and the reference image which provides the makeup example style as input, and the result is the output image which retains the facial structure of while applying the makeup style from . The notation we used is enumerated in TABLE I.
|Reference image (after warping)|
|Facial structure layer|
|Facial detail layer|
|CIELAB facial color layer a|
|CIELAB facial color layer b|
|Weight controlling the degree of blending and in|
|Weight controlling the illumination transfer and in|
|Image pixel point|
|Skin region of the facial image|
The complete pipeline is shown in FIGURE 2. Before the pipeline begins, we need to perform whitening and smoothing pretreatment onto the input image as a small optimization.The pipeline mainly has the following four steps. Firstly, facial alignment has to be done between the input facial image and the reference facial image. Since the information is transferred from pixel to pixel, it needs to be perfectly aligned before the makeup transfer. We use a modified Active Structure Search Algorithm to find the corresponding 90 feature points and affine transformation to distort the reference image into the input image .Secondly, followed by layer decomposition. Both and are resolved into the following three layers: facial structure layer, facial color layer, and facial detail layer. Thirdly, the information from per layer of is transferred to the related layer of
in their own way: facial detail is transferred directly; facial color is transferred through alpha blending; facial illumination of the facial structure layer is transferred with specific algorithm. And three composite layers are ultimate combined. Fourthly, we use facial parsing to judge facial label probability of each pixel and then retain the components of the input image and the components of the initial makeup in different probability to fuse into the final makeup.
Iii-a Whitening and Smoothing
On the one hand, we use the OpenCV Color Balance Algorithm to achieve facial whitening. Color balance global adjustments image dominant colors including red, green and blue. The whole process is briefly described below: firstly initializing image each pixel brightness area (i.e. highlights, mid-tones, shadows), nextly adjusting each brightness area corresponding variable parameters with color balance coefficient, then figuring out image red, green, blue channel value used for adjusting image color, finally balancing the whole image color based on red, green, blue channel value. On the other hand, we use the OpenCV Bilateral Filtering Algorithm  to achieve facial smoothing. Bilateral filtering performed in the CIELAB color space is the most natural type of filtering for color images: only perceptually similar colors are averaged together, and only perceptually important edges are preserved while eliminating noise. The basic idea underlying bilateral filtering is not only considers the influence of the position on the central pixel, but also considers the similarity degree between the pixel and the central pixel in the convolution kernel, and generates two different weights according to the similarity degree between the position influence and the pixel value. Consider the two weights when computing center pixels, and realize bilateral low-pass filtering.
Iii-B Facial Alignment
For facial alignment, we firstly use the modified Active Shape Model (ASM) of Milborrow et al.  to obtain the facial feature points and then use the affine transformation algorithm to warp the reference image into the input image . Due to the variety of appearances in the underside of various possible makeup, our facial feature points landmark software needs to obtain more precise facial feature points in an automatic and manual manner. Our examples of a total of 90 landmark points on the facial are shown in FIGURE 3.
Iii-C Layer Decomposition
The facial is segmented according to the components distribution of each pixel. As shown in FIGURE 4, we utilize facial parsing of Liu et al.  to define different facial components to obtain components label of per pixel, including hair, eyebrows, eyes, nose, lips, mouth, facial skin and background.
As shown in FIGURE 5, we use the above 90 landmark feature points including the input image and reference image to warp the reference image to input image for facial alignment.
We parse the input facial image and select 11 sorts of labels which seldom cover all the facial components. Then we tint 11 facial component labels to get the facial hard mask. Next we segment facial into different regions with facial hard mask, guiding different makeup transfer operations onto facial regions.
We choose CIELAB color space to decompose the input image and the reference image (after warping) into facial structure layer, facial color layer (i.e. CIELAB color channels a, b channel), and facial detail layer. The CIELAB color space of Lukac et al.  performs better than other color spaces in terms of separation brightness and approximates the perceptual unity of Wood-land et al. .
Secondly, according to the approach of Eisemann et al. , Zhang et al. , and the Weighted Least Squares (WLS) presented by Farbman et al. , we perform edge-preserving smoothing filter on the luminance layer to extract the facial structure layer , then subtracted from the luminosity layer to obtain a facial detail layer .
Iii-D Layer Transfer
We define the facial detail layer , i.e.
We define the facial color layer as the alpha-blending of the CIELAB color channels and of and , i.e.
where is the mixing weight that controls the two color channels, is the image pixel point, is the skin region of the facial image, and means the image pixel point belonging to facial skin region.
We define the facial structure of as
Iii-E Illumination Transfer
We define the following formula to achieve illumination transfer:
where as the illumination transfer parameter between input facial structure and reference facial structure, is the image pixel point, is the skin region of the facial image, and refers to the image pixel point belonging to the facial skin region.
Iv Experiments and Results
Iv-a Data Collection
For our makeup transfer experiments, in order to achieve better results, we collect two separate high-resolution datasets, one containing before-makeup faces with nude makeup or very light makeup and another one containing faces with a large variety of facial makeup styles. To this end, we collect our own datasets from major websites. We manually identify whether each facial image is indeed a before-makeup or with-makeup face with eyes open and without occlusions. By this way, we harvest a before-makeup dataset of 526 images and a with-makeup dataset of 878 images. Our datasets contain a wide variety of facial makeup styles.
Iv-B Efficient Makeup Transfer
Comparison results between us and Guo et al.  are shown in FIGURE 7. On the one hand, Guo et al.  method assume the illumination in the reference image is uniform, but it is not necessary to be the same as the input image. If any shadow or specularity exists, they would also be transferred to the input image. To solve this problem, we introduce illumination transfer to detect and remove shadow or specularity; our results are shown in FIGURE 7.
On the other hand, Guo et al.  method does not work well for black and dark makeup. In their result, the dark regions appears gray and unnatural. The black color is the foundation in physical makeup; but their method only transfers the detail introduced by foundation. The black color is interpreted as no color in CIELAB color space; the illumination of black color is especially important to human perception. But the illumination is not transferred in their method. Thus, the dark color in their result appears gray. We solve the problem through the way that adding illumination transfer with user control coefficient in the degree of illumination transfer, and our results are shown in FIGURE 7.
Iv-C Effective Makeup Transfer
Comparison results between us and Chang et al. , as shown in FIGURE 8. As we have compared in the above results, the method of the Chang et al.  could only transfer the makeup of the eyes and lips, whereas could not transfer the makeup of the skin part, but our method is not only transfer the eyes and lips makeup, but also transfer the skin part makeup, which is equivalent to a combination of both. Except that, their makeup method for fine hair could not be effectively treated, but our method could overcome it.
Other comparison results between us and Liu et al. . As shown in FIGURE 7, our makeup result works better than Liu et al.  method. As we could see, our method could transfer facial skin detail of the reference image, thus conduct to form new detail, while Liu et al  method could not do that. Furthermore, our method that combine makeup and relighting could handle the reference image with eye black and dark makeup rather than Liu et al. .
Last but not least important, the time and space complexity of our method is lower than Liu et al. .As shown in TABLE II, the running time for beautify makeup is within 2 seconds on an iPhone6 for a pair of color image with our method. For Liu et al. IJCAI 2016 , it needs to take 6 seconds on a TITAN X GPU for a pair of color image.
|Liu et al. IJCAI 2016 ||image pair using TITAN X GPU||6s|
|Our method||image pair using iPhone6||2s|
Iv-D Air-Bangs Makeup Transfer
So far, there is still no good way to deal with makeup transfer with reference examples of air-bangs in the traditional computer vision fields and deep learning fields. Since these methods rely on extremely accurate facial feature landmark without any exception, so as to generate a natural facial mask. As for the reference examples in real life are very diverse, these methods could not make hair and skin very naturally segregate, resulting in the problem that the hair of the reference examples is also transferred together. In order to solve such a tough circumstance, we have further improved our method above, successfully solving the problem of hair and skin boundary in makeup transfer, as shown inFIGURE 9.
The main process as follow, firstly we conduct facial whitening and smoothing and use facial parsing of Liu et al.  to acquire the hard mask of the input image with air-bangs, then we utilize the previous method to generate the initial makeup, in which process we could notice that the hair makeup of the reference image also transfer to the input image unexpected. Followed, we need to convert the hard mask into soft mask which could judge the facial components in terms of probability. Combined the soft mask of the input image, we could make the input image preserve four facial components: eyes, mouth, air-bangs, and the background parts. At the same time, we make the initial makeup preserve four facial components: skin, eyebrows, nose, and lips parts. Thirdly, we fuse the pixels of the input image’s facial retention component and the initial makeup result’s facial retention component with different probabilities. Finally, we combine the above fusion results to generate the final makeup.
Iv-E Quantitative Comparison
The quantitative comparison mainly focuses on the quality of makeup transfer and the degree of harmony. On the one hand, we conduct 100 makeup transfer experiments and compare our results with Guo et al. , Neural Style , and Liu et al. . Each time, a 7-tuple, i.e., a input facial mages, a reference facial image, the result facial images by our method and above methods, are sent to 20 participants to compare. Note that the four result facial images are shown in random order. The participants rate the results into five degrees:“much better”, “better”, “same”, “worse”, and “much worse”. The percentages of each degree are shown in TABLE III. Our method is much better than Guo in 23.6% cases. We are much better than NerualStyle-CC and NerualStyle-CS in 90.1% and 92.3% cases. And We are much better than Liu in 32.9% cases.
|Methods||much better||better||same||worse||much worse|
|Guo et al. ||23.6%||66.7%||25.2%||10.2%||0.93%|
|Liu et al. ||32.9%||28.4%||5.13%||0.33%||0.14%|
On the other hand, we conduct a user study on Amazon Mechanical Turk making a pairwise comparison among results of the method of Chang et al. and of our method. We randomly select 102 input facial mages and reference facial image, so we have 102 groups of makeup transfer results to compare. Then we ask 10 or more subjects to select which result better matches the makeup style in the reference. On average 87.3% of people prefer our results over those of Chang et al..
In this paper, we propose a novel makeup transfer method that adapts to most of sample images. The main innovations are as follows: firstly, in the makeup transfer process, we conduct the illumination transfer in the facial structure with our special algorithm; secondly, we expand the makeup to air-bangs circumstances. The major advantages of our method are efficient, effective, and could handle the reference image with air-bangs.
Since the reference images only require skin detail and color information to beautify the appearance, the facial structure of input image is no longer needed, helping to protect the privacy of the makeup actor. We apply the latest and most fashionable makeup examples to our system so that users could apply virtual makeup to their faces in real time according to individual needs, just like a tailor-made personal beauty salon.
As we dilate above, our approach has the following three advantages:
Black or dark and white makeup could be effectively transferred by introducing illumination transfer;
Efficiently transfer makeup within seconds compared to those makeup methods based on deep learning framework;
Examples with the air-bangs could makeup transfer perfectly.
We thank all the editors. reviewers and Prof. Yebin Liu for his advices. This work is partially supported by the National Natural Science Foundation of China (Grant Nos. 61772047, 61772513), the Open Project Program of State Key Laboratory of Virtual Reality Technology and Systems, Beihang University (No. VRLAB2019C03), the Open Funds of CETC Big Data Research Institute Co.,Ltd., (Grant No. W-2018022), the Science and Technology Project of the State Archives Administrator (Grant No. 2015-B-10), and the Fundamental Research Funds for the Central Universities (Grant Nos. 328201803, 328201801). Parts of this paper have previously appeared in our previous work. This is the extended journal version of the conference paper: X. Li, R. Han, N. Ning, X. Zhang and X. Jin. Efficient and Effective Face Makeup Transfer. The 4th International Symposium on Artificial Intelligence and Robotics (ISAIR), Daegu, Korea, 20-24 August, 2019.
-  Wai-Shun Tong, Chi-Keung Tang, Michael S. Brown, and Ying-Qing Xu. reference-based cosmetic transfer. In Proceedings of the Pacific Conference on Computer Graphics and Applications, Pacific Graphics 2007, Maui, Hawaii, USA, October 29 - November 2, 2007, pages 211–218, 2007.
Dong Guo and Terence Sim.
Digital facial makeup by reference.
2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 73–79, 2009.
-  Chen Li, Kun Zhou, and Stephen Lin. Simulating makeup through physics-based manipulation of intrinsic image layers. In IEEE Conference on Computer Vision and Pattern Recognition,CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 4621–4629, 2015.
-  Si Liu, Xinyu Ou, Ruihe Qian, Wei Wang, and Xiaochun Cao. Makeup like a superstar: Deep localized makeup transfer network. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, pages 2568–2575. AAAI Press, 2016.
-  Huiwen Chang, Jingwan Lu, Fisher Yu, and Adam Finkelstein. PairedCycleGAN: Asymmetric style transfer for applying and removing makeup. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 40–48, 2018.
-  Sifei Liu, Jimei Yang, Chang Huang, and Ming-Hsuan Yang. Multi-objective convolutional learning for facial labeling. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 3451–3459, 2015.
-  Rastislav Lukac, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos. Color image processing. Computer Vision and Image Understanding, 107(1-2):1–2, 2007.
-  Alan Woodland and Frédéric Labrosse. On the Separation of Luminance from Colour in Images. In Mike Chantler, editor, Vision, Video, and Graphics (2005). The Eurographics Association, 2005.
-  Elmar Eisemann and Frédo Durand. Flash photography enhancement via intrinsic relighting. ACM Trans. Graph., 23(3):673–678, 2004.
-  Xiaopeng Zhang, Terence Sim, and Xiaoping Miao. Enhancing photographs with near infra-red images. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 24-26 June 2008, Anchorage, Alaska,USA, 2008.
-  Zeev Farbman, Raanan Fattal, Dani Lischinski, and Richard Szeliski. Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM Trans. Graph., 27(3):67:1–67:10, 2008.
-  Carlo Tomasi and Roberto Manduchi. Bilateral filtering for gray and color images. In ICCV, pages 839–846, 1998.
-  Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. A neural algorithm of artistic style. CoRR, abs/1508.06576, 2015.
-  Stephen Milborrow and Fred Nicolls. Locating facial features with an extended active structure model. In Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part IV, pages 504–513, 2008.
-  Xiaowu Chen, Mengmeng Chen, Xin Jin, and Qinping Zhao. facial illumination transfer through edge-preserving filters. In The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011, pages 281–287, 2011.
-  Xiaowu Chen, Hongyu Wu, Xin Jin, and Qinping Zhao. facial illumination manipulation using a single reference image by adaptive layer decomposition. IEEE Trans. Image Processing, 22(11):4249–4259, 2013.
-  Xin Jin, Mingtian Zhao, Xiaowu Chen, Qinping Zhao, and Song Chun Zhu. Learning artistic lighting template from portrait photographs. In Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings,Part IV, pages 101–114, 2010.
-  Xiaowu Chen, Xin Jin, Qinping Zhao, and Hongyu Wu. Artistic illumination transfer for portraits. Comput. Graph. Forum, 31(4):1425–1434, 2012.
-  Xiaowu Chen, Xin Jin, Hongyu Wu, and Qinping Zhao. Learning templates for artistic portrait lighting analysis. IEEE Trans. Image Processing, 24(2):608–618, 2015.
Xiaowu Chen, Ke Wang, and Xin Jin.
Single image based illumination estimation for lighting virtual object in real scene.In 12th International Conference on Computer-Aided Design and Computer Graphics, CAD/Graphics 2011, Jinan, China, September 15-17, 2011,pages 450–455, 2011.
-  X. Jin, Y. Tian, N. Liu, C. Ye, J. Chi, X. Li, and G. Zhao. Object image relighting through patch match warping and color transfer. In 2016 International Conference on Virtual Reality and Visualization (ICVRV), pages 235–241, Sep. 2016.
Xiaowu Chen, Xin Jin, and Ke Wang.
Lighting virtual objects in a single image via coarse scene understanding.SCIENCE CHINA information Sciences, 57(9):1–14, 2014.
-  Xin Jin, Yannan Li, Ningning Liu, Xiaodong Li, Quan Zhou, Yulu Tian, and Shiming Ge. Scene Relighting Using a Single Reference Image Through Material Constrained Layer Decomposition, pages 37–44. Artificial Intelligence and Robotics, 01, 2018.
-  Xin Jin, Yannan Li, Ningning Liu, Xiaodong Li, Xianggang Jiang, Chaoen Xiao,and Shiming Ge. Single reference image based scene relighting via material guided filtering. CoRR, abs/1708.07066, 2017.
-  Quan Zhou, Wenbin Yang, Guangwei Gao, Weihua Ou, Huimin Lu, Jie Chen, and Longin Jan Latecki. Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web, 22(2):555–570, 2019.
-  Quan Zhou, Jie Cheng, Huimin Lu, Yawen Fan, Suofei Zhang, Xiaofu Wu, Baoyu Zheng, Weihua Ou, and Longin Jan Latecki. Learning adaptive contrast combinations for visual saliency detection. Multimedia Tools and Applications, Nov 2018.
-  Quan Zhou, Cheng Zhang, Wenbin Yu, Yawen Fan, Hu Zhu, Xiaofu Wu, Weihua Ou,Wei-Ping Zhu, and Longin Jan Latecki. facial recognition via fast dense correspondence. Multimedia Tools Appl., 77(9):10501–10519, 2018.
-  Quan Zhou, Baoyu Zheng, Wei-Ping Zhu, and Longin Jan Latecki. Multi-scale context for scene labeling via flexible segmentation graph. Pattern Recognition, 59:312–324, 2016.
-  Seiichi Serikawa and Huimin Lu. Underwater image dehazing using joint trilateral filter. Computers & Electrical Engineering, 40(1):41–50, 2014.
-  Huimin Lu, Yujie Li, Shenglin Mu, Dong Wang, Hyoungseop Kim, and Seiichi Serikawa. IEEE Internet of Things Journal, 5(4):2315–2322, 2018.
-  Huimin Lu, Yujie Li, Min Chen, Hyoungseop Kim, and Seiichi Serikawa. Brain intelligence: Go beyond artificial intelligence. MONET, 23(2):368–375, 2018.
-  Huimin Lu, Dong Wang, Yujie Li, Jianru Li, Xin Li, Hyoungseop Kim, Seiichi Serikawa, and Iztok Humar. Conet: A cognitive ocean network. CoRR, abs/1901.06253, 2019.
-  Huimin Lu, Yujie Li, Tomoki Uemura, Hyoungseop Kim, and Seiichi Serikawa. Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Generation Comp. Syst., 82:142–148, 2018.