Facial Image Deformation Based on Landmark Detection

10/30/2019 ∙ by Chaoyue Song, et al. ∙ 20

In this work, we use facial landmarks to make the deformation for facial images more authentic and verisimilar. The deformation includes the expansion for eyes and the shrinking for noses, mouths, and cheeks. An advanced 106-point facial landmark detector is utilized to provide control points for deformation. Bilinear interpolation is used in the expansion part and Moving Least Squares methods (MLS) including Affine Deformation, Similarity Deformation and Rigid Deformation are used in the shrinking part. We then compare the running time as well as the quality of deformed images using different MLS methods. The experimental results show that the Rigid Deformation which can keep other parts of the images unchanged performs best even if it takes the longest time.



There are no comments yet.


page 1

page 2

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image deformation, as one of the most popular topics in the area of computer vision and image processing, has been discussed for many years. Recently, the emergence of artificial intelligence has enabled new techniques in image deformation, and has achieved impressive achievements especially in some specific scenarios. The deformation aimed for faces is one of the most popular areas in academe as well as industry. With more and more people pursuing beauty, verisimilar deformed facial images and an automatic process to generate these images are required to meet these people’s need. Accurate deformation methods are continuously in great demand. Before the emergence of artificial intelligence, common manipulations (e.g. expansion, shrinking and blurring) on facial features have no differences with that on other objects. This is because those manipulations omit special characters of facial features. Machine learning methods are able to extract facial landmarks from facial images, providing possibilities of warping based on these landmarks, thereby providing more accurate results and making the warped image more authentic. There are amounts of techniques in operating deformations onto facial images and the fundamental operation is image warping, especially

Figure 1: An example of our result. From top left to bottom right: original image, image with eye expansion, image with nose, mouth, and cheek shrinking, image with both expansion and shrinking.

warping with control points.

Expansion and shrinking are two preferred operations on facial features. Expansion is often used in deformations onto eyes and shrinking is often used on noses and mouths. For an automatic facial deformation process, it is intuitive that more landmarks are detected accurately and more authentic deformed image will be obtained. Dlib111http://dlib.net/ can provide a 68-point facial landmark detection. Nevertheless, that is not enough. For example, there are too few points located near the nose which makes it difficult to perform deformations. Thus, in order to perform deformations with more accuracy, we need a landmark detector that provides more control points.

The key in the deformation stage is to find an accurate mapping function to map one reference point or several adjacent points if required to the wanted point, based on several control points. In addition, we may need to choose different mapping functions according to the demands of different deformation parts.

In this work, we trained an advanced 106-point facial landmarks detector based on the method proposed by Face++[16], which can provide enough control points for the deformation. Then we implemented the expansion part based on bilinear interpolation whose degree of expansion is adjustable. Shrinking is achieved by Moving Least Squares algorithm(MLS)[7] which including Affine Deformation, Similarity Deformation and Rigid Deformation. The experimental result shows that our method which combines facial landmark detection and image deformation together can provide authentic deformed facial images.

2 Related Work

Facial Landmark detection

The research on facial landmark detection can trace back to 1995 when an Active Shape Model (ASM) [3] was proposed. ASM is based on Point Distribution Model and it was improved into Active Appearance Models [2]

which consists of Shape Model and Texture Model. With the development of deep learning, convolutional neural network is used in facial detection for the first time

[10]. In [10], a deep convolutional neural network (DCNN) is proposed. Later, Face++ [16] improved the accuracy in DCNN and can detect and localize more landmarks with higher accuracy. After that, TCDCN (Tasks-Constrained Deep Convolutional Network) [15], MTCNN (Multi-task Cascaded Convolutional Networks) [14], TCNN(Tweaked Convolutional Neural Networks) [13], DAN(Deep Alignment Networks) [5] came up and performed better and better. Recently, the work [6], [9], and [12] have proposed more methods with higher accuracy and robustness.

Image Warping

Image warping is a transformation which maps all positions in one image plane to positions in a second plane [4]. There are three forms of warping which are translation warping, scaling warping and rotation warping in [11]. The common ground for image warping is that a set of handles (also named as control points) is required. However, these methods do not consider the features of the image. In [1], a feature-based image deformation method is proposed to solve these problems. And an image deformation method based on linear Moving Least Squares was proposed in [7] to meet the smoothness, interpolation and identity demands for image deformation. Such deformation has the property that the amount of local scaling and shearing is minimized. Later, an image warping method based on artificial intelligence was proposed in [8] and techniques in artificial intelligence are used more widely in image warping.

3 Method

3.1 Facial Landmark Extraction

Accurate facial landmark extraction is the prerequisite of a successful facial image deformation. The model we use in this work is proposed in [16]. We improve the original 68-point facial landmark detector provided by Dlib to a 106-point one. As shown in Figure 2, 33 landmarks are for cheeks, 18 landmarks are for eyes, 18 landmarks are for eyebrows, 15 landmarks are for the nose and 20 landmarks are for the mouth.

Figure 2: Facial landmark detection. Left: 68-point facial landmark detection. Right: 106-point facial landmark detection.

3.2 Expansion

Without generality, we just take the manipulation on the left eye as an example for explanation. The manipulation on the right eye is exactly symmetric to the manipulation on the left eye. There are two types of eye landmarks. One is the center landmark , and the other is landmarks that draw the outline of the eye. Before doing expansion, we have to first determine the control points, thereby making sure the area that needs to be adjusted, which is named as the deformation area in this paper. The center point and the landmark at the corner of eye , are used to determine the boundary of the deformation area. There are two schemes to determine . One is just regarding the center landmark as , and the other is regarding the midpoint between the landmark for the outer canthus and the landmark for the inner canthus as . These two schemes are illustrated in Figure 3.

Figure 3: Two schemes for eye expansion. Left: Use the center landmark as . Right: Use the middle point between and as . The four orange points whose outlines are yellow are the four pixels used in bilinear interpolation. The intervals between pixels are exaggerated.

The deformation area is a circle, whose center is and radius . Thus, the pixel within the deformation area satisfies . For each pixel in the deformation area, there is a corresponding reference pixel . To find the corresponding reference pixel, we define a parameter which is used to determine the expansion scale for each pixel . Then the expansion scale is expressed as


The reference pixel’s position is


We use bilinear interpolation to compute the value of , which is expressed as . Then


where and ’s possible values are and .

3.3 Shrinking

The method we use in the shrinking part is Moving Least Squares Deformation(MLS) [7].

3.3.1 Background: MLS

MLS views the deformation as a function that maps all pixels in the undeformed image to pixels in the deformed image. It needs to apply the function to each point in the undeformed image. In [7], the authors consider building image deformations based on collections of points with which the user controls the deformation and this is natural in facial images. Let be a set of control points and be the deformed positions of the control points.

For a point in the image, the best affine transformation can be solved by minimizing


where and

are row vectors and the weights

have the form


Therefore, a different transformation for each can be obtained.

Next, the deformation function can be defined to be . There are three kinds of functions that will induct different deformation effects which are Affine Deformation, Similarity Deformation and Rigid Deformation. The mapping function for Affine Deformation is


The mapping function for the Similarity Deformation is


where and depends only on the , and which can be precomputed. is


The mapping function for the Rigid Deformation is given by


where . Because the rigid deformation was proposed to make the deformation be as rigid as possible, it performs best in our task.

3.3.2 Implementation Details of Shrinking

In detail, the control points are exactly the landmarks for noses (15 points), mouths (13 points) and cheeks (21 points). In this paper, we achieve the global adjustment by controlling the moving vector for each control point.

Before applying the mapping function, we have to make sure if the facial image is in the right direction. We decide this by checking if the vector from the center landmark of the left eye to the center landmark of right eye is horizontal. To compare, we define the horizontal vector is . The angle decides the direction of . can be computed after knowing and the moving distance . As shown in Figure 4, for control points on the left side of the face, . For control points on the right side of the face, the situation is opposite and . Keep the control points on the axes still which means .

Figure 4: The shift of control points. Left: The face is horizontal. Right: The face is tilted and the moving vectors of the control points on the left side of the face have the same direction with

4 Experimental Results

The experiments mainly focus on showing the performance of different methods in the expansion part and the shrinking part. Two center point determination methods in the expansion part and three MLS methods in the shrinking part will be discussed.

4.1 Expansion Results

In general, the expansion operation is to make eyes bigger. Thus we set in this experiment to testify the expansion effect. However, we can also set to a negative value to make eyes smaller. We also regard this as a part of expansion process and in this experiment, we set for another set of results. We use the proposed two methods for center point determination. Three characteristic facial images are tested and the results are shown in Figure 5. The three images from left to right depict people looking straightforward (normal), looking sideways (the pupil is not in the middle), and showing only the side face, in sequence. In the first row, the center point of the eye is the center eye landmark. In the second row, the center point is the midpoint between and . For each set of images in each row, the first is the original image, the second is the deformed image when , and the third is the deformed image when .

As shown in Figure 5, the first set and the second set of images reveal nearly similar effects using different center point determination methods. However, two methods obtain different results when the input contains only the side face. The method using the midpoint of inner canthus and outer canthus as the center point is proposed to fight against the latent distortion that may caused by the difference between the pupil and the real center point. But in some cases, the first method could also have a good performance as shown by the middle images of 5, when the pupil is different from the real center point, the expansion effect using the first method is better than the effect of the second effect, which is beyond our expectation.

The reason for this result is that the line between the inner conthus and the outer canthus is often lower than the pupil. Thus, the center point is not accurate then the deformation method will also be inaccurate. However, in practice, the second method can obtain better effects for some specific images. Thus, the choice on methods depends on specific images in the expansion part.

Figure 5: The results of the eye expansion process. Images in the first row are deformed images using the center eye landmark as the center point. Images in the second row are deformed images using the midpoint between and as the center point.

4.2 Shrinking Results

We compare results and the run-times of three methods: Affine Deformation, Similarity Deformation and Rigid Deformation in this subsection. We then evaluate how will the weight influence deformation results in Rigid MLS deformation.

Deformation results comparison.

As shown in Figure 6, for images in each row, the first one is the original image, the second is the deformed image using Affine MLS, the third is the deformed image using Similarity MLS and the fourth is the deformation image using Rigid MLS. Affine MLS and Similarity MLS can both obtain quite satisfying results on the face, but inevitably induct other unstabilizing factors. For example, some distortion appears in the area close to the boundary of the deformed image. Rigid MLS can achieve a satisfying result which keeps the part we don’t want to deform unchanged. However, because of the fact that these methods are all based on facial landmarks, a new problem is raised. Will results be effected if the face is not symmetric, especially when only the side face is revealed? One may think there will be distortion on the figures. However, when the figures are clear and the face does not turn a lot, the nose, the mouth and the cheeks can be shrunken successfully and little distortion can be perceived, and it can be proved by the second row in Figure 6.

Figure 6: Comparison of three deformation methods. From left to right: origin, Affine MLS, Similarity MLS, Rigid MLS
Run-time Comparison.

Three methods have different run-times. As shown in Table 1, Similarity MLS and Rigid MLS’s run-time is longer than Affine MLS’s, but they are still edurable and welcome because they have better preformance on deformation results.

Figure 6
Figure 6
Figure 6
Affine MLS 0.49s 0.53s 0.64s
Similarity MLS 0.88s 0.89s 1.06s
Rigid MLS 0.90s 0.89s 1.06s
Table 1: Deformation times for the various methods.
Results with Different Weight.

In this paper, we use the reciprocal of the distance from to the control point as the weight. The factor that affects the weight is which be can known in Equation 6. To compare the effects of different , we try for the same image. The different results are shown in Figure 7.

As we can see, the deformed image is almost the same as the original image when is small. However, the extent of deformation is too much when is large. Therefore, we choose the most ideal situation when . Obviously, the deformation is exactly appropriate under such condition which can be shown by the third image in 7.

Figure 7: Comparison of rigid MLS with different weight. From left to right: origin, , ,

5 Conclusion

In this paper, we describe a complete pipeline for facial image deformation based on facial landmark detection. The image deformation consists of two parts which are the expansion part, based on bilinear interpolation and the shrinking part, based on Moving Least Squares methods. During the experiments, we notice that the center points of eyes influence the expansion effect a lot. In addition, we compare the quality as well as the running time of deformed images using different MLS methods and then explore how will the weight in Rigid Deformation influence the results of the shrinking process. Based on the experimental results, we find that the methods in both parts have a good performance. However, we believe there are still substantial room for improvement. In future works we plan to explore more advanced methods and reduce the impact of inaccurate detection of facial landmarks.


  • [1] T. Beier and S. Neely (1992) Feature-based image metamorphosis. Computer graphics 26 (2), pp. 35–42. Cited by: §2.
  • [2] T. F. Cootes, G. J. Edwards, and C. J. Taylor (1998) Active appearance models. In European conference on computer vision, pp. 484–498. Cited by: §2.
  • [3] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham (1995) Active shape models-their training and application. Computer vision and image understanding 61 (1), pp. 38–59. Cited by: §2.
  • [4] C. A. Glasbey and K. V. Mardia (1998) A review of image-warping methods. Journal of applied statistics 25 (2), pp. 155–171. Cited by: §2.
  • [5] M. Kowalski, J. Naruniec, and T. Trzcinski (2017) Deep alignment network: a convolutional neural network for robust face alignment. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

    pp. 88–97. Cited by: §2.
  • [6] D. Merget, M. Rock, and G. Rigoll (2018) Robust facial landmark detection via a fully-convolutional local-global context network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 781–790. Cited by: §2.
  • [7] S. Schaefer, T. McPhail, and J. Warren (2006) Image deformation using moving least squares. In ACM transactions on graphics (TOG), Vol. 25, pp. 533–540. Cited by: §1, §2, §3.3.1, §3.3.
  • [8] J. Shiraishi, Q. Li, D. Appelbaum, and K. Doi (2011) Computer-aided diagnosis and artificial intelligence in clinical imaging. In Seminars in nuclear medicine, Vol. 41, pp. 449–462. Cited by: §2.
  • [9] G. Song, Y. Liu, M. Jiang, Y. Wang, J. Yan, and B. Leng (2018) Beyond trade-off: accelerate fcn-based face detector with higher accuracy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7756–7764. Cited by: §2.
  • [10] Y. Sun, X. Wang, and X. Tang (2013) Deep convolutional network cascade for facial point detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3476–3483. Cited by: §2.
  • [11] G. Wolberg (1990) Digital image warping. Vol. 10662, IEEE computer society press Los Alamitos, CA. Cited by: §2.
  • [12] W. Wu, C. Qian, S. Yang, Q. Wang, Y. Cai, and Q. Zhou (2018) Look at boundary: a boundary-aware face alignment algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2138. Cited by: §2.
  • [13] Y. Wu, T. Hassner, K. Kim, G. Medioni, and P. Natarajan (2018) Facial landmark detection with tweaked convolutional neural networks. IEEE transactions on pattern analysis and machine intelligence 40 (12), pp. 3067–3074. Cited by: §2.
  • [14] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao (2016)

    Joint face detection and alignment using multitask cascaded convolutional networks

    IEEE Signal Processing Letters 23 (10), pp. 1499–1503. Cited by: §2.
  • [15] Z. Zhang, P. Luo, C. C. Loy, and X. Tang (2014) Facial landmark detection by deep multi-task learning. In European conference on computer vision, pp. 94–108. Cited by: §2.
  • [16] E. Zhou, H. Fan, Z. Cao, Y. Jiang, and Q. Yin (2013) Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 386–391. Cited by: §1, §2, §3.1.