1 Introduction
Understanding the appearance of the human face is of considerable interest in computer vision
[1] and graphics [2] and has been studied broadly in many works. One approach to understanding appearance is to estimate a physically meaningful decomposition. Existing methods for decomposing face appearance often rely on complex equipment and lab conditions. For example, Ma et al. [3] use polarised spherical gradient illumination to capture geometry and diffuse and specular reflectance maps. Donner et al. [4] use polarised, multispectral light and an assumption of a planar sample to estimate biophysical skin parameter maps. Gitlina et al. [5] essentially combine these two setups using multispectral illumination in a lightstage. In a medical setting, Claridge [6] use RGB plus NIR measurements with polarised illumination and again assuming a planar sample to estimate skin biophysical parameters for the purpose of identifying abnormalities.Highly ambitious recent work uses deep learning to decompose face appearance from single, uncontrolled images. Kim et al.
[7] and Tewari et al. [8] use selfsupervision to learn to fit 3D morphable models [9] to single images (providing an estimate of geometry, illumination and reflectance). Intrinsic image based methods [1, 10] do not rely on a facespecific statistical model but learn the whole task from scratch.We consider the challenging problem of decomposing a single image into diffuse and specular shading and the distribution maps of biophysical parameters that explain skin colouration. To make the problem tractable we work with multispectral images but do not require the complex and controlled illumination conditions of previous methods. To do so, we propose a novel combination of a biophysical and dichromatic reflectance model that efficiently characterises spectral skin reflectance. Previously, the dichromatic model has primarily been used in a setting where the light source colour (or more generally spectral power distribution) is unknown but the object under study is assumed to have uniform or piecewise uniform [11] colour (more precisely diffuse albedo or spectral reflectance). We show that, with known light source spectra we can estimate four spatially varying reflectance parameters by fitting our model to multispectral data using nonlinear least squares. We show how the decomposed parameter maps can be edited in an intuitive way and validate our approach quantitatively on a database of skin reflectance spectra and qualitatively on a set of multispectral face images.
2 Biophysical dichromatic skin model
In this section we introduce our multispectral skin reflectance model. We assume that skin reflects light according to the dichromatic model [12] and that diffuse albedo (i.e. subsurface absorption) can be explained using a biophysical model.
2.1 Multispectral dichromatic model
The dichromatic model assumes that wavelengthdependent scene radiance, with the wavelength, is a sum of body (diffuse) and surface (specular) reflected components. Further, it divides each source of radiance into a part that depends on geometry (informally “shading”) and a wavelength dependent part (informally “colour”). The body reflection arises from subsurface scattering and modifies the spectral power distribution (SPD) of the light through absorption whereas the surface reflectance happens at the interface and does not, meaning the model can be written as: , where and are the diffuse and specular shading respectively and is the spectral reflectance of the diffusely reflected light. We discretise at
evenly spaced wavelengths such that we can write the vector of spectral scene radiance,
, as:(1) 
where and are the wavelengthdiscrete illuminant SPD and spectral reflectance respectively.
2.2 Biophysical skin reflectance model
We now replace generic spectral reflectance with a biophysical model for skin. This skin model has only two free parameters, meaning the dichromatic model has in total only four unknowns per pixel. Our biophysical spectral reflectance model for skin follows a number of previous models [6, 2, 13, 14], though we focus on simplicity and limiting the number of free parameters. Specifically, our model allows only the melanin and haemoglobin concentration to vary spatially whereas all other parameters are based on measured data, validated approximation functions or average values [15, 16, 17, 18, 2, 19, 20, 13]. The free parameters have physical meaning and can therefore be constrained to the range of values observed in healthy skin.
Human skin tissue has a complicated layered structure. We consider only two layers (Fig. 1(a)). The outer layer is the epidermis contains the melanin and is responsible for absorption of the short wavelengths of the visible spectrum and the remainder of light is mostly forward scattered. The dermis has blood vessels that contain the haemoglobin pigment, and absorbs light in the green and blue wavelengths, the remainder of light is primarily reflected back through the epidermis where the melanin pigment absorbs the light again. Therefore, our skin spectral reflectance model is written as:
where is the epidermal melanosomes volume fraction and lies in the range , is the dermal blood volume fraction and lies in the range , is the proportion light transmitted through the epidermis (twice) and is modelled using the LambertBeer law, is the proportion of light reflected from the dermis and is modelled by KubelkaMunk theory. In Fig. 1(b) we visualise the range of colours that can be obtained by our skin reflectance model when transformed to RGB space as described in Sec. 4.3. In wavelengthdiscrete terms once again, we write as the vector of diffuse spectral reflectance which can be substituted into (1).
(a)  (b) 

3 Model fitting
We now assume that we are provided with a multispectral measurement of scene radiance, , at a point on the skin surface. We assume that scene illumination is spectrally uniform and that its SPD is known. In this case, our model has four unknown parameters and we pose model fitting as a nonlinear least squares problem whose objective is:
(2) 
All four parameters are subject to constraints. The biophysical parameters must lie in their plausible ranges while the diffuse and specular shading must be positive. This leads to a constrained optimisation problem:
In practice, we reparameterise each of the four unknowns to obtain an unconstrained optimisation problem:
(3)  
and is the logistic function: . We minimise (3) using the trustregionreflective algorithm. Note that, in the case of multispectral images, we solve the optimisation problem independently at each pixel. i.e. we do not require any spatial smoothness priors.
4 Modelbased editing
Once a face image has been decomposed to the four semantically meaningful parameter maps, intuitive editing becomes possible. After editing one or more of the maps, we recompute scene radiance with (1), blend with unedited background radiance using a skin mask and then transform to sRGB.
4.1 Learning skin segmentation
To produce convincing edited images we require a skin probability mask to blend the edited skin radiance with background. We train a multilayer perceptron that computes the probability that a single pixel belongs to the skin region. The network has five fully connected layers with ReLU activation (number of channels = 64/64/128/128/2). The input for a pixel is a vector
comprising the original measured radiance and the four fitted parameters:The output of final fully connected layer is passed thought the perpixel classification log loss and the network predicts the probability , for all pixels. We train the network by manually selecting skin (total 1.1M pixels) and nonskin (total 750k pixels) regions from the data in [21].
4.2 Blending skin and background edits
Our biophysical model is able to describe only skin spectral reflectance. So, other features such as facial features: eyes, teeth, facial hair and the image background are not well explained. For this reason, we use the estimated skin probabilities to blend the spectral radiance for nonskin regions and our edited biophysical spectral radiance for skin regions:
4.3 Spectral radiance to nonlinear sRGB
For visualisation, a spectral radiance vector resulting from an edit must be rendered to a tristimulus RGB image. This arises from an integration over wavelength of the product of scene radiance (itself the product of illumination and reflectance spectra) and camera spectral sensitivity:
(4) 
where is the SPD of the illuminant, the spectral reflectance of the surface and the spectral sensitivity of the camera in colour channel . In our wavelengthdiscrete formulation this becomes:
(5) 
where contains the wavelengthdiscrete versions spectral sensitivities of the camera, is a matrix that performs white balancing and colour transformation to sRGB space and controls the nonlinear gamma.
5 Experiments
We begin by quantitatively evaluating how well our model and model fitting method are able to explain measured data. We use the 25 faces from the ISET database [21], compute the best fitting parameters for skin labelled pixels and then compute the mean relative absolute error in the reflectance spectra over all pixels in all images, yielding a 7.4% error.
In Fig. 2 we show representative qualitative results from this experiment. Input, albedo and reconstruction RGB renderings use the mean camera spectral spectral sensitivity from [23] and assume daylight (D65) illumination. The specular shading maps clearly pick up the specular reflections. Note also that the specular shading contains fine surface detail whereas the diffuse shading is blurred due to subsurface scattering (consistent with [3]). The diffuse albedo is computed directly from the biophysical spectral reflectance and shows that shading effects are not transferred to the biophysical parameters. Note that lips and flushed cheeks are clearly visible in the haemoglobin maps and that the overall melanin value accurately reflects skin colour. All results are masked to skin region by binarising the skin probability maps.
There are no existing methods solving the same task, but for comparison we adapt the method of Tsumura et al. [22]
to multispectral data. This is a statistical approach based on Independent Component Analysis (ICA) so the parameter maps have no exact physical meaning. However, as seen in Fig.
3 the maps do seem to have an approximate correspondence to our physically meanginful ones, though it is clear that our decomposition and reconstruction is more plausible.Finally, in Fig. 4 we present editing results using the method in Sec. 4. In each case we show the original RGB and one of the estimated maps, then the edited map and finally the edited RGB result. In the first row we scale the melanin by 0.75, giving the appearance of lighter skin. In the second row, we show freckle removal by applying 2D median filter on the melanin map. In the third row we increase the melanin concentration by 0.2 and this shows the face appearance with darker skin as if the face were suntanned. In the fourth row we increase the haemoglobin map by 0.3. This presents a flushed appearance of the face such as if the face is over heated. Finally, in the fifth row we demonstrate specularity removal by setting the specular map to a constant.
6 Conclusion
We have presented a novel hybrid of a spectral biophysical and dichromatic reflectance model and shown how to fit the model to multispectral face images. Our results show that the model is able to accurately explain multispectral skin reflectance and that the estimated maps provide a highly plausible decomposition. The surprising conclusion of our work is that there is sufficient information in a multispectral image to render the decomposition task wellposed. This is possible only by introducing the constraint of a biophysical model. In principle, our model could be fitted to four channel data so it would be interesting to see whether we can obtain similar results using RGB + NIR images. Our inverse rendered results could be used to learn statistical models of the variation in intrinsic biophysical face parameters [16].
References
 [1] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras, “Neural face editing with intrinsic image disentangling,” in Proc. CVPR, 2017.
 [2] J. Jimenez, T. Scully, N. Barbosa, C. Donner, X. Alvarez, T. Vieira, P. Matts, V. Orvalho, D. Gutierrez, and T. Weyrich, “A practical appearance model for dynamic facial color,” ACM Trans. Graphic., vol. 29, no. 6, pp. 141:1–141:10, 2010.
 [3] W.C. Ma, T. Hawkins, P. Peers, C.F. Chabert, M. Weiss, and P. Debevec, “Rapid acquisition of specular and diffuse normal maps from polarized spherical gradient illumination,” in Proc. Eurographics Conference on Rendering Techniques, 2007, pp. 183–194.
 [4] C. Donner, T. Weyrich, E. d’Eon, R. Ramamoorthi, and S. Rusinkiewicz, “A layered, heterogeneous reflectance model for acquiring and rendering human skin,” in ACM Trans. Graphics, 2008, vol. 27, p. 140.
 [5] X. Huang G. C. Guarnera Y. Gitlina, D. S. J. Dhillon and A. Ghosh, “Practical multispectral imaging of human skin using RGBW+ illumination,” in CVMP Short Papers, 2018.
 [6] E. Claridge, S. Cotton, P. Hall, and M. Moncrieff, “From colour to tissue histology: Physicsbased interpretation of images of pigmented skin lesions,” Medical Image Analysis, vol. 7, no. 4, pp. 489 – 502, 2003.
 [7] H. Kim, M. Zollhöfer, A. Tewari, J. Thies, C. Richardt, and C. Theobalt, “InverseFaceNet: Deep monocular inverse face rendering,” in Proc. CVPR, 2018.

[8]
A. Tewari, M. Zollhofer, H. Kim, P. Garrido, F. Bernard, P. Perez, and
C. Theobalt,
“MoFA: Modelbased deep convolutional face autoencoder for unsupervised monocular reconstruction,”
in Proc. ICCV, 2017.  [9] Volker Blanz and Thomas Vetter, “A morphable model for the synthesis of 3d faces,” in Proc. SIGGRAPH, 1999, pp. 187–194.
 [10] Soumyadip Sengupta, Angjoo Kanazawa, Carlos D. Castillo, and David W. Jacobs, “SFSNet: Learning shape, refectance and illuminance of faces in the wild,” in Proc. CVPR, 2018.
 [11] Cong Phuoc Huynh and Antonio RoblesKelly, “A solution of the dichromatic model for multispectral photometric invariance,” International Journal of Computer Vision, vol. 90, no. 1, pp. 1–27, 2010.
 [12] Steven A Shafer, “Using color to separate reflection components,” Color Research & Application, vol. 10, no. 4, pp. 210–218, 1985.
 [13] Aravind Krishnaswamy and Gladimir VG Baranoski, “A biophysicallybased spectral model of light interaction with human skin,” Computer Graphics Forum, vol. 23, no. 3, pp. 331–340, 2004.
 [14] S. J. Preece and E. Claridge, “Spectral filter optimization for the recovery of parameters which describe human skin,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 7, pp. 913–922, July 2004.
 [15] S. L. Jacques, “Skin optics summary,” Oregon Medical Laser Center News, January 1998.
 [16] S. Alotaibi and W. A. P. Smith, “A biophysical 3D morphable model of face appearance,” in Proc. ICCV, 2017.
 [17] R. Rox Anderson and John A. Parrish, “The optics of human skin,” Journal of Investigative Dermatology, vol. 77, no. 1, pp. 13 – 19, 1981.
 [18] A. J. Thody, E. M. Higgins, K. Wakamatsu, S. Ito, S. A. Burchill, and J. M. Marks, “Pheomelanin as well as eumelanin is present in human epidermis,” Journal of Investigative Dermatology, vol. 97, no. 2, pp. 340 – 344, 1991.
 [19] Scott Prahl, “Optical absorption of hemoglobin,” Oregon Medical Laser Center News, December 1999.
 [20] Ross Flewelling, “Noninvasive Optical Monitoring,” in The Biomedical Engineering Handbook, Second Edition. 2 Volume Set, Electrical Engineering Handbook. CRC Press, dec 2000.
 [21] ImageVal Consulting, “Hyperspectral Scene Data (415 – 915 nm): High Resolution Faces,” http://www.imageval.com/scenedatabase4faces1meter/, 2012.
 [22] N. Tsumura, N. Ojima, K. Sato, M. Shiraishi, H. Shimizu, H. Nabeshima, S. Akazaki, K. Hori, and Y. Miyake, “Imagebased skin color and texture analysis/synthesis by extracting hemoglobin and melanin information in the skin,” ACM Trans. Graph., vol. 22, no. 3, pp. 770–779, 2003.
 [23] J. Jiang, D. Liu, J. Gu, and S. Süsstrunk, “What is the space of spectral sensitivity functions for digital color cameras?,” in 2013 IEEE Workshop on Applications of Computer Vision (WACV), Jan 2013, pp. 168–179.
Comments
There are no comments yet.