The Selden Map of China was an early seventeenth century map which was brought into the Bodleian Library at the University of Oxford in 1659 [3, 4, 19]. The map was named after John Selden, a renowned London lawyer who donated it to the Bodleian Library in 1654 . This map provides significant understanding of the globalization in Asia in the early seventeenth century. In 2008, the map was ‘rediscoverd’ by Robert Batchelor  which give researchers an opportunity to understand Chinese cartography, the timeframe of the creation of the map, material diversity of the map, merchant shipping history in Asia and disputes about the merchant routes of the Ming Empire . Though there is a dispute if the map was legally obtained or not, it is the first Chinese map of merchant data that reached England and the first of its kind to survive . The map was painted on paper with black carbon ink and six different colors: red, green, yellow, blue, white, and black portraying flowers, trees, rivers, and mountains . To understand the pigment material diversity of the map, hyperspectral image (HSI) data of the map were collected at the Bodleian Library at Oxford University in 2015. To get high spatial resolution image, the overall collection of the HSI data was split into 12 ‘chips’. The spatial size of each chip may vary from 1600 (pixels) x 2300 (pixels) to 1600 (pixels) x 3250 (pixels). Each HSI chip contains the reflectance measurement in the range of visible to near infra-red, i.e., 224 channels from 400nm to 1000nm. In section 5 we will discuss the details of the data collection parameters.
Among many, two common studies on cultural heritage artifacts are faded text enhancement and pigment analysis which play significant role in the codicological studies (study of codices or manuscripts written on parchment or paper). For faded text analysis there are several methods which are based on multispectral imaging techniques such as reflectance, fluoresence, and transmission based techniques [11, 14, 8, 12, 18, 13, 10, 15]11, 14, 8, 12, 18, 13, 10, 15].
Hyperspectral imaging techniques have become more widely used for historical artifact analysis in recent years. Bai et al [3, 2, 1] has done extensive works on HSI data classification of historical artifacts. In these works most of the techniques involve unmixing which is adopted from the remote sensing community. For example, Bai et al  used HSI data to estimate the within pigment material diversity of Gough Map data where they used some initial processing of the HSI data by sphering and binning the data matrix. Then they used the Gram matrix and MaxD techniques to estimate the dimensionality of the data and Spectral Angle Mapper (SAM) to estimate the final classification map.
. In remote sensing imagery, the spatial area sampled by each pixel is much larger and the measured radiance in each pixel is the effect of different materials present within this area. Here the materials are spatially separated in most of the cases. As a result, linear unmixing theory in the reflectance space is sufficient to unmix the spectra of the constituent materials. In contrast, the paint mixtures usually have small particles uniformly distributed in the binding medium which we call intimate mixtures. So the total reflectance is not just a linear combination of the constituent pigments’ spectral reflectance in the reflectance space. Rather, the spectral unmixing for pigment mixtures are best modeled by a physics based model of light transport in turbid medium.
The Kubelka-Munk (K-M) theory [20, 21] proposed a radiative transfer model involving two diffused fluxes that propagates in the forward and backward direction. Because of the simplicity and accuracy of the model in paint unmixing (finding the mixing ratios of paints to match a given color), this method is still widely used in paint industries. Significant work has been done in the field of pigment identification using K-M theory [30, 25, 22] with multispectral image (MSI) data. These works mainly focus on pigment identification using an existing database of unique pigments. For instance, Zhao et al  extensively studied Vincent van Gogh’s The Starry Night using single-constant K-M theory where the author used preexisting database of pure pigments to find the best match. However, there is still a lack of study for a pigment diversity estimation using the HSI data without any prior database of unique pigments.
In this paper we implemented the single-constant K-M theory for the pigment material diversity estimation without using any prior database of unique pigments. Our results show that single-constant K-M theory is useful in pigment mapping in the field of historical document image analysis even if we have a little or no knowledge about the pigments present. The rest of the paper is organized as follows. In section 2 we discuss the theoretical background of K-M theory. In section 3 we describe the dimensionality estimation and unmixing of the HSI data. In section 4 we present the methodology of our experiment. In section 5 we demonstrate the application of the method on the Selden Map. In section 6 we summarize the results and draw conclusions.
2 Kubelka-Munk Theory
The K-M theory [20, 21] describes the radiative transfer model of two diffuse fluxes in a turbid medium. The diffuse reflectance of a film relates to the effective absorption coefficient and effective scattering coefficient using the following formulae
where is the reflectance of the background, is the film thickness, , and . Considering an opaque layer, the reflectance factor can be written as
This can be written as
Using the assumption of additivity and scalibility , the absorption and scattering coefficients of the paint mixtures are the linear combination of the absorption and scattering coefficients of the component paints weighted by their concentrations, respectively. This is referred to as a two-constant K-M theory. However, when the scattering is dominated by white pigment, the ratio, K/S of the overall pigment mixture can be modeled as the linear combination of the K/S of all the constituent pigments . This is referred to as the single constant K-M theory. We can write the relationship for the single constant K-M theory as 
Here, MixOnPaper and PigmentOnPaper represent the spectral properties of the paintings and masstone of the pigments on paper, respectively. Replacing this value in equation 4 we get
We handpicked the pixel spectrum of the paper. Then we use equation 6 to formulate the unmixing process. However, few assumptions are made for the approximation of the K-M theory: (a) the pigment particles are smaller compared to the thickness of the layer, (b) diffuse illumination source and diffuse reflectance, (c) materials have high optical thickness, (d) the specular reflection is neglected. To ensure that pigment materials have high reflectance, Zhao et al  added appropriate amount of white paint to form a spectral database which they used for pigment identification using K-M theory. Liang et al  showed that this unimixing method works even without adding any additional white pigments, where the pigment materials have low absorption coefficient and high scattering coefficient. However, note that we do not have a prior database of unique pigments or any information about the pigments present in our work.
3 Pigment Diversity Estimation and Unmixing
In this section we will describe theoretical background on material diversity estimation in HSI using the Gram Matrix and MaxD techniques and spectral unmixing using the Non-Negative Least Square (NNLS) technique.
In HSI, material diversity estimation implies the measurement of the dimensionality of the data. Several algorithms have been used to estimate the dimensionality of HSI datasets [31, 28, 16, 26]. We used the method introduced by Messinger et al  and used by Canham et al  to estimate the materials diversity. Using the MaxD technique, number of endmembers are extracted . The Gram Matrix can be computed from a test set, , where the element in the matrix is given by
which is the inner product between the and vectors in the test set. is a matrix where the number of vectors is in the test set. The determinant of the Gram Matrix is the square of the volume of the parallelotope generated by the test vectors. When the volume goes to , the vectors within the set are no longer linearly independent. An independent set of enemembers implies a set of unique materials. Then using an iterative procedure the volume function can be plotted which indicates the dimensionality of the data which is shown in Figure 5a.
We used the NNLS method for pigment unmixing. The objective of unmixing is to find the abundance matrix given the HSI data and the endmember matrix. If the endmember matrix is and the data vector is , NNLS tries to minimize the loss function where . The matrix consisting for all the data points is called the abundance matrix.
In this research we applied the unmixing in the K-M space .We transformed the endmember matrix and the HSI reflectance data to the K-M space and estimated the abundance matrix. Then we computed the difference matrix and augmented the abundance matrix with the difference matrix. If a vector in the abundance matrix is given as
the difference matrix is computed by taking the difference between the abundances of the constituent endmembers. For equation 8,
are the components of the difference matrix which is appended with the abundance matrix to make the feature descriptor more discriminative. Finally, we perform K-means clustering of the joint matrix and get our final class map.
In this research, we estimated the pigment diversity of a predefined portion of the Selden Map using the K-M theory. The workflow of our method is shown in Figure 1. The overall procedure is outlined below.
1. Cut specific portions of the ‘chips’ and stitch them together to create a single patch.
2. Find the green pixels using Mahalanobis distance classifier (MDC).
3. Use the MaxD and Gram Matrix approach to estimate the dimensionality of the data and extract the endmembers.
4. Transform the data into the K-M space using the opaque form of single-constant K-M theory.
5. Use the NNLS technique to find the abundance matrix and augment the abundance matrix with the difference matrix.
6. Use K-means clustering to find the final classification map using the abundance plus difference matrix.
In this section we will explain the dataset, details of the methodology, evaluation metrics, and results and comparison.
To capture high resolution HSI data of the overall map, the map was imaged in overlapping ‘chips’. All the overlapping ‘chips’ are shown in Figure 2b after RGB rendering from the HSI data. The map data was captured using a hyperspectral imaging system. It is noted that the image data of ‘chip’ 2 was lost during transmission. The HSI data has spectral bands ranging from nm to nm where nm. The physical size of the Selden Map is 100cm x 160cm. In pixel space the overall HSI data dimension is (pixels) x (pixels) x (bands). So the spatial resolution is approximately cm/pixel.
The RGB image of the Selden Map with geographic location is shown in Figure 2a. Notice that the similar green pigments are used in different places of the map such as the Ming Empire, Korea, and Banda.
5.2 Experimental Details
In this section we will discuss the overall experimental procedure to perform pigment diversity estimation.
5.2.1 Region of Interest selection
Our first step is to select the region of interest from the map. From the study of Bai et al we found that Korea and the land between Java and Banda had large spectral angle error in the final classification map which motivated us to investigate further using a physics based approach for pigment unmixing. The cropped region of the RGB images from ‘chip’ 4 and ‘chip’ 6 are shown in Figure 3a and Figure 3b respectively. For computational simplicity, we stitch the two cropped regions side by side which is shown in Figure 3c. we call this patch as our region of interest (ROI).
Our next step is to obtain the segmentation map to extract the green pigments. It is important to mention that the patch taken from ‘chip’ 4 has some blackish pixels near the west border of Korea. We included those with the mountains in our segmentation map to extract the green pigments. We used both the RGB image and the HSI data to create an accurate segmentation map. We took a small patch from our ROI RGB image which is shown in Figure 4a. To get an estimate of the segmentation map we used the K-means clustering of the RGB image patch in RGB, Lab, and HSV colorspaces with and without augmenting the x and y coordinates of each pixel with the data. Clustering in the Lab colorspace with the x and y coordinates augmentation gave us the best possible estimate of the segmentation map which is shown in Figure 4b. Here we used 4 clusters and to show the clustering result, the color of each cluster is created by taking the mean of the red, green, and blue channels separately and replacing all the pixel values with the mean value within the same cluster. This segmentation mask is used as the training label of the corresponding HSI data to train the MDC. The final segmentation map is estimated by the prediction of the MDC from the overall HSI data corresponding to the ROI. The binary image of the segmentation map is shown in Figure 4c. The black pixels in the image represent the green and the dark pigments from the Selden map within the ROI.
5.2.3 Endmember Extraction
Using the segmentation map we extracted the green pixels from the HSI for further analysis. Then we used the Gram Matrix technique to estimate the number of endmembers presented in the data identified by where the function approaches , which is shown in Figure 5a. It is inferred from the figure that the number of endmembers present in the HSI is . Then we used the MaxD technique to find those endmembers. The extracted endmembers are shown in Figure 5b which include both green pixels (bright around nm) and dark pixels (dark across the visible spectrum). The endmember locations in the original ROI image is shown in Figure 6 with concentrated blue circles.
5.2.4 K-M transformation
5.2.5 Unmixing and Clustering
We unmix the HSI data in the K/S space using the NNLS technique. This technique enforces the abundances to be non-negative but the sum of the abundances for each pixel can be less or greater than where ideally it should be equal to . This is to allow for within material variability and to avoid overfitting. This unmixing gave us the abundance matrix. We used the same technique used by Bai et al which is described in section 3 to compute the difference matrix from the abundance matrix. Then we augmented the abundance matrix with the difference matrix and used the K-means clustering algorithm to find the final classification map.
5.3 Metrics of Success
For the success metrics we use the root mean squared error (RMSE) between the cluster mean and each of the pixels within that cluster. Our approach is to minimize the total RMSE for a certain class from the corresponding class mean. The workflow is shown in Figure 7.
The final classification map comes from the clustering of the abundance map in the K/S space. We use the segmentation map to visualize the final classification result which is shown in Figure 8. From the classification result we can see that the boundary of Korea is drawn with a dark pigment which is not used in the part of the land between Banda and Java. We can also see that a darker green pigment is used significantly in Korea, but used marginally in the other region. We suspect that these changes were made during a modification of the map. However, few other green pigments are used in these two regions simultaneously. Our findings have some agreement with Bai et al , where the author mentioned that similar pigments are used in these two places. However, our study provides deeper analysis of these places using K-M theory. The spatial distribution of the error map is shown in Figure 9. Note that all the errors are small indicating a good model per pixel in the final classification map.
Kubelka-Munk (K-M) theory has been used to estimate pigment mixtures of modern paintings in hyperspectral imagery. Here, K-M theory is utilized to classify pigments in the Selden Map of China, a navigational map from the early seventeenth century. Hyperspectral data of the map was collected at the University of Oxford and can be used to estimate the pigment diversity, and spatial distribution, within the map. This work seeks to assess the utility of analyzing the data using K-M theory, as opposed to the traditional reflectance domain. From our study we found some similarities and dissimilarities of green pigments’ spatial distribution between Korea and the land between Banda and Java. This method can be generalized to a tool to estimate pigment diversity in historical artifacts where little is known of the specific pigments and mixtures used. Geographers and cartographic historians may use this technique to understand the timeline of construction and modification of various historical artifacts.
-  (2017) A pigment analysis tool for hyperspectral images of cultural heritage artifacts. In Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XXIII, Vol. 10198, pp. 101981A. Cited by: §1.
-  (2017) Hyperspectral analysis of cultural heritage artifacts: pigment material diversity in the gough map of britain. Optical Engineering 56 (8), pp. 081805. Cited by: §1, §5.2.5.
-  (2018) Pigment diversity estimation for hyperspectral images of the selden map of china. Vol. 10644, pp. 10644 – 10644 – 17. External Links: Cited by: §1, §1, §5.2.1, §5.4.
-  (2013) The selden map rediscovered: a chinese map of east asian shipping routes, c.1619. Imago Mundi 65 (1), pp. 37–63. External Links: Cited by: §1.
-  (2002) Multiple pigment selection for inpainting using visible reflectance spectrophotometry. Studies in conservation 47 (1), pp. 46–61. Cited by: §2.
-  (2007) An optical real-time adaptive spectral identification system (orasis). Hyperspectral Data Exploitation: Theory and Applications, pp. 77–106. Cited by: §1.
-  (2011) Spatially adaptive hyperspectral unmixing. IEEE Transactions on Geoscience and Remote Sensing 49 (11), pp. 4248–4262. Cited by: §3.
-  (2011) Some properties of textual heritage materials of importance in spectral imaging projects. na. Cited by: §1.
-  (1940) The colour of pigment mixtures. Proceedings of the Physical Society 52 (3), pp. 390. Cited by: §2.
-  (2003) Multispectral imaging of the archimedes palimpsest. In null, pp. 111. Cited by: §1.
-  (2011) Ten years of lessons from imaging of the archimedes palimpsest. na. Cited by: §1.
-  (2011) Some properties of textual heritage materials of importance in spectral 27 imaging projects. In Proc. of the European Signal and Image Processing Conf.(EURASIP’11), pp. 1440–1444. Cited by: §1.
-  (2014) Statistical processing of spectral imagery to recover writings from erased or damaged manuscripts. manuscript cultures 7, pp. 35–46. Cited by: §1.
-  (2010) Infinite possibilities: ten years of study of the archimedes palimpsest. Proceedings of the American Philosophical Society 154 (1), pp. 50–76. Cited by: §1.
-  (2015) Rediscovering text in the yale martellus map. In Information Forensics and Security (WIFS), 2015 IEEE International Workshop on, pp. 1–6. Cited by: §1.
-  (2018) Endmember extraction from hyperspectral imagery based on qr factorisation using givens rotations. IET Image Processing. Cited by: §3.
-  (1998) Application of kubelka-munk theory in device-independent color space error diffusion. In PICS, pp. 344–348. Cited by: §2.
-  (2011) Recovery of handwritten text from the diaries and papers of david livingstone. In Computer Vision and Image Analysis of Art II, Vol. 7869, pp. 786909. Cited by: §1.
-  (2016-09-02) The origins of the selden map of china: scientific analysis of the painting materials and techniques using a holistic approach. Heritage Science 4 (1), pp. 28. External Links: Cited by: §1.
-  (1931) A contribution to the optics of pigments. Z. Tech. Phys 12, pp. 593–599. Cited by: §1, §2.
-  (1948) New contributions to the optics of intensely light-scattering materials. part i. Josa 38 (5), pp. 448–457. Cited by: §1, §2.
-  (2008) Pigment identification with optical coherence tomography and multispectral imaging. Cited by: §1, §2.
-  (2012) Advances in multispectral and hyperspectral imaging for archaeology and art conservation. Applied Physics A 106 (2), pp. 309–323. Cited by: §1.
-  (2012) Metrics of spectral image complexity with application to large area search. Optical Engineering 51 (3), pp. 036201. Cited by: §3.
-  (2017) Linear modeling of modern artist paints using a modification of the opaque form of kubelka-munk turbid media theory. Color Research & Application 42 (3), pp. 308–315. Cited by: §1.
-  (2010) Survey of geometric and statistical unmixing algorithms for hyperspectral images. In 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing, pp. 1–4. Cited by: §3.
-  (2003) A subpixel target detection technique based on the invariance approach. In AVIRIS Airborne Geoscience workshop Proceedings, Cited by: §3.
-  (2017) Endmember extraction from hyperspectral image based on discrete firefly algorithm (ee-dfa). ISPRS Journal of Photogrammetry and Remote Sensing 126, pp. 108–119. Cited by: §3.
-  (2005) Spectral unmixing of normalized reflectance data for the deconvolution of lichen and rock mixtures. Remote Sensing of Environment 95 (1), pp. 57–66. Cited by: §1.
-  (2008) An investigation of multispectral imaging for the mapping of pigments in paintings. In Computer image analysis in the study of art, Vol. 6810, pp. 681007. Cited by: §1, §2.
-  (2010) Iterative convex hull volume estimation in hyperspectral imagery for change detection. In Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVI, Vol. 7695, pp. 76951I. Cited by: §3.