With the rapid proliferation of digital imaging and videoing, accurate recording of the true color of a scene through the device-captured image is of extreme importance for many practical applications, ranging from the color-based object recognition and tracking to the quality control of textile and food processing [1, 2, 3]. However, these device-captured image colors are always affected by the prevailing changed light source color incident in a scene. Fig. 1(a) indicates that the images of a same scene rendered under two different illuminants obviously exhibit various color appearance. Thus, for the sake of maintaining the true color appearance of objects, the color artifacts due to the illuminant should be carefully eliminated.
Color constancy (CC) refers to the perceptual constancy of the human visual system, which enables the perceived color of objects in a scene largely constant as the light source color changes [3, 4, 5]. Many CC algorithms have been specially designed to imitate this visual attribute by computationally estimating the illuminant and then removing the color cast to discount the bias due to the illuminant (see [3, 4, 6]
for excellent reviews). Recently, the performance of CC on several benchmark datasets has been significantly progressed, especially for the state-of-the-art CC models that are based on extensive feature extraction and machine learning techniques[7, 8, 9, 10, 11, 12, 13].
However, as indicated in Fig. 1, image color is not only influenced by the scene light source color. Actually, during the image acquisition phase, there are three factors influencing the color values that we finally measure at each pixel, i.e., the reflectance of the objects in the scene, the illuminant incident in a scene, and the camera spectral sensitivity (CSS). While CC always devotes to obtain a stable color representation of scene across the changes of the illuminant (Fig. 1(a)), the CSS also affects the color appearance of the scene (Fig. 1(b)).
So far, however, almost all existing benchmark datasets were collected using one camera with fixed CSS [17, 14, 18, 19, 20, 21] and many state-of-the-art learning-based CC algorithms implicitly assume that the images in a dataset are captured by the same CSS, and are limited to evaluate their performance on intra-dataset-based CC (intra-CC) [3, 7, 8, 10, 11, 12, 13, 22, 23, 24], i.e., learning the model on one part of the dataset and testing the learned model on another part of the dataset. Thus, the effect of CSS on CC can not be deeply probed according to the intra-CC strategy. However, as indicated in Fig. 1(b) and Fig. 2, both the color distributions of the recorded images and the extracted true illuminants in the datasets rely on CSS.
While an existing intra-CC strategy discounts the color bias induced by the illuminant, the recovered image color is in fact the appearance of the combination of the object reflectance and the camera sensor effect (Fig. 1(b)). As a consequence, the existing CC algorithms may suffer problems when dealing with the inter-dataset-based (inter-CC) application, i.e., training a model on one dataset that was captured by a specific camera and then testing the learned model on another dataset that was captured by another camera with different CSS.
In this paper, we draw insights into the rich literature on intrinsic image research, which aims to decompose an image into various individual physical characteristics (e.g., reflectance and shade) that are independent of both illuminant and camera sensor [25, 26, 27]. To this end, we primarily focus on studying the effect of the CSS on CC arising in the inter-CC setup.
Specifically, we first point out that the chromatic distributions of both the measured illuminants and recorded images by various CSSs are quite different, even for a same scene under one illuminant, which is one of the significant causes that results in the failure of inter-CC evaluation using current learning-based CC algorithms. Then, in order to overcome this drawback, we propose a simple yet efficient framework that incorporates the information of CSS into the process of CC. This is the main contribution of this work. In particular, we first learn a transform matrix between the CSS functions of two distinct cameras (CSS-1 and CSS-2). Then, the learned matrix is used to convert the color biased images and the provided illuminants recorded with CSS-1 into those with CSS-2 before training the model and testing on the image(s) recorded with CSS-2.
Moreover, by taking into account the CSS information, we also demonstrate how to obtain a stable color image representation of the scene that is almost independent of both illuminant and camera sensor. This stable color image representation may benefit the further computer vision applications such as intrinsic image decomposition, 3-D view synthesis, physics-based reflectance descriptor, and so on.
Although it is well-known that the CSS affects the image formation, how to discount such a prevail adverse effect in an elegant way and thus improve the performance of other color applications (e.g., CC) is a very difficult problem. In fact, some CC researches have also been devoted to the effect of imaging sensors on chromatic adaptation, in which both the source and destination illuminants are known. For example, the spectral sharpening of sensors attempts to simplify the illuminant change characterization and therefore improve the performance of any CC algorithm that is based on the diagonal-matrix transformation [28, 29].
Totally different from those attempts, our aims are to probe into the possible effects of various CSSs on inter-CC performance for traditional learning-based CC models and to propose feasible solution to solve such a challenging problem. To the best of our knowledge, such issue has not been explicitly studied before in the area of CC. The motivation is that though learning-based models work well for intra-CC application, they require the collection of extensive training set for each CSS. In contrast, it would be undoubtedly more practically valuable if we could always obtain inter-CC performance competitive to that by learning-based intra-CC for any CSS, simply based on the collection of training set for only one CSS.
The rest of this paper is organized as follows. Section II formulates the problem. The proposed solution is described in section III. Section IV presents the experiments to validate our theoretical analysis and the proposed method. We conclude in Section V by discussing some contributions and limitations of this work.
Ii Color image fundamentals and color constancy
As did in many papers, in this work we reasonably assume a single light source color across a scene [3, 7, 30, 31, 32]. Based on the common form of the linear imaging equation, the interaction of surface, light source, and sensor can be indicated by a simple equation written as [3, 4]
where the integral is taken over the visible spectrum and are sensor channels. Equation (1) states that the captured image values directly depend on the color of the light source , the surface reflectance and the camera spectral sensitivity , where is the spatial coordinate and is the wavelength of the light.
Earlier researches have demonstrated that it is sufficient to simulate the color transform of image induced by the illuminant with a diagonal transform [3, 6, 33, 34, 28]. Thus, based on the diagonal transform assumption in CC, Eq (1) can be simplified as
In general, CC aims at removing the color bias in captured image by first estimating the illuminant , , then recovering by dividing by . However, both and in Eq (2) are usually unknown and hence, given only the image values , estimating is a typical ill-posed problem that cannot be solved without further constraints [30, 35, 36, 37, 38].
While many existing algorithms rely on aforementioned steps to achieve CC, it should be explicitly pointed out that these algorithms are camera dependent (e.g., a fixed camera ). Unfortunately, almost all current CC models only focus on developing techniques to eliminate the color bias in image induced by the illuminant but ignoring the fact that the CSS also contributes to the color bias when recording the images. In particular, for almost all the color constancy datasets, the illuminant ground truth of each image is extracted from a specific local region (e.g., the grey ball or color-checker) of the image recorded by the CSS. That is, the illuminant ground truth is also CSS dependant.
For a clear illustration, we express the sensor responses to both the illuminant and surface respectively as 
Obviously, both and depend on the camera spectral sensitivity . Fig. 1(b) intuitively shows that the same scene rendered under white light source but exhibit obvious color difference since two images () are respectively captured by two cameras with different CSSs. Similarly, the responses to illuminants () under diverse CSSs also visually display in various color domains (Fig. 2).
Ii-a About the learning-based CC
We abstract the standard framework of traditional learning-based CC as
indicates a set of vectors of ground truth illuminants supplied by the dataset.represents a set of vectors of certain statistics extracted from the input image . donates a certain model that is committed to learn a transformation, which can effectively map the extracted features of images to the corresponding illuminant ground truth . Basically, for the sake of comprehensive consideration of the effectiveness and efficiency, various algorithms train their models using different machine learning techniques and feature descriptors for CC.
. While statistical methods relate the CC as one kind of parameter inference problem by assuming that the reflectance and illuminant meet specific probability distribution[19, 42, 43, 44], they need to train a model that can capture the mapping between the preprocessed images and the illuminant ground truth.
A large class of learning-based methods inherently treat CC as a regression problem [8, 23, 45]. This kind of methods could be unified under a standard regressive equation like . Here donates a regression matrix that needs to be learned. The difference among regression based models is reflected in the specific choice of the technique of learning a regression matrix and the selection of the features22]
. Similar approaches apply support vector regression , linear regression[45, 46]
, or thin-plate spline interpolation techniques
to learn the regression matrix using almost the similar type of input data. More recently, the leading performance is obtained by employing more effective features (color-edge moment), more efficient regression techniques (e.g., regression trees 
), or even deep learning.
Alternatively, if we replace the features with the illuminant estimated by multiple low-level based CC algorithms [5, 30, 35, 36, 37, 38], the regression problem mentioned above is boiled down to the so-called committed-based CC [47, 48, 49, 50]. In this protocol, is no longer regarded as a feature mapping matrix but a weighted matrix that tries to ’optimally’ fuse the output of multiple CC methods as a single illuminant estimate under certain rule (e.g., the weights are optimized in the least mean square (LMS) sense). Or in [9, 10, 51, 52, 53], is taken as an vote matrix that can select the most appropriate CC method or previously stored illuminant for every input image according to the intrinsic properties of natural images, e.g., high level information [10, 47, 51, 52, 54, 55].
Ii-B Intra-dataset-based CC (Intra-CC)
The common way to benchmark the performance of a learning-based CC algorithm is to adopt the intra-dataset-based evaluation with the form of so-called -folds cross validation [3, 8, 13, 31]. For example, the dataset is first divided into parts (-folds). Next, by applying the model on the parts, the optimal parameters and structures of the model are trained using the corresponding illuminant ground truth. Then, the trained model is tested on the remaining one part of the data. For a complete procedure of -folds cross validation, the steps mentioned above are repeated times to ensure that each image occurs in the test set only once and all images in the whole dataset is either in the training set or in the test set at the same time. Finally, the measures for each cross validation are averaged as the final metric of algorithm’s performance. Fig. 3(a) summarizes the steps of intra-CC. The logic behind the typical intra-dataset-based evaluation is indeed very suitable to test the performance of a learning-based CC model in the presence of the multiple scenes with diverse illuminants and reflectances in a dataset .
Despite the significant advancement in the performance of intra-CC on several benchmark datasets [3, 7, 8, 11, 24, 39, 40, 12, 49], these methods always implicitly assume that the distribution of the test data should be similar to the distribution of the training data (e.g., both the training images and test images are captured by a fixed camera). However, for practical CC applications this assumption may be readily violated, since various trademarks of cameras possess quite diverse CSSs (e.g., NIKON and CANON in Fig. 2), and there is even apparent difference of CSS among cameras produced by the same manufacture (e.g., CANON in Fig. 2). Therefore, although the existing state-of-the-art algorithms that are correctly trained can achieve high performance on intra-CC, they may also suffer serious problem once being applied on inter-CC [7, 8, 12, 23, 49].
Ii-C Problem formulation for inter-dataset-based CC
Fig. 3(b) summarizes the general steps of the traditional inter-CC, we assume that we have trained a CC model on a dataset that is rendered under certain CSS (we called it as CSS-1).
As indicated by Eqs (3) and (4), both the illuminant and the image are mixed with the information of CSS-1. Similarly, the color domain of feature statistics extracted from is also dependant on CSS-1. Thus, based on these training data, the model just learns a fixed mapping that is only suitable to this fixed CSS-1. In other words, the model will always learn a mapping that attempts to predict the illuminant under a color domain of the corresponding CSS (e.g., a color domain rendered under CSS-1).
Then, to run the inter-CC, we need to test the CC model that is trained on the color domain of CSS-1 on another dataset that is collected by another camera with a distinct CSS (we called it as CSS-2).
Where , , and are rendered under CSS-2. If we directly apply the CC model on the dataset that is rendered under CSS-2, the CC model is destined to suffer serious failure, which is explained as below.
While these learning-based CC approaches that are correctly trained can deliver very competitive performance with intra-CC, the training phase is undoubtedly relied on the illuminant ground truth supplied by the dataset . As analyzed above, the color domains of the images and illuminants are clearly affected by the changes of CSS (e.g., Fig. 1(b) and Fig. 2). Thus, both and are color domain dependent and the existing learning-based CC models can just learn a fixed model that is only applicable to a fixed camera.
In other words, the existing learning-based CC algorithms always train a model that tends to predict illuminant under a specific color domain (e.g., a color domain rendered under specific CSS). In consequence, once the trained CC model is tested on the images that are captured by another camera with very different CSS, it is destined to suffer failure since they do not consider the effects of CSS during the training.
Iii Improving the inter-dataset-based CC
In the above we have provided theoretical analysis about the effect that a CSS has on the chromatic distribution of light source spectra (Fig. 2) and the color appearance of images (Fig. 1(b)), which further indicates that the existing CC models will perform poorly once being tested with inter-CC.
In this section, we propose a method to improve the performance of a CC algorithm on inter-CC evaluation by taking into account the CSS of each camera. Our strategy is straightforward. Specifically, in order to overcome the problem due to the difference between the two CSSs during the process of inter-CC, we first learn a sensor transformation that can express the mapping between the two given CSSs, and then we use this mapping to relate the CC model learned on one dataset with CSS-1 to another dataset with CSS-2. The improved framework for CC on inter-CC application is illustrated in Fig. 4.
Sensor transformation can be simply described by a matrix . If denotes the response of CSS-1 to a certain reflectance, and denotes the response of CSS-2 to the same reflectance, we define
where S is a matrix, which is simply learned by a least-mean-square (LMS) training technique in this work. In order to learn the matrix S based on Eq (8), we always use the 1995 reflectance spectras complied from the SFU hyperspectral dataset  as the input to produce the responses and based on Eq (1), which produces the matrix size of for both and . The procedure of learning S is shown in Fig. 5.
For the inter-CC, this matrix is utilized to transform both the image and illuminant rendered under CSS-1 into CSS-2. After this transformation, the CC model is then trained on this transformed data.
Finally, the trained model is tested on the images rendered under CSS-2 for the inter-CC, i.e., to obtain the illuminant according to
It is worth to stress that learning this matrix S is very easy and only needs very few publicly available reflectance, which is normally far less than the number of images required to train an intra-CC model for each camera. Moreover, we have experimentally found that utilizing a same learned matrix is enough to capture the transformation between two different CSSs for either Mondrian-like, hyperspectral, or real RGB dataset. Such attribute is very important since this indicates that the learned CSS adaptation matrix works independent of the scene. This makes our strategy widely applicable since we can use existing public surface reflectance samples (e.g., the 1995 reflectance spectra  used in this study) to learn a transformation matrix for each camera for later use, which can save much time and effort to prepare training set for new cameras (e.g., manually labeling numerous images under various environments and illuminants for each camera).
Iv Implementations and experiments
In this section, we first measure to what extent the effect of CSS has on the inter-CC performance with synthetic Mondrian and hyperspectral natural images. Then, the proposed method is comprehensively compared and validated on hyperspectral and real images coming from cameras. Finally, we exhibit several examples to show how a stable color representation of an image that is invariant to both the illuminant and CSS could be obtained with the proposed technique.
We select the early committe-based CC model (CBCC)  as the representative of the learning-based CC models to demonstrate the CSS effect. Basically, there are mainly two considerations when choosing CBCC as the example. On the one hand, CBCC can be almost taken as the earliest prototype of the regression-based CC models and thus many of the existing state-of-the-art learning-based CC models can be included into its framework [7, 8, 10, 23, 45, 51, 52]. On the another hand, recent work [7, 8, 57] has indicated that with much simpler implementation, CBCC can actually lead quite competitive performance in comparison to those more sophisticated learning-based CC methods by incorporating with more effective features (e.g., the color edge moments used in CM ). During our implementation, the outputs of the grey-world and grey-edge models were integrated into the framework of CBCC to train the regression model in the LMS sense . The “cross terms” employed in CM  that has been demonstrated to be very important for delivering the best CC result were also utilized in our implementation.
We also implement both the CM  and spectral sharpening (SS) techniques [29, 58] to test their performance on inter-CC for comparison. CM is one of the state-of-the-art CC algorithms which improves the CC performance by learning a fixed matrix to correct the biased illuminant estimates of some low level based CC algorithms. In our implementation, we use 9-edge-moments-based CM. The aim we compare with the CM is to show if it is more convenient to correct the CSSs before applying CC algorithms (proposed) than to correct the illuminant estimates after applying CC algorithms (CM).
Spectral sharpening was originally proposed to sharpen the CSS such that each sharpened CSS has its spectral sensitivity concentrated as much as possible within a narrow band of wavelengths. Hypothetically, we expect that the sharpened CSSs could be more similar to each other and thus can improve the inter-CC performance of CBCC. Specifically, for a dataset CSS-1, we first convert the original sensors to its sharpened version via a matrix multiplication and then learn the CBCC on the sharpened CSS-1. Then, given a second dataset CSS-2, we also perform spectral sharpening on the sensors of CSS-2 and finally test the performance of the learned CBCC of CSS-1 on the sharpened images of CSS-2.
Iv-a Validation on Mondrian-like images
In this experiment we used a dataset containing 1995 spectra of reflectances and 102 spectra of light sources compiled from several sources  for the generation of Mondrian-like images. These spectra of reflectances and light sources were carefully collected under both man-made and natural environments. To study the CSS effect, we used a recently published database [16, 15], which includes 12 sensors with various CSSs ranging from common consumer level camera to industrial camera (several CSSs are shown in Fig. 2). Moreover, the CIE color matching function  and standard sRGB function  are also included as a kind of CSS during the test of CSS effect on CC.
We generated the Mondrian-like datasets according to the model of image formation described by Eq (1). Specifically, the spectra of reflectance and illuminant  are first randomly selected and then integrated with CSS  over all the visible spectra to obtain R, G, B values. In practical computation, all spectrum curves are sampled and represented as vectors. Then, using these generated pixel colors, Mondrian-like images are created, and each image randomly contains up to tens of different surfaces, and hence many different transitions. The Gabor grating tuned brightness variations are also added in each Mondrian-like image for simulating as close as to the luminance of real-world images [10, 31, 40, 61]. Several examples are shown in Fig. 6 (a).
We totally synthesized 14 Mondrian image datasets based on each CSS of 14 sensors . In each dataset, there are totally 510 Mondrian-like images (with a size of 300*300 pixels) composited by random choices of illuminants and reflectances by using the aforementioned rules, and hence can simulate the situations of the multiple scenes with diverse illuminants. Thus, each dataset alone could be utilized to test the performance of intra-CC.
Moreover, among 14 Mondrian datasets, the scenes in each dataset are exactly same for all the cameras, but rendered by distinct CSSs. In other words, the only difference among the datasets is the CSS. Therefore, any pair of datasets arbitrarily selected from the Mondrian datasets could be used to accurately measure the CSS effect on the performance of inter-CC. For example, we trained CBCC on one Mondrian dataset that is specifically rendered under a certain CSS (e.g. CANON 5D Mark), and then tested the trained CBCC on another Mondrian dataset that is captured by a different CSS (e.g., NIKON D70).
Fig. 7 summarizes the averaged performance of intra-dataset-based cross validation of CBCC on various Mondrian datasets (each dataset is named based on the used camera type). Angular error is usually used to test the accuracy of a CC algorithm  by measuring the chromatic difference between the illuminant ground truth and the estimated illuminant by the CC algorithm.
Fig. 7 indicates that on all of the Mondrian datasets tested here, CBCC indeed obtains very good intra-CC performance. However, as discussed earlier, such evaluation is only suitable to fairly benchmark the performance of a CC method with the presence of multiple scenes with diverse illuminants. In contrast, for inter-dataset-based evaluation, CBCC suffers serious performance decreasing due to the CSS effects.
From the above evaluation on these 14 Mondrian datasets, we did not observe that there is CSS more effective than others for CC, and for the inter-dataset-based evaluation, we can observe based on the measurement of angular error in Fig. 7 that the more similar the two CSSs are (e.g., the CSS between camera CANON 5D Mark and CANON 5D in Fig. 7(e)), the less impact of CSS is on the accuracy of either intra- or inter-dataset-based CBCC. In contrast, the more distinct the two CSSs are (e.g., the CANON 5D Mark and SONY DXC 930 in Fig. 7(a)), the worse the performance of the original inter-dataset-based CBCC is. In short, the performance of the traditional inter-CC relies greatly on the similarity of CSSs among cameras. For example, when applying a model on a dataset that is trained on the images captured by a very distinct CSS (Fig. 7(c)), very bad inter-CC performance is achieved. It is worth to note that for real camera captured images, the degradation caused by CSS is more complicated than Mondrian situation. We will further discuss this point in the following experiments using real camera captured images.
Iv-B Validation on Hyperspectral images
While synthesizing Mondrian images according to Eq (1) is a pretty accurate first order model of image formation [8, 62], it is not able to model other reflective effect (e.g., specular component) and thus may not reflect the real color image formation process. Thus, we also utilized the hyperspectral natural images to measure the CSS effect on the performance of inter-CC. For this experiment we used a dataset containing 77 high quality hyperspectral images (with a size of 1392*1040 pixels) acquired in real indoor and outdoor scenes .
Fig. 6(b) shows several examples of hyperspectral images employed in this experiments. The subset of previous light source spectra dataset that contains 11 illuminants with both Planckian and non-Planckian  and the 14 CSSs mentioned above were employed to produce 14 hyperspectral datasets. Each hyperspectral dataset totally contains 847 hyperspectral natural images rendered under multiple illuminants for various investigations of the color appearance of real-world scenes. Similar to the situation of Mondrian scenes, among the 14 hyperspectral datasets, each one possesses identical distributions of reflectance and illuminant but exclusively rendered by distinct CSSs.
Fig. 8 shows the results of CBCC on both intra- and inter-dataset-based evaluation. The observations on these results are quite consistent with those obtained in the previous experiments with synthetic Mondrian datasets shown in Fig. 7. As a comparison to the experiments shown in Fig. 8, Fig. 9 shows more results with the methods of CM and SS for inter-dataset-based evaluation. Similar to CBCC, the-state-of-the-art CM and early SS also suffer the drastic decreasing of performance on inter-dataset-based evaluation. The reason why CM fails to achieve a good performance for inter-dataset-based evaluation is that similar to CBCC, CM just learns a fixed matrix for specific camera . In other words, CM is not able to adapt to other cameras. For SS, although we assume that the sharpened CSSs are more similar to each other and thus SS could improve the performance of inter-dataset-based CBCC, the actual performance is surprisingly poor since we observed that the sharpened CSSs include many negative values, which may finally result in the large angular error for inter-dataset-based CC. This problem may be alleviated when using technique of spectral sharpening with positivity .
Iv-C Improving the inter-dataset-based CC
In the above sections we have provided a systematic analysis of the CSS effect on the color appearance of images (Fig. 2) and the chromatic distribution of light source spectra (Fig. 1(b)), and shown how a CSS can significantly affect the performance of a learning-based CC algorithm on inter-dataset-based evaluation (Fig. 7, Fig. 8, and Fig. 9). In this section, we will show the improved performance of inter-CC by the proposed method. The improved performance of CBCC on both Mondrian and hyperspectral datasets are shown in Fig. 7, Fig. 8, and Fig. 9, respectively. We can observe that the performance of CBCC on inter-dataset-based evaluation is greatly boosted by the proposed strategy that includes the CSS effect during the model training. Surprisingly, by employing the proposed strategy, the improved performance of CBCC on inter-dataset-based evaluation almost reaches the same level as that of the CBCC on intra-dataset-based evaluation.
Iv-D More general situations
We have shown how to improve the inter-CC by the proposed method on datasets that have exactly the same reflectance distributions but with distinct CSSs (Fig. 7, Fig. 8, and Fig. 9). For practical application, however, the general situation is that the training set not only has distinct CSSs but also possesses with different reflectance distributions compared to the test set (e.g., the training dataset is captured under indoor environment, but the test dataset is collected under outdoor natural environment). In this section, we measure the CSS effect on such a more general situation.
Specifically, we trained the CBCC model on the Mondrian-like dataset and then tested it on the hyeprspectral natural dataset. We also trained the CBCC on the hyeprspectral dataset and tested it on a real RGB dataset [14, 65] (SFU lab dataset) that is captured by SONY DXC 930. In these experiments, we evaluated the CBCC under four situations including intra-dataset-based CC (intra-CBCC), inter-dataset-based CC with same CSSs (labeled as inter-CBCC-s), inter-dataset-based CC with distinct CSSs (labeled as inter-CBCC-d), and finally the improved model (proposed). Fig. 10 shows the results of intra-based CBCC on the Mondrian dataset and the results of inter-based CBCC by training on the Mondrian set and testing on the hyperspectral dataset. Similarly, Fig. 11 shows the results of intra-based CBCC on the hyperspectral dataset and the results of inter-based CBCC by training on the hyperspectral set and testing on the real SFU lab dataset.
In general, the experimental results for each situation behave like this: as is expected, CBCC obtains very good CC performance of intra-dataset-based evaluation on all the datasets rendered under any CSSs, illuminants and scenes. For the situation of inter-CC, intuitively, the greater the difference exists between the training and the test datasets, the worse the performance of CBCC obtains. For example, the performance of inter-based CBCC with same CSSs (inter-CBCC-s) is worse than its performance on the intra-CC (intra-CBCC), since the two datasets utilized for training and testing have different reflectance distributions (e.g., the scene structure in hyperspectral dataset is very different from the scene structure in SFU lab dataset). The worst performance of CBCC is obtained for the inter-CC with distinct CSSs (inter-CBCC-d) due to that there is not only the huge difference of reflectance distributions but also the huge difference of CCSs between the two datasets. This experiment once again demonstrates the adverse effect of CSS on the performance of a learning-based CC algorithm for the inter-CC application.
It should be noted that the improved performance of CBCC on inter-CC with different reflectance (inter-CBCC-s) does not arrive at the same performance level of the intra-CC as shown in Fig. 7, Fig. 8, and Fig. 9, which is mainly due to the huge difference of reflectance distribution between the two datasets evaluated here (e.g., the difference between the natural scene in hyperspectral dataset and the laboratory scene in SFU lab dataset). Nevertheless, the performance of CBCC on inter-CC with different CSSs (inter-CBCC-d) is greatly improved by the proposed strategy.
Iv-E Validation on real camera captured images
All the above experiments used the synthetic images or multispectral ones, which ignore all the non-linearities that occur in digital camera pipelines before the RAW image is saved. To validate the applicability of the proposed method in a real-world scenario, we finally tested our proposed method on NUS dataset , which includes 1853 high quality linear images taken by 9 different cameras in real environment. In addition, this dataset is composed of images of the same scene with the different cameras, such that the scene and illumination is the same for all the 9 cameras. This makes the dataset suitable to investigate the CSS effect on inter-CC under real camera situation. One example of a same scene taken by 5 difference cameras is shown in Fig. 6 (c). For unbiased evaluation, during experiments we masked out the color checker patch which is originally used to compute the illuminant ground truth of each image. Since SS performs very poorly in previous experiments (Fig. 9), we only compared the improved results of CBCC with CM.
Similar to what we have reported on the synthesized Mondrian-like and hyperspectral images, Fig. 12 shows the results on several NUS subsets captured with various cameras. As expected, CBCC and CM perform very well on intra-dataset-based evalutation. However, once being applied in inter-CC setup, the performance of both methods is clearly degraded due to the effect of CSS, and in general, the larger the difference between the CSSs is, the worse the inter-CC performs for CBCC and CM.
It should be noted that all the cameras utilized for capturing the NUS dataset have very similar CSSs since they are specifically chosen to satisfy the so-called Luther condition for CSS approximation . Hence, the results also indicate that although such a small difference exists among different CSSs (cameras), which indeed leads to the drastic performance decreasing for the existing learning-based CC models on inter-dataset-based evaluation. Thus, these experiments on the dataset from real cameras further ground our theoretical analysis in subsection II (C). In short, in order to develop a CC algorithm with good generalization ability, we need to reasonably consider the CSS effect.
We noticed that with the quite similar CSSs between two cameras for the NUS dataset, the inter-CC performance by CBCC and CM are still greatly degraded in comparison to the intra-CC performance (e.g., Fig. 12(e)). Before definite reason can be found to explain this phenomenon, we speculate that based on Eq (1), besides the CSS, there are other mixed factors determining the color appearance of a captured image, and consequently, the inter-CC performance should also be influenced at least by the distributions of illuminant and surface reflectance. Let us take Fig. 12(a) as an example. If we make a mutual change of the training and test sets, the performance of inter-CBCC varies markedly, as indicated by the green bars on the left and right parts of Fig. 12(a). Similar observations can also be found for the inter-CC evaluations on the Mondrian and hyperspectral datasets (Figs. 7-11
). This means that besides the difference between the CSSs, the inter-CBCC performance is affected determinately by the difference between the training and test sets. This may partially explain the poor inter-CC performance on two datasets even with the visual similarity of CCS. In fact, such phenomenon further emphasizes the difficulty of color constancy in real applications, where the difference between the inherent features of training and test images is uncontrollable.
In order to further determine whether the proposed method significantly improves the performance of CBCC and CM on inter-CC, we further utilized the Wilcoxon Signed-Rank Test (WST) to measure the performance difference between the inter-CC (e.g., inter-CBCC and inter-CM) and our method. WST has been recommended as an valuable tool for performance evaluation [3, 31, 67]. In this study, given any two different algorithms, the WST was run on their angular error distributions on the whole dataset, and its result was used to conclude that at a specific (e.g., 95%) confidence level, the angular error of one algorithm is often lower or higher than that of another algorithm, or there is no significant difference between them.
Fig. 13 reports the results of the statistical significance test using WST on all cameras. A sign () at location (i, j) indicates that the average angular error of method i is significantly lower than that of method j at the 95% confidence level, and a sign () at (i, j) indicates the opposite situation. A sign () means that the average angular errors of the two methods have no significant difference. Fig. 13 indicates that the performance of the proposed method exhibits significant improvement over both CBCC and CM in the most situations of inter-CC evaluation.
Iv-F Stable color representation of images
In previous sections, we have shown the improvement of inter-CC by incorporating CSS information. Here, we show examples of actual improvement on image appearance by further considering the CSS after CC. Even the two images in Fig. 1(b) have been corrected by CC, they still exhibit obviously different color appearance due to the difference between CSSs. In order to eliminate the chromatic difference induced by CSSs, the original color biased images taken by the two CSSs were respectively corrected by a CC algorithm (here by the CC method of grey-edge ), then a transformation by the learned matrix was further used to adapt the corrected image rendered under one CSS (e.g., KODAK DCS 460) to another CSS (e.g., CANON 5D Mark 2).
Fig. 14 shows examples selected from the two hyperspectral datasets [63, 68]. It is clear that the color cast on the image pairs due to the different CSSs and illuminants are well removed after applying a CC algorithm and then eliminating the CSS effect. For example, Fig. 14(a) shows obvious chromatic difference between the two images since they are rendered under distinct illuminants and CSSs. By applying CC on these two images, although the color bias induced by the illuminants are removed (Fig. 14(b)), these two images still present significant chromatic difference since they are rendered under two different CSSs. However, the chromatic difference of these two images is further eliminated after removing the CSS effect by the proposed method (Fig. 14(c)).
It should be noted that several CC methods have been proposed to impose stabilization on the resulting color corrected images through specific constraints. For example, in  Sapiro uses the probabilistic Hough transform to introduce the physical constraints that the corrected solution should accomplish; in  Finlayson et al. select the illuminant from a predefined subset that better correlates with a set of given reflectances, and in  Vazquez-Corral et al. select the illuminant that maximizes the number of psychophysically based focal colors presented in the corrected scene. However, the main aim of those constraints is to improve the accuracy of illuminant estimation instead of explicitly removing the color bias triggered by CSS. In other words, the results of these algorithms is still CSS dependent.
V Discussion and Conclusion
In this paper we demonstrated that CCS is a quite important factor that results in the serious degeneration of learning-based CC algorithms in inter-dataset-based application due to that both the illuminant and image are actually color domain dependent (camera dependent). It means that the existing CC algorithms only build very limited color correction models that are only suitable for a specific camera. Thus, once applying the trained CC model on the images collected by a camera with distinct CSS, the model will undoubtedly suffer serious problem. We have grounded our theoretical analysis on experiments with various datasets.
We also proposed a simple yet effective way to incorporate the CSS information into the training process through an adapted matrix, which builds a mapping between two CSSs. As a direct consequence of our strategy, we can model CSS effect when building a CC model. Comprehensive experiments on synthetic, hyperspectral, and real camera captured datasets have shown how the embedded CSS information can greatly improve the performance of learning-based CC algorithms on inter-dataset-based evaluation. Many existing state-of-the-art learning-based CC methods will benefit from our proposed technique to improve their generalization ability across multiple cameras.
As a practical application, we have also shown examples to demonstrate how a stable color representation of scene across the changes of illuminants and CSSs could be obtained by the proposed method. This is an important step towards the normalization of color appearance under different acquisitions with different devices.
The results presented in this work prove very useful for many color-based applications. Imagine we are facing a surveillance problem using a large scale camera network and we want to build a CC model for this camera network. For traditional implementation, we have to first capture a large group of images under diverse illuminants for each camera, and then train a CC model separately for each camera utilizing the corresponding manually labeled images and illuminants with this camera. This will not be practical in this large scale camera network, since manually labeling many images under various illuminants for each camera is very expensive and time-consuming. However, by employing the solution proposed in this work, it is enough to train a CC algorithm for any camera with the only information of different CSSs by just employing the images under various illuminants captured by only one camera, since we can adapt a CC algorithm between any two cameras by taking the CSSs into account.
Moreover, the recovered images by the proposed strategy (Fig. 14) are almost the truly color constant images since they are independent of both CSSs and illuminant, which would be very useful for further computer vision tasks that need accurate color characterization of object materials invariant to specific device characteristics (e.g., intrinsic image decomposition [26, 25]). Furthermore, we believe that for some widely used sensors, such stable corrections could even be incorporated in the standard libraries for wide use in any application that relies on accurate matching of surface appearances across different images.
In this work, we assume that the CSS utilized to calculate the adapted matrix between two CSSs is already known. This is practically applicable since such information is mostly publicly available for many camera sensors (e.g., http://www.dxomark.com/) , or we can estimate it using existing CSS estimation algorithms [16, 15, 69, 70, 66].
Another point that deserves a brief comment is that all the analysis in this work is true when we are dealing with RAW images. In case that the images are rendered to sRGB and post-processed with some nonlinear transformations built in the camera, the inter-dataset-based CC problem becomes more difficult since we will need to learn a transformation for each particular set of camera parameters (e.g., style look).
This work also initiates an open question that how to build a general CC model that could be applied for any cameras and situations. Our proposed method is good at dealing with the failure of inter-CC situation caused by different CSSs. However, during the experiments we also observed that there are other factors (e.g., scene content) which also affect the performance of inter-CC. Thus, it is very important to employ other useful information during building a CC model with higher efficiency and plasticity, which is our future work.
We would like to thank Dr. Dilip Kumar Prasad and Michael S. Brown for providing their CSSs of NUS dataset and Dr. Javier Vazquez-Corral for sharing the code of spectral sharpening. We would also thank the anonymous reviewer’s critical comments on the original version of our manuscript.
-  M. Vrhel, E. Saber, and H. J. Trussell, “Color image generation and display technologies,” IEEE Signal Processing Magazine, vol. 22, no. 1, 2005.
-  B. Funt, K. Barnard, and L. Martin, “Is machine colour constancy good enough?” in Computer Vision ECCV’98. Springer, 1998, pp. 445–459.
-  A. Gijsenij, T. Gevers, and J. Van De Weijer, “Computational color constancy: Survey and experiments,” Image Processing, IEEE Transactions on, vol. 20, no. 9, pp. 2475–2489, 2011.
-  D. H. Foster, “Color constancy,” Vision research, vol. 51, no. 7, pp. 674–700, 2011.
-  S. Gao, K. Yang, C. Li, and Y. Li, “A color constancy model with double-opponency mechanisms,” in Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013, pp. 929–936.
-  S. D. Hordley, “Scene illuminant estimation: past, present, and future,” Color Research and Application, vol. 31, no. 4, pp. 303–314, 2006.
D. Cheng, B. Price, S. Cohen, and M. S. Brown, “Effective learning-based
illuminant estimation using simple features,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1000–1008.
-  G. D. Finlayson, “Corrected-moment illuminant estimation,” in Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013, pp. 1904–1911.
-  H. R. V. Joze and M. S. Drew, “Exemplar-based color constancy and multiple illumination,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 36, no. 5, pp. 860–873, 2014.
-  A. Gijsenij and T. Gevers, “Color constancy using natural image statistics and scene semantics,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 33, no. 4, pp. 687–698, 2011.
-  S. Bianco and R. Schettini, “Adaptive color constancy using faces,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 36, no. 8, pp. 1505–1518, 2014.
-  S. Bianco, C. Cusano, and R. Schettini, “Color constancy using cnns,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 81–89.
-  B. Li, W. Xiong, W. Hu, B. Funt, and J. Xing, “Multi-cue illumination estimation via a tree-structured group joint sparse representation,” International Journal of Computer Vision, pp. 1–27, 2015.
-  K. Barnard, L. Martin, B. Funt, and A. Coath, “A data set for color research,” Color Research & Application, vol. 27, no. 3, pp. 147–151, 2002.
-  H. Zhao, R. Kawakami, R. T. Tan, and K. Ikeuchi, “Estimating basis functions for spectral sensitivity of digital cameras,” in Meeting on Image Recognition and Understanding, vol. 2009, no. 1, 2009.
-  R. Kawakami, H. Zhao, R. T. Tan, and K. Ikeuchi, “Camera spectral sensitivity and white balance estimation from sky images,” International Journal of Computer Vision, vol. 105, no. 3, pp. 187–204, 2013.
-  D. Cheng, D. K. Prasad, and M. S. Brown, “Illuminant estimation for color constancy: why spatial-domain methods work and the role of the color distribution,” JOSA A, vol. 31, no. 5, pp. 1049–1058, 2014.
-  F. Ciurea and B. Funt, “A large image database for color constancy research,” in Color and Imaging Conference, vol. 2003, no. 1. Society for Imaging Science and Technology, 2003, pp. 160–164.
-  P. V. Gehler, C. Rother, A. Blake, T. Minka, and T. Sharp, “Bayesian color constancy revisited,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.
-  B. Funt and L. Shi, “The rehabilitation of maxrgb,” in Color and Imaging Conference, vol. 2010, no. 1. Society for Imaging Science and Technology, 2010, pp. 256–259.
-  J. Vazquez-Corral, C. Párraga, R. Baldrich, and M. Vanrell, “Color constancy algorithms: Psychophysical evaluation on a new dataset,” Journal of Imaging Science and Technology, vol. 53, no. 3, pp. 31 105–1, 2009.
-  V. C. Cardei, B. Funt, and K. Barnard, “Estimating the scene illumination chromaticity by using a neural network,” JOSA A, vol. 19, no. 12, pp. 2374–2386, 2002.
-  B. Funt and W. Xiong, “Estimating illumination chromaticity via support vector regression,” in Color and Imaging Conference, vol. 2004, no. 1. Society for Imaging Science and Technology, 2004, pp. 47–52.
-  L. Shi, W. Xiong, and B. Funt, “Illumination estimation via thin-plate spline interpolation,” JOSA A, vol. 28, no. 5, pp. 940–948, 2011.
-  M. Serra, O. Penacchio, R. Benavente, M. Vanrell, and D. Samaras, “The photometry of intrinsic images,” in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014, pp. 1494–1501.
-  H. Barrow and J. Tenenbaum, “Recovering intrinsic scene characteristics from images. 1978.”
-  J. T. Barron and J. Malik, “Shape, illumination, and reflectance from shading,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 8, pp. 1670–1687, 2015.
-  G. D. Finlayson, M. S. Drew, and B. V. Funt, “Color constancy: generalized diagonal transforms suffice,” JOSA A, vol. 11, no. 11, pp. 3011–3019, 1994.
-  ——, “Spectral sharpening: sensor transformations for improved color constancy,” JOSA A, vol. 11, no. 5, pp. 1553–1563, 1994.
-  S. Gao, W. Han, K. Yang, C. Li, and Y. Li, “Efficient color constancy with local surface reflectance statistics,” in Computer Vision–ECCV 2014. Springer, 2014, pp. 158–173.
-  S.-B. Gao, K.-F. Yang, C.-Y. Li, and Y.-J. Li, “Color constancy using double-opponency,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 10, pp. 1973–1985, 2015.
-  K.-F. Yang, S.-B. Gao, Y.-J. Li, and Y. Li, “Efficient illuminant estimation for color constancy using grey pixels,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2254–2263.
-  M. Ebner, “Color constancy based on local space average color,” Machine Vision and Applications, vol. 20, no. 5, pp. 283–301, 2009.
-  G. D. Finlayson and S. D. Hordley, “Color constancy at a pixel,” JOSA A, vol. 18, no. 2, pp. 253–264, 2001.
-  G. Buchsbaum, “A spatial processor model for object colour perception,” journal of the Franklin institute, vol. 310, no. 1, pp. 1–26, 1980.
-  E. H. Land and J. McCann, “Lightness and retinex theory,” JOSA, vol. 61, no. 1, pp. 1–11, 1971.
-  J. Van De Weijer, T. Gevers, and A. Gijsenij, “Edge-based color constancy,” Image Processing, IEEE Transactions on, vol. 16, no. 9, pp. 2207–2214, 2007.
-  G. D. Finlayson and E. Trezzi, “Shades of gray and colour constancy,” in Color and Imaging Conference, vol. 2004, no. 1. Society for Imaging Science and Technology, 2004, pp. 37–41.
-  D. A. Forsyth, “A novel algorithm for color constancy,” International Journal of Computer Vision, vol. 5, no. 1, pp. 5–35, 1990.
-  A. Gijsenij, T. Gevers, and J. Van De Weijer, “Generalized gamut mapping using image derivative structures for color constancy,” International Journal of Computer Vision, vol. 86, no. 2-3, pp. 127–139, 2010.
-  G. D. Finlayson, S. D. Hordley, and P. M. Hubel, “Color by correlation: A simple, unifying framework for color constancy,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, no. 11, pp. 1209–1221, 2001.
-  D. H. Brainard and W. T. Freeman, “Bayesian color constancy,” JOSA A, vol. 14, no. 7, pp. 1393–1411, 1997.
-  A. Chakrabarti, K. Hirakawa, and T. Zickler, “Color constancy with spatio-spectral statistics,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 34, no. 8, pp. 1509–1519, 2012.
-  C. Rosenberg, A. Ladsariya, and T. Minka, “Bayesian color constancy with non-gaussian models,” in Advances in neural information processing systems, 2003, pp. 1–8.
-  V. Agarwal, A. V. Gribok, A. Koschan, M. Abidi et al., “Estimating illumination chromaticity via kernel regression,” in Image Processing, 2006 IEEE International Conference on. IEEE, 2006, pp. 981–984.
-  V. Agarwal, A. Gribok, A. Koschan, B. Abidi, and M. Abidi, “Illumination chromaticity estimation using linear learning methods,” Journal of Pattern Recognition Research, vol. 4, no. 1, pp. 92–109, 2009.
-  S. Bianco, G. Ciocca, C. Cusano, and R. Schettini, “Automatic color constancy algorithm selection and combination,” Pattern recognition, vol. 43, no. 3, pp. 695–705, 2010.
-  G. Schaefer, S. Hordley, and G. Finlayson, “A combined physical and statistical approach to colour constancy,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. IEEE, 2005, pp. 148–153.
-  V. C. Cardei and B. Funt, “Committee-based color constancy,” in Color and Imaging Conference, vol. 1999, no. 1. Society for Imaging Science and Technology, 1999, pp. 311–313.
-  S. Bianco, F. Gasparini, and R. Schettini, “Consensus-based framework for illuminant chromaticity estimation,” Journal of Electronic Imaging, vol. 17, no. 2, pp. 023 013–023 013, 2008.
-  S. Bianco, G. Ciocca, C. Cusano, and R. Schettini, “Improving color constancy using indoor–outdoor image classification,” Image Processing, IEEE Transactions on, vol. 17, no. 12, pp. 2381–2392, 2008.
-  J. Van De Weijer, C. Schmid, and J. Verbeek, “Using high-level visual information for color constancy,” in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on. IEEE, 2007, pp. 1–8.
-  G. Sapiro, “Color and illuminant voting,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, no. 11, pp. 1210–1215, 1999.
-  R. Lu, A. Gijsenij, T. Gevers, V. Nedović, D. Xu, and J.-M. Geusebroek, “Color constancy using 3d scene geometry,” in Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009, pp. 1749–1756.
-  J. Vazquez-Corral, M. Vanrell, R. Baldrich, and F. Tous, “Color constancy by category correlation,” Image Processing, IEEE Transactions on, vol. 21, no. 4, pp. 1997–2007, 2012.
-  P. M. Hubel, J. Holm, G. D. Finlayson, M. S. Drew et al., “Matrix calculations for digital photography.” in Color Imaging Conference, 1997, pp. 105–111.
-  N. Banic and S. Loncaric, “Color dog: Guiding the global illumination estimation to better accuracy,” in International Conference on Computer Vision Theory and Applications, vol. 6, 2015.
-  J. Vazquez-Corral and M. Bertalmío, “Spectral sharpening of color sensors: Diagonal color constancy and beyond,” Sensors, vol. 14, no. 3, pp. 3965–3985, 2014.
-  G. Wyszecki and W. S. Stiles, Color science. Wiley New York, 1982, vol. 8.
-  I. E. Commission et al., “Multimedia systems and equipment–colour measurement and management–part 2-1: Colour management–default rgb colour space–srgb,” IEC 61966-2-1, Tech. Rep., 1999.
-  K. Barnard, V. Cardei, and B. Funt, “A comparison of computational color constancy algorithms. i: Methodology and experiments with synthesized data,” Image Processing, IEEE Transactions on, vol. 11, no. 9, pp. 972–984, 2002.
-  B. Wandell et al., “The synthesis and analysis of color images,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, no. 1, pp. 2–13, 1987.
-  A. Chakrabarti and T. Zickler, “Statistics of real-world hyperspectral images,” in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011, pp. 193–200.
-  M. S. Drew and G. D. Finlayson, “Spectral sharpening with positivity,” JOSA A, vol. 17, no. 8, pp. 1361–1370, 2000.
-  K. Barnard, L. Martin, A. Coath, and B. Fun, “A comparison of computational color constancy algorithms. ii. experiments with image data,” Image Processing, IEEE Transactions on, vol. 11, no. 9, pp. 985–996, 2002.
-  D. K. Prasad, R. Nguyen, and M. S. Brown, “Quick approximation of camera’s spectral response from casual lighting,” in Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on. IEEE, 2013, pp. 844–851.
-  S. D. Hordley and G. D. Finlayson, “Reevaluation of color constancy algorithm performance,” JOSA A, vol. 23, no. 5, pp. 1008–1020, 2006.
-  S. Nascimento, F. P. Ferreira, and D. H. Foster, “Statistics of spatial cone-excitation ratios in natural scenes,” JOSA A, vol. 19, no. 8, pp. 1484–1490, 2002.
-  J. Jiang, D. Liu, J. Gu, and S. Susstrunk, “What is the space of spectral sensitivity functions for digital color cameras?” in Applications of Computer Vision (WACV), 2013 IEEE Workshop on. IEEE, 2013, pp. 168–179.
-  K. Barnard and B. Funt, “Camera characterization for color research,” COLOR research and application, vol. 27, no. 3, pp. 153–164, 2002.