I Introduction
Ground based sky imaging (GBSI) systems have become very popular these days for the task of making cloud observations. For instance GBSI systems are employed extensively for predicting intermittency due to clouds in the field of intrahour solar power forecasting [1, 2, 3]. The Whole Sky Imager (WSI) developed by the Scripps Institution of Oceanography at the University of California, San Diego [4] and the Total Sky Imager (TSI) developed by Yankee Environmental Systems, Inc. [5] are two popular GBSI systems. We [3] like many other research groups around the world [5, 6, 7, 8] have developed our own GBSI system. But unlike most other GBSI systems which are stationary and use either an upward looking camera fitted with a fish eye lens or a camera looking down on to a curved mirror to obtain a complete view of the sky, we use a low cost camera fitted with a fish eye lens which tracks the sun. Higher resolution around the sun (which is preferred since we care about predicting solar power intermittency due to clouds) is one of the advantages of this set up. Since it tracks the sun the position of the sun on the image remains constant and this makes occluding the sun (to prevent saturation of the image) easier. Also the relative area occupied by this occlusion is small in comparison to other imaging systems which is another advantage of this set up.
Our main goal is to predict intermittency due to clouds for which cloud detection is an important step. One of the earliest works used an empirically derived fixed threshold on the red to blue channel (RB) ratio for cloud detection [9, 10, 4]. Another fixed thresholding scheme was suggested by SouzaEcher et al. where they apply a fixed threshold on the saturation component of the IHS colorspace [11]. Neto et al.[12] used multidimensional Euclidean geometric distance (EGD) and Bayesian methods to classify sky and cloud patterns based on the observation that sky and cloud patterns occupy different loci on the RGB colorspace. Recently Ghonima et al. [13] proposed the use of the difference in RB ratio between the pixel to be classified and the corresponding pixel in the clear sky library for cloud opacity measurements. They described a method to compensate for the variations in aerosol optical depth by using a haze correction factor derived with the help of a clear sky library. Yamashita et. al [14] and Li et al. [15] proposed the use of normalized blue to red channel (NBR) ratio (Yamashita et al. called it sky index) for cloud classification and detection. NBR ratio is defined as
where and
represents the blue and red channel intensities of the pixel respectively. Li et al. also proposed a hybrid algorithm combining fixed and adaptive thresholding schemes. They apply a threshold on the standard deviation of NBR ratio to decide whether to use a fixed thresholding scheme or an adaptive thresholding scheme. If the standard deviation of NBR ratio is below a threshold a fixed threshold determined statistically from training data is used for cloud detection. Otherwise an adaptive thresholding algorithm based on cross entropy minimization is applied to obtain the threshold for the image.
]]. In [16] we showed that NSV ratio could serve as a contrast enhancing feature suitable for adaptive thresholding. The NSV ratio is defined aswhere
is the value component and is the saturation component in the HSV colorspace. As discussed in [16, 15] and as depicted in the Fig. 1 there is considerable overlap between cloud and sky pixels in the various feature spaces. Please note that unlike in [16] the histograms shown here corresponds to a set of images used as a training set and not of a single image. These histograms depict the problem associated with a fixed thresholding scheme. Whatever be the threshold that is picked it will inevitably misclassify some pixels.
On the other hand in the case of adaptive thresholding it is only necessary to ensure that the classes do not overlap in the feature space on an image by image basis. In [16]
though we showed that the NSV ratio provides a strong separation between the two classes (cloud and sky) in the feature space for many images, we also mentioned that this is not always true. The dependency of the NSV ratio on the value component of the pixels (in the HSV space) causes the NSV ratio for some dark clouds to overlap with that of sky pixels. We also showed that an adaptive thresholding scheme like cross entropy minimization might pick a wrong threshold if the intraclass variance is high, which is a likely scenario in the case of sky images. Minimum cross entropy thresholding and Otsu’s thresholding can be viewed as a surface fitting problem, and the best fit may not always correspond to the best segmentation
[16].To overcome these issues, we propose a cloud detection scheme based on the CRF framework. We explain how this model addresses each of the issues mentioned above. Quantitative and qualitative results are presented, and possible extensions to this work are discussed towards the end.
Ii Proposed Method
Probabilistic graphical models like Markov random fields (MRF) and conditional random fields (CRF) have been extensively applied for the task of contextual image segmentation [17, 18, 19, 20, 21, 22, 23]. Li et al. [20]
proposed an MRF model for detecting thin clouds. Though MRF models have proven to be successful for computer vision tasks like segmentation and image denoising, they have some drawbacks. Being a generative model, MRF based approaches model the joint density
, where represents the input image data, where is the set of all sites. A site can correspond to a single pixel or a group of pixels (region). And , represents the class labels corresponding to the sites (y is often referred to as a label configuration) andis the set of all possible labels/classes. In the MAP (maximum a posteriori) MRF framework, Bayes’ rule is employed to derive the posterior probability of the labels given the data:
In many cases such as in the case of cloud detection x is always observed (i.e. during training as well as while performing inference or classifying) and hence it is not necessary to model which might not be a simple function. Another drawback of MRF modeling is that often for tractability the likelihood is assumed to have a factorized form, i.e. = . This assumption is too restrictive as complex relationships usually exist between data at neighboring sites. And finally the MRF model imposes label consistency (modeled as the prior ) uniformly over the entire image without taking into consideration the observed data.
CRF models, on the other hand, are discriminative models which directly model , the probability of a label configuration given the data. The CRF model therefore does not attempt to model the distribution of input data . The conditional independence assumption of the likelihood model given the labels is relaxed in the case of CRFs. CRFs also allow us to model datadependent label interactions as the clique potentials, or the interaction potentials are functions of both the labels and the data. This property, as we will see, plays a crucial role in cloud detection. Due to the abovementioned factors, CFRs outperform MRF models for various computer vision tasks including image segmentation [24, 22].
Iia CRF Model
As mentioned before, we combine a discriminative classifier and a higher order clique potential in a CRF framework. Again represents the input image data, where is the set of all sites. We divide the image into homogeneous regions using mean shift clustering [25], hence each region defines a site. The labels corresponding to the sites are , with where 0 is the label for sky and 1 is the label for cloud. Following the discriminative random field approach [22], assuming y to obey the Markov property conditioned on the data x (i.e., , where represents the neighborhood of site ), and using the Hammersley Clifford theorem [26] the posterior distribution of the label configuration is defined as
(1) 
where
(2) 
(3) 
and
(4) 
and represent the NBR ratio and NSV ratio at site , respectively; are the parameters that need to be estimated. is the partition function and serves as a normalization factor. It is computed by summing the model over all possible configurations of y.
Both the NBR and NSV ratios are utilized in our CRF model. As can be seen in Fig. 2, the class conditional density for cloud is almost one for all NBR values below 0.11, and it is almost zero for all values above 0.2. It is mainly in the range [0.11, 0.20] that reliance on NBR alone would result in misclassification, and outside this range we can make cloud detection decisions almost unambiguously. Thus, there are a considerable number of regions which we can classify as cloud or sky with very high confidence. And for the remaining regions where there are ambiguities, we can look at the spatial context to make decisions. As we showed in [16] the NSV ratio exhibits enhanced contrast between cloud and sky, so it could serve as an effective, local feature for testing spatial consistency. But due to the dependency of this ratio on the value component of the HSV space of the image, it makes more sense to use the NBR ratio as a global feature.
This idea can be modeled very well using a CRF framework. The function
is nothing but a logistic regression classifier. This classifier gives a well calibrated probability value indicating whether the site under consideration is cloud or sky just based on the NBR ratio at that site. As shown in Fig. 2, the logistic function derived from the training data is a smoother version of the class conditional density for the cloud. In the DRF framework terminology
[22] the logistic function is the association potential. Now in order to take the spatial context into account, we define the interaction potential function . Before defining the interaction potential we need to define the neighborhood for the site . In our case, we define as all regions which are within a 200pixel radius of the centroid of site (the site is not part of the neighborhood). It is not necessary that a region be entirely within this radius to be considered a neighbor; even if it is partially within this range, it will be considered a neighbor. The interaction potential can now be understood with the help of Fig. 3. and are the average NSV values of sky neighbors and cloud neighbors, respectively. Clouds usually have higher NSV values in comparison to sky regions, hence . The interaction potential has been defined in such a manner that and serve as reference points. I.e., the farther the NSV ratio of a site is to the left of the higher the probability that it is a sky region, and the farther it is to the right of the higher the probability that it is cloud. If the NSV ratio of a site is in between and , then the interaction potential is negative, and the magnitude will depend on its proximity to the two reference points.As stated earlier, the mean shift clustering algorithm [25] is used to form homogeneous regions in an image. The parameters including range bandwidth, color bandwidth, minimum number of pixels in a region, etc., are set manually to produce the best qualitative results. Defining the CRF model over regions instead of pixels has two main advantages. First, it would help in speeding up the inference algorithm (the algorithm used to find the most probable labeling configuration given a new image) and secondly it will help combat noise as we will demonstrate in the results section.
IiB Parameter Estimation and Inference
The parameters in CRF are usually estimated using maximum likelihood estimation (MLE) [27, 28, 24]
. MLE finds the parameters that maximize the conditional likelihood of the true labels given the training data. Stochastic gradient descent (SGD) is usually employed to find the MLE estimates
[24]. In SGD the partial derivatives with respect to each parameter are evaluated. These partial derivatives involve the computation of the expected value of the feature function (for e.g. interaction potential in our case) over all possible label configurations [27, 28]. Since this computation is intractable, the expectation is approximated by coming up with an approximation for . Here, represents the parameter set {}. Note thatis the probability density over all possible label configurations given an image and the parameters, and it needs to be calculated for each image. Loopy belief propagation, Markov chain Monte Carlo (MCMC) and Contrastive divergence are some methods used for approximate training
[27, 28]. Being approximations, these methods may not estimate the parameters well [21]. Furthermore, for methods like belief propagation, the time complexity is exponential in the size of the largest clique. And in our case, since the CRF model involves higher order cliques, the largest clique might have a size of 80, hence loopy belief propagation cannot be employed.An alternative, practical approach is piecewise training [29, 21, 23]. In piecewise training each piece of the CRF model is learned independently. In our case, parameters for association potential and interaction potential are learned independently. The training methodology is similar to that of [23]. Our training data set consists of eight manually labeled images. We first estimate the association potential parameters and using four images from the training set. We use the R project for statistical computing [30] to estimate the parameters. The glm function in the stats core package is used for this purpose. This function uses the iteratively reweighted least squares (IWLS) method to obtain the fit. Once the parameters for association potential are estimated, we estimate the parameter for the interaction potential by keeping the association potential parameters constant. The second half of the training dataset (four images) which was not used to train the association potential is used for this purpose. The interaction potential parameter, , which minimized the pixel wise classification error (i.e. the total number of misclassified sky pixels and cloud pixels) was picked.
ATS  ATS  CRF  
(NSV ratio)  (NBR ratio)  
Accuracy  0.7979 0.0963  0.7173 0.1654  0.9346 0.0269 
Precision  0.7782 0.1552  0.7324 0.1954  0.9561 0.0560 
Recall  0.9047 0.0838  0.9022 0.1196  0.9022 0.0471 
Accuracy, Precision and Recall Values with 99.9% Confidence Interval for Adaptive Thresholding Schemes (ATS) and CRF Based Method
Fixed Thresholding  CRF  

Accuracy  0.8892 0.0359  0.9436 0.0181 
Precision  0.9420 0.0378  0.9597 0.0341 
Recall  0.8236 0.0540  0.9095 0.0309 
Once we have estimated all the parameters, we need an inference algorithm which would find the most probable labeling configuration given an image (data) and the parameters. I.e., we want to find such that
Due to the intractability of exact inference , we use the local search alg, iterative conditional modes (ICM), proposed by Besag [31]. ICM is an iterative procedure that maximizes the local conditional probability. For each site in an image we find
This procedure is iterated until convergence. ICM requires an initial labeling configuration to begin with. The output of the logistic regression classifier is used as the initial guess.
During training and inference, there may be situations where a particular site has only one class of sites around it, such that all its neighbors are cloud or all are sky. In such scenarios we use the average sky and cloud NSV values of the entire image instead of the corresponding values of the neighbors. In order to calculate the average sky and cloud NSV values of the entire image, the labeling configuration from the previous ICM iteration is used.
Iii Results
In order to evaluate the performance of the proposed method, two separate test datasets were used. One of these datasets (Set C) contains 22 images and was used to compare the performance of the CRFbased method with the minimum cross entropy method proposed by Li et al.[15]. Li’s method uses a hybrid scheme combining fixed thresholding and adaptive thresholding. The decision to use an adaptive threshold or fixed threshold is based on the standard deviation of the NBR ratio. But we observed that this decision rule does not always work well because some clear sky and completely cloudy images have a standard deviation greater than that of some images containing both sky and cloud. Hence, for a fair comparison we have only included images containing both sky and cloud in the first dataset. The second dataset (Set D) contains 26 images in total and was used to compare the performance of the proposed CRF method with that of fixed thresholding proposed by Long et al. [5]. This data set contains all the 22 images in the first dataset with an addition of two completely cloudy and two clear sky images.
Replace[[
Accuracy, precision and recall computed based on the confusion matrix are employed as metrics for performance evaluation
[32]. The metrics are defined below:Here, TP (true positive) is the number of cloud pixels classified correctly, TN (true negative) is the number of sky pixels classified correctly, FP (false positive) is the number of cloud pixels that got misclassified as sky pixels and FN (false negative) is the number of sky pixels which got misclassified as cloud pixels.]]
Table I shows the results for two adaptive thresholding schemes and CRF method using Set C. Both adaptive thresholding schemes are based on cross entropy minimization, the difference being that one uses NSV as the feature while the other uses NBR. The accuracy of the CRFbased method is much higher in comparison with the other two methods, and the accuracy values do not overlap even at 99.9% confidence. The NSV thresholding performed better than NBR thresholding. The reason for the poor performance of adaptive thresholding is that the best fit does not always correspond to the best segmentation/classification , especially when there is high intraclass variance. For instance, as shown in Fig. 4, the sky region at the top portion of the image is dark in comparison to the sky region at the bottom of the image. And when the adaptive thresholding algorithm tries to fit a binary surface that minimizes the cross entropy, it ends up misclassifying the entire bottom portion of the image.
The results of comparing the performance of our method with the fixed thresholding scheme is provided in Table II. Though the fixed thresholding scheme does better than adaptive thresholding, the CRF method provides considerable improvement in accuracy. The accuracy values do not overlap at 98% confidence level, so we can be 98% confident that the CRF method is better. Here, taking the spatial context into consideration leads to better accuracy. Fig. 5 shows that the inclusion of interaction potential helps in correcting some mistakes made in segmentation using the association potential alone. Looking at the local context is really helpful in regions where the histograms of sky and cloud regions overlap in the feature space.
Forming regions using mean shift segmentation and then performing regionbased classification helps in dealing with noise. Fig. 6 shows that using logistic regression (association potential) to classify the image on a pixelbypixel basis yields noisier results than classifying regions formed by mean shift segmentation. Hence, defining the CRF model over regions not only speeds up computation but also helps in combating noise. However, in cases where ta cloud has an unusual shape, segmentation may fail to find the proper regions. But for most cases, a region based approach works well. Fig. 7 shows the segmentation results for some images obtained using our CRF method.
Iv Conclusion
We have presented a CRF model for cloud detection on groundbased sky images which outperforms Li’s adaptive thresholding algorithm and Long’s fixed thresholding algorithm on our images. Though our model uses only two features (NSV ratio and NBR ratio) it is very flexible, and more features (if found useful) can be added very easily. For instance, adding more features into the association potential is trivial. Similarly, the interaction potential could be replaced by one or more features that may help take the local context into account in a better way. Since each imaging system is different, it is possible to find features different from what we have used which are better suited for different imaging setups. But our aim here is to demonstrate that the idea of combining a discriminative classifier with a higherorder clique potential in a CRF framework is a really powerful scheme for cloud detection.
Acknowledgment
We would like to acknowledge Tucson Electric Power (TEP), The Arizona Research Institute for Solar Energy Research (AZRISE) and Arizona TRIF program for supporting this work.
References
 [1] C. W. Chow, B. Urquhart, M. Lave, A. Dominguez, J. Kleissl, J. Shields, and B. Washom, “Intrahour forecasting with a total sky imager at the UC San Diego solar energy testbed,” Solar Energy, vol. 85, no. 11, pp. 2881–2893, 2011.
 [2] R. Marquez and C. F. Coimbra, “Intrahour DNI forecasting based on cloud tracking image analysis,” Solar Energy, vol. 91, no. 0, pp. 327 – 336, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0038092X1200343X
 [3] V. T. Jayadevan, J. J. Rodriguez, V. P. Lonij, and A. D. Cronin, “Forecasting solar power intermittency using groundbased cloud imaging,” in Proc. 41st ASES Annual Conference, vol. 3, 2012, pp. 2100–2106.
 [4] J. Shields, M. Karr, T. Tooman, D. Sowle, and S. Moore, “The whole sky imager–a year of progress,” in Eighth Atmospheric Radiation Measurement (ARM) Science Team Meeting, Tucson, Arizona, 1998, pp. 23–27.
 [5] C. N. Long, J. Sabburg, J. Calbó, and D. Pages, “Retrieving cloud characteristics from groundbased daytime color allsky images,” Journal of Atmospheric and Oceanic Technology, vol. 23, no. 5, pp. 633–652, 2006.
 [6] G. Pfister, R. McKenzie, J. Liley, A. Thomas, B. Forgan, and C. N. Long, “Cloud coverage based on allsky imaging and its impact on surface solar irradiance,” Journal of Applied Meteorology, vol. 42, no. 10, pp. 1421–1434, 2003.
 [7] A. Cazorla, F. Olmo, and L. AladosArboledas, “Development of a sky imager for cloud cover assessment,” JOSA A, vol. 25, no. 1, pp. 29–39, 2008.
 [8] G. Seiz, J. Shields, U. Feister, E. Baltsavias, and A. Gruen, “Cloud mapping with groundbased photogrammetric cameras,” International Journal of Remote Sensing, vol. 28, no. 9, pp. 2001–2032, 2007.
 [9] R. W. Johnson, W. S. Hering, and J. E. Shields, “Automated visibility & cloud cover measurements with a solid state imaging system,” DTIC Document, Tech. Rep., 1989.
 [10] J. Shields, R. Johnson, and T. Koehler, “Automated whole sky imaging systems for cloud field assessment,” in Fourth Symposium on Global Change Studies of the American Meteorological Society, Boston, Massachusetts, 1993.
 [11] M. SouzaEcher, E. Pereira, L. Bins, and M. Andrade, “A simple method for the assessment of the cloud cover state in highlatitude regions by a groundbased digital camera,” Journal of Atmospheric and Oceanic Technology, vol. 23, no. 3, pp. 437–447, 2006.
 [12] S. L. Mantelli Neto, A. von Wangenheim, E. B. Pereira, and E. Comunello, “The use of euclidean geometric distance on rgb color space for the classification of sky and cloud patterns,” Journal of Atmospheric and Oceanic Technology, vol. 27, no. 9, pp. 1504–1517, 2010.
 [13] M. Ghonima, B. Urquhart, C. Chow, J. Shields, A. Cazorla, and J. Kleissl, “A method for cloud detection and opacity classification based on ground based sky imagery,” Atmospheric Measurement Techniques Discussions, vol. 5, no. 4, pp. 4535–4569, 2012.
 [14] M. Yamashita, M. Yoshimura, and T. Nakashizuka, “Cloud cover estimation using multitemporal hemisphere imageries,” in Proc. XXth Congress of the International Society for Photogrammetry and Remote Sensing (ISPRS04). Citeseer, 2004, pp. 818–821.
 [15] Q. Li, W. Lu, and J. Yang, “A hybrid thresholding algorithm for cloud detection on groundbased color images,” Journal of atmospheric and oceanic technology, vol. 28, no. 10, pp. 1286–1296, 2011.
 [16] V. T. Jayadevan, J. J. Rodriguez, and A. D. Cronin, “A new contrast enhancing feature for cloud detection in ground based sky images,” Journal of atmospheric and oceanic technology, Submitted. [Online]. Available: http://uapv.physics.arizona.edu/Publications/Submitted/NewRatio.pdf
 [17] D. Panjwani and G. Healey, “Markov random field models for unsupervised segmentation of textured color images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 10, pp. 939–954, 1995.
 [18] K. Held, E. Kops, B. Krause, W. Wells, R. Kikinis, and H. MullerGartner, “Markov random field segmentation of brain mr images,” IEEE Transactions on Medical Imaging, vol. 16, no. 6, pp. 878–886, 1997.
 [19] F. Melgani and S. Serpico, “A markov random field approach to spatiotemporal contextual image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 11, pp. 2478–2487, 2003.
 [20] Q. Li, W. Lu, J. Yang, and J. Z. Wang, “Thin cloud detection of allsky images using markov random fields,” IEEE Geoscience and Remote Sensing Letters, vol. 9, no. 3, pp. 417–421, 2012.
 [21] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost: Joint appearance, shape and context modeling for multiclass object recognition and segmentation,” in Computer Vision ECCV 2006, ser. Lecture Notes in Computer Science, A. Leonardis, H. Bischof, and A. Pinz, Eds. Springer Berlin Heidelberg, 2006, vol. 3951, pp. 1–15.
 [22] S. Kumar and M. Hebert, “Discriminative random fields: a discriminative framework for contextual interaction in classification,” in Proceedings. Ninth IEEE International Conference on Computer Vision, 2003, pp. 1150–1157 vol.2.
 [23] P. Kohli, L. Ladick , and P. Torr, “Robust higher order potentials for enforcing label consistency,” International Journal of Computer Vision, vol. 82, no. 3, pp. 302–324, 2009.

[24]
X. He, R. Zemel, and M. CarreiraPerpindn, “Multiscale conditional random
fields for image labeling,” in
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, vol. 2, 2004, pp. II–695–II–702 Vol.2.  [25] D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2002.
 [26] S. Z. Li, Markov random field modeling in image analysis. Springer, 2009.
 [27] C. Elkan, “Loglinear models and conditional random fields,” Tutorial notes at CIKM, vol. 8, 2008.
 [28] C. Sutton and A. McCallum, “An introduction to conditional random fields for relational learning,” Introduction to statistical relational learning, vol. 93, pp. 142–146, 2007.

[29]
——, “Piecewise training for undirected models,” in
Proceedings of the TwentyFirst Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI05)
. AUAI Press, 2005, pp. 568–575.  [30] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2012, ISBN 3900051070. [Online]. Available: http://www.Rproject.org/
 [31] J. Besag, “On the statistical analysis of dirty pictures,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 259–302, 1986.
 [32] R. Kohavi and F. Provost, “Glossary of terms,” Machine Learning, vol. 30, no. 23, pp. 271–274, 1998.