Abstract
Design of a fuzzy rule based classifier is proposed. The performance of the classifier for multispectral satellite image classification is improved using DempsterShafer theory of evidence that exploits information of the neighboring pixels. The classifiers are tested rigorously with two known images and their performance are found to be better than the results available in the literature. We also demonstrate the improvement of performance while using DS theory along with fuzzy rule based classifiers over the basic fuzzy rule based classifiers for all the test cases.
1 . Introduction
Analysis of satellite images has many important applications such as prediction of storm and rainfall, estimation of natural resources, estimation of crop yields, assessment of damage caused by natural disasters, and land cover classification. In this paper we focus on land cover classification from multispectral satellite images.
The most widely used techniques for this problem employ discriminant analysis, maximum likelihood classification, and neural networks
[6], [3]. Such classifiers cannot handle the fact that for land cover a pixel may correspond to more than one types of objects. For example, the area covered by a pixel may correspond to 30% land and 70% water. Note that, the uncertainty involved in classifying such a pixel is not probabilistic, but fuzzy in nature and thereby it demands “soft” classifiers. In developing soft classifiers for land cover analysis two approaches have gained popularity. These are based on (1)fuzzy set theory and (2) Dempster and Shafer’s (DS) evidence theory [7].Numerous fuzzy classification techniques have been developed by many researchers to solve problems in diverse fields. A comprehensive account of such works can be found in [2]
. Fuzzy rules are attractive because they are interpretable and provides an analyst a deeper insight into the problem. Use of fuzzy rule based systems for land cover analysis is relatively new. In a recent paper Bárdossy and Samaniego
[1] have proposed a scheme for developing a fuzzy rulebased classifier for analysis of multispectral images.The other approach for designing soft classifiers is to use the evidence theory developed by Dempster and Shafer [7]. Since the theory of evidence allows one to combine evidences obtained from diverse sources of information in support a hypothesis, it seems a natural candidate for analyzing multispectral images for land cover classification.
Here we propose a scheme for designing fuzzy rulebased classifiers for land cover types that uses evidence theory for decision making. This is a two stage process. First we find a good set of fuzzy rules using information from all channels. In the next stage, the responses of the fuzzy rules over a
neighborhood are used to define 8 Basic Probability Assignment which are then combined by DS rule to exploit contextual information to make a better decision. The problem of high variation in the variances of different features, which often degrades the performance of a distance based classifier substantially, is handled in a natural manner by fuzzy rules due to the atomic nature of the antecedent clauses.
2 . Designing the Fuzzy Rule base
The proposed scheme has several stages. First a set of labeled prototypes is generated. Then the prototypes are converted into fuzzy rules. The fuzzy rules are further tuned for improving their performance. Labeled prototypes can be generated using any clustering algorithm followed by labeling the cluster centers. However, for most of such algorithms the number of clusters is a predefined parameter. Here we use the prototype generation scheme described in [5]
. It is a two stage algorithm involving unsupervised and supervised learning that dynamically decides the number of prototypes and extract them using the training data. For details the readers are referred to
[5].2.1 . Designing the fuzzy rulebase
A prototype (representing a cluster of points) for class can be translated into a fuzzy rule of the form :
: is CLOSE TO AND AND is CLOSE TO then class is .
The fuzzy set CLOSE TO is modeled by a Gaussian membership function :
Given a data point with unknown class, we first find the firing strength of each rule. Let denote the firing strength of the rule on a data point . We assign the point to class , if and the rule represents class .
Each fuzzy set is characterized by two parameters and . The s of the rules can be initialized with the components of the final set of prototypes,, generated by our SOFM based algorithm, where . The notation is used to indicate that it corresponds to the initial centers of the membership functions. The initial estimates of the s are computed as follows.
For each prototype in the set let be the set of training data closest to . For each the set
is computed and is associated with the prototype. We use the as the spread of the membership function whose center is at ; is a constant parameter and its value can have a significant impact on the classification performance for complex data sets.
2.2 . Tuning the rulebase
The initial rulebase thus obtained is further refined to achieve better performance. The exact tuning algorithm depends on the conjunction operator used for computation of the firing strengths. The firing strength can be calculated using any Tnorm [2]. Use of different Tnorms results in different classifiers. The minimum and the product are among the most popular Tnorms used as conjunction operators. It is much easier to formulate a calculus based tuning algorithm if product is used. However, if there are many clauses in the antecedent, the firing strength of a rule tends to have low numerical values even when the membership value of each individual clause is quite high. Though computationally this does not pose any problem (we are interested in relative firing strengths of the rules), it is conceptually somewhat unattractive  especially from the interpretability viewpoint.
Thus to avoid the use of the product and at the same time to be able to derive update rules easily we use a softmin operator.
The softmatch of positive number is defined by
where is any real number. is known as an aggregation operator with upper bound of value 1 when . It is easy to see that and Thus we define the softmin operator as the soft match operator with a sufficiently negative value of the parameter . The firing strength of the rth rule computed using softmin is
In the present study we use .
Let be from class and be the rule from class giving the maximum firing strength for . Also let be the rule from the incorrect classes having the highest firing strength for .
We use the error function
We minimize with respect to , and , of the two rules and using gradient decent. Here the index corresponds to clause number in the corresponding rule. Minimizing will refine the rules with respect to their contexts in the feature space. Note that, the context referred here is different from the context of a pixel defined in terms of its spatial neighborhood. The tuning process is repeated until the rate of decrement in E becomes negligible resulting in final rule base .
3 . Using the theory of evidence for Rule aggregation
For the sake of completeness, we briefly introduce the
DempsterShafer theory of evidence.
Let be the universal set and be its power set. A Belief
measure is a function that satisfies the axioms
[7]. and .
For every , if then .
, for every and for every collection
of subsets of .
There is a plausibility measure with each belief measure defined by .
Every belief measure and its dual plausibility measure can be expressed in terms of a Basic Probability Assignment (BPA) function . is called a BPA iff and . A belief measure and a plausibility measure are uniquely determined by through the formulas:
(1) 
(2) 
Every set for which is called a focal element of . Evidence obtained in the same context from two distinct sources and expressed by two BPAs and on some power set can be combined by Dempster’s rule of combination to obtain a joint BPA as:
(3) 
Here
Eq. (3) is often expressed with the notation . The rule is commutative and associative. Evidence from any number (say ) of distinct sources can be combined by repetitive application of the rule as
3.1 . Pignistic probability
Given a belief measure we are often required to make decisions based on the available evidence. In such case becomes the set of decision alternatives and the function denote our belief about the choice of the optimal decision . However, in general it is not possible to select the optimal decision directly from the evidence embodied in the function . In such cases, we use the pignistic transformation, , to construct a probability function for selecting the optimal decision [8]. Thus
is called a pignistic probability, which can be used for making decision . The pignistic probability for can be expressed in terms of BPAs as follows:
(4) 
Optimal decision can now be chosen in favor of , if has the highest pignistic probability.
3.2 . Scheme for decision making
In our problem the frame of discernment is the set of classes, =, where is the number of classes. The propositions take the form the true class label of the pixel of interest is in .
Let us denote the pixel of interest as and its eight spatial neighbors as . We use the firing strengths produced by the rulebase in support of different classes for and one of its neighbors, say as the th source of evidence. Let be the number of rules in the fuzzy rulebase. Since , there could be multiple rules corresponding to a class. Let be the highest firing strength produced by the rules corresponding to the class for . We treat this value as the confidence measure of the rulebase pertaining to the membership of to the class . Thus, the set of values contain the confidence measures for all the classes for (if a confidence measure is less than a threshold, say 0.01, it is set to 0). A similar set of confidence measures can be constructed for every .
Now we use and to define the th BPA to the subsets of . There are possible subsets of , i.e., members of the power set of . Each subset corresponds to the proposition that the “true” class of is contained in that subset. We shall consider the subsets containing one and two elements only. The subsets containing one element correspond to propositions of the form “the class contained in the subset is the true class for ” and the subsets containing two elements corresponds to propositions of the form “the true class label of is any one of the two classes contained in the subset”. Assigning BPA to a subset essentially involves committing some portion of belief in favor of the proposition represented by the subset. So the scheme followed for assigning BPAs must reflect some realistic assessment of the information available in favor of the proposition. We define as follows:
(5) 
For
(6) 
where
The numerators in the right hand side of the above formulae are measures of confidence in favor of the respective propositions. A closer look on (5) shows that the numerator is a product of two terms. The first term is the average of the confidence measures of and for the class , while the second term is an exponential one that reflects the degree of closeness of the confidence measures. Thus as a whole a high value of the numerator reflects two facts: (1) both and has high confidence value for class and (2) the confidence values are close to each other. Eq. (6) is a straightforward extension of the same concept when we define the confidence in favor of a pair of classes.
Thus for the eight neighboring pixels we obtain eight combinable sources of evidence. The global BPA can be computed by applying the Dempster’s rule repeatedly. The combined global BPA is computed as follows:
(7) 
It is easily seen that:
and
(8)  
; where is given by
Once is obtained the pignistic probability for each class is computed. The following formula is used for computing the pignistic probability of class :
(9) 
The pixel is assigned to the class such that
4 . Experimental results and discussions
We report the performances of the proposed classifiers for two multispectral satellite images. We call them Satimage1 and Satimage2.
The Satimage1 is a 256level LandsatTM image of size pixels captured by seven sensors operating in different spectral bands. Each sensor generates an image with pixel values varying from 0 to 255. The
ground truth data provide the actual distribution of classes of objects captured in the image. From this data we produce the labeled data set with each pixel represented by a 7dimensional feature vector and a class label. Satimage2 also is a seven channel 256level LandsatTM image of size
. However due to some characteristic of the hardware used in capturing the images the first row and the last column of the images contain gray value 0. So we did not include those pixels in our study and effectively worked with images. The ground truth containing four classes is used for labeling the data.In our study we generated 4 training sets of samples for each of the images. For Satimage1, each training set contains 200 data points randomly chosen from each of eight classes. This choice is made to conform to the protocol followed in [4]. For Satimage2 we include in each training set 800 randomly chosen data points from each of four classes. Bischof et al. [3] used more training points / class than that of ours.
First we report the performances of the fuzzy rulebased classifiers using firing strengths directly for decision making and compare the results with the published results. Then we report the performances of the fuzzy classifiers using evidence theoretic approach for decision making. The performances of fuzzy rulebased classifiers using firing strengths directly for decision making is summarized in the Table 1.
Trng  No. of  Error Rate in  Error Rate in  

Set  rules  Training Data  Whole Image  
Satimage1  
1.  30  5.0  12.0%  13.6% 
2.  25  6.0  14.3%  14.47% 
3.  25  5.0  12.0%  13.03% 
4.  27  4.0  12.6%  12.5% 
Satimage2  
1.  14  2.0  16.3%  14.14% 
2.  14  2.0  16.3%  14.04% 
3.  12  2.0  17.09%  14.01% 
4.  11  2.0  17.34%  14.23% 
For Satimage1 the best result reported in [4] uses a fuzzy integral based method and gives the classification rate 78.15%. In our case, even the worst result is about 5% better than that.
For Satimage2 the reported result in [3] shows 84.7% accuracy with the maximum likelihood classifier (MLC) and 85.9% accuracy with neural network based classifier. In our case for all trainingtest partitions the fuzzy rulebased classifiers outperform the MLC and at par with the results reported for neural networks.
Tables 2 summarizes the performances of the fuzzy rulebased classifiers using evidence theoretic approach. We used the same set of fuzzy rules as used previously, but the rule outputs are aggregated using the evidence theory.
Training  No. of  Error Rate in 
Set  rules  Whole Image 
Satimage1  
1.  30  12.3% 
2.  25  13.37% 
3.  25  11.6% 
4.  27  11.03% 
Satimage2  
1.  14  12.7% 
2.  14  12.65% 
3.  12  12.4% 
4.  11  12.51% 
Comparison of Table 2 with Table 1 clearly shows that in every case there is a consistent improvement in the classification performance. In case of Satimage1 the improvements varied between 1.1% and 1.5% and the best performing classifier (for training set 4) achieves error rate as low as 11.03%. For Satimage2 also the improvement varied between 1.4% and 1.7%. So the overall improvement for Satimage1 over the existing methods is more than 7%. For Satimage2 also we achieved consistent improvements using training sets of smaller size. For applications like crop yield estimation even a small improvement will have a significant impact on the overall estimate.
5 . Conclusion
We proposed two classifiers: one is fuzzy rule based and the other integrates outputs of fuzzy rules using theory of evidence. Fuzzy rules are extracted with the help SOFM. The system automatically decides on the number of rules.
The fuzzy rule based classifier is of general nature and can be applied in any classification problem, while the evidence theoretic classifier exploits the spatial information available for an image to make the classification decision.
In the evidence theoretic framework we use the pixel under consideration and one of its neighbors to provide a body of evidence in support of different propositions regarding the class membership (to a particular class as well as a pair of classes) of the pixel. The BPAs for the propositions are calculated from the mutual confidences of the pixels in support of respective propositions. Eight bodies of evidence is obtained for eight neighbors of the pixel. Now the evidences are combined to obtain a global body of evidence. Then pignistic probability for each class is computed and the pixel is assigned to the class with highest pignistic probability. The proposed system demonstrates a consistent improvement in performance.
Acknowledgement: Authors thank Prof. N. R. Pal for his continuous support and advices. They also thank Dr. A. S. Kumar and Dr. A. J. Pinz for allowing them to use the satellite images Satimage1 and Satimage2 respectively to test the proposed methods.
References
 [1] A. Bárdossy and L. Samaniego, “Fuzzy rulebased classification of remotely sensed imagery”, IEEE Trans. Geosci. Remote Sensing, vol. 40, no.2 pp. 362374, 2002.

[2]
J. C. Bezdek, J. Keller, R. Krishnapuram and N. R. Pal,
Fuzzy Models and Algorithms for Pattern Recognition and Image Processing
Kluwer, Massachusetts, 1999.  [3] H. Bischof, W. Schneider and A. J. Pinz, “Multispectral Classification of LandsatImages Using Neural Networks”, IEEE Trans. on Geosci. Remote Sensing, vol. 30, no. 3, pp. 482490, 1992.
 [4] A. S. Kumar, S. Chowdhury and K. L. Majumder, “Combination of neural and statistical approaches for Classifying spaceborne multispectral data,” Proc. of ICAPRDT99, pp. 8791, 1999.
 [5] A. Laha and N. R. Pal “Some novel classifiers designed using prototypes extracted by a new scheme based on SelfOrganizing Feature Map”,IEEE Trans. on Syst. Man and Cybern: B, vol 31, no. 6, pp. 881890, 2001.

[6]
J. D. Paola and R. A. Schowengerdt, “A detailed comparison of backpropagation neural network and maximum likelihood classifiers for urban land use classification”,
IEEE Trans. on Geosci. Remote Sensing, vol. 33, pp. 981996, July, 1995.  [7] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, 1976.
 [8] P. Smets and R. Kennes, “The transferable belief model”, Artificial Intelligence, vol. 66, pp. 191234, 1994.