ssn
Code for SpatioSpectral Networks (SSN), a method to compute colortexture descriptors from images
view repo
Texture is one of the moststudied visual attribute for image characterization since the 1960s. However, most handcrafted descriptors are monochromatic, focusing on the gray scale images and discarding the color information. In this context, this work focus on a new method for color texture analysis considering all color channels in a more intrinsic approach. Our proposal consists of modeling color images as directed complex networks that we named SpatioSpectral Network (SSN). Its topology includes withinchannel edges that cover spatial patterns throughout individual image color channels, while betweenchannel edges tackle spectral properties of channel pairs in an opponent fashion. Image descriptors are obtained through a concise topological characterization of the modeled network in a multiscale approach with radially symmetric neighborhoods. Experiments with four datasets cover several aspects of colortexture analysis, and results demonstrate that SSN overcomes all the compared literature methods, including known deep convolutional networks, and also has the most stable performance between datasets, achieving 98.5(±1.1) of average accuracy against 97.1(±1.3) of MCND and 96.8(±3.2) of AlexNet. Additionally, an experiment verifies the performance of the methods under different color spaces, where results show that SSN also has higher performance and robustness.
READ FULL TEXT VIEW PDF
A new method based on complex networks is proposed for colortexture
ana...
read it
It is well known that clothing fashion is a distinctive and often habitu...
read it
The present work proposes the development of a novel method to provide
d...
read it
This paper presents a high discriminative texture analysis method based ...
read it
Texture is a visual attribute largely used in many problems of image
ana...
read it
Until recently, those deep steganalyzers in spatial domain are all desig...
read it
We introduce the cause of the inefficiency of bivariate glyphs by defini...
read it
Code for SpatioSpectral Networks (SSN), a method to compute colortexture descriptors from images
Texture is an abundant property in nature that allows to visually distinguish many things, and it is present not only in our common scale but also in macro and microscale, such as in satellite and microscopy imaging. There are various formal definitions to texture, for instance, according to Julesz [29], two texture are considered similar if their first and second order statistics are similar. We can also define texture in a simple way as a combination of local intensity constancy and/or variations that produce spatial patterns, roughly independently at different scales. Therefore, the challenge of texture analysis is to tackle these patterns in a multiscale manner, keeping a tradeoff between performance and computational complexity (cost). This has taken decades of study and heterogeneous literature, ranging from mathematical to bioinspired methods.
Color images are present in the vast majority of current imaging devices, however, most of the texture analysis methods are monochromatic, i.e. they consider only one image channel, or the image luminance (grayscale). The true color information is usually lost during a grayscale conversion or processed separately with nonspatial approaches such as color statistics, histograms, etc. Convolutional networks have been employing the whole color information for various tasks such as object recognition [32, 46, 48, 24] and texture analysis [12, 6], with promising results. However, it is important to notice that even these methods do not consider the direct relation of pixels in different color channels. In fact, few works explore spatial patterns between color channels and their benefits for colortexture analysis.
The main contribution of this work is the proposal of a new method for colortexture analysis that performs a deep characterization of spatial patterns withinbetween color channels. This is achieved through a directed SpatioSpectral Network (SSN) that models texture images creating connections pointing towards the gradient in a radially symmetric neighborhood, linking pixels from the same or different channels. A radius parameter defines a local window size, and each symmetric neighborhood contained in that region provides relevant colortexture information. The characterization is done through wellknown network topological measures of low computational cost, such as the vertex degree and strength distributions. The combinations of different measures from the network structure provide a robust colortexture descriptor. The source code of the SSN method is available at GitHub ^{1}^{1}1The script to compute SSN descriptors is available at www.github.com/scabini/ssn.. We perform classification experiments to analyze the performance of our proposed descriptor and also compare our results with several methods from the literature in 4 colortexture datasets. Moreover, we analyze the robustness of each method for different color spaces (RGB, LAB, HSV and ) on each dataset.
The challenge of texture analysis lies in effectively characterizing local and global texture patterns while keeping a balance between performance and complexity (cost). Decades of research resulted in a wide range of methods [34, 27], where most of the techniques focus on grayscale images. Statistical methods explore measures based on grayscale cooccurrences in a local fashion, and the most diffused ones are the Gray Level Cooccurrence Matrices (GLCM) [23] and the Local Binary Patterns (LBP) [37, 38]. These methods influenced future techniques, following the same principles. For instance, Local Phase Quantization (LPQ) [39] computes local descriptors in a similar way to LBP but focusing on the problem of centrally symmetric blur. The Completed Local Binary Patterns (CLBP) [21] includes information not covered by the original LBP through the local difference signmagnitude transform. Another approach to texture analysis consists of transforming the image into the frequency domain. There are various methods on this approach and most of them are based on the Gabor filters [25] or Fourier spectrum [2]. There is another category of texture analysis which explores image complexity and fits on the modelbased paradigm, that includes methods based on fractal dimension [51] and Complex Networks (CN) [7, 44]. The latter approach consists of modeling images as networks and using their topological properties for texture characterization.
Color is key information for the human visual system which helps to recognize things faster and to remember then better [53]. The theory of opponent color processing [17] determines that the human visual system combines the responses of different photoreceptor cells in an antagonistic manner. This happens because the wavelength of light for which the three types of cones (L, M, and S) respond overlap. Therefore, it is hypothesized that the interpretation process works in this way because for the visual system it is more efficient to store the differences between the responses of the cones rather than the response of each individual type of cone. The theory suggests that there are two opposing color channels: red versus green and blue versus yellow [17]. In other words, according to this theory evolution has made the human visual system focus on variations between pairs of colors rather than individual colors as a way to improve color vision.
Computationally, the representation of colored images is given by different approaches, called color space. The most wellknown color spaces can be classified into four main groups
[50], named by:Primary color spaces, based on the trichromatic retinal system. These methods assume that any color can be represented by the appropriate mixing of quantities of the three primary colors. The most common format is called RGB (Red, Green, and Blue). This format is widely used in many situations because most imaging devices work in this way.
The luminancechrominance color spaces, where one channel represents luminance and two channels describe the chrominance. A widely used format is known as LAB, or L*A*B*, or even CIELAB, and is a color space defined by the International Commission on Illumination (CIE) in 1976. This format is inspired by the theory of opponent color processing since channel L represents luminosity, channel A represents the red versus green component, and channel B represents the blue versus yellow component.
Perceptual color space, where one of the bestknown methods of this type is HSV (hue, saturation, and value) proposed in 1970. In this model the colors of each hue are arranged in a radial slice, that is, different from the threedimensional space of the models mentioned above, the HSV format has a cylindrical geometry. In this space the hue represents the angular dimension, ranging from (red), (green) and (blue). The vertical axis represents , which sets the brightness level from 0 (black) at the bottom to 1 (white) at the top. The distance from the center to the edge of the cylinder represents the value of , which defines the saturation of the color, where the pure colors are at the maximum value of , at the edges of the cylinder.
The color spaces based on the concept of independent axes, which are obtained by statistical methods. An example of this type of method is the
color space, which consists of computing components with the lowest possible correlation. Normally this color space is obtained through an RGB image, converted from a linear transformation. In this case, the channel
represents the brightness (), the channel the opponent between red and blue (), and the channel the opponent between green, red and blue ().Given an image represented by a certain color space, a method of color texture analysis should explore ways of characterizing the present spectral information. There are different approaches to color texture analysis in the literature and most of them are integrative, which separate color from texture. Integrative methods commonly compute traditional grayscale descriptors of each color channel, separately. In this case, any method of grayscale texture analysis can be applied by combining descriptors from each channel. Another type of approach is called pure color, which only considers the first order distribution of the color, not taking into account spatial aspects commonly analyzed in grayscale images. In this type of methods, the most widespread is based on color histograms [22]. This technique computes a compact summarization of the color distribution of the image. However, these methods do not consider the spatial interaction of pixels, as is done in texture analysis. Therefore, a common approach is to combine pure color descriptors with grayscale descriptors in parallel methods. In [35, 9] the reader may consult an analysis between different integrative, pure color and parallel methods.
In some cases, integrative methods generate results similar to traditional grayscale methods. Authors have then questioned [35] if the inclusion of color information is feasible for texture analysis, arguing that color and texture should be handled separately. However, it should be taken into account that few techniques consider all color information in a more intrinsic approach and that there are different concepts of color processing, even biological ones, that must be taken into account. For instance, the aforementioned opponent color processing theory of the human visual system is an interesting way of color characterization. Since texture consists of local patterns of intensity changes, colortexture information can be extracted by analyzing the spatial relationship between different colors. One of the first works approaching this idea [42] explores the interaction between pairs of color channels, and results indicate an increase in the textural information obtained, which is not available in the analysis of individual color channels. In a paper from 1998 [28] a technique based on filter banks is introduced for the characterization of colored texture. Monochromatic and opponent channels are obtained, calculated from the output of Gabor filters. More recently, some methods based on image complexity focused on the analysis of withinbetween channel aspects for color texture with fractal geometry [8] and CNs [30, 44].
The researches in CNs arise from the combination of graph theory, physics, and statistics, with the aim of analyzing large networks that derive from complex natural processes. Initially, works have shown that there are structural patterns in most of these networks, something that is not expected in a random network. This led to the definition of CN models that allow us to understand the structural properties of real networks. The most popular models are the scalefree [5] and smallworld [52]
networks. Therefore, a new line of research has been opened for pattern recognition, where CNs are adopted as a tool for modeling and characterizing natural phenomena.
The concepts of CN are applied in several areas such as physics, biology, nanotechnology, neuroscience, sociology, among others [13]
. Applying CNs to some problem consists of two main steps: i) modeling the problem as a network; ii) structural analysis of the resulting CN. The topological quantification of the CN allows us to arrive at important conclusions related to the system that it represents. For example, local vertex measurements can highlight important network regions, estimate their vulnerability, find groups of similar vertices, and etc.
A network can be defined mathematically by , where is a set of vertices and a set of edges (or connections). The edges can be weighted, representing a value that describes the weight of the connection between two vertices, or unweighted, indicating only if the connection exists. The edges can be either undirected, satisfying , or directed, satisfying , i.e. can be something other than .
The topology of a CN is defined by the patterns of its connections. To quantify it, measurements can be extracted for either individual vertices, vertex groups or globally for the entire network. One of the most commonly used measures is the vertex degree, which is the sum of its connections. Considering the sets and , the degree of each vertex can be calculated as follows:
(1) 
Note that in this case the degree is calculated in a binary way since the sum considers only 1 if there is the edge, or 0 if it does not exist. If the network is weighted, the degree can also be weighted, metric commonly known by vertex strength. Therefore, the weight of all edges incident on the vertex is summed
(2) 
In directed networks it is possible to calculate the input and output degree of vertices according to the edge directions. The output degree represents the number of edges leaving , and yields the same equation of the degree in undirected networks (Equation 3). To compute the input degree, it is necessary to invert the edge check
(3) 
which then sums the number of edges pointing to . Analogously we can compute the input and output strength of a vertex ( and ) by summing the weight of its edges ( instead of 1) according to its direction.
Since CNs are flexible structures that allow the analysis of several phenomena in the real world, it is possible to use them in image modeling. In this case, it becomes possible to transform a computer vision problem into a CN problem, which can be treated in different ways. The first step of this approach is the modeling of the network from the image, that is the definition of what are vertices and edges. This setting is variable and usually depends on the problem in question. For texture modeling, the technique usually employed is to consider each pixel of the image as a vertex to build an undirected and weighted network. Consider a
gray image with pixels, with intensity levels between ( is the highest possible intensity value in the images). A network is obtained by constructing the set . The first work [10] consider the absolute difference of intensity between pixels to define the weight of their connection. It is important to note that the intensity difference is not affected by changes in the average illumination of the image [18]. Given a radius that defines a spatial boundary window, a new network is obtained where each vertex is connected to its neighbors with weight (normalized absolute intensity difference) if , where represents the pixel Euclidean distance. This same modeling approach is also used by the methods proposed in [20, 19]. The connection weight inversely represent pixel similarity, where lower values means high similarity. However, this equation does not include spatial information on the connection weight, which led future works to propose new rules. In [4] the term was included on the calculation of the edge weight, giving equal importance to both the pixel intensity difference and its spatial position inside the connection neighborhood (). A different approach is introduced in [45], where intensity and distance are directly proportional (). The inclusion of the spatial information overcomes the limitation of previous methods where the connection weight towards pixels with the same intensity would be the same regardless of their distance to the central pixel.The steps described so far results in a network with scale proportional to , which limits the connection neighborhood. However, this is a regular network, as all vertices have the same number of connections (except for border vertices). Therefore, a transformation is needed in order to obtain a network with relevant topological information. In most works this is achieved through connection thresholding with an additional parameter , therefore a new network is obtained by transforming its set of edges . The resulting network then keep connected similar pixels, where controls the similarity level. It is intuitive to conclude that the resulting topology is directly influenced by the parameters and . This allows a complete analysis of the network dynamic evolution from smaller to higher neighborhood sets and different levels of pixel similarity. The final texture characterization is then made through CN topological measures such as the vertex degree, strength, and others [14].
The concepts of CN applied to texture analysis have been explored and improved in more recent works. In [44] a new multilayer model is introduced for color texture analysis, where each network layer represents an image color channel, and its topology contains withinbetween channel connections in a spatial fashion. This work also proposes a new method for estimating optimal thresholding, and the use of the vertex clustering coefficient is also introduced for the network characterization, achieving promising results. In [16] an interesting technique is proposed to build a vocabulary learned from CN properties and also to characterize the detected key points through CN, exploring the relevance of various topological measures.
We propose a new network modeling for colortexture characterization with various improvements over previous CNbased methods. Our method models the spatial relation of intra and interchannel pixels through a directed CN, that we named SpatioSpectral Network (SSN). Firstly, consider a color image with size , i.e pixels with colors whose values range from , and a network defined by a tuple , where represents the network vertices and its edges. The vertex set is created as in [44], where each image pixel is mapped as a vertex for each colorchannel, thus . This creates a multilayer network, where each layer represents one image colorchannel and each vertex carry a pair of coordinates , indicating the position of the pixel that it represents, and a value indicating the pixel intensity value on its respective colorchannel.
In order to create the connections, previous works usually adopt a set of radii to limit the size of the vertex neighborhood, i.e. a set of sliding windows of radius defines a distance limit for vertices to be connected, resulting in a set of networks. In other words, a vertex is connected to a pixel if it is inside its neighborhood , where is the 2D Euclidean distance between the vertices. Therefore, as in [44], this neighborhood covers vertices in all colorchannels because it considers only the spatial position of the pixels to compute its distance. This process allows to access the dynamic evolution as increases by analyzing each network , and have demonstrated to be effective for color texture characterization [44]. On the other hand, it is possible to notice that for a set of increasing radii , and , . This means that each neighborhood contains all neighborhoods from previous radius, which leads to redundancy between networks .
In this context, we propose a radially symmetric modeling criteria by redefining the neighborhood as , i.e. it considers only vertices in a distance range between the considered and the previous radius, thus . It is important to notice that we include vertices with distance 0 in the neighborhood , which is the case when a pixel connects to itself in different colorchannels. The neighborhood definition can also be formulated as a combination of the distance and the number of neighbors , which is usually adopted in LBPbased methods [21]. In our case, as the neighborhood covers all the color channels, we can represent the vertex neighbors for radius 1 by (, ), as the vertex connects to 4 neighbors in each channel plus itself on the other channels. Analogously, radius 2 creates a neighborhood (, ), and so on. To illustrate this concept, Figure 1 shows a CN modelled for 1 image channel and highlights the difference between the standard (ab) and the radially symmetric (cd) neighboring. In our proposal, the radially symmetric neighborhood is then extended for all image channels (e).


The definitions given so far concerns only the pixel neighborhood in which vertices will connect, therefore the next step is to define the connection creation. The weight of the connection between pairs of vertices are defined by their absolute intensity difference directly proportional to their Euclidean distance on the image
(4) 
This is a modification over the equations proposed in [45, 44] so that neither side of the multiplication cancels the other when the intensity difference or the distance is 0 (same pixel in different channels), and to generate uniform values between . It is important to notice that according to the proposed neighboring, the creation of connections from all to all channels with the given connection weight implies that the network somehow performs opponent color processing in a spatial fashion. Therefore, the SSN is a combination of CN and a bioinspired approach based on the OpponentProcess Theory [17], but considers the opposing color pairs red versus green, red versus blue and green versus blue instead of red versus green and blue versus yellow. If another color space than RGB is used, then different opposing color pairs will be considered acording to that space.
As previously mentioned, most CNbased approaches employ weighted undirected networks, which needs thresholding techniques in order to obtain relevant topological information for analysis. This happens because a network that connects all pixels inside a fixed radius have constant topological measures such as degree, strength, clustering, etc. However, the thresholding leads to additional parameters that need to be tuned, which have been a drawback of previous works where authors explored costly techniques for optimal threshold selection [45, 44]. In this sense, we propose a directed technique that eliminates the need for thresholding, reducing the method parameters to only the set of radius . This is achieved by associating the direction of the connection to the direction of the gradient, i.e. towards the pixel of higher intensity. This idea was first employed in previous work for grayscale texture characterization [40], and here we extend its definitions for multilayer networks of color images, along with our new connection weight equation (Equation 4). Consider a network , its set of edges is defined by
(5) 
and when the edge is bidirectional (). This process generates a network that contains relevant topological information reflected on its directional connection patterns, which can be quantified through directed measures such as the input and output degree and strength of vertices, eliminating the need for connection cutting.
The spatiospectral nature of our network emerges from the patterns of withinbetween channel connections which includes information of the image gradient through edge directions. To highlight this information, we divide the original network as in [44], obtaining two additional networks, being the first whose edges are a subset of that contains withinchannel connections, thus , , where return the channel/layer of . The second network represents betweenchannel connections, then , . The vertex set of each network , and are the same () as we only divide its edges. Figure 1 (d, e and f) illustrates the structure of the 3 networks for radius . By quantifying their topology, it is possible to obtain rich colortexture information for image characterization, as we discuss in the following.
To characterize the directed SSN, we propose the use of traditional centrality measures computed for each vertex considering its edge directions, which is the input and output degree and strength (See equation 3). These measures can be effectively computed during the network modeling as only the vertex neighbors must be visited for the calculation, therefore there is no significant additional cost for the network characterization. Our approach cost is then smaller than the method of [44]. This one uses the vertex clustering coefficient, which needs to store the network to visit the neighbors of the neighbors of each vertex.
Notice that , , where indicates the maximum number of possible connections for . This means that the input and output degree are a linear transformation of each other in the function of the network max degree, therefore we use only the input degree. It is not possible to make the same assumption for the vertex strength, as it also depends on the distribution of image pixel values in the connection neighborhood, thus we use both the input and output strength. In this context, the network characterization is performed combining the three measures (input degree, input strength, and output strength). Each topological measure highlights different texture patterns of the modeled image, as we show in Figure 2 for each network. We consider a set of feature maps all the topological information obtained through the characterization of the networks , and with the aforementioned 3 centrality measures (, and ) in a multiscale fashion with a set of increasing radius given . In the qualitative analysis shown in Figure 2, it is possible to notice how the feature maps highlight a wide range of colortexture patterns.
The feature maps obtained from the characterization of SSN must be summarized in order to obtain a single and compact image descriptor. From a network science perspective, it is possible to obtain the probability distribution of the three centrality measures, which is a common approach for network characterization. We propose to compute the distribution for each layer of
, and separately, therefore, distributions are obtained from each network. It is intuitive to conclude that a separate analysis should provide more specific information regarding patterns occurring in each network layer. This technique improved our results if compared to using the whole network as proposed in [44]. For the exact computation of the probability distribution function of each network layer, we fixed the number of bins for counting the input degree occurrence as the maximum possible degree. For the occurrence counting of the strength measure, the bin number is maximum possible degree multiplied by 10. We define the probability distribution function of each measure by , and . Figure 3 shows each distribution of a SSN according to the proposed layerwise analysis, where it is possible to notice a clear distinction between the two different input textures. The topological arrangement of the network varies greatly according to the input texture and the layer being analyzed, however, it is possible to notice a powerlawlike behavior in some cases. As a matter of comparison, the multilayer networks of [44] seems to present a similar topology for different image inputs, with variations of the smallworld effect and the occurrence of powerlawlike degree distributions. On the other hand, here the directed SSN present heterogeneous arrangements where the network structure varies greatly between different texture, which then provides better topological measures for characterization. This difference between the previous work [44] happens mostly due to the use of connection direction, the radially symmetric neighboring and the layerwise network characterization.




Although the distributions , and summarize the network topology, it is still impracticable to apply them as an image descriptor because as higher the parameter , higher is the size of its combination, e.g. for a single network of a RGB image, the size would be (19 + 190 + 190)3 = 1197 (the size of each distribution obtained for each of the 3 image channels). Therefore, we propose to use statistical measures to further summarize the SSN structure. We employ the four measures proposed in [44] (mean
, energy , and entropy) plus the third and fourth statistical moments, i.e. the Skewness
(6) 
and the Kurtosis
(7) 
The combination of the six statistics for each of the three topological measures of each layer comprises a network descriptor. Consider a network and its set of vertices redefined as a combination of vertices from each of the layers
, then a vector
represents the six statistics for a given layer and measure . The concatenation of measures from each layer compose the network descriptor(8) 
Considering the multiscale approach, we propose the use of a set of 1by1 increasing radius of to maintain the proportion of each symmetric neighborhood. Therefore, the final descriptor which addresses the dynamic evolution of each network is obtained by , and . The combination of descriptors from networks , and results in a complete representation that comprises the whole spatiospectral information
(9) 
(10) 
The cost of the proposed method is directly proportional to the image size and the largest radius used (), which defines the window size of connection creation. The last radius, , indicates that all pixels at distance must be visited to compose each neighborhood , according to the radially symmetric neighborhood previously defined. Therefore, consider an image with size and channels (), and as the number of pixels contained in all neighborhoods for a set of radius . The cost of the image modelling as SSN is then
(11) 
considering that the modeling of , and are performed together, as it is only necessary to verify the pixel channel in order to determine whether the connection belongs. As previously mentioned, the overall implementation of the proposed method is similar to the previous work [44] (that has cost ) without the cost to compute the clustering coefficient of each vertex. Therefore, the network measures are computed during the modeling step with no additional cost, as there is no need to store the network. The characterization cost is then related to the computation of each measure distribution and its statistics, which is much smaller than the modeling cost. Therefore, the asymptotic limit of the proposed method is the same as defined in Equation 11. As the timeconsuming experiment presented in [44] shows, this approach is faster than recent deep convolutional networks.
This section presents the experiments performed to evaluate the proposed SSN under different scenarios and to compare our results with other methods from the literature. We perform a supervised classification scheme using the Linear Discriminant Analysis (LDA) classifier [41]
, which consists of finding a linear combination of characteristics where the variance between classes is greater than the intraclass variance. The performance is measured by the accuracy of leaveoneout crossvalidation, which is a repetition of
(number of samples of the dataset) traintest procedures where one sample is used for test and the remainder for training at each iteration (each sample is used as test once). The accuracy then is the percentage of correctly classified samples.The following color texture datasets from the literature are used:
USPtex: This dataset [3] was built by the University of São Paulo and contains 191 classes of natural colored textures, found on a daily basis. The original images are 512x384 in size and are divided into 12 samples of size 128x128 without overlap, totaling 2292 images in total.
Outex13: The Outex framework [36] is proposed for the empirical evaluation of texture analysis methods. This framework consists of several different sets of images, and the Outex13 dataset (test suit OutexTC00013 on the web site ^{2}^{2}2www.outex.oulu.fi) focuses on the analysis of texture considering color as a discriminatory property. The dataset contains 1360 images divided into 68 classes, that is, 20 samples per class, of size 200x200.
MBT: The MultiBand Texture [1] is composed of 154 colored images of classes formed by the combined effect of spatial variations within and between channels. This type of pattern appears in images with high spatial resolution, which are common in areas such as astronomy and remote sensing. Each of the 154 original images, 640x640 in size, is divided into 16 nonoverlapping samples, size 160x160, composing a set of 2464 images.
CUReT: The ColumbiaUtrecht Reflectance and Texture dataset [15] is composed of colored images of materials. The base contains 61 classes with 92 samples each, where there is a wide variety of geometric and photometric properties as intraclass variations in rotation, illumination, and viewing angle. Classes represent surface textures of various materials such as aluminum foil, artificial grass, gypsum, concrete, leather, fabrics, among others.
We analyze the proposed method in terms of its parameter and the impact on using different feature vectors from the combination of the networks , and . Figure 5 shows how the accuracy rate obtained with each feature vector , , , and changes as increases. It is possible to notice that the performance grows to a certain point and then stabilizes, except for Outex13 where it starts to drop after . This indicates that smaller values of are not sufficient to achieve a complete multiscale texture characterization, and also that too higher values on Outex13 can provide feature redundancy as the number of descriptors grows. On the other hand, peaks of performances are achieved with on USPtex and CUReT, but with a small performance difference in comparison to smaller values. Therefore, a more stable region lies around if we consider all datasets. Regarding the feature vectors, it is possible to notice that the performance for each network alone vary between datasets, where is better on USPtex and MBT, while is better on Outex13 and CUReT. On the other hand, both vectors and present the highest results in most cases, which is to be expected as it combines measures from both within and betweenchannel analysis. We then suggest the use of the combinatorial feature vectors in order to obtain better results in different scenarios.
On Table 1 we show the average accuracy (for the 4 datasets) of the combinatorial feature vectors and as increases, along with the corresponding number of descriptors. As we previously suggested, the interval has a smaller standard deviation, which indicates a more stable region. Moreover, the highest average performance is obtained with for using 648 descriptors, and with and for using 648 or 810 descriptors, respectively. On this context, the results suggest that the 2 alternatives or has a similar average performance than but using a smaller number of descriptors. For a more satisfactory performance in most scenarios, we suggest the use of due to the use of a larger neighborhood, providing a better multiscale analysis.
descriptors  mean acc.  descriptors  mean acc.  
1  108  93.5()  162  94.5() 
2  216  96.9()  324  97.4() 
3  324  97.7()  486  98.2() 
4  432  98.1()  648  98.5() 
5  540  98.4()  810  98.5() 
6  648  98.5()  972  98.3() 
7  756  98.4()  1134  98.1() 
8  864  98.3()  1296  97.0() 
9  972  98.4()  1458  97.4() 
10  1080  98.1()  1620  97.4() 
The following literature methods are considered for comparison: OpponentGabor [28] (264 descriptors), Local Phase Quantization (LPQ) [39] (integrative, 768 descriptors), Complete Local Binary Patterns (CLBP) [21] (integrative, 354 descriptors), Complex Networks Traditional descriptors (CNTD) [4] (108 descriptors for grayscale and 324 for integrative) and Multilayer Complex Network Descriptors (MCND) [44]
(we use the best results available in the paper). We also compare results with some wellknown deep convolutional neural networks (DCNN), these are: AlexNet
[32], VGG16 and VGG19 [46], InceptionV3 [47, 48], ResNet50 and ResNet101 [24]. The DCNN are employed as feature extractors using a pretrained architecture on the ImageNet object recognition dataset, from the Large Scale Visual Recognition Challenge
[43]. Each considered architecture and its pretrained weights are available at the Neural Network Toolbox of the Matlab 2018a software ^{3}^{3}3https://www.mathworks.com/help/deeplearning/ug/pretrainedconvolutionalneuralnetworks.html. The use of pretrained DCNN achieves higher performance than training these models from scratch on the considered color texture datasets, as results in [44] shows. As each DCNN has a fixed input size, images from each dataset are resized before being processed, and the classification part (fullyconnected layers) is removed. Therefore the 2D feature activation maps generated by the last convolutional layers are considered to compute image features, this approach has been used previously [33, 11, 12]. To obtain a feature vector a Global Average Pooling (GAP) is applied as in [33], producing a vector of a size corresponding to the number of 2D feature activation maps. However, most of the last convolutional layer of DCNNs produce a high number of feature activation maps such as 2048 for the Inception and ResNet models. In this sense, we propose to use previous convolutional layers until the size is smaller than 800. This reduction is performed for various reasons, where the most obvious is to avoid the curse of dimensionality, caused by the exponential increase in the volume associated to the addition of extra dimensions to the Euclidean space
[31]. Another reason is a known effect of the LDA classifier, the Singularity, Small Sample Size (SSS), which happens when there is high dimensional feature vectors or a small number of training images [26, 49]. It is also important to highlight that we use the raw output of the corresponding convolutional layers, i.e we do not apply its following summarizing functions such as ReLU and local poolings. This approach improves the average performance of DCNN as texture feature extractors when compared to the results in
[44], therefore we believe that it is reasonable for our purpose. Additionally to the aforementioned situations, the reduction of the DCNN feature vector also improves the computational cost of the training and classification step, while keeping a reasonable number of descriptors in comparison to other handcrafted descriptors. The exact number of features obtained with InceptionV3 and both ResNet models are, respectively, 768 and 512. For AlexNet and both VGG models, the size corresponds to its last convolutional layer, which is 256 and 512, respectively.We compare the accuracy rate between the proposed approach and the aforementioned methods on the four considered datasets, results are shown on Table 2, the highest result of each column is highlighted in bold type. For the USPtex dataset, the lowest results are achieved by the integrative methods LPQ, CLBP and CNTD (around and ), followed by the MCND and the OpponentGabor descriptors (around ). DCNNs performs near the highest results, with small differences between each model. The highest result is obtained using the proposed method and , achieving of accuracy rate. The suggested feature vector achieves the second highest result of , equivalent to the performance obtained by the ResNet50 DCNN. On the other hand, DCNN performs better than the previous handcrafted descriptors, but are overcome by the proposed SSN.
Method  USPtex  Outex13  MBT  CUReT  Average 
OpponentGabor (1998)  99.1  93.5  97.6  95.8  96.5() 
LPQ integrative (2008)  90.4  80.1  95.7  91.7  89.5() 
CLBP integrative (2010)  97.4  89.6  98.2  91.8  94.3() 
CNTD (grayscale) (2013)  92.3  86.8  83.7  84.2  86.8() 
CNTD integrative (2013)  97.9  92.3  98.5  91.9  95.2() 
MCND (2019) [44]  99.0  95.4  97.0  97.1  97.1() 
AlexNet (2012)  99.6  91.4  97.8  98.2  96.8() 
VGG16 (2014)  99.5  91.1  97.2  98.5  96.6() 
VGG19 (2014)  99.5  90.7  96.3  98.6  96.3() 
InceptionV3 (2016)  99.5  89.5  94.4  97.3  95.2() 
ResNet50 (2016)  99.7  91.5  94.9  98.7  96.2() 
ResNet101 (2016)  99.5  91.3  94.6  98.8  96.1() 
SSN ()  99.7  96.6  98.6  98.9  98.5() 
SSN ()  99.8  95.2  98.3  99.1  98.1() 
SSN ()  99.5  96.8  99.0  98.6  98.5() 
SSN ()  99.6  96.6  99.0  98.9  98.5() 
Outex13 is the hardest dataset in terms of color texture characterization, as results shows. Moreover, on this dataset the performance of DCNNs drops considerably, performing around and . The integrative LPQ and CLBP and the grayscale approach of CNTD present the lowest accuracies, with , and , respectively. On the other hand, the integrative CNTD method performs above the DCNN (), and the OpponentGabor method overcomes both the integrative and the DCNN methods, achieving . The highest performance is obtained by the proposed SSN and the MCND, with and , respectively. The suggested feature vector achieves , which also overcomes the other methods, where the performance improvement is of over MCND and over the best DCNN, ResNet50. The results of the best methods on this dataset, OpponentGabor, MCND, and SSN, corroborates to the importance of CN and the withinbetween channel analysis for color texture characterization.
On the MBT dataset, we see again a different performance pattern regarding the literature approaches. Firstly, the large performance difference between the grayscale and the integrative CNTD method ( and ) indicates that color plays an important role in the MBT color texture characterization. The DCNN AlexNet and VGG16 achieve around of accuracy, similar to the OpponentGabor approach. On the other hand, the deeper DCNN (Inception and ResNet) achieves the lowest results, around . The highest accuracy is obtained by the proposed method, with at its best configuration and using the suggested feature vector .
The last dataset, CUReT, is the largest in terms of the number of samples per classes, and we can also notice a different performance pattern regarding the literature methods. On this dataset, the depth of DCNNs seems to be beneficial to performance, as we can observe for VGG and ResNet, which can be related to the dataset size. The integrative methods present the lowest performance, around , performing below OpponentGabor and MCND ( and , respectively). The DCNN descriptors perform above the previous methods, where the deeper network analyzed, ResNet101, achieves . The proposed method again overcomes the other approaches, where the highest accuracy rate is achieved with the configuration (), and the proposed configuration achieves .
The last column of Table 2 shows the average accuracy of each method over all datasets (standard deviation in brackets). We can see that the proposed SSN achieves the highest average performance for any configuration, and also the lower standard deviations, corroborating to its robustness for different color texture scenarios. The DCNN overcomes the previous handcrafted descriptors, however, its high standard deviation reflects its performance drop on the Outex13 and MBT datasets. Similarly, the integrative descriptors also present a varying performance, as its standard deviation shows. Regarding the average performance of the OpponentGabor method, we can conclude that the opponent color processing allows a better characterization in some cases, therefore this method has a higher performance than integrative methods and performs close to DCNNs. This corroborates to the benefits of the opponent color analysis from which the proposed method benefits, however, our approach keeps a higher performance and a lower oscillation between datasets in comparison to the OpponentGabor method. Considering the CNbased methods, we can see that the CNTD approach also has an oscillating performance, and is overcome by the MCND which has the second highest average performance and also the second lowest standard deviation, behind only the proposed method. Overall, the results presented here corroborate to the effectiveness of SSN for color texture analysis in various scenarios, where it keeps the highest performance in all cases, while most of the other literature methods oscillate.
In color texture analysis the image representation through different color spaces may impact the characterization performance. On this context, this section presents a set of experiments by varying the image color space of the four considered datasets and comparing the performance between the proposed method () and other literature approaches. The MCND approach is not included in this analysis because the available results concern only the best configuration of the method for each dataset separately in the RGB space. Four color spaces are considered for analysis, being one for each different approach: RGB, LAB, HSV, and . These techniques are described in depth at Section 2.2. As some color spaces use different value ranges for different channels, we normalize its values to . First, we analyze the performance for the USPtex and Outex13 datasets, the accuracies of leaveoneout crossvalidation with the LDA classifier are shown in Table 3. It is possible to observe how different color spaces influence each method, in different ways. For example, the integrative methods, had increased performance in the HSV space, while the opposite happens with the DCNN. In fact, this effect on the convolutional networks also happens in all other channels, albeit with a lower intensity. This is somewhat expected since these networks are pretrained in a set of RGB images. The two methods that presented greater robustness to the different types of color spaces were, respectively, the proposed SSN and the OpponentGabor method. This indicates that the opponent color processing approach allows a greater tolerance to the different ways of representing the colors of the images. The proposed method incorporates these characteristics and, at the same time, increases the invariance to color spaces, reaching the highest average performance and the lowest standard deviation of versus of OpponentGabor. In addition, the SSN method also achieves the highest results in each color space individually.
Method  RGB  LAB  HSV  Average  
USPtex  LPQ i.  90.4  95.0  96.6  94.6  94.1() 
CLBP i.  97.4  98.6  98.6  98.5  98.3()  
OpponentGabor  99.1  99.0  97.9  99.3  98.8()  
CNTD i.  97.9  98.3  99.1  98.2  98.4()  
AlexNet  99.6  99.0  94.7  99.3  98.2()  
VGG16  99.5  98.6  94.4  98.7  97.8()  
VGG19  99.5  98.2  92.1  98.6  97.1()  
InceptionV3  99.5  97.7  94.7  98.1  97.5()  
ResNet50  99.7  98.2  92.5  98.5  97.2()  
ResNet101  99.5  98.0  92.8  98.5  97.2()  
SSN ()  99.7  99.7  99.4  99.4  99.6()  
Outex13  LPQ i.  80.1  74.8  78.2  76.0  77.3() 
CLBP i.  89.6  86.8  88.2  86.6  87.8()  
OpponentGabor  93.5  91.3  91.7  91.3  91.9()  
CNTD i.  92.3  90.6  94.0  90.6  91.9()  
AlexNet  91.4  91.6  91.0  91.1  91.3()  
VGG16  91.1  89.2  87.6  89.8  89.4()  
VGG19  90.7  89.1  85.9  90.8  89.1()  
InceptionV3  89.5  85.0  84.2  85.3  86.0()  
ResNet50  91.5  88.7  86.8  89.0  89.0()  
ResNet101  91.3  90.1  86.4  88.8  89.2()  
SSN ()  96.6  94.3  95.7  94.6  95.3()  
In the Outex13 dataset, the methods with the lowest performance are, respectively, LPQ and InceptionV3. The other DCNN, except for AlexNet, get an average of around . The CLBP method has its performance reduced in the LAB and color spaces, which results in an average below the convolutional networks, of . The integrative CNTD method overcomes other integrative methods, obtaining an average of , a result also obtained approximately by the OpponentGabor method and by the AlexNet DCNN. This shows that traditional integrative methods have limitations on this dataset when different color spaces are considered. For example, the CLBP method, in the USPtex dataset, is benefited by the changes of color space, which does not happen in Outex13. The methods with the greatest robustness to the different color spaces in this dataset are again the proposed SSN and the OpponentGabor method, with an average performance of and , respectively. In this context and considering the results obtained by the OpponentGabor and the integrative CNTD methods, it is possible to note that CN and opponents techniques present the best performances on this dataset, and the SSN method which combines these properties achieves the highest results for all color spaces.
Table 4 shows the obtained results for the MBT and CUReT datasets. Regarding MBT, the highest results are of the SSN method, of the integrative CNTD approach and obtained by both OpponentGabor and CLBP methods. On this dataset, the average performance of the deepest neural networks (InceptionV3 and both ResNets) is the lowest observed, close to , while the smaller networks AlexNet and VGG16 obtain, respectively, and . The high standard deviation of the DCNN is due to the sharp performance drop with the HSV color space, with losses of almost . Differently, from what is observed in the other bases, this also happens with all other methods, but with a smaller intensity if compared to the DCNN. On this aspect the proposed method proved to be more robust, achieving the highest performance under HSV and also the lowest variation between color spaces. The results obtained in the CUReT dataset corroborate the robustness of the proposed method to different color spaces. The highest result is obtained by SSN in all color spaces with an average performance of . On this dataset, the DCNN ResNet (50 and 101) outperform the integrative methods in the individual LAB and color spaces, and despite presenting again a loss in the HSV space, their average performance overcomes the other literature methods. In the HSV space, the best results are obtained by the proposed method and by the OpponentGabor method. In general, on this dataset both integrative and convolutional networks have the highest standard deviation, indicating that their performance oscillates more between different color spaces. The methods with lower oscillation of performance are the proposed SSN and the OpponentGabor method, respectively, reinforcing the robustness of the opponent color processing and the CNbased approach.
Method  RGB  LAB  HSV  Average  

MBT  LPQ i.  95.7  94.2  90.6  93.9  93.6() 
CLBP i.  98.2  95.9  94.6  96.6  96.3()  
OpponentGabor  97.6  96.9  93.9  96.7  96.3()  
CNTD i.  98.5  96.7  93.3  97.8  96.6()  
AlexNet  97.8  98.1  88.4  98.2  95.6()  
VGG16  97.2  97.0  88.5  96.3  94.7()  
VGG19  96.3  95.8  87.9  95.8  94.0()  
InceptionV3  94.4  95.0  85.8  93.1  92.1()  
ResNet50  94.9  94.2  85.1  93.6  91.9()  
ResNet101  94.6  94.1  85.3  92.7  91.6()  
SSN ()  98.6  97.1  95.6  98.1  97.3()  
CUReT  LPQ i.  91.7  94.8  96.3  93.0  93.9() 
CLBP i.  91.8  94.2  96.2  93.9  94.0()  
OpponentGabor  95.8  95.6  97.4  95.3  96.0()  
CNTD i.  91.9  92.7  95.4  91.2  92.8()  
AlexNet  98.2  96.6  92.9  96.8  96.1()  
VGG16  98.5  97.6  93.7  97.4  96.8()  
VGG19  98.6  97.9  92.3  97.5  96.6()  
InceptionV3  97.3  95.5  89.4  95.0  94.3()  
ResNet50  98.7  97.3  93.9  97.8  96.9()  
ResNet101  98.8  98.0  93.4  97.9  97.0()  
SSN ()  98.9  98.4  98.9  98.6  98.7() 
To analyze the overall performance of the methods on each color space separately, we compute the average accuracy over all the datasets, results are shown in Figure 6 for the 7 best methods. It is possible to notice that SSN has the highest average performance for all color spaces. The DCNN AlexNet achieves the second best performance overcoming other compared methods for the RGB, LAB and color spaces, but performs poorly with the HSV color space, along with the other DCNN. On the other hand, the proposed method overcomes the other approaches with the largest margin on the HSV color space.
The color space experiment shows that the proposed SSN has great robustness both between datasets or between different color spaces for a single dataset. In all scenarios, its performance variation is significantly smaller in comparison to the other literature methods, while also achieving the highest results. Therefore, considering all the obtained results, SSN stands out as a very effective approach for color texture characterization in a wide range of different scenarios, such as under different color texture properties (considering the heterogeneity of the datasets) and different color space representations.
This work introduces a new singleparameter technique for color texture analysis which consists of the modeling and characterization of a directed SpatioSpectral Network (SSN) from the image color channels. Each pixel, in each channel, is considered as a network vertex, resulting in a multilayer structure. The connections are created according to a proposed technique of radially symmetric neighborhood given a distance limiting parameter . Directed connections withinbetween color channel are defined pointing towards the pixels of higher intensity, and the connection weight consists of a normalized calculation of the intensity difference multiplied by the pixel Euclidean distance. This process results in a rich network with deep spatiospectral properties, capturing a wide range of texture patterns related to the combined spatial and spectral intensity variations. This network is then quantified with topological measures following a complete characterization procedure, considering different measures, different connection types and analyzing each network layer separately. The whole process results in a compact and effective image descriptor for color texture characterization.
We performed classification experiments on four datasets (USPtex, Outex13, MBT, and CUReT) in order to evaluate the proposed SSN and to compare its performance to other methods from the literature. 12 methods are considered for comparison, including Gabor filters obtained from opponent color channels, integrative versions of grayscale texture descriptors (LPQ, CLBP, and CNTD), CNbased methods (CNTD and MCND) and DCNN (AlexNet, VGG, Inception, and ResNet). Results show that the proposed approach has higher performance for all datasets, and also the smallest variation between datasets, corroborating to its robustness to different scenarios. We also perform an evaluation of the impacts of different color spaces (RGB, LAB, HSV and ) on color texture characterization, which shows that SSN also has the higher average performance and higher tolerance to each color space than the other compared methods from the literature. The obtained results suggest that the spatiospectral approach combined with the flexibility of complex networks to model and characterize real phenomena is a powerful technique for color texture analysis, and its properties should be further explored.
L. F. S. Scabini acknowledges support from CNPq (Grants #134558/20162 and #142438/20189). L. C. Ribas gratefully acknowledges the financial support grant #2016/237638 and #2016/188099, São Paulo Research Foundation (FAPESP). O. M. Bruno acknowledges support from CNPq (Grant #307897/20184) and FAPESP (grant #2014/080261 and 2016/188099). The authors are also grateful to the NVIDIA GPU Grant Program for the donation of the Quadro P6000 and the Titan Xp GPUs used on this research.
Encyclopedia of machine learning and data mining
, pages 314–315, Berlin, 2017. Springer.
Comments
There are no comments yet.