Code for Spatio-Spectral Networks (SSN), a method to compute color-texture descriptors from images
Texture is one of the most-studied visual attribute for image characterization since the 1960s. However, most hand-crafted descriptors are monochromatic, focusing on the gray scale images and discarding the color information. In this context, this work focus on a new method for color texture analysis considering all color channels in a more intrinsic approach. Our proposal consists of modeling color images as directed complex networks that we named Spatio-Spectral Network (SSN). Its topology includes within-channel edges that cover spatial patterns throughout individual image color channels, while between-channel edges tackle spectral properties of channel pairs in an opponent fashion. Image descriptors are obtained through a concise topological characterization of the modeled network in a multiscale approach with radially symmetric neighborhoods. Experiments with four datasets cover several aspects of color-texture analysis, and results demonstrate that SSN overcomes all the compared literature methods, including known deep convolutional networks, and also has the most stable performance between datasets, achieving 98.5(±1.1) of average accuracy against 97.1(±1.3) of MCND and 96.8(±3.2) of AlexNet. Additionally, an experiment verifies the performance of the methods under different color spaces, where results show that SSN also has higher performance and robustness.READ FULL TEXT VIEW PDF
A new method based on complex networks is proposed for color-texture
It is well known that clothing fashion is a distinctive and often habitu...
The present work proposes the development of a novel method to provide
This paper presents a high discriminative texture analysis method based ...
Texture is a visual attribute largely used in many problems of image
Until recently, those deep steganalyzers in spatial domain are all desig...
We introduce the cause of the inefficiency of bivariate glyphs by defini...
Code for Spatio-Spectral Networks (SSN), a method to compute color-texture descriptors from images
Texture is an abundant property in nature that allows to visually distinguish many things, and it is present not only in our common scale but also in macro and microscale, such as in satellite and microscopy imaging. There are various formal definitions to texture, for instance, according to Julesz , two texture are considered similar if their first and second order statistics are similar. We can also define texture in a simple way as a combination of local intensity constancy and/or variations that produce spatial patterns, roughly independently at different scales. Therefore, the challenge of texture analysis is to tackle these patterns in a multiscale manner, keeping a tradeoff between performance and computational complexity (cost). This has taken decades of study and heterogeneous literature, ranging from mathematical to bio-inspired methods.
Color images are present in the vast majority of current imaging devices, however, most of the texture analysis methods are monochromatic, i.e. they consider only one image channel, or the image luminance (grayscale). The true color information is usually lost during a grayscale conversion or processed separately with non-spatial approaches such as color statistics, histograms, etc. Convolutional networks have been employing the whole color information for various tasks such as object recognition [32, 46, 48, 24] and texture analysis [12, 6], with promising results. However, it is important to notice that even these methods do not consider the direct relation of pixels in different color channels. In fact, few works explore spatial patterns between color channels and their benefits for color-texture analysis.
The main contribution of this work is the proposal of a new method for color-texture analysis that performs a deep characterization of spatial patterns within-between color channels. This is achieved through a directed Spatio-Spectral Network (SSN) that models texture images creating connections pointing towards the gradient in a radially symmetric neighborhood, linking pixels from the same or different channels. A radius parameter defines a local window size, and each symmetric neighborhood contained in that region provides relevant color-texture information. The characterization is done through well-known network topological measures of low computational cost, such as the vertex degree and strength distributions. The combinations of different measures from the network structure provide a robust color-texture descriptor. The source code of the SSN method is available at GitHub 111The script to compute SSN descriptors is available at www.github.com/scabini/ssn.. We perform classification experiments to analyze the performance of our proposed descriptor and also compare our results with several methods from the literature in 4 color-texture datasets. Moreover, we analyze the robustness of each method for different color spaces (RGB, LAB, HSV and ) on each dataset.
The challenge of texture analysis lies in effectively characterizing local and global texture patterns while keeping a balance between performance and complexity (cost). Decades of research resulted in a wide range of methods [34, 27], where most of the techniques focus on grayscale images. Statistical methods explore measures based on grayscale co-occurrences in a local fashion, and the most diffused ones are the Gray Level Co-occurrence Matrices (GLCM)  and the Local Binary Patterns (LBP) [37, 38]. These methods influenced future techniques, following the same principles. For instance, Local Phase Quantization (LPQ)  computes local descriptors in a similar way to LBP but focusing on the problem of centrally symmetric blur. The Completed Local Binary Patterns (CLBP)  includes information not covered by the original LBP through the local difference sign-magnitude transform. Another approach to texture analysis consists of transforming the image into the frequency domain. There are various methods on this approach and most of them are based on the Gabor filters  or Fourier spectrum . There is another category of texture analysis which explores image complexity and fits on the model-based paradigm, that includes methods based on fractal dimension  and Complex Networks (CN) [7, 44]. The latter approach consists of modeling images as networks and using their topological properties for texture characterization.
Color is key information for the human visual system which helps to recognize things faster and to remember then better . The theory of opponent color processing  determines that the human visual system combines the responses of different photoreceptor cells in an antagonistic manner. This happens because the wavelength of light for which the three types of cones (L, M, and S) respond overlap. Therefore, it is hypothesized that the interpretation process works in this way because for the visual system it is more efficient to store the differences between the responses of the cones rather than the response of each individual type of cone. The theory suggests that there are two opposing color channels: red versus green and blue versus yellow . In other words, according to this theory evolution has made the human visual system focus on variations between pairs of colors rather than individual colors as a way to improve color vision.
Primary color spaces, based on the trichromatic retinal system. These methods assume that any color can be represented by the appropriate mixing of quantities of the three primary colors. The most common format is called RGB (Red, Green, and Blue). This format is widely used in many situations because most imaging devices work in this way.
The luminance-chrominance color spaces, where one channel represents luminance and two channels describe the chrominance. A widely used format is known as LAB, or L*A*B*, or even CIELAB, and is a color space defined by the International Commission on Illumination (CIE) in 1976. This format is inspired by the theory of opponent color processing since channel L represents luminosity, channel A represents the red versus green component, and channel B represents the blue versus yellow component.
Perceptual color space, where one of the best-known methods of this type is HSV (hue, saturation, and value) proposed in 1970. In this model the colors of each hue are arranged in a radial slice, that is, different from the three-dimensional space of the models mentioned above, the HSV format has a cylindrical geometry. In this space the hue represents the angular dimension, ranging from (red), (green) and (blue). The vertical axis represents , which sets the brightness level from 0 (black) at the bottom to 1 (white) at the top. The distance from the center to the edge of the cylinder represents the value of , which defines the saturation of the color, where the pure colors are at the maximum value of , at the edges of the cylinder.
The color spaces based on the concept of independent axes, which are obtained by statistical methods. An example of this type of method is the color space, which consists of computing components with the lowest possible correlation. Normally this color space is obtained through an RGB image, converted from a linear transformation. In this case, the channel
color space, which consists of computing components with the lowest possible correlation. Normally this color space is obtained through an RGB image, converted from a linear transformation. In this case, the channelrepresents the brightness (), the channel the opponent between red and blue (), and the channel the opponent between green, red and blue ().
Given an image represented by a certain color space, a method of color texture analysis should explore ways of characterizing the present spectral information. There are different approaches to color texture analysis in the literature and most of them are integrative, which separate color from texture. Integrative methods commonly compute traditional grayscale descriptors of each color channel, separately. In this case, any method of grayscale texture analysis can be applied by combining descriptors from each channel. Another type of approach is called pure color, which only considers the first order distribution of the color, not taking into account spatial aspects commonly analyzed in grayscale images. In this type of methods, the most widespread is based on color histograms . This technique computes a compact summarization of the color distribution of the image. However, these methods do not consider the spatial interaction of pixels, as is done in texture analysis. Therefore, a common approach is to combine pure color descriptors with grayscale descriptors in parallel methods. In [35, 9] the reader may consult an analysis between different integrative, pure color and parallel methods.
In some cases, integrative methods generate results similar to traditional grayscale methods. Authors have then questioned  if the inclusion of color information is feasible for texture analysis, arguing that color and texture should be handled separately. However, it should be taken into account that few techniques consider all color information in a more intrinsic approach and that there are different concepts of color processing, even biological ones, that must be taken into account. For instance, the aforementioned opponent color processing theory of the human visual system is an interesting way of color characterization. Since texture consists of local patterns of intensity changes, color-texture information can be extracted by analyzing the spatial relationship between different colors. One of the first works approaching this idea  explores the interaction between pairs of color channels, and results indicate an increase in the textural information obtained, which is not available in the analysis of individual color channels. In a paper from 1998  a technique based on filter banks is introduced for the characterization of colored texture. Monochromatic and opponent channels are obtained, calculated from the output of Gabor filters. More recently, some methods based on image complexity focused on the analysis of within-between channel aspects for color texture with fractal geometry  and CNs [30, 44].
The researches in CNs arise from the combination of graph theory, physics, and statistics, with the aim of analyzing large networks that derive from complex natural processes. Initially, works have shown that there are structural patterns in most of these networks, something that is not expected in a random network. This led to the definition of CN models that allow us to understand the structural properties of real networks. The most popular models are the scale-free  and small-world  networks. Therefore, a new line of research has been opened for pattern recognition, where CNs are adopted as a tool for modeling and characterizing natural phenomena.
networks. Therefore, a new line of research has been opened for pattern recognition, where CNs are adopted as a tool for modeling and characterizing natural phenomena.
The concepts of CN are applied in several areas such as physics, biology, nanotechnology, neuroscience, sociology, among others  . Applying CNs to some problem consists of two main steps: i) modeling the problem as a network; ii) structural analysis of the resulting CN. The topological quantification of the CN allows us to arrive at important conclusions related to the system that it represents. For example, local vertex measurements can highlight important network regions, estimate their vulnerability, find groups of similar vertices, and etc.
. Applying CNs to some problem consists of two main steps: i) modeling the problem as a network; ii) structural analysis of the resulting CN. The topological quantification of the CN allows us to arrive at important conclusions related to the system that it represents. For example, local vertex measurements can highlight important network regions, estimate their vulnerability, find groups of similar vertices, and etc.
A network can be defined mathematically by , where is a set of vertices and a set of edges (or connections). The edges can be weighted, representing a value that describes the weight of the connection between two vertices, or unweighted, indicating only if the connection exists. The edges can be either undirected, satisfying , or directed, satisfying , i.e. can be something other than .
The topology of a CN is defined by the patterns of its connections. To quantify it, measurements can be extracted for either individual vertices, vertex groups or globally for the entire network. One of the most commonly used measures is the vertex degree, which is the sum of its connections. Considering the sets and , the degree of each vertex can be calculated as follows:
Note that in this case the degree is calculated in a binary way since the sum considers only 1 if there is the edge, or 0 if it does not exist. If the network is weighted, the degree can also be weighted, metric commonly known by vertex strength. Therefore, the weight of all edges incident on the vertex is summed
In directed networks it is possible to calculate the input and output degree of vertices according to the edge directions. The output degree represents the number of edges leaving , and yields the same equation of the degree in undirected networks (Equation 3). To compute the input degree, it is necessary to invert the edge check
which then sums the number of edges pointing to . Analogously we can compute the input and output strength of a vertex ( and ) by summing the weight of its edges ( instead of 1) according to its direction.
Since CNs are flexible structures that allow the analysis of several phenomena in the real world, it is possible to use them in image modeling. In this case, it becomes possible to transform a computer vision problem into a CN problem, which can be treated in different ways. The first step of this approach is the modeling of the network from the image, that is the definition of what are vertices and edges. This setting is variable and usually depends on the problem in question.
For texture modeling, the technique usually employed is to consider each pixel of the image as a vertex to build an undirected and weighted network. Consider a
Since CNs are flexible structures that allow the analysis of several phenomena in the real world, it is possible to use them in image modeling. In this case, it becomes possible to transform a computer vision problem into a CN problem, which can be treated in different ways. The first step of this approach is the modeling of the network from the image, that is the definition of what are vertices and edges. This setting is variable and usually depends on the problem in question. For texture modeling, the technique usually employed is to consider each pixel of the image as a vertex to build an undirected and weighted network. Consider agray image with pixels, with intensity levels between ( is the highest possible intensity value in the images). A network is obtained by constructing the set . The first work  consider the absolute difference of intensity between pixels to define the weight of their connection. It is important to note that the intensity difference is not affected by changes in the average illumination of the image . Given a radius that defines a spatial boundary window, a new network is obtained where each vertex is connected to its neighbors with weight (normalized absolute intensity difference) if , where represents the pixel Euclidean distance. This same modeling approach is also used by the methods proposed in [20, 19]. The connection weight inversely represent pixel similarity, where lower values means high similarity. However, this equation does not include spatial information on the connection weight, which led future works to propose new rules. In  the term was included on the calculation of the edge weight, giving equal importance to both the pixel intensity difference and its spatial position inside the connection neighborhood (). A different approach is introduced in , where intensity and distance are directly proportional (). The inclusion of the spatial information overcomes the limitation of previous methods where the connection weight towards pixels with the same intensity would be the same regardless of their distance to the central pixel.
The steps described so far results in a network with scale proportional to , which limits the connection neighborhood. However, this is a regular network, as all vertices have the same number of connections (except for border vertices). Therefore, a transformation is needed in order to obtain a network with relevant topological information. In most works this is achieved through connection thresholding with an additional parameter , therefore a new network is obtained by transforming its set of edges . The resulting network then keep connected similar pixels, where controls the similarity level. It is intuitive to conclude that the resulting topology is directly influenced by the parameters and . This allows a complete analysis of the network dynamic evolution from smaller to higher neighborhood sets and different levels of pixel similarity. The final texture characterization is then made through CN topological measures such as the vertex degree, strength, and others .
The concepts of CN applied to texture analysis have been explored and improved in more recent works. In  a new multilayer model is introduced for color texture analysis, where each network layer represents an image color channel, and its topology contains within-between channel connections in a spatial fashion. This work also proposes a new method for estimating optimal thresholding, and the use of the vertex clustering coefficient is also introduced for the network characterization, achieving promising results. In  an interesting technique is proposed to build a vocabulary learned from CN properties and also to characterize the detected key points through CN, exploring the relevance of various topological measures.
We propose a new network modeling for color-texture characterization with various improvements over previous CN-based methods. Our method models the spatial relation of intra and inter-channel pixels through a directed CN, that we named Spatio-Spectral Network (SSN). Firstly, consider a color image with size , i.e pixels with colors whose values range from , and a network defined by a tuple , where represents the network vertices and its edges. The vertex set is created as in , where each image pixel is mapped as a vertex for each color-channel, thus . This creates a multilayer network, where each layer represents one image color-channel and each vertex carry a pair of coordinates , indicating the position of the pixel that it represents, and a value indicating the pixel intensity value on its respective color-channel.
In order to create the connections, previous works usually adopt a set of radii to limit the size of the vertex neighborhood, i.e. a set of sliding windows of radius defines a distance limit for vertices to be connected, resulting in a set of networks. In other words, a vertex is connected to a pixel if it is inside its neighborhood , where is the 2-D Euclidean distance between the vertices. Therefore, as in , this neighborhood covers vertices in all color-channels because it considers only the spatial position of the pixels to compute its distance. This process allows to access the dynamic evolution as increases by analyzing each network , and have demonstrated to be effective for color texture characterization . On the other hand, it is possible to notice that for a set of increasing radii , and , . This means that each neighborhood contains all neighborhoods from previous radius, which leads to redundancy between networks .
In this context, we propose a radially symmetric modeling criteria by redefining the neighborhood as , i.e. it considers only vertices in a distance range between the considered and the previous radius, thus . It is important to notice that we include vertices with distance 0 in the neighborhood , which is the case when a pixel connects to itself in different color-channels. The neighborhood definition can also be formulated as a combination of the distance and the number of neighbors , which is usually adopted in LBP-based methods . In our case, as the neighborhood covers all the color channels, we can represent the vertex neighbors for radius 1 by (, ), as the vertex connects to 4 neighbors in each channel plus itself on the other channels. Analogously, radius 2 creates a neighborhood (, ), and so on. To illustrate this concept, Figure 1 shows a CN modelled for 1 image channel and highlights the difference between the standard (a-b) and the radially symmetric (c-d) neighboring. In our proposal, the radially symmetric neighborhood is then extended for all image channels (e).
The definitions given so far concerns only the pixel neighborhood in which vertices will connect, therefore the next step is to define the connection creation. The weight of the connection between pairs of vertices are defined by their absolute intensity difference directly proportional to their Euclidean distance on the image
This is a modification over the equations proposed in [45, 44] so that neither side of the multiplication cancels the other when the intensity difference or the distance is 0 (same pixel in different channels), and to generate uniform values between . It is important to notice that according to the proposed neighboring, the creation of connections from all to all channels with the given connection weight implies that the network somehow performs opponent color processing in a spatial fashion. Therefore, the SSN is a combination of CN and a bio-inspired approach based on the Opponent-Process Theory , but considers the opposing color pairs red versus green, red versus blue and green versus blue instead of red versus green and blue versus yellow. If another color space than RGB is used, then different opposing color pairs will be considered acording to that space.
As previously mentioned, most CN-based approaches employ weighted undirected networks, which needs thresholding techniques in order to obtain relevant topological information for analysis. This happens because a network that connects all pixels inside a fixed radius have constant topological measures such as degree, strength, clustering, etc. However, the thresholding leads to additional parameters that need to be tuned, which have been a drawback of previous works where authors explored costly techniques for optimal threshold selection [45, 44]. In this sense, we propose a directed technique that eliminates the need for thresholding, reducing the method parameters to only the set of radius . This is achieved by associating the direction of the connection to the direction of the gradient, i.e. towards the pixel of higher intensity. This idea was first employed in previous work for grayscale texture characterization , and here we extend its definitions for multilayer networks of color images, along with our new connection weight equation (Equation 4). Consider a network , its set of edges is defined by
and when the edge is bidirectional (). This process generates a network that contains relevant topological information reflected on its directional connection patterns, which can be quantified through directed measures such as the input and output degree and strength of vertices, eliminating the need for connection cutting.
The spatio-spectral nature of our network emerges from the patterns of within-between channel connections which includes information of the image gradient through edge directions. To highlight this information, we divide the original network as in , obtaining two additional networks, being the first whose edges are a subset of that contains within-channel connections, thus , , where return the channel/layer of . The second network represents between-channel connections, then , . The vertex set of each network , and are the same () as we only divide its edges. Figure 1 (d, e and f) illustrates the structure of the 3 networks for radius . By quantifying their topology, it is possible to obtain rich color-texture information for image characterization, as we discuss in the following.
To characterize the directed SSN, we propose the use of traditional centrality measures computed for each vertex considering its edge directions, which is the input and output degree and strength (See equation 3). These measures can be effectively computed during the network modeling as only the vertex neighbors must be visited for the calculation, therefore there is no significant additional cost for the network characterization. Our approach cost is then smaller than the method of . This one uses the vertex clustering coefficient, which needs to store the network to visit the neighbors of the neighbors of each vertex.
Notice that , , where indicates the maximum number of possible connections for . This means that the input and output degree are a linear transformation of each other in the function of the network max degree, therefore we use only the input degree. It is not possible to make the same assumption for the vertex strength, as it also depends on the distribution of image pixel values in the connection neighborhood, thus we use both the input and output strength. In this context, the network characterization is performed combining the three measures (input degree, input strength, and output strength). Each topological measure highlights different texture patterns of the modeled image, as we show in Figure 2 for each network. We consider a set of feature maps all the topological information obtained through the characterization of the networks , and with the aforementioned 3 centrality measures (, and ) in a multiscale fashion with a set of increasing radius given . In the qualitative analysis shown in Figure 2, it is possible to notice how the feature maps highlight a wide range of color-texture patterns.
The feature maps obtained from the characterization of SSN must be summarized in order to obtain a single and compact image descriptor. From a network science perspective, it is possible to obtain the probability distribution of the three centrality measures, which is a common approach for network characterization. We propose to compute the distribution for each layer of
The feature maps obtained from the characterization of SSN must be summarized in order to obtain a single and compact image descriptor. From a network science perspective, it is possible to obtain the probability distribution of the three centrality measures, which is a common approach for network characterization. We propose to compute the distribution for each layer of, and separately, therefore, distributions are obtained from each network. It is intuitive to conclude that a separate analysis should provide more specific information regarding patterns occurring in each network layer. This technique improved our results if compared to using the whole network as proposed in . For the exact computation of the probability distribution function of each network layer, we fixed the number of bins for counting the input degree occurrence as the maximum possible degree. For the occurrence counting of the strength measure, the bin number is maximum possible degree multiplied by 10. We define the probability distribution function of each measure by , and . Figure 3 shows each distribution of a SSN according to the proposed layer-wise analysis, where it is possible to notice a clear distinction between the two different input textures. The topological arrangement of the network varies greatly according to the input texture and the layer being analyzed, however, it is possible to notice a power-law-like behavior in some cases. As a matter of comparison, the multilayer networks of  seems to present a similar topology for different image inputs, with variations of the small-world effect and the occurrence of power-law-like degree distributions. On the other hand, here the directed SSN present heterogeneous arrangements where the network structure varies greatly between different texture, which then provides better topological measures for characterization. This difference between the previous work  happens mostly due to the use of connection direction, the radially symmetric neighboring and the layer-wise network characterization.
Although the distributions , and summarize the network topology, it is still impracticable to apply them as an image descriptor because as higher the parameter , higher is the size of its combination, e.g. for a single network of a RGB image, the size would be (19 + 190 + 190)3 = 1197 (the size of each distribution obtained for each of the 3 image channels). Therefore, we propose to use statistical measures to further summarize the SSN structure. We employ the four measures proposed in  (mean , energy , and entropy
and the Kurtosis
and the Kurtosis
The combination of the six statistics for each of the three topological measures of each layer comprises a network descriptor. Consider a network and its set of vertices redefined as a combination of vertices from each of the layers , then a vector
, then a vectorrepresents the six statistics for a given layer and measure . The concatenation of measures from each layer compose the network descriptor
Considering the multiscale approach, we propose the use of a set of 1-by-1 increasing radius of to maintain the proportion of each symmetric neighborhood. Therefore, the final descriptor which addresses the dynamic evolution of each network is obtained by , and . The combination of descriptors from networks , and results in a complete representation that comprises the whole spatio-spectral information
The cost of the proposed method is directly proportional to the image size and the largest radius used (), which defines the window size of connection creation. The last radius, , indicates that all pixels at distance must be visited to compose each neighborhood , according to the radially symmetric neighborhood previously defined. Therefore, consider an image with size and channels (), and as the number of pixels contained in all neighborhoods for a set of radius . The cost of the image modelling as SSN is then
considering that the modeling of , and are performed together, as it is only necessary to verify the pixel channel in order to determine whether the connection belongs. As previously mentioned, the overall implementation of the proposed method is similar to the previous work  (that has cost ) without the cost to compute the clustering coefficient of each vertex. Therefore, the network measures are computed during the modeling step with no additional cost, as there is no need to store the network. The characterization cost is then related to the computation of each measure distribution and its statistics, which is much smaller than the modeling cost. Therefore, the asymptotic limit of the proposed method is the same as defined in Equation 11. As the time-consuming experiment presented in  shows, this approach is faster than recent deep convolutional networks.
This section presents the experiments performed to evaluate the proposed SSN under different scenarios and to compare our results with other methods from the literature. We perform a supervised classification scheme using the Linear Discriminant Analysis (LDA) classifier  , which consists of finding a linear combination of characteristics where the variance between classes is greater than the intra-class variance. The performance is measured by the accuracy of leave-one-out cross-validation, which is a repetition of
, which consists of finding a linear combination of characteristics where the variance between classes is greater than the intra-class variance. The performance is measured by the accuracy of leave-one-out cross-validation, which is a repetition of(number of samples of the dataset) train-test procedures where one sample is used for test and the remainder for training at each iteration (each sample is used as test once). The accuracy then is the percentage of correctly classified samples.
The following color texture datasets from the literature are used:
USPtex: This dataset  was built by the University of São Paulo and contains 191 classes of natural colored textures, found on a daily basis. The original images are 512x384 in size and are divided into 12 samples of size 128x128 without overlap, totaling 2292 images in total.
Outex13: The Outex framework  is proposed for the empirical evaluation of texture analysis methods. This framework consists of several different sets of images, and the Outex13 dataset (test suit OutexTC00013 on the web site 222www.outex.oulu.fi) focuses on the analysis of texture considering color as a discriminatory property. The dataset contains 1360 images divided into 68 classes, that is, 20 samples per class, of size 200x200.
MBT: The Multi-Band Texture  is composed of 154 colored images of classes formed by the combined effect of spatial variations within and between channels. This type of pattern appears in images with high spatial resolution, which are common in areas such as astronomy and remote sensing. Each of the 154 original images, 640x640 in size, is divided into 16 non-overlapping samples, size 160x160, composing a set of 2464 images.
CUReT: The Columbia-Utrecht Reflectance and Texture dataset  is composed of colored images of materials. The base contains 61 classes with 92 samples each, where there is a wide variety of geometric and photometric properties as intra-class variations in rotation, illumination, and viewing angle. Classes represent surface textures of various materials such as aluminum foil, artificial grass, gypsum, concrete, leather, fabrics, among others.
We analyze the proposed method in terms of its parameter and the impact on using different feature vectors from the combination of the networks , and . Figure 5 shows how the accuracy rate obtained with each feature vector , , , and changes as increases. It is possible to notice that the performance grows to a certain point and then stabilizes, except for Outex13 where it starts to drop after . This indicates that smaller values of are not sufficient to achieve a complete multiscale texture characterization, and also that too higher values on Outex13 can provide feature redundancy as the number of descriptors grows. On the other hand, peaks of performances are achieved with on USPtex and CUReT, but with a small performance difference in comparison to smaller values. Therefore, a more stable region lies around if we consider all datasets. Regarding the feature vectors, it is possible to notice that the performance for each network alone vary between datasets, where is better on USPtex and MBT, while is better on Outex13 and CUReT. On the other hand, both vectors and present the highest results in most cases, which is to be expected as it combines measures from both within and between-channel analysis. We then suggest the use of the combinatorial feature vectors in order to obtain better results in different scenarios.
On Table 1 we show the average accuracy (for the 4 datasets) of the combinatorial feature vectors and as increases, along with the corresponding number of descriptors. As we previously suggested, the interval has a smaller standard deviation, which indicates a more stable region. Moreover, the highest average performance is obtained with for using 648 descriptors, and with and for using 648 or 810 descriptors, respectively. On this context, the results suggest that the 2 alternatives or has a similar average performance than but using a smaller number of descriptors. For a more satisfactory performance in most scenarios, we suggest the use of due to the use of a larger neighborhood, providing a better multiscale analysis.
|descriptors||mean acc.||descriptors||mean acc.|
The following literature methods are considered for comparison: Opponent-Gabor  (264 descriptors), Local Phase Quantization (LPQ)  (integrative, 768 descriptors), Complete Local Binary Patterns (CLBP)  (integrative, 354 descriptors), Complex Networks Traditional descriptors (CNTD)  (108 descriptors for grayscale and 324 for integrative) and Multilayer Complex Network Descriptors (MCND)  (we use the best results available in the paper).
We also compare results with some well-known deep convolutional neural networks (DCNN), these are: AlexNet . The DCNN are employed as feature extractors using a pre-trained architecture on the ImageNet object recognition dataset, from the Large Scale Visual Recognition Challenge , producing a vector of a size corresponding to the number of 2D feature activation maps. However, most of the last convolutional layer of DCNNs produce a high number of feature activation maps such as 2048 for the Inception and ResNet models. In this sense, we propose to use previous convolutional layers until the size is smaller than 800. This reduction is performed for various reasons, where the most obvious is to avoid the curse of dimensionality, caused by the exponential increase in the volume associated to the addition of extra dimensions to the Euclidean space . It is also important to highlight that we use the raw output of the corresponding convolutional layers, i.e we do not apply its following summarizing functions such as ReLU and local poolings. This approach improves the average performance of DCNN as texture feature extractors when compared to the results in
(we use the best results available in the paper). We also compare results with some well-known deep convolutional neural networks (DCNN), these are: AlexNet, VGG16 and VGG19 , InceptionV3 [47, 48], ResNet50 and ResNet101 
. The DCNN are employed as feature extractors using a pre-trained architecture on the ImageNet object recognition dataset, from the Large Scale Visual Recognition Challenge. Each considered architecture and its pre-trained weights are available at the Neural Network Toolbox of the Matlab 2018a software 333https://www.mathworks.com/help/deeplearning/ug/pretrained-convolutional-neural-networks.html. The use of pre-trained DCNN achieves higher performance than training these models from scratch on the considered color texture datasets, as results in  shows. As each DCNN has a fixed input size, images from each dataset are resized before being processed, and the classification part (fully-connected layers) is removed. Therefore the 2D feature activation maps generated by the last convolutional layers are considered to compute image features, this approach has been used previously [33, 11, 12]. To obtain a feature vector a Global Average Pooling (GAP) is applied as in 
, producing a vector of a size corresponding to the number of 2D feature activation maps. However, most of the last convolutional layer of DCNNs produce a high number of feature activation maps such as 2048 for the Inception and ResNet models. In this sense, we propose to use previous convolutional layers until the size is smaller than 800. This reduction is performed for various reasons, where the most obvious is to avoid the curse of dimensionality, caused by the exponential increase in the volume associated to the addition of extra dimensions to the Euclidean space. Another reason is a known effect of the LDA classifier, the Singularity, Small Sample Size (SSS), which happens when there is high dimensional feature vectors or a small number of training images [26, 49]
. It is also important to highlight that we use the raw output of the corresponding convolutional layers, i.e we do not apply its following summarizing functions such as ReLU and local poolings. This approach improves the average performance of DCNN as texture feature extractors when compared to the results in, therefore we believe that it is reasonable for our purpose. Additionally to the aforementioned situations, the reduction of the DCNN feature vector also improves the computational cost of the training and classification step, while keeping a reasonable number of descriptors in comparison to other hand-crafted descriptors. The exact number of features obtained with InceptionV3 and both ResNet models are, respectively, 768 and 512. For AlexNet and both VGG models, the size corresponds to its last convolutional layer, which is 256 and 512, respectively.
We compare the accuracy rate between the proposed approach and the aforementioned methods on the four considered datasets, results are shown on Table 2, the highest result of each column is highlighted in bold type. For the USPtex dataset, the lowest results are achieved by the integrative methods LPQ, CLBP and CNTD (around and ), followed by the MCND and the Opponent-Gabor descriptors (around ). DCNNs performs near the highest results, with small differences between each model. The highest result is obtained using the proposed method and , achieving of accuracy rate. The suggested feature vector achieves the second highest result of , equivalent to the performance obtained by the ResNet50 DCNN. On the other hand, DCNN performs better than the previous hand-crafted descriptors, but are overcome by the proposed SSN.
|LPQ integrative (2008)||90.4||80.1||95.7||91.7||89.5()|
|CLBP integrative (2010)||97.4||89.6||98.2||91.8||94.3()|
|CNTD (grayscale) (2013)||92.3||86.8||83.7||84.2||86.8()|
|CNTD integrative (2013)||97.9||92.3||98.5||91.9||95.2()|
|MCND (2019) ||99.0||95.4||97.0||97.1||97.1()|
Outex13 is the hardest dataset in terms of color texture characterization, as results shows. Moreover, on this dataset the performance of DCNNs drops considerably, performing around and . The integrative LPQ and CLBP and the grayscale approach of CNTD present the lowest accuracies, with , and , respectively. On the other hand, the integrative CNTD method performs above the DCNN (), and the Opponent-Gabor method overcomes both the integrative and the DCNN methods, achieving . The highest performance is obtained by the proposed SSN and the MCND, with and , respectively. The suggested feature vector achieves , which also overcomes the other methods, where the performance improvement is of over MCND and over the best DCNN, ResNet50. The results of the best methods on this dataset, Opponent-Gabor, MCND, and SSN, corroborates to the importance of CN and the within-between channel analysis for color texture characterization.
On the MBT dataset, we see again a different performance pattern regarding the literature approaches. Firstly, the large performance difference between the grayscale and the integrative CNTD method ( and ) indicates that color plays an important role in the MBT color texture characterization. The DCNN AlexNet and VGG16 achieve around of accuracy, similar to the Opponent-Gabor approach. On the other hand, the deeper DCNN (Inception and ResNet) achieves the lowest results, around . The highest accuracy is obtained by the proposed method, with at its best configuration and using the suggested feature vector .
The last dataset, CUReT, is the largest in terms of the number of samples per classes, and we can also notice a different performance pattern regarding the literature methods. On this dataset, the depth of DCNNs seems to be beneficial to performance, as we can observe for VGG and ResNet, which can be related to the dataset size. The integrative methods present the lowest performance, around , performing below Opponent-Gabor and MCND ( and , respectively). The DCNN descriptors perform above the previous methods, where the deeper network analyzed, ResNet101, achieves . The proposed method again overcomes the other approaches, where the highest accuracy rate is achieved with the configuration (), and the proposed configuration achieves .
The last column of Table 2 shows the average accuracy of each method over all datasets (standard deviation in brackets). We can see that the proposed SSN achieves the highest average performance for any configuration, and also the lower standard deviations, corroborating to its robustness for different color texture scenarios. The DCNN overcomes the previous hand-crafted descriptors, however, its high standard deviation reflects its performance drop on the Outex13 and MBT datasets. Similarly, the integrative descriptors also present a varying performance, as its standard deviation shows. Regarding the average performance of the Opponent-Gabor method, we can conclude that the opponent color processing allows a better characterization in some cases, therefore this method has a higher performance than integrative methods and performs close to DCNNs. This corroborates to the benefits of the opponent color analysis from which the proposed method benefits, however, our approach keeps a higher performance and a lower oscillation between datasets in comparison to the Opponent-Gabor method. Considering the CN-based methods, we can see that the CNTD approach also has an oscillating performance, and is overcome by the MCND which has the second highest average performance and also the second lowest standard deviation, behind only the proposed method. Overall, the results presented here corroborate to the effectiveness of SSN for color texture analysis in various scenarios, where it keeps the highest performance in all cases, while most of the other literature methods oscillate.
In color texture analysis the image representation through different color spaces may impact the characterization performance. On this context, this section presents a set of experiments by varying the image color space of the four considered datasets and comparing the performance between the proposed method () and other literature approaches. The MCND approach is not included in this analysis because the available results concern only the best configuration of the method for each dataset separately in the RGB space. Four color spaces are considered for analysis, being one for each different approach: RGB, LAB, HSV, and . These techniques are described in depth at Section 2.2. As some color spaces use different value ranges for different channels, we normalize its values to . First, we analyze the performance for the USPtex and Outex13 datasets, the accuracies of leave-one-out cross-validation with the LDA classifier are shown in Table 3. It is possible to observe how different color spaces influence each method, in different ways. For example, the integrative methods, had increased performance in the HSV space, while the opposite happens with the DCNN. In fact, this effect on the convolutional networks also happens in all other channels, albeit with a lower intensity. This is somewhat expected since these networks are pre-trained in a set of RGB images. The two methods that presented greater robustness to the different types of color spaces were, respectively, the proposed SSN and the Opponent-Gabor method. This indicates that the opponent color processing approach allows a greater tolerance to the different ways of representing the colors of the images. The proposed method incorporates these characteristics and, at the same time, increases the invariance to color spaces, reaching the highest average performance and the lowest standard deviation of versus of Opponent-Gabor. In addition, the SSN method also achieves the highest results in each color space individually.
In the Outex13 dataset, the methods with the lowest performance are, respectively, LPQ and InceptionV3. The other DCNN, except for AlexNet, get an average of around . The CLBP method has its performance reduced in the LAB and color spaces, which results in an average below the convolutional networks, of . The integrative CNTD method overcomes other integrative methods, obtaining an average of , a result also obtained approximately by the Opponent-Gabor method and by the AlexNet DCNN. This shows that traditional integrative methods have limitations on this dataset when different color spaces are considered. For example, the CLBP method, in the USPtex dataset, is benefited by the changes of color space, which does not happen in Outex13. The methods with the greatest robustness to the different color spaces in this dataset are again the proposed SSN and the Opponent-Gabor method, with an average performance of and , respectively. In this context and considering the results obtained by the Opponent-Gabor and the integrative CNTD methods, it is possible to note that CN and opponents techniques present the best performances on this dataset, and the SSN method which combines these properties achieves the highest results for all color spaces.
Table 4 shows the obtained results for the MBT and CUReT datasets. Regarding MBT, the highest results are of the SSN method, of the integrative CNTD approach and obtained by both Opponent-Gabor and CLBP methods. On this dataset, the average performance of the deepest neural networks (InceptionV3 and both ResNets) is the lowest observed, close to , while the smaller networks AlexNet and VGG16 obtain, respectively, and . The high standard deviation of the DCNN is due to the sharp performance drop with the HSV color space, with losses of almost . Differently, from what is observed in the other bases, this also happens with all other methods, but with a smaller intensity if compared to the DCNN. On this aspect the proposed method proved to be more robust, achieving the highest performance under HSV and also the lowest variation between color spaces. The results obtained in the CUReT dataset corroborate the robustness of the proposed method to different color spaces. The highest result is obtained by SSN in all color spaces with an average performance of . On this dataset, the DCNN ResNet (50 and 101) outperform the integrative methods in the individual LAB and color spaces, and despite presenting again a loss in the HSV space, their average performance overcomes the other literature methods. In the HSV space, the best results are obtained by the proposed method and by the Opponent-Gabor method. In general, on this dataset both integrative and convolutional networks have the highest standard deviation, indicating that their performance oscillates more between different color spaces. The methods with lower oscillation of performance are the proposed SSN and the Opponent-Gabor method, respectively, reinforcing the robustness of the opponent color processing and the CN-based approach.
To analyze the overall performance of the methods on each color space separately, we compute the average accuracy over all the datasets, results are shown in Figure 6 for the 7 best methods. It is possible to notice that SSN has the highest average performance for all color spaces. The DCNN AlexNet achieves the second best performance overcoming other compared methods for the RGB, LAB and color spaces, but performs poorly with the HSV color space, along with the other DCNN. On the other hand, the proposed method overcomes the other approaches with the largest margin on the HSV color space.
The color space experiment shows that the proposed SSN has great robustness both between datasets or between different color spaces for a single dataset. In all scenarios, its performance variation is significantly smaller in comparison to the other literature methods, while also achieving the highest results. Therefore, considering all the obtained results, SSN stands out as a very effective approach for color texture characterization in a wide range of different scenarios, such as under different color texture properties (considering the heterogeneity of the datasets) and different color space representations.
This work introduces a new single-parameter technique for color texture analysis which consists of the modeling and characterization of a directed Spatio-Spectral Network (SSN) from the image color channels. Each pixel, in each channel, is considered as a network vertex, resulting in a multilayer structure. The connections are created according to a proposed technique of radially symmetric neighborhood given a distance limiting parameter . Directed connections within-between color channel are defined pointing towards the pixels of higher intensity, and the connection weight consists of a normalized calculation of the intensity difference multiplied by the pixel Euclidean distance. This process results in a rich network with deep spatio-spectral properties, capturing a wide range of texture patterns related to the combined spatial and spectral intensity variations. This network is then quantified with topological measures following a complete characterization procedure, considering different measures, different connection types and analyzing each network layer separately. The whole process results in a compact and effective image descriptor for color texture characterization.
We performed classification experiments on four datasets (USPtex, Outex13, MBT, and CUReT) in order to evaluate the proposed SSN and to compare its performance to other methods from the literature. 12 methods are considered for comparison, including Gabor filters obtained from opponent color channels, integrative versions of grayscale texture descriptors (LPQ, CLBP, and CNTD), CN-based methods (CNTD and MCND) and DCNN (AlexNet, VGG, Inception, and ResNet). Results show that the proposed approach has higher performance for all datasets, and also the smallest variation between datasets, corroborating to its robustness to different scenarios. We also perform an evaluation of the impacts of different color spaces (RGB, LAB, HSV and ) on color texture characterization, which shows that SSN also has the higher average performance and higher tolerance to each color space than the other compared methods from the literature. The obtained results suggest that the spatio-spectral approach combined with the flexibility of complex networks to model and characterize real phenomena is a powerful technique for color texture analysis, and its properties should be further explored.
L. F. S. Scabini acknowledges support from CNPq (Grants #134558/2016-2 and #142438/2018-9). L. C. Ribas gratefully acknowledges the financial support grant #2016/23763-8 and #2016/18809-9, São Paulo Research Foundation (FAPESP). O. M. Bruno acknowledges support from CNPq (Grant #307897/2018-4) and FAPESP (grant #2014/08026-1 and 2016/18809-9). The authors are also grateful to the NVIDIA GPU Grant Program for the donation of the Quadro P6000 and the Titan Xp GPUs used on this research.
Encyclopedia of machine learning and data mining, pages 314–315, Berlin, 2017. Springer.