Texture is a key visual feature used to describe images in many problems of computer vision and image processing such as plant recognition, medical image analysis, industrial inspection, and microscope images.
The visual texture is composed of sub-patterns related to the pixel distribution and with certain similarity on the image
Texture is a key visual feature used to describe images in many problems of computer vision and image processing such as plant recognition, medical image analysis, industrial inspection, and microscope images. The visual texture is composed of sub-patterns related to the pixel distribution and with certain similarity on the image[gonccalves2016texture]. Although it can be easily understood by the humans, there is no formal definition for the texture attribute. However, this fact did not prevent the progress and development of new approaches for texture analysis.
Many techniques for texture analysis have been proposed in the last decades. They can be divided into four classical categories according to the way of exploiting the texture characteristics of the image.
The statistical-based approaches have been the most studied over the last decades.
Examples of these approaches are the variants of Gray-Level Co-occurrence Matrices (GLCM) [haralick1973] and Local Binary Patterns (LBP) [ojala2002multiresolution].
The structural approaches treat the image as an arrangement of textons (ie. small elements), which form the texture from a spatially organized pattern.
Some methods of this kind of analysis are the Morphological decomposition [lam1997rotated] and key point detectors and descriptors to represent the texture elements [lazebnik2005sparse].
On the other hand, in the spectral-based approaches, the image is exploited in the power spectrum domain and the most popular methods are the Gabor filters [manjunath1996texture] and Wavelet transforms [DEVES20142925] .
Finally, in the model-based approaches, the textures are represented using mathematical models and estimation of their parameters such as Fractal models
. Finally, in the model-based approaches, the textures are represented using mathematical models and estimation of their parameters such as Fractal models[backes2012, Ribas2015] and stochastic models [panjwani1995markov].
More recently, innovative methods have been proposed to characterize textures, achieving promising results.
In particular, methods that use learning techniques to represent the texture have gained prominence such as that ones based on a vocabulary of Scale Invariant Feature Transform (SIFT) [2004densesift] (called Bag-Of-Visual-Words (BOVW)), randomized neural networks [JarbasRNN2015, SaJrRNNColor2019] and deep convolutional neural networks
and deep convolutional neural networks[simonyan2014very, szegedy2016rethinking, szegedy2017inception, basu2016theoretical]. On the other hand, methods based on image complexity analysis have also gained attention due to its capacity to deal with the complex texture patterns. In this sense, one of the most popular are the methods based on Complex Network (CN) theory. These methods are very promising because of its ability to represent the relationships among structural elements of texture. However, once the textures are modeled as CNs, how to characterize it in order to obtain representative descriptors is a challenge to overcome.
In this paper, we propose to use a randomized neural network to learn and characterize the topology of a directed CN that models a texture image. Firstly, the image is modeled as a directed CN mapping the pixels into vertices and connecting the vertices based on a connection rule. For characterization, we use each vertex of the CN as a label and the neighboring vertices (ie. the eight neighboring pixels in a window ) as an input vector for training a randomized neural network. A randomized neural network has a single-hidden layer with a very fast learning algorithm that can learn and characterize the topological characteristics of the CN. Thus, the feature vector of texture is composed of the output weights of a trained randomized neural network. In relation to the previous approach [ribas2018fusion], the main contribution of this approach is a new way to build the label and input vectors to train the randomized neural network. The proposed approach is faster and improves the classification performance.
This paper is organized as follows. Section 2 describes the fundamentals of complex network theory and randomized neural network. The proposed method for texture analysis is described in Section 3. Experimental setup used in this work is presented in Section 4, The experimental results, discussion and comparison on four databases are presented in Section 5. Finally, Section 6 concludes the paper and suggests future works.
2.1 Complex Networks
The Complex Network (CN) research, also called Network Science, arises from the combination between the graph theory, physics and statistics, targeting large and complicated real systems. Basically, most of these systems can be modeled as a structure composed of elements that interact with each other, which in a network are described by vertices and edges. However, the mathematical properties that governs these networks internal behavior is not trivial, as an example one can imagine the structural organization and functioning of the internet, composed of countless routers and computers (vertices) and their physical connections through wires (edges). These systems are collectively called complex systems, capturing the fact that it is difficult to derive their collective behavior from a knowledge of the system’s components [barabasi2016network]. Initially, it was believed that most of the real networks had random topology [randomCN, randomCNevolution]. Further research then has shown various structural patterns, which led to the definition of network models such as the scale-free [scalefreeCN] (power-law-like degree distribution) and the small-world [smallworldCN] (short average path distance and high vertex inter-connectivity). These phenomena were observed in several real systems, which then allowed important advances in the study of their functioning. Therefore, the complex network approach became popular on the analysis of various real systems in areas such as physics, biology, sociology and many more.
A Complex Network is usually defined by a combination of two sets, its vertices and edges. Let us define as a set of vertices and as a set of edges connecting vertex pairs, then a network is defined by . Here we will focus on directed weighted networks, which implies that and . To analyze a system from a network perspective, the first step is the modelling, which means defining the vertices and edges. Once the network is built, several topological measures can be obtained to describe its structure [costa2007CNsurvey]. One of the most traditional measures is the vertex degree, that counts the number of connections of a vertex
In a directed network scenario, Equation 1 describes the output degree of vertex , i.e. the number of connections leaving . It is also possible to compute the input degree by inverting the verification from "if " to "if ". The traditional degree measure does not take into consideration the edge weights, therefore it is possible to calculate the weighted degree, also known as strength of a vertex
which can also be computed for input connections, which we will denote . From the degree and strength of the vertices, many network properties can be quantified like its wiring patterns, the existence of the scale-free phenomenon, the identification of hubs and influent vertices, etc.
2.2 Randomized Neural Network
Randomized neural networks [schmidt1992feedforward, pao1992functional, pao1994learning, huang2006extreme] are artificial neural nets, which, in their simplest version, have a single hidden layer, whose weights are determined randomly, and an output layer whose weights can be determined using a closed-form solution. When these neural networks allow direct links between the input feature vectors and the output layer [pao1992functional, pao1994learning], they are known as random vector functional link (RVFL) nets.
To mathematically explain a randomized neural network without direct links, let be a set of input feature vectors, each one having attributes (each vector has an additional fixed value for bias). Next, these input vectors can be processed by the hidden neurons by using
for bias). Next, these input vectors can be processed by the hidden neurons by using, where is a transfer function and is a matrix of weights of the hidden neurons whose dimensions are , where is the number of neurons of the hidden layer.
The matrix represents the outputs of the hidden layer for all input feature vectors, that is, . This matrix, after the inclusion of an additional fixed value for bias in each vector, can be used to compute the weights of the output layer, according to
where is a matrix of labels (each corresponding to its respective input vector ) and is the Moore-Penrose pseudo-inverse [Moore1920, penrose_1955].
Sometimes, the square matrix is near-singular, resulting in unstable results for the matrix . To solve this, it is possible to use the Tikhonov regularization [tikhonov1963, calvetti2000], as follows
where is an identity matrix
is an identity matrixand is a regularization parameter ().
3 Proposed Method
In this section, we describe the proposed method in detail, from the image modeling as a directed complex network to the characterization through learning local topological properties with randomized neural networks.
3.1 Modeling Texture as Directed Networks
Most of previous works approaching texture as Complex Networks considers each image pixel as a network vertex, and connections are based on their intensity similarity and spatial proximity. This is achieved through two modeling parameters: a radius, which defines a distance limit for connections, and a threshold for connection cutting. On the other hand, we employ a directed modeling as introduced in [ribas2018fusion], where only the radius is needed, removing the need of finding ideal threshold values. Consider as the input image with pixels , where defines the maximum intensity value of pixels, usually for 8-bit images. A network is built where represent vertices, which maps each image pixel, and represent edges, connecting pairs of vertices. To create directed connections each vertex/pixel is centered in a sliding window of radius and edges points in the direction of the gradient, i.e. towards pixels of higher intensity
where represents the Euclidean distance between pixels and . If the pixel intensity is equal, the edge is bidirectional.
The connection weight plays an important role in the network structure, and it is crucial for computing the vertex strength. It is defined as a combination between the intensity difference and the spatial position of the pixels
where is the smallest possible distance, thus we consider only the intensity information. This equation yields an edge distribution normalized in the range , giving equal weight to both the intensity and the spatial position of the pixels. By varying the radius parameter , it is possible to control the network density, i.e. as increases, the number of connections increases. With a set of radii , different networks can be modeled where one can analyze the dynamic evolution of the system (image), which is related to the interplay between neighbour pixels and, therefore, contains multiscale texture information. Figure 1(a) illustrates the structure of the directed network modeled for different values of .
To quantify the network structure, three centrality measures are employed just as in [ribas2018fusion], which are the output degree, input and output weighted degree (or strength). The out-degree of a vertex is computed by Eq. 1 and counts the number of vertices that points. On the other hand, the weighted out-degree considers the weight of each connection leaving . The weighted in-degree is then the opposite of the out-degree, i.e. it sums the edges that points to ,
Figure 1(b) shows a visual representation of the network measures computed for a given image texture for different radii, where pixels are obtained normalizing each measure by the maximum possible vertex degree. It is possible to notice that each topological measure represents a different local pattern related to the image intensity variation. The combination of these measures in a multiscale fashion provides rich texture information that we exploit to train neural networks to produce image descriptors. More details are given in the following section.
3.2 Learning Local Complex Features
The proposal of this paper is to use the randomized neural network to learn the main topological characteristics of a CN and then use the weights of the output layer of the trained RNN as a signature to represent the CN (ie. the texture). To achieve this, for each vertex of the CN, three information were considered: out-degree, weighted out-degree and weighted in-degree. As the out-degree is directly related to the in-degree in the modeled networks (i.e., the sum of the two degrees produces the same value in all vertices) and therefore have the same information, only the out-degree was considered.
In this method, we propose to apply windows of size over the modeled network in order to build the matrix of input feature vectors for the randomized neural network. For this, firstly, consider that the Cartesian coordinates and of a vertex are the same of the pixel that is represented by the vertex. Thus, the window is the spatial neighborhood of a vertex based on the Cartesian coordinates. In other words, we divided the image into joint windows, however, instead of using information directly from the pixels we use the information of the vertices that represent them (ie. from the CN). Figure 2 illustrates a window with a central pixel represented by the vertex and neighboring pixels represented by their vertices. Regarding this window, for each vertex , the out-degree is considered the label and the out-degrees of the neighboring vertices compose the input feature vector . This strategy to define the input feature vectors and the labels to train the neural network is the main difference between this approach and the previous work [ribas2018fusion]. In the proposed approach, it is possible to compute the feature vector using a unique value of radius for modeling, while in the previous work is necessary to use a set of values of radius to obtain a feature vector, increasing the computational time. In addition, in the proposed approach, instead of using the gray-level we use the out-degree as the label. Thus, the input feature vector and label are composed only of information from the CN.
Figure 3(a) shows how the input feature vector and its respective label are obtained from a given vertex using a window . A matrix of input feature vectors and a matrix of labels for the out-degree are then constructed considering all the vertices of the network. Thus, it is possible to analyze the characteristics of the vertices that represent the pixels. In addition to the matrix of input feature vector containing the out-degree, we also construct matrices of input vectors using the weighted out-degree and the weighted in-degree .
To train a randomized neural network, it is necessary to define the matrix of weights of the hidden neurons.
Generally, this matrix is randomly defined at each new training, changing also the values of the trained output weights.
However, in feature extraction techniques it is important
that the signature values be always equal for the same sample.
Therefore, it is important to use the same values in the matrix of weighs
of the hidden neurons. Generally, this matrix is randomly defined at each new training, changing also the values of the trained output weights. However, in feature extraction techniques it is important that the signature values be always equal for the same sample. Therefore, it is important to use the same values in the matrix of weighsto obtain the same signature for the same network. In this way, we used the same procedure adopted in [JarbasRNN2015] to obtain the pseudorandom uniform numbers for the matrix , that is, the linear congruent generator (LCG) [park1988random], according to
where is the random sequence and the values , and are parameters set up as , e (values adopted in [JarbasRNN2015]), where is the length of the sequence that is started by . Thus, the matrix is defined using the sequence divided into segments of values . The matrices and (each row) are normalized to zero mean and unit variation.
The descriptors of the modeled CN and, consequently, the signature of the texture is obtained based on the matrix , which is composed of the learned weights of the output layer. This one becomes a vector , where , as illustrated in Figure 3(b). Notice that has a length due to the bias value. Thus, the first signature for a texture is obtained by concatenating the vectors of three RNNs trained with the three matrices of input feature vectors ,,,
where is the number of hidden neurons and is the radius to construct the CN. The signature is constructed using an unique value of and . These two parameters influence the trained weights of the neural network and, therefore, provide different descriptors for different values. Taking advantage of this, we propose a signature , which concatenates the vectors for different values of radius ,
Finally, a signature which combines the vector using different values of is proposed, according to
4 Experimental Setup
To assess the performance of the proposed method, the following databases were evaluated:
Brodatz [brodatz-1966]: just as in [cndt], we used 111 classes, 16 images of 128 128 pixels per class, totaling a data set of 1776 images.
Vistex [Vistex1995]: this data set is provided by the MIT Media Laboratory. Just as in [cndt], we used 54 classes, each one represented by an image 512 512 pixels divided into non-overlapping images of 128 128 pixels. Thus, the database has 16 images per class, thus resulting in a total of 864 images.
Outex [OjalaMPVKH02]: this framework provides several texture benchmark data sets, and, among them, we chose TC_Outex_00013, which is composed of 68 classes. Just as in [cndt], each class is represented by an image of 746 538 pixels, from which 20 images of 128 128 pixels were cropped without overlapping. Thus, the database used in this work has 1360 images.
USPTex [backes2012]: this database is composed of 191 classes, 12 images per class, resulting in 2292 images, which represent a wide variety of scenes of everyday life, such as bark, sand, bricks, vegetation, sidewalk etc.
All color texture data sets were converted into grayscale. To classify the proposed signature, we used Linear Discriminant Analysis (LDA) . This statistical method aims to project the data in order to maximize the distance inter-classes and minimize the variance within the same class. To validate the classification procedure, we used the
All color texture data sets were converted into grayscale. To classify the proposed signature, we used Linear Discriminant Analysis (LDA)[Webb2002]
. This statistical method aims to project the data in order to maximize the distance inter-classes and minimize the variance within the same class. To validate the classification procedure, we used theleave-one-out cross-validation approach, which basically separates one sample for test, uses the remainder for training, and repeats this process times ( is the number of samples), each time with a different sample for test. The validation performance is the mean accuracy of the runs.
5 Results and Discussion
5.1 Parameter Analysis
In this section, we perform an evaluation of the parameters of our method in terms of accuracy on the four databases. The first signature analyzed is the with . For this signature, different values of and their combinations were analyzed. The results are summarized in Figure 4, which presents the mean of the accuracies obtained in the four databases. In this experiment, a maximum of two values of was combined due to the large number of characteristics generated for larger combinations.
The results indicate that for a single value of (main diagonal of the matrix), the best result is reached with . However, it is clear that a single value of yields results lower than that of the combination of two values of . In this case, note that the best results are obtained when combining a low value with a high value (best result of 95.46% for . The values of determine the radius of the network modeling, that is, for low values the pixels are connected with nearest neighbors (local analysis), whereas for high values, a global analysis of texture is performed. Thus, the combination of local and global characteristics is expected to provide superior results.
A second experiment to evaluate the feature vector was performed. In this experiment, different values of were used for . Figure 5 shows the accuracies with the signature on the four databases with different values of and their combinations. Note that the lower results are on the main diagonals, that is, when a single value of is considered. On the other hand, when two values of are considered, the accuracy increases. Table 1 shows the results for the combination of three values of . Note that higher accuracies were obtained with these combinations. However, as larger values of are combined, the size of the feature vector also increases and the accuracy tends to decrease or stabilize. Thus, we consider the vector , which presents a good balance between accuracy and number of features on the four databases.
5.2 Comparison with other methods
In order to evaluate the results obtained by the proposed method, comparisons were performed with other methods of the literature. The experimental configurations were the same as described above, with the exception of the CLBP method, which used the classifier 1-Nearest Neighborhood (1-NN) with the distance chi-square according to the original paper. For the proposed method, the signature that obtained the best result in the previous analysis was considered: .
We also compare the performance of our descriptor with deep convolutional neural networks, which are applied in a transfer-learning approach where a pre-trained architecture is used as feature extractor by computing the Global Average Pooling (GAP) over the output of its last convolutional layer. The weights of these models are learned from the ImageNet dataset . All models and their pre-trained weights were imported from the Keras 2.2.4 library
We also compare the performance of our descriptor with deep convolutional neural networks, which are applied in a transfer-learning approach where a pre-trained architecture is used as feature extractor by computing the Global Average Pooling (GAP)[lin2013network]
over the output of its last convolutional layer. The weights of these models are learned from the ImageNet dataset[deng2009imagenet], composed of millions of images, and can be ported to various applications, such as texture analysis [cimpoi2016deep]. We considered four well-known models from the literature, the VGG19 (2014) [simonyan2014very], InceptionV3 (2016) [szegedy2016rethinking], ResNet50 (2016) [he2016deep] and InceptionResNetV2 (2017) [szegedy2017inception]
. All models and their pre-trained weights were imported from the Keras 2.2.4 library111www.keras.io/applications. For these methods, the images are normalized by the maximum possible gray-level before being processed.
Table 2 presents the results obtained by our method and the others on the four texture databases. The results show that the proposed method obtained the best results when compared to the others on three databases (Outex, USPTex and Vistex). In relation to the methods based on convolutional neural networks, the ResNet50 model outperformed the proposed method only on the Brodatz database. On this database, all other approaches based on convolutional neural networks achieved higher results. However, it is possible to note that these methods did not have a good performance on the Outex database, which is characterized by samples with different illuminations.
It is also important to highlight that the proposed method improved the accuracy when compared to the ELM Signature and CNDT method (which are based on neural network nets and complex networks, respectively). The ELM Signature method uses only the pixel intensities to train the RNNs and characterize the textures, which shows that modeling in CNs and their topology are important in texture analysis. On the other hand, the CNDT method models the images in complex networks and calculates traditional measures of CN to compose the feature vector. In this context, the results suggest that our approach of learning complex network characteristics using randomized neural networks to characterize textures is more discriminative.
This paper proposed a novel way of extracting information from complex networks to train randomized neural networks in order to use the weights of output neurons to compose a texture signature. Unlike the method proposed in [ribas2018fusion] , our approach proposed a different strategy to construct the input feature vector and the label, which is based on CN information only. This one uses a unique value of radius to model the complex network and train the neural network, decreasing the computational time. The success rates of our approach were very high, surpassing accuracies of a large set of compared methods, including some based on deep learning. Thus, in the light of the obtained results, we believe that our approach offers a relevant contribution to the novel and promising field of research that studies how to connect neural and complex networks to build image signatures.
, our approach proposed a different strategy to construct the input feature vector and the label, which is based on CN information only. This one uses a unique value of radius to model the complex network and train the neural network, decreasing the computational time. The success rates of our approach were very high, surpassing accuracies of a large set of compared methods, including some based on deep learning. Thus, in the light of the obtained results, we believe that our approach offers a relevant contribution to the novel and promising field of research that studies how to connect neural and complex networks to build image signatures.
Lucas Correia Ribas gratefully acknowledges the financial support grant #s 2016/23763-8 and 16/18809-9, São Paulo Research Foundation (FAPESP). Jarbas Joaci de Mesquita Sá Junior thanks CNPq (National Council for Scientific and Technological Development, Brazil) (Grant: 302183/2017-5) for the financial support of this work. Leonardo Scabini acknowledges support from CNPq (Grant number #142438/2018-9). Odemir M. Bruno thanks the financial support of CNPq (Grant # 307897/2018-4) and FAPESP (Grant #s 14/08026-1 and 16/18809-9). The authors are also grateful to the NVIDIA GPU Grant Program for the donation of the Quadro P6000 and the Titan Xp GPUs used on this research.