1 Introduction
Dynamic texture is a spatiotemporal extension of image texture patterns. They are textures present in video or sequences of images that exhibit certain stationary properties in time [13]. Although simple and repetitive patterns could be considerate as dynamic texture, in general, they are characterized by nonrigid complex motions. These motions are guided by nonlinear and stochastic dynamics, for instance, the turbulent seawater or a fire blast [59]. Examples of dynamic textures are present in our daily lives (e.g., trees swaying smoke, waterfall, etc.) and in various areas of science, such as biology (e.g., evolution of bacterial colonies [60, 2], tumor growth [5, 47, 56], tissue growth [33], etc) and material science (e.g., process of corrosion of metals and nanostructure [65, 17]).
Dynamic texture research appeared more than 30 years after static texture (textures in images), in the begging of the 90s [37]. Moreover, only in the last decade that the attention to the area has increased. The dynamic texture is not just a trivial extension of the static textures, indeed, many issues arise when textures that change in time are analyzed. Besides the issues related to static textures, in dynamic textures, the patterns are now a combination in space and time. Some characteristics presented by dynamic textures include large amount of raw data, spatial and temporal regularity, and very little prior structure [29, 24].
In terms of application, dynamic texture methods are valuable tools in different areas such as industry, medicine, security, traffic engineering, among others. In these areas, there are various problems that can be modeled as dynamic texture. This makes the dynamic texture analysis an important research field. The research in facial expression recognition using dynamic textures has proved to be more promise compared to static texture based approaches [61]. In this way, fully automatic and realtime facial expression recognition could be used to help, for instance, biometric systems, psychological research, and humancomputer interaction [51]. In medicine, dynamic texture methods were applied for automatic segmentation of liver in ultrasound, allowing the classification of hepatic structures including vasculature and liver parenchyma [36]. Recently, works employed dynamic texture methods to identify the traffic conditions (i.e., light, medium, slow, and stopped traffic) and to support traffic monitoring system [6, 21]. Besides that, there are many systems using dynamic texture methods. Some examples are forest fire detection systems [63, 16], video retrieval systems [41, 45], recognition of human activity [32], among others.
The characterization of dynamic textures was much less studied than the static textures and has many challenges to overcome, such as: (i) extract features that integrates the combination of appearance and motion (most of the methods consider appearance and motion individually), (ii) invariance to transformations (e.g. rotation, scale and translation), (iii) the presence of complex texture patterns, (iv) multiscale analysis (in the same way as the static texture, the dynamic textures can present different characteristics at different levels of scale, e.g. local and global features [34]), (v) computational cost (since the analysis is over video, computational time is an important issue), among others. These challenges cause limitations in most of the literature approaches.
Several method categories have been used along the years for dynamic texture analysis. However, optical flow based methods are the most popular due to the low computational cost and the efficiency in motion analysis of videos. These methods reduce the dynamic texture video to the analysis of a sequence of motion patterns [23]
. In the first works, it was used the vector field of normal flow to characterize the global magnitude and direction of motion in dynamic textures
[37, 42]. More recent methods in this category include the vector histogram of acceleration and speed [35], features invariant to scale and rotation of normal flow [15], combination of normal flow characteristics and periodicity [40], among others. Although they are good for motion analysis, optical flow based methods have as main limitation, be not capable to analyze properly the appearance characteristics of videos. On the other hand, spatiotemporal filtering based methods characterize the dynamic texture through decomposition at various scales using spacetime filtering in order to explore local and global information of motion and appearance. In the initial study [54], filters of oriented energy were used to analyze local spacetime patterns. Later, other studies using wavelet transform [55, 48, 14] and spatiotemporal Gabor filter [25] were used for dynamic texture characterization. Finally, a category of methods less popular is the geometric properties based methods. These methods use properties of moving contour surfaces (i.e., features based on the tangent plane distribution) to characterize the spatial and temporal domain [39, 64].In addition to the categories mentioned above, modelbased methods have been proposed. These ones aim to build generative process models from the video and extract features from these models to characterize the dynamic texture [7, 6, 44]. One of the most popular methods of this category is based on linear dynamical systems (LDS) model which inspired several researchers to use LDS to dynamic texture analysis. It explores the spatial and temporal regularities of dynamic textures for the characterization [13]. Besides LDS models, fractal models have also been used to analyze dynamic textures such as dynamic fractal spectrum (DFS) [59], 3D oriented transform feature (3DOTF) [57] and wavelet domain multifractal spectrum (WMFS) [30]. Recently in this category, have been proposed agent methods. They use walkers guided by deterministic rules to extract features of motion and appearance in dynamic textures [23, 21, 20]. Another recent method uses network models of the dynamic texture and extracts statistical measures from these networks [26]. This method achieved promising results in dynamic texture recognition, however, the dynamic texture was modeled into an undirected network and only three statistical measures of the degree distribution were used. We believe that other types of modeling and more robust measures can increase the performance of the dynamic texture recognition.
In this paper, we propose a new method for dynamic texture characterization based on diffusion in directed complex networks, following recent works that show that diffusion in directed networks may reveal clustering in the structure versus dynamics space [11, 24]
. The main contributions of this paper are the modeling of dynamic texture as a directed network and the feature extraction using diffusion from networks. Complex networks is a term used for the study of graphs with mandatory presence of complex phenomena and their analysis by methods from mechanical statistics. Recently, complex networks have become an object of interest due to its flexibility and simplicity to represent complex systems. Thus, several works have used networks to represent and characterize images of static textures with success
[4, 3, 19]. In dynamic textures, the first approach that uses complex networks is reported in [26].In the proposed method, the dynamic texture is modeled as a regular directed network, which is characterized by the activity of the vertices computed by diffusion. To model the dynamic texture, each pixel is mapped into a vertex, which is connected to others vertices according to a given radius. Then, the network dynamic is explored through thresholds that aim to highlight specific topological features from each dynamic texture class. Given a transformed network, the activity of each vertex is estimated by random walks. The activity is the relative frequency at which each vertex is visited in balance by random walks. In order to measure the topological characteristics, we propose the use of two histograms that associates the activity with the temporal and spatial degree of vertices. Experiments performed in two dynamic texture databases illustrate that our method achieved excellent performances in dynamic texture recognition. In particular, in the UCLA50, UCLA9 and UCLA8 databases, the proposed method achieved the highest correct classification rate compared to stateoftheart methods. Besides that, experimental results on traffic condition classification have demonstrated the effectiveness of the proposed approach. Finally, experiments in synthetic textures showed that the proposed method is efficient in motion analysis and it is rotation invariant.
This paper is organized as follows. In Section 2 complex network theory is briefly discussed to provide motivation and background for the proposed extension. A new method for dynamic texture characterization based on activity of directed complex networks is described in Section 3. Section 4 presents the experimental setup that includes a description of the databases. Experimental results on parameter evaluation, comparison with others methods and computational complexity analysis is presented in Section 5. Finally, Section 6 concludes this paper.
2 Diffusion in networks
Graphs or networks? The two terms are related to the same object of studies that arose into the Mathematics as Graph Theory. Graph Theory started from the Euler’s solution of the famous Konigsberg bridges problem in 1736 and, since then, has been studied in Mathematics and, after the 50s, in Computer Science. Into the last decades, the Graph Theory attracted the attention of physicists, which contributed to the field, incorporating analysis and methods that came from the Mechanical Statistics. The combination of graphs and mechanical statistics based analysis is known in the literature as Complex Networks or just Networks (where the presence of complex phenomena are not mandatory into the graphs) [38, 10]. In this work, let us use the term ”network” because diffusion is associated to the complex networks field.
A network can be defined as a pair , where is a set of vertices and is a set of edges . For each edge, it is assigned a weight , which according to application represent lengths, costs, measures, etc. The networks can be undirected (i.e., when the edges have no direction), or directed (i.e., an edge is directed when there is a direction from to , thus, is called the tail and is called the head). In undirected networks, the degree of a vertex is the number of edges incident to the vertex. On the other hand, in directed networks each vertex has a indegree , corresponding to the number of ingoing edges (i.e. number of heads adjacent to vertex), and an outdegree , equal to number of outgoing edges (i.e. number of tails adjacent to vertex).
The diffusion in networks used in this work was proposed into the complex network field [11]. In this method, the diffusion is estimated by means of traditional random walks. Thus,
walks are initiated at each vertex and move according to the probability associated with the edge weights. For characterization of the topological structure of the network, it is estimated a histogram of activities. The activity of a vertex is the frequency in that it was visited by the walkers. Thus, the histogram is composed of the sum of the activity of all vertices with the same degree
. Recently, diffusion in networks has been used for static texture characterization [24] through the histogram of the activities. Differently, in this work, we propose an approach to characterize dynamic textures.3 Dynamic texture using diffusion in networks
In this section, we describe the proposed method for dynamic texture characterization based on activity of directed networks. Basically, the proposed method can be described into four steps. Figure 1 summarizes the steps:

Modeling texture as a regular network: first, the video of the dynamic texture is modeled as a regular directed network, where each pixel is considered a vertex. Each vertex is connected to other vertices if the Euclidean distance between the pixels represented by the vertices is lesser than or equal to a predefined value. A weight is defined as the gray intensity difference of the vertices.

Exploring the network dynamics: the directed network is transformed by a cutting function imposed on the edges. This function removes edges whose weight is greater than a given threshold, highlighting specific characteristics of each texture pattern.

Diffusion in network: random walk starts in each vertex of the network obtained in step 2. The walker moves according to a random raffle, where the probability of visiting a vertex is associated with the edge weight. The activity of each vertex is the number of visits in balance.

Signature extraction: characteristics of appearance and motion of the dynamic texture are considered to compose the feature vector. Thus, it is obtained an activity histogram associated to the temporal indegree (i.e., edges that connect two vertices of different frames) and an activity histogram associated to the spatial indegree (i.e., vertices in the same frame).
In the next subsections, each of the proposed method steps is described in details.
3.1 Modeling of dynamic textures in directed networks
In this approach, the video is considered as a threedimensional matrix that contains all its frames. Thus, a pixel can be addressed as , where is time or, in discrete terms, the frame that the pixel belongs, and are its spatial coordinates. In this model, the intensity of a pixel is given by . To model a video on a network , each pixel is mapped to a vertex . Two vertices and associated to the pixels and are connected by an edge , if the Euclidean distance between and is lesser than or equal to a predefined value of radius [26], according to,
(1) 
Figure 2 shows a regular network obtained from a video with three frames in a given radius . Each vertex is connected with vertices of the same frame and different frames. Therefore, each vertex has the same degree, except for those on the border.
For each edge, it is assigned a weight according to the intensity difference between pixels corresponding to the vertices:
(2) 
It is worth to mention that the weights of the edges are invariant to lighting changes if the change is the same for all frame video, i.e. if the lighting is constant [26] (e.g. two videos of the same class captured with different lightings). Indeed, if the lighting conditions of the video are varying in each frame the method is not invariant to lighting changes.
Notice that according to Equation 2, negative weights are obtained when . In this study, it is considered only edges with positive weight. Thus, an edge is directed from to , if . The modeling with directed networks is more suitable to capture topological features through the dynamic process, such as random walks, which differs from the undirected network used in previous dynamic texture works [26].
3.2 Threshold function
The network obtained in the previous step has regular behavior (if we do not consider the edge direction), i.e., all vertices have the same degree. Therefore, information based on vertex degrees cannot characterize the network that models the dynamic texture. In this way, in this step, a transformation based on a series of edges cuts is applied over the network, revealing relevant properties of the dynamic texture. This approach is usually used in complex networks to extract information from the structure and dynamics [12]. Here, the transformation is performed applying a threshold function over the set of edges of the network. This function consists in selecting a set , where each edge has weight lesser than or equal to a given threshold (Equation 3) [19, 4].
(3) 
The new set of edges and the original set of vertices compose the new network , which represents an intermediate stage in the evolution of the network [4]. Figure 3 presents an example of the function applied on a regular network considering three values of (Fig. 3 (b) , Fig. 3 (c) and Fig. 3 (d) ). The evolution of the network by applying the function with different values of can be considered as a multiscale analysis of the network [28]. For low values of (Figure 3 (b)), it can analyze and describe micro texture details, as increases, global details are highlighted. Thus, the proposed method applies a set of thresholds , on the original network in order to extract features from its dynamics in a multiscale approach.
3.3 Activity estimation in dynamic textures
To characterize a scaled network transformed by the threshold function, the activity of each vertex is estimated by random walks. Consider a walker that is in the vertex , the next step is to visit a vertex with probability:
(4) 
The walker conducts the walk according to Equation 4 until it visits a vertex without outgoing edge or the length of the walking is greater than a given threshold . The activity of the vertex is defined as the number of visits received during the walks. To estimate more accurately the activity, walks are started at each vertex of the network [24]. According to experimental evaluations in this paper, we define .
Figure 4 shows images of the activity of pixels/vertices of videos modeled with different radius and thresholds . Note that, the activity of the pixels reflects the main properties of the dynamic texture, maintaining the appearance and motion characteristics. Pixels in heterogeneous regions tend to have high activity due to the weight of their edges be higher than those in homogeneous regions. For example, if the edge weight between vertex and is 255 (the highest possible weight), the probability of the agent be attracted to visit the pixel is high.
In Figure 5, the activity of pixels/vertices to different videos modeled with and is shown. It can be seen that the activity can characterize different types of dynamic textures, capturing characteristics of motion and appearance.
3.4 Composing the feature vector by the activity histogram
The activity of the vertices describes important properties of the dynamic texture. In general, to characterize a static texture, a histogram of activity that correlates the activity of the vertex and its indegree is used [24]. However, as mentioned before, in dynamic textures one of the challenges is to describe the appearance and motion properties. This paper proposes to correlate the activity of the vertex with their spatial indegree and temporal indegree . The spatial indegree is the number of incoming edges from vertices that are in the same frame:
(5) 
On the other hand, the temporal indegree of a vertex is the number of incoming edges from vertices in different frames:
(6) 
(7) 
where is the radius and is the threshold used to build the network.
This joint distribution has properties that allow describing the dynamic textures present in the video, which can be noticed in Figure 6 by the six examples. The classes of dynamic textures shown in figure are: boiling water (Figure 6(a)), candle (Figure 6(b)) and foliage (Figure 6(c)). The dynamic textures were modeled using a directed network with and , and, as can be observed, each dynamic texture class presents a different joint distribution. The boiling water class is most homogeneous compared to others classes, thus, the weight of the edges are not high. That causes the activity be larger in vertices with high indegree, as can be seen in the joint distribution of Figure 6(a). In contrast, in Figure 6 (b) and (c) the vertices with a high indegree do not have high activity, which reflects the characteristics of the dynamic texture.
However, due to the large amount of information present in the joint distribution, two histograms are calculated, in order to reduce the feature vector size. On one hand, the histogram correlates the activity and the spatial indegree (Equation 8) and on the other hand, the histogram correlates the activity and the temporal indegree (Equation 9).
(8) 
(9) 
To discuss the histograms of Equations 8 and 9, consider the following interpretation. It is expected that vertices with high indegrees have greater probability of it being visit during the walks. However, this does not always occur due to the weight of the edges that are considered for during the walking process. These weights directly reflecting the properties of dynamic textures on the network, since they are the difference of intensity between the pixels. Besides that, a vertex with low temporal indegree can also have high activity if its spatial indegree is high. This situation reflects more the appearance of the texture than the motion. The opposite situation can also occur. Therefore, the histograms contain information correlated with the motion and appearance of the dynamic texture.
Figure 7 shows examples of activity histograms and with and for four different classes of dynamic texture. The classes are burning candles, ocean waves away, ocean waves near and heavy traffic of a highway. Considering the plot of the histogram (Figure 7(b)) and (Figure 7(c)), it can be notice that each class of dynamic texture presents a different histogram, which demonstrate the potential to recognize the dynamic texture, even for classes that are similar as the case of the waves (far and close).
For a dynamic texture multiscale analysis we consider the histograms for different threshold values , . The set is composed by threshold values that are in the range and incremented by the constant . Therefore, the feature vector of the histogram considering networks transformed with the same radius is given by:
(10) 
Similarly, the histogram is given by:
(11) 
Finally, to describe motion and appearance characteristics, the final vector consists of the concatenation of vectors and for different values of radius :
(12) 
4 Experimental setup
In this section, we describe the experimental setup used in the experiments performed with the proposed approach for recognizing dynamic textures based on diffusion in directed networks. In the experiments, a feature vector is extracted from each video of the database and it is classified using knearest neighbor, with
. The choice of this classifier is due to its simplicity, highlighting the importance of the features extracted by the methods without any parameter optimization [1]. For the evaluation of the proposed method, we have adopted an experimental setup similar to [43, 18]. To separate the training and test sets in Dyntex++ and UCLA50 databases is used the fold crossvalidation scheme with fold andfold in each database, respectively. The correct classification rate (CCR) is reported as the average performance of 10 experimental trials. In UCLA8 and UCLA9 databases for the training set, half of the sequences are randomly selected from each class, and for the test set, the remaining half is used. In these two databases, the experiments are repeated 20 times and the average performance (CCR) and standard deviation of all trials are reported. For the Traffic database, we have used a
fold crossvalidation scheme with 10 experimental trials. Next, we describe the three databases and its versions used in the experiments.
UCLA [8, 46]: the UCLA database is a very popular benchmark for dynamic textures. The database contains 200 samples divided into 50 dynamic textures classes (UCLA50), with 4 samples per class. In Figure 9, we show the first frame of some dynamic textures presented in the database. We also have used two different variations proposed in [43]. The first version reorganize the UCLA50 database in 9 classes (UCLA9) in order to combine videos that are taken from different viewpoints. The 9 classes are: boiling water (8), fire (8), flower (12), fountains (20), plants (108), sea (12), smoke (4), water (12) and waterfall (16). The number of samples per class is reported in parentheses. The second version (UCLA8) discards the plant class to balance the number of samples per class.
5 Results and discussion
In this section, we evaluate the performance of the proposed method in four analyzes. First, we present the influence of the method parameters in the three databases described in Section 4. Then, we evaluate the method behavior in characterizing motion features. We also present the performance of the proposed method in dynamic texture classification and compared with the other literature methods. Finally, it is performed an analysis of the computational complexity and time processing.
5.1 Parameter evaluation
In order to evaluate the proposed method, a parameter analysis is conducted to understand its influence on the dynamic texture classification task. The parameters are the initial threshold , the incremental threshold , the final threshold and the radius to model the dynamic texture as a directed network. Next, the classification results for different parameter settings are presented.
The parameters , and indicate the set of thresholds used in the function . These parameters are responsible by a multiscale analysis of the method. Figure 11 (a) shows the correct classification rate (CCR) according to the initial threshold for the three databases. The initial threshold values that obtained the highest CCR were , for Dyntex++, Traffic and UCLA50 databases, respectively. These values show that low initial threshold are important because they provide smallscale details in the recognition stage [26].
The incremental threshold is responsible for the step in which the thresholds starting from to will be increased. Thus, the proposed method can provide the analysis of a range of scales. The CCR’s for different values of in the three databases are plotted in Figure 11 (b). For the Dyntex++, Traffic and UCLA50 databases, the incremental thresholds values that provided the best CCR were , respectively. Notice that for the three databases, the best success rate was obtained by high values of . For high values of , the method explores various scales, analyzing the variation of patterns of micro to macro details.
To evaluate the final threshold , it is considered the number of thresholds , such that . Thus is the number of thresholds used in the multiscale analysis. Figure 11 (c) shows the CCR according to . The best CCR was achieved for on the Dyntex++, Traffic and UCLA50 databases, respectively. Note that the values of obtained were low, in fact, they are associated with high values of the incremental threshold . Therefore, few scales already provided a good characterization of the dynamic texture, a relevant result with respect to the computational time.
Finally, it is evaluated the radius . The Figure 11 (d), shows the result of different combinations of radiuses. In the Dyntex++ database, the best result was obtained with the concatenation of radiuses . For the Traffic and UCLA50 databases, the concatenation that provided the maximum CCR were and , respectively.
From the analysis above, it is possible to determine the set of thresholds and radiuses for each database. In general, the number of scales necessary to obtain the maximum CCR in each database is closed. Note that good results are obtained with combinations of thresholds that start with low values, are incremented by high values (average 85) and final values are close to the maximum weight (255). In relation to the radiuses, the combination provides a good CCR for all databases.
The feature vectors built from each sample are embedded in a twodimensional space for visualization; this is achieved by applying the tDistributed Stochastic Neighbor Embedding  tSNE [50]. It is worth noting that tSNE uses information of class for display only. The results of the proposed method using features extracted from Dyntex++ and UCLA50 databases are illustrated in Figures 12 (a) and (b), respectively. Figure 12 (c) shows the projected feature vectors for Traffic database. These results indicate that medium and heavy traffic classes have characteristics very similar in opposition to the light traffic. Figure 13 shows examples of the first frame of some videos from the medium and heavy traffic classes. As can be seen, the appearance of videos of the medium traffic class is similar to the heavy traffic, which justifies the proximity of the points in the projection.
5.2 Rotation invariance
Here, we also present an analysis concerning the rotation invariance of the proposed method, which is an interesting and desirable characteristic for dynamic texture methods. Consider a video and a transformed video by a map , such that . Given two networks and obtained from videos and , they must be exactly the same in order to the proposed method be invariant to rotation. To corroborate the invariance to rotation, we show the histograms and obtained from a video in four rotation. As can be seen in Figure 14, the histograms for the four angles are nearly the same. This corroborates the analysis of rotation in networks and diffusion performed in the work of static textures [4, 24].
5.3 Motion analysis
An important property of dynamic texture methods is their capacity to deal with different motion patterns. In this section, we evaluate the influence of different motion patterns in the features extracted by the proposed method. In the experiments, we have used 120 synthetic dynamic textures of different motion patterns. The synthetic dynamic textures are divided into 4 classes of motion: circular, linear, random and without motion. Each video has approximately 60 frames with a resolution of 180 220 pixels. The video is composed of two classes of dynamic textures, one class is used as background (ocean waves) and the other one (seaweed) is used to simulate motion patterns. Figure 15 shows examples of synthetic dynamic textures with the motion patterns labeled as red.
The experiments were conducted as follows. First, the proposed method was applied in each sample of synthetic dynamic texture using the best parameters achieved in the UCLA50 database (, , and ). We choose this database due to the different motion patterns present in it and the good results achieved by the proposed method. Since the goal is to analyze motion patterns, we considered only the feature vector for dynamic texture representation. For purposes of visualization, the features vector are embedded in a twodimensional space through the application of the tDistributed Stochastic Neighbor Embedding  tSNE [50]
. The tSNE is a technique for visualization of highdimensional data giving each data point a location on a twodimensional map. It is important to mention that the tSNE does not use information of class to perform the projection. Therefore, the samples are divided into classes only for visualization.
Figure 16 shows the projection of the features extracted by the proposed method. Note that, as expected the samples with the same motion pattern are clustered in the projected space. In this way, it is clear that the proposed method can separate the four classes for which the main characteristic rely on motion patterns. Another desirable property in the methods of dynamic texture characterization is the ability to deal with video transformations such as rotation. As we can see in Figure 15 the linear motion is rotated. However, the projection indicates that the features remain clustered. It occurs because the agent is not influenced by rotated pixels since it searches for the next vertex in all directions and based on a random raffle and a probability.
5.4 Comparison with other methods
In this section, we use the parameters obtained by the proposed method in the previous sections and we compare its results with the literature methods. In order to perform the comparison, it was considered the following measures: number of features and correct classification rate (CCR). The results of the VLBP, LBPTOP, CDTTOP and CNDT methods were obtained from our own implementation. On the other hand, the results of remaining methods are from the original literature. All results were obtained using the same experimental setup.
Tables 1 shows the performance of the methods for the Traffic database. In this database, the proposed method and CNDT method has relatively better performance than the other methods. Although the size of the feature vector of the CNDT method is the double of the proposed method, our method is approximately three times faster to feature extraction, as can be seen in Section 5.5. In relation to the parameters, the two methods have four parameters to build the network. However, in contrast to the CNDT method, the proposed method models the video in a directed network, consuming the half of memory space. Figure 17
shows the confusion matrix of the Traffic database. Note that the medium traffic class is most difficult to describe and classify, which can also be seen from Figure
12 (c). Noticed that the two classes, medium and heavy traffic present high similarity (see Figure 13) and, in this way, can be incorrectly classified.A comparison of the proposed method against other methods on the UCLA50 database is shown in Table 2. It is evident that the proposed method obtains a significant improvement compared to the other methods. Also, the sizes of feature vectors of the proposed and CDTTOP method are considerably lower than other methods. The confusion matrix of the UCLA database is shown in Figure 18. As can be noticed, only three samples were incorrectly classified.
Table 3 presents the performance of the proposed and compared methods on the UCLA9 database. As can be seen, the proposed method outperforms all compared methods. Besides that, our method and the CDTTOP method have a smaller number of feature when compared to other approaches. It is important to emphasize that the dimension of the feature vector is very important in realtime applications where the time is crucial. A comparison of the proposed method and others for the UCLA8 database is shown in Table 4. Once again, the proposed method outperform the other methods with approximately 0.42% (3DOTF) and 0.57% (CVLBP) margins. It can also be observed that the proposed method performs better than the other LBP based approaches.
The comparison results on the Dyntex++ database is shown in Table 5. As observed from the table, the proposed method outperforms the CDTTOP and CNDT methods. The VLBP and LBPTOP methods achieved the higher correct classification rate, 96.14% and 97.72%, respectively. However, the proposed method extracts 141 features, while the size of the feature vector of the RIVLBP and LBPTOP methods are 16384 and 768, respectively. High dimensional feature vectors are computationally expensive, increasing the time required to classify a DT and memory consumption. Once dynamic textures have been used in several realtime expert systems, high dimensional feature vectors can be impractical. For instance, in fire detection system is very important the computational time to trigger the alarm in time. In addition, the proposed method is approximately twice faster than the VLBP method. In terms of parameters, as the proposed method the VLBP and LBPTOP methods have parameters. The VLBP method has three parameters that can imply in the size of feature vector and complexity considerably, while the LBPTOP method has two parameters that can also vary in each orthogonal planes. These may influence the computational and accuracy performance of the methods. Figure 19 shows the confusion matrix of the Dyntex++ database where the samples incorrectly classified are distributed to several classes. Indeed, Dyntex++ is the most difficult database to be classified.
Methods  N. of Features  CCR (%) 

RIVLBP [62]  16,384  93.31 ( 4.34) 
LBPTOP [61]  768  93.70 ( 4.70) 
CDTTOP [22]  75  93.70 ( 4.83) 
CNDT [26]  144  96.46 ( 4.10) 
Proposed method  297  96.60 ( 4.38) 
Method  Number of descriptors  CCR (%) 

KDTMD [9]    89.50 
DFS [59]    89.50 
3DOTF [58]  290  87.10 
CVLBP [49]    93.00 
RIVLBP [62]  16384  77.50 ( 8.98) 
LBPTOP [61]  768  95.00 ( 4.44) 
CDTTOP [22]  75  95.00 ( 4.78) 
CNDT [26]  420  95.00 ( 5.19) 
Proposed method  169  98.50 ( 3.37) 
Method  Number of descriptors  CCR (%) 

3DOTF [58]  290  96.32 
CVLBP [49]    96.90 
High level feature [52]    92.60 
WMFS [31]  702  96.95 
Chaotic vector [53]  300  85.10 
RIVLBP [62]  16384  96.30 
LBPTOP [61]  768  96.00 
CDTTOP [22]  75  96.33 ( 2.46) 
CNDT [26]  336  95.61 ( 2.72) 
Proposed method  169  97.80 ( 1.53) 
Method  Number of descriptors  CCR (%) 

3DOTF [58]  290  95.80 
CVLBP [49]    95.65 
High level feature [52]    85.65 
Chaotic vector [53]  300  85.00 
RIVLBP [62]  16384  91.96 
LBPTOP [61]  768  93.67 
CDTTOP [22]  75  93.41 ( 6.01) 
CNDT [26]  336  94.32 ( 4.18) 
Proposed method  169  96.22 ( 4.80) 
5.5 Computational complexity and processing time
Basically, the proposed method models a dynamic texture with pixels in a directed network. Next, random walks of size are initiated in each vertex to estimate its activity. Each pixel is mapped to a vertex, which is connected with the neighbors. Therefore, to build the network operations are required. However, in practice, the radius is much smaller than the number of pixels in the video, such that .
The computational complexity of the proposed method is given then by . Since is a multiplicative constant (in this work ) and , it can be disregarded. The other variable that affects the complexity is the length of the walk . The best case is when the vertices are disconnected, and in this case, the length of the walk is , leading to complexity equal to . The worst case occurs when the length of all walks is the maximum, that is equal to (this work ). The complexity of the worst case is .
The average case is evaluated in Dyntex database which has 3600 video samples of each. Figure 20 shows the average walk length to different radius and thresholds . In general, the average length of the walking is below 5, for instance, using and , the average walk length is . This leads to a computational complexity close to the best case, which is .
For comparison purposes, the CNDT and CDTTOP methods also have a complexity of order , considering the average case. The LBPTOP method and their extensions (CVLBP method) have complexity in order of , where defines the number of neighboring points. On the other hand, the complexity of the VLBP and CVLBP methods are of order . In the case of values of greater than 1 the complexity and size of the feature vector increase considerably. These complexities show that the proposed method is competitive in terms of computational complexity.
In terms of processing time, to build the complex network with radius (worst case) of a video with vertices, the proposed method took on average 0.368 s using Intel (R) Core (TM) i73610QM CPU @ 2.30Ghz, 16 GB RAM, 64bit Operating System. For feature extraction considering the Dyntex++ database parameters, the proposed method took on average 4.830 s. For comparison purposes, the VLBP, LBPTOP, CDT and CNDT methods obtained on average 8.04, 0.51, 15.03 and 22.33 seconds, respectively. The high processing time of the CNDT method is due to the large number of radiuses needed to build the network (7 radiuses), increasing the complexity of the method. Although the proposed method has not obtained the lower running time, the results indicate competitive running time for realtime application compared to other methods.
6 Conclusion
In this paper, we proposed a new method for characterization of dynamic textures using the complex network theory. The main contribution of this paper is the use of directed networks to dynamic textures modeling, which tends to provide a better performance than undirected networks [24, 11]. Another contribution is the use of diffusion in network to dynamic textures characterization, which provides a better characterization than other traditional network measures used in [26], improving the recognition task performance. This follows from the fact that directed diffusion has been used to highlight the diversity and separation of the dynamics in the respective network[24, 11]. The dynamic textures are represented in directed complex networks and its activity associated to the spatial and temporal indegree is used to compose the feature vector. We demonstrated how the diffusion can be effectively used for characterizing and analyzing complex networks derived from videos of dynamic textures. The information in different scales obtained by the proposed method demonstrated that it can provide valuable information about the structure being analyzed.
The proposed method achieved the same result of the compared methods on the Traffic database and its performance on the Dyntex++ although was outperformed by two of the compared methods, the difference was small (close to 4%). However, the size of the feature vector of other literature methods (e.g. RIVLBP and LBPTOP) is much higher as compared to the proposed method, which offset its improvement over our method. This makes the proposed method competitive in expert systems that use videos. On the other hand, the proposed method outperformed all the other methods in the UCLA50, UCLA9 and UCLA8 databases. The proposed method also outperformed the CNDT method on the UCLA and Dyntex++ databases and obtained a similar correct classification rate on the Traffic database. The CNDT method uses undirected network and traditional measurements to analysis, therefore, it is demonstrated that the use of directed network and diffusion improves the recognition performance. The proposed method also demonstrated to have desirable properties that are missing in many methods of the literature, such as properly analyze appearance and motion characteristics and rotation invariance.
Considering the experiments of rotation, the motion analysis, and the good performance obtained in three dynamic texture databases, we can conclude that the method is robust and it can be considered a very good option to dynamic texture recognition. Complex Networks based methods have demonstrated a very good performance on the characterization of dynamic textures, which turns out into a promising research field.
Once the proposed modeling need of four parameters, in future, new ways of dynamic texture modeling in network can be investigated to improve the representation of texture patterns and optimize the number of parameters. A future related work would be to investigate a function of automatic threshold selection, optimizing the parameters and improving the performance. Another research issue is to evaluate other strategies to extract information from the activity joint distribution. In addition, as part of the future work, we plane to use new methods of pattern recognition to describe the network. Currently, there is a gap of pattern recognition methods in complex networks in literature. We believe that more sophisticated methods can provide a better classification performance.
Acknowledgments.
Lucas C. Ribas gratefully acknowledges CAPES (Grant Nos. 9254772/m) and São Paulo Research Foundation (FAPESP) (Grant No. 2016/237638) for financial support. Odemir M. Bruno thanks the financial support of CNPq (Grant # 307797/20147) and FAPESP (Grant # 14/080261). Wesley N. Gonçalves acknowledges support from Fundect (Grant No. 071/2015) and CNPq (Grant No. 304173/20169).
References
 [1] D. R. Amancio, C. H. Comin, D. Casanova, G. Travieso, O. M. Bruno, F. A. Rodrigues, and L. da Fontoura Costa. A systematic comparison of supervised classifiers. PloS one, 9(4):e94137, 2014.
 [2] H. Ates and O. N. Gerek. An imageprocessing based automated bacteria colony counter. In Computer and Information Sciences, 2009. ISCIS 2009. 24th International Symposium on, pages 18–23, Sept 2009.
 [3] A. R. Backes, D. Casanova, and O. M. Bruno. A complex networkbased approach for boundary shape analysis. Pattern Recognition, 42(1):54 – 67, 2009.
 [4] A. R. Backes, D. Casanova, and O. M. Bruno. Texture analysis and classification: A complex networkbased approach. Information Sciences, 219(0):168 – 180, 2013.
 [5] J. P. Celli, I. Rizvi, A. R. Blanden, A. O. AbuYousif, B. Q. Spring, and T. Hasan. Biologically relevant 3D tumor arrays: imagingbased methods for quantification of reproducible growth and analysis of treatment response. In Optical Methods for Tumor Treatment and Detection: Mechanisms and Techniques in Photodynamic Therapy XX, volume 7886, Feb. 2011.
 [6] A. Chan and N. Vasconcelos. Modeling, clustering, and segmenting video with mixtures of dynamic textures. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(5):909–926, May 2008.
 [7] A. B. Chan and N. Vasconcelos. Classification and retrieval of traffic video using autoregressive stochastic processes. In Intelligent Vehicles Symposium, 2005. Proceedings. IEEE, pages 771–776. IEEE, 2005.
 [8] A. B. Chan and N. Vasconcelos. Probabilistics for the classification of autoregressive visual processes. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 846–851. IEEE, 2005.
 [9] A. B. Chan and N. Vasconcelos. Classifying video with kernel dynamic textures. In 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–6, 2007.
 [10] R. Cohen and S. Havlin. Complex Networks: Structure, Robustness and Function. Cambridge University Press, 2010.
 [11] C. H. Comin, M. P. Viana, L. Antiqueira, and L. da F Costa. Random walks in directed modular networks. Journal of Statistical Mechanics: Theory and Experiment, 2014(12):P12003, 2014.
 [12] L. Costa, F. Rodrigues, G. Travieso, and P. Boas. Characterization of complex networks: A survey of measurements. Advances in Physics, 56(1):167–242, 2007.
 [13] G. Doretto, A. Chiuso, Y. Wu, and S. Soatto. Dynamic textures. International Journal of Computer Vision, 51(2):91–109, 2003.
 [14] S. Dubois, R. Péteri, and M. Ménard. Pattern Recognition and Image Analysis: 4th Iberian Conference, IbPRIA 2009 Póvoa de Varzim, Portugal, June 1012, 2009 Proceedings, chapter A Comparison of Wavelet Based Spatiotemporal Decomposition Methods for Dynamic Texture Recognition, pages 314–321. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
 [15] S. Fazekas and D. Chetverikov. Dynamic texture recognition using optical flow features and temporal periodicity. In ContentBased Multimedia Indexing, 2007. CBMI ’07. International Workshop on, pages 25–32, June 2007.
 [16] J. FernandezBerni, R. CarmonaGalán, and L. CarranzaGonzález. A vlsioriented and powerefficient approach for dynamic texture recognition applied to smoke detection. In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications  Volume 1: VISAPP, (VISIGRAPP 2009), pages 307–314. INSTICC, SciTePress, 2009.
 [17] J. B. Florindo, M. S. Sikora, E. C. Pereira, and O. M. Bruno. Characterization of nanostructured material images using fractal descriptors. Physica A: Statistical Mechanics and its Applications, 392(7):1694 – 1701, 2013.
 [18] B. Ghanem and N. Ahuja. Maximum margin distance learning for dynamic texture recognition. In Proceedings of the 11th European Conference on Computer Vision: Part II, ECCV’10, pages 223–236, Berlin, Heidelberg, 2010. SpringerVerlag.
 [19] W. N. Gonçalves, A. R. Backes, A. S. Martinez, and O. M. Bruno. Texture descriptor based on partially selfavoiding deterministic walker on networks. Expert Systems with Applications, 39(15):11818 – 11829, 2012.
 [20] W. N. Gonçalves and O. M. Bruno. Dynamic texture analysis and classification using deterministic partially selfavoiding walks. In J. BlancTalon, R. Kleihorst, W. Philips, D. Popescu, and P. Scheunders, editors, Advanced Concepts for Intelligent Vision Systems: 13th International Conference, ACIVS 2011, Ghent, Belgium, August 2225, 2011. Proceedings, pages 349–359, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
 [21] W. N. Gonçalves and O. M. Bruno. Dynamic texture analysis and segmentation using deterministic partially selfavoiding walks. Expert Systems with Applications, 40(11):4283 – 4300, 2013.
 [22] W. N. Gonçalves and O. M. Bruno. Dynamic texture analysis and segmentation using deterministic partially selfavoiding walks. Expert Systems with Applications, 40(11):4283 – 4300, 2013.
 [23] W. N. Gonçalves and O. M. Bruno. Dynamic texture segmentation based on deterministic partially selfavoiding walks. Computer Vision and Image Understanding, 117(9):1163 – 1174, 2013.
 [24] W. N. Gonçalves, N. R. da Silva, L. da Fontoura Costa, and O. M. Bruno. Texture recognition based on diffusion in networks. Information Sciences, 364–365:51 – 71, 2016.
 [25] W. N. Gonçalves, B. B. Machado, and O. M. Bruno. Spatiotemporal gabor filters: a new method for dynamic texture recognition. CoRR, abs/1201.3612, 2012.
 [26] W. N. Gonçalves, B. B. Machado, and O. M. Bruno. A complex network approach for dynamic texture recognition. Neurocomputing, 153(0):211 – 220, 2015.
 [27] W. N. Gonçalves, B. B. Machado, and O. M. Bruno. A complex network approach for dynamic texture recognition. Neurocomputing, 153:211 – 220, 2015.

[28]
W. N. Gonçalves, J. A. Silva, and O. M. Bruno.
A rotation invariant face recognition method based on complex network.
In Iberoamerican Congress on Pattern Recognition, pages 426–433, 2010.  [29] W. Hu, N. Xie, L. Li, X. Zeng, and S. Maybank. A survey on visual contentbased video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 41(6):797–819, 2011.
 [30] H. Ji, X. Yang, H. Ling, and Y. Xu. Wavelet domain multifractal analysis for static and dynamic texture classification. IEEE Transactions on Image Processing, 22(1):286–299, Jan 2013.
 [31] H. Ji, X. Yang, H. Ling, and Y. Xu. Wavelet domain multifractal analysis for static and dynamic texture classification. IEEE Transactions on Image Processing, 22(1):286–299, 2013.
 [32] V. Kellokumpu, G. Zhao, and M. Pietikäinen. Human activity recognition using a dynamic texture based method. In BMVC, volume 1, page 2, 2008.
 [33] A. Khalil, C. Aponte, R. Zhang, T. Davisson, I. Dickey, D. Engelman, M. Hawkins, and M. Mason. Image analysis of softtissue ingrowth and attachment into highly porous alumina ceramic foam metals. Medical Engineering and Physics, 31(7):775 – 783, 2009.
 [34] S. Krig. Global and Regional Features, pages 85–129. Apress, Berkeley, CA, 2014.
 [35] Z. Lu, W. Xie, J. Pei, and J. Huang. Dynamic texture recognition by spatiotemporal multiresolution histograms. In Application of Computer Vision, 2005. WACV/MOTIONS’05 Volume 1. Seventh IEEE Workshops on, volume 2, pages 241–246. IEEE, 2005.
 [36] S. Milko, E. Samset, and T. Kadir. Segmentation of the liver in ultrasound: a dynamic texture approach. International Journal of Computer Assisted Radiology and Surgery, 3(1):143, May 2008.
 [37] R. C. Nelson and R. Polana. Qualitative recognition of motion using temporal texture. CVGIP: Image Underst., 56(1):78–89, July 1992.
 [38] M. Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.
 [39] K. Otsuka, T. Horikoshi, S. Suzuki, and M. Fujii. Feature extraction of temporal texture based on spatiotemporal motion trajectory. In Pattern Recognition, 1998. Proceedings. Fourteenth International Conference on, volume 2, pages 1047–1051 vol.2, Aug 1998.
 [40] C.H. Peh and L.F. Cheong. Synergizing spatial and temporal texture. Image Processing, IEEE Transactions on, 11(10):1179–1191, Oct 2002.
 [41] R. Péteri and D. Chetverikov. Computer Vision and Graphics: International Conference, ICCVG 2004, Warsaw, Poland, September 2004, Proceedings, chapter QUALITATIVE CHARACTERIZATION OF DYNAMIC TEXTURES FOR VIDEO RETRIEVAL, pages 33–38. Springer Netherlands, Dordrecht, 2006.
 [42] R. Polana and R. Nelson. MotionBased Recognition, chapter Temporal Texture and Activity Recognition, pages 87–124. Springer Netherlands, Dordrecht, 1997.
 [43] A. Ravichandran, R. Chaudhry, and R. Vidal. Viewinvariant dynamic texture recognition using a bag of dynamical systems. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1651–1657, June 2009.
 [44] A. Ravichandran, R. Chaudhry, and R. Vidal. Categorizing dynamic textures using a bag of dynamical systems. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(2):342–353, 2013.
 [45] A. Ravichandran and R. Vidal. Video registration using dynamic textures. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(1):158–171, Jan 2011.
 [46] P. Saisan, G. Doretto, Y. N. Wu, and S. Soatto. Dynamic texture recognition. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 2, pages II–58–II–63 vol.2, 2001.
 [47] A. C. d. Silva, F. R. Cabral, J. B. Mamani, J. M. Malheiros, R. S. Polli, A. Tannus, E. Vidoto, M. J. Martins, T. T. Sibov, L. F. Pavon, et al. Tumor growth analysis by magnetic resonance imaging of the c6 glioblastoma model with prospects for the assessment of magnetohyperthermia therapy. Einstein (São Paulo), 10(1):11–15, 2012.
 [48] J. Smith, C.Y. Lin, and M. Naphade. Video texture indexing using spatiotemporal wavelets. In Image Processing. 2002. Proceedings. 2002 International Conference on, volume 2, pages II–437–II–440 vol.2, 2002.
 [49] D. Tiwari and V. Tyagi. Dynamic texture recognition based on completed volume local binary pattern. Multidimensional Systems and Signal Processing, 27(2):563–575, 2016.

[50]
L. van der Maaten and G. Hinton.
Visualizing highdimensional data using tsne.
Journal of Machine Learning Research
, 9: 2579–2605, Nov 2008. 
[51]
S. J. Wang, W. J. Yan, X. Li, G. Zhao, and X. Fu.
Microexpression recognition using dynamic textures on tensor independent color space.
In 2014 22nd International Conference on Pattern Recognition, pages 4678–4683, Aug 2014.  [52] Y. Wang and S. Hu. Exploiting high level feature for dynamic textures recognition. Neurocomputing, 154:217–224, 2015.
 [53] Y. Wang and S. Hu. Chaotic features for dynamic textures recognition. Soft Computing, 20(5):1977–1989, 2016.
 [54] R. P. Wildes and J. R. Bergen. Computer Vision — ECCV 2000: 6th European Conference on Computer Vision Dublin, Ireland, June 26–July 1, 2000 Proceedings, Part II, chapter Qualitative Spatiotemporal Analysis Using an Oriented Energy Representation, pages 768–784. Springer Berlin Heidelberg, Berlin, Heidelberg, 2000.
 [55] P. Wu, Y. M. Ro, C. S. Won, and Y. Choi. Computer Analysis of Images and Patterns: 9th International Conference, CAIP 2001 Warsaw, Poland, September 5–7, 2001 Proceedings, chapter Texture Descriptors in MPEG7, pages 21–28. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001.
 [56] P.H. Wu, J. Phillip, S. Khatau, W.C. Chen, J. Stirman, S. Rosseel, K. Tschudi, J. Van Patten, M. Wong, S. Gupta, A. Baras, J. Leek, A. Maitra, and D. Wirtz. Evolution of cellular morphophenotypes in cancer metastasis. Scientific Reports, 5, 2015. cited By 0.
 [57] Y. Xu, S. Huang, H. Ji, and C. FermüLler. Scalespace texture description on siftlike textons. Comput. Vis. Image Underst., 116(9):999–1013, Sept. 2012.
 [58] Y. Xu, S. Huang, H. Ji, and C. Fermüller. Scalespace texture description on siftlike textons. Computer Vision and Image Understanding, 116(9):999–1013, 2012.
 [59] Y. Xu, Y. Quan, H. Ling, and H. Ji. Dynamic texture classification using dynamic fractal analysis. In 2011 International Conference on Computer Vision, pages 1219–1226, Nov 2011.
 [60] H. P. Zhang, A. Be’er, E.L. Florin, and H. L. Swinney. Collective motion and density fluctuations in bacterial colonies. Proceedings of the National Academy of Sciences, 107(31):13626–13630, 2010.
 [61] G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915–928, 2007.
 [62] G. Zhao and M. Pietikäinen. Dynamical Vision: ICCV 2005 and ECCV 2006 Workshops, WDV 2005 and WDV 2006, Beijing, China, October 21, 2005, Graz, Austria, May 13, 2006. Revised Papers, chapter Dynamic Texture Recognition Using Volume Local Binary Patterns, pages 165–177. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
 [63] Y. Zhao, J. Zhao, E. Dong, B. Chen, J. Chen, Z. Yuan, and D. Zhang. Artificial Intelligence and Computational Intelligence: Third International Conference, AICI 2011, Taiyuan, China, September 2425, 2011, Proceedings, Part III, chapter Dynamic Texture Modeling Applied on Computer Vision Based Fire Recognition, pages 545–553. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
 [64] J. Zhong and S. Scarlaroff. Temporal texture recongnition model using 3d features. Technical report, MIT Media Lab Perceptual Computing, 2002.
 [65] A. M. Zimer, E. C. Rios, P. de Carvalho Dias Mendes, W. N. Gonçalves, O. M. Bruno, E. C. Pereira, and L. H. Mascaro. Investigation of {AISI} 1040 steel corrosion in {H2S} solution containing chloride ions by digital image processing coupled with electrochemical techniques. Corrosion Science, 53(10):3193 – 3201, 2011.
Comments
There are no comments yet.