I Introduction
The financial market can be considered as a complex timevarying system consisting of multiple interacting financial components [42]
, e.g., the stock trade price and return rate. Due to the evolution of these financial variables with time, multiple coevolving financial time series can be generated from the original data. For the objective of analyzing the timevarying financial market, a variety of time series analysis methods have been developed for anomaly detection applications. These include change point detection, sequence detection, and pattern detection in the time series evolution
[34, 12, 13]. Among these applications, change point detection plays an important role for financial risk analysis and aims to identify abrupt changes in the time series properties [27]. Unfortunately, detecting such crucial points remains challenging, since it is difficult to detect the changes that cannot be easily observed for a system consisting of complex inteactions between its constituent coevolving time series [38]. One way to overcome this problem is to represent multiple coevolving financial time series as a family of timevarying financial networks, with each vertex representing an individual time series of a stock (e.g., stock trading price) and each edge between a pair of coevolving financial time series representing their degree of correlation (i.e., the absolute value of their Pearson correlation). As a result, networkbased methods can be directly employed for analysis.The aim of this paper is to define a new kernelbased approach for analyzing multiple coevolving financial time series that are represented as network structures. Our work is based on representing each financial network as discrete entropy time series as well as the classical dynamic time warping measure between the series. The proposed approach bridges the gap between graph kernels and the classical dynamic time warping framework for time series analysis.
Ia Literature Review
Network representations are powerful tools that can be employed for the analysis of timevarying complex systems consisting of multiple coevolving time series [43, 30, 37, 38, 41], e.g., the stock market with trade price, climate data, and functional magnetic resonance images. This approach is based on the idea that the structure of socalled timevarying complex networks [10] inferred from the corresponding time series of the system can represent physical interactions between system entities that are richer than the original individual time series. According to this approach, one of the main objectives is to identify the extreme events which may considerably change the network structure. For example, in timevarying financial networks, extreme events corresponding to the financial instability of the stock are of particularly interest [38] and can be inferred by detecting the anomalies in the corresponding networks [41]. The network structure before and after an extreme event should be significantly different.
Broadly speaking, most existing approaches aim to characterize networks based on two principle approaches, namely a) derive network characteristics using connectivity structures, or statistics capturing connectivity structures and b) characterize the networks using statistical physics. Proponents of the former approach focus on capturing network substructures using communities, hubs and clusters [17, 1, 2]. On the other hand, proponents of the latter approach describe the network structures based on the partition function to characterize the network structures, and the corresponding temperature, energy, and entropy measures can be calculated in terms of this function [21, 22, 16, 18, 41]
. Unfortunately, both approaches tend to approximate structural relationships of networks in a low dimensional pattern space, hence leading to substantial loss of information. This shortcoming affects the effectiveness of existing network methods for time series analysis. One principle approach to address this drawback is to adopt graph kernels. In pattern recognition, graph kernels are powerful tools for analyzing graphbased structural data. The main advantage of adopting graph kernels is that they provide an effective way of mapping graph structures into a high dimensional Hilbert space and thus better encapsulate the structural information.
Most existing stateoftheart graph kernels fall into the scenario of Rconvolution kernels, that are originally proposed by Haussler in 1999 [20]. The main idea underpinning Rconvolution kernels is based on decomposing graphs into substructures and measuring the similarity between each pair of input graphs in terms of their isomorphic substructures, e.g., graph kernels based on comparing pairs of isomorphic a) walks, b) subgraphs, and c) subtrees. Representative Rconvolution graph kernels based on substructures include the WeisfeilerLehman subtree kernel [36], the treebased continuous attributed kernel [28], the aligned subtree kernel [8], the JensenTsallis qdifference graph kernel [5], the optima assignment WeisfeilerLehman kernel [26], the core variantsbased shortest path kernel [31], the random walk graph kernel [24], etc. Unfortunately, directly employing these graph kernels to analyze timevarying network structures inferred from the original time series tends to be elusive. Because such network structures in most realworld applications are by nature complete weighted graphs, i.e., each vertex is adjacent to all remainder vertices, whereas the edge weights between the vertices may be rather different. It is difficult to decompose such a graph into substructures. This in turn influences the effectiveness of most existing graph kernels.
One way to address the problem is to discard the less interacted information between a pair of vertices and adopt the sparser versions of original timevarying networks, i.e., the sparser networks only preserve the original edges indicating pairs of more interacted vertices. Under this scenario, Ye et al. [41], Silva et al. [38] and Wang et al. [39] have taken the widely adopted thresholdbased methods and preserved the edges whose weights fall into the larger of correlationbased weights. Although this strategy provides a way of directly employing existing graph kernels to accommodate timevarying networks for multiple coevolving time series analysis, these sparse structures rely on the selection of the threshold. Thus, it is not clear how to preliminarily select a suitable threshold. Moreover, these sparser structures also lead to significant information loss, because many weighted edges are discarded. In summary, analyzing timevarying networks associated with stateoftheart graph kernels remains challenges.
IB Contributions
The objective of this paper is to address the aforementioned problems and develop a new kernelbased approach for analyzing multiple coevolving financial time series. Specifically, we propose an Entropic Dynamic Time Warping Kernel (EDTWK) for timevarying financial networks, with each vertex representing the individual time series of a different stock (e.g., stock trading price) and each edge between a pair of coevolving financial time series representing the absolute value of their Pearson correlation. One key innovation of the proposed EDTWK kernel is the automatic identification of the dominant correlated vertex subset for each of the financial networks, i.e., the proposed kernel incorporates the process of identifying the most mutually correlated stocks specified by the vertex subset. In contrast, the aforementioned methods through the thresholdbased strategy cannot guarantee that the preserved vertices correspond to a more mutually correlated vertex subset. This is because these methods tend to individually select each edge with a higher correlation weight and many edges between the preserved vertices may not exist. Based on financial risk theory [19], the financial crises are usually caused by a set of the most mutually correlated stocks while having less uncertainty. As a result,
the proposed EDTWK kernel cannot only overcome the shortcoming of heuristically selecting the threshold value that arises in the thresholdbased approach for timevarying network analysis
[41, 38], but also capture more reliable information concerning the evolution of the financial system to hand. The computational framework of the proposed EDTWK kernel is shown in Fig.1. Specifically, the main contributions of this work are threefold.First, for a family of timevarying financial networks, our start point is to compute the commute time matrices associated with their original weighted adjacency matrices, i.e., the absolute Pearson correlation based matrices. The reason of using the commute time matrix as the representation of each network structure is that each element of this matrix represents the average path length between a pair of vertices over all possible paths residing on the original weighted edges [33]. Thus, the commute time can be seen as the enhanced absolute Pearson correlation value between the time series of pairwise stocks, i.e., it integrates the effectiveness of all possible correlationbased paths between a pair of vertices of the original network. Moreover, the commute time is robust under the perturbation of the network structure (e.g., the changes of edges or paths [33]). As a result, the commute time matrix can provide a more stable representation for the financial network structure that may accumulate a lot of noises over time. In summary, the commute time matrix offers an elegant way to probe the original structure of the timevarying financial networks (see details in Section IIA). More specifically, the proposed approach associated with the commute time matrix will be more effective than that associated with the original absolute Pearson correlation matrix (see details in Section IVB and Section IVC).
Second, with the commute time matrix of each timevarying financial network to hand, we employ this matrix to automatically identify a set of dominant correlated vertices in the network structure (i.e., a set of the most mutually correlated time series represented by the set of vertices), by maximizing a quadratic programming problem associated with the commute time matrix. Specifically, we compute a dominant probability distribution of these time series belonging to the most mutually correlated set. We show that this strategy not only overcomes the shortcoming of existing thresholdbased approaches [41, 38] that roughly select pairs of relatively more correlated time series, but also encapsulates reliable information in terms of the evolution of the financial system to hand. Furthermore, we transform each original timevarying financial network into a discrete dominant entropy time series associated with the dominant probability distribution, i.e., we characterize the uncertainty of each network structure within the financial system to hand in terms of the classical Shannon entropy associated with the probability distribution. With each pair of entropy time series to hand, we compute the EDTWK kernel through the classical dynamic time warping framework. We show that the proposed kernel not only accommodates the complete weighted graphs through the commute time matrix, but also bridges the gap between graph kernels and the classical dynamic time warping framework for time series analysis (see details in Section IIIB).
Third, we perform the proposed kernel on timevarying financial networks extracted from New York Stock Exchange (NYSE) data. Experimental results demonstrate that the proposed method can preserve the ordinal arrangement of the timevarying financial networks, and thus well understand the structural evolution of the networks with time, i.e., the proposed kernel can effectively detect abrupt changes in networks as time series structures and can be used to characterize different stages in timevarying financial network evolutions.
IC Paper Outline
Ii Preliminary Concepts
In this section, we briefly review preliminary concepts which will be utilized in this paper. We first review the concept of the commute time. Furthermore, we review the concept of a dynamic time warping framework inspired kernel.
Iia Commute Time on Graphs
As we have stated, one main objective of this work is to automatically identify a set of most mutually correlated stocks in terms of their time series. To this end, we require a correlation matrix as the structural representation of the corresponding timevarying financial network (i.e., the weighted adjacency matrix of the network), with each vertex representing the individual time series of a different stock and each edge representing the correlation between a pair of coevolving financial time series. Broadly speaking, most stateoftheart approaches usually adopt the absolute Pearson correlation based matrix as the network representation [41, 39, 38]. In order to capture a reliable and robust mutually correlated stock set, in this work we propose to utilize the commute time matrix associated with the original correlation matrix as the network representation.
The main reasons of employing the commute time matrix are threefold. First, the commute time averages the time taken for a random walk to travel between a pair of vertices over all connecting paths residing on the original correlation based weighted adjacency matrix. Thus, the commute time can be considered as the enhanced correlation matrix. Second, since the commute time amplifies the correlation based affinity between a pair of vertices, it is robust under the perturbation of the graph structure, e.g., the changes of edges or paths. Thus, the commute time based enhanced correlation matrix is robust and provides a stable correlation representation for the timevarying financial network that may accumulate a lot of noises over time. Third, the commute time is calculated through the Laplacian matrix of the original correlation based weighted adjacency matrix. In Section III, we will show how the commute time matrix can be employed to identify a set of most mutually correlation stocks specified by a set of dominant vertices, associated with a quadratic problem.
In this subsection, we briefly introduce the concept of the commute time. Assume is a complete weighted graph, where is the edge set, is the vertex set , and each vertex of is connected by all the remainder vertices,. Let be the associated weighted adjacency matrix of . If , we say that the vertices and are adjacent. Let denote the degree matrix of . is a diagonal matrix and each of its diagonal element corresponds to the sum of the corresponding row or column of , i.e., . Then, the graph Laplacian matrix is computed by . The spectral decomposition of is defined as , where is a
diagonal matrix with ascending eigenvalues as elements, i.e.,
, and is a matrixwith the corresponding ordered eigenvectors as columns. For
, the hitting time between each pair of vertices and is computed as the expected number of steps taken by a classical random walk commencing from and ending at . Likewise, the commute time is defined as the expected number of steps of the random walk commencing from and ending at , and then coming back to again, i.e., . Thus, the commute time can be calculated through the unnormalized Laplacian eigenvalues and eigenvectors [33] as(1) 
Remarks: The commute time has been proven to be a powerful tool to extract rich characteristics from complete weighted graphs. In previous works, Bai et al. [6] have employed the commute time matrix to develop a new quantuminspired kernel for dynamic financial network analysis. Specifically, for the original complete weighted adjacency matrix of each financial network, they commence by abstracting the minimum or maximum spanning tree associated with the commute time matrix. For a pair of complete weighted graphs to be compared, the resulting quantum kernel is defined by measuring the similarity between their associated commute time spanning tree structures in terms of a new developed evolving model of discretetime quantum walks. This approach significantly reduces the problem of information loss that arises in previously mentioned thresholdbased methods for financial network analysis [41, 39, 38]. This is because the weights of the preserved edges on spanning tree structures correspond to the commute time values between corresponding pairs of vertices, and the commute time values integrates the effectiveness over all possible paths residing on the original weighted edges. However, similar to these thresholdbased approaches [41, 39, 38], the quantum kernel [6] cannot guarantee that the preserved vertices correspond to a more mutually correlated vertex subset, since the spanning tree is a very sparse structure (only edges preserved for the network with vertices) and many edges between the vertices do not exist. In other words, this kernel approach cannot reflect the most mutually correlated time series specified by the vertices, and will influence the effectiveness. To overcome this problem, in Section III, we will develop a new kernelbased approach for financial network analysis that can integrate the process of adaptively identifying the most mutually correlated financial time series of stocks associated with the commute time matrix.
IiB The Dynamic Time Warping Framework
We review the global alignment kernel that is defined through the classical dynamic time warping framework [14]. Assume is a set of discrete time series that take values in a space . For each pair of discrete time series and with lengths and respectively, the alignment between and
is computed as a pair of increasing integral vectors
of length , whereand
such that is assumed to possess unitary increments and no simultaneous repetitions. For and , each of their elements can be an observation vector with fixed dimensions at a corresponding time step. For any index that is between and (i.e., ), the following condition holds for the increment vector of , i.e.,
(2) 
Within the framework of the classical dynamic time warping [14], the coordinates and of the alignment define the warping function. Assume corresponds to a set of all possible alignments between and , Cuturi [14] has proposed a dynamic time warping inspired kernel, namely the Global Alignment Kernel, by considering all the possible alignments in . The kernel is defined as
(3) 
where is the alignment cost given by
(4) 
and is defined through a local divergence that quantifies the discrepancy between each pair of elements and . In general, is defined as the squared Euclidean distance. Note that,
the kernel measures the quality of both the optimal alignment and all other alignments , thus it is positive definite. Moreover, provides richer statistical measures of similarity by encapsulating the overall spectrum of the alignment costs .
Remarks: The dynamic time warping based global alignment kernel has been proven to be a powerful tool of analyzing vectorial time series [14]. To extend into the graph kernel domain, Bai et al. [4] have developed a family of nested graph kernels through . Specifically, they commenced by decomposing each graph structure into a family of layer expansion subgraphs rooted at the centroid vertex. The nested depthbased complexity trace of each graph is computed by measuring the entropy on the family of layer expansion subgraphs. Since the parameter varies from to , this complexity trace naturally forms a onedimensional sequencebased characterization vector, that is similar to the onedimensional time series vector. As a result, for a pair of graphs the resulting dynamic time warping based kernel can be directly computed by measuring the global alignment kernel between their complexity traces. Although, they demonstrated that the nested graph kernels outperform stateoftheart graph kernels [24, 35, 23] on graph classification tasks. Unfortunately, as we have stated, the financial networks are by nature complete weighted graphs and it is difficult to decompose such network structures into the required expansion subgraphs rooted at the centroid vertex. As a result, directly preforming the dynamic time warping inspired kernel for timevarying financial networks tends to be elusive and remains challenges.
Iii The Kernel for Timevarying Networks
In this section, we propose a kernelbased similarity measure for timevarying networks representing multiple coevolving financial time series. Specifically, we commence by identifying a set of most mutually correlated time series through maximizing a quadratic programming method on the commute time matrix. We exhibit how this allows us to compute a probability distribution for the time series belonging to the dominant set. Finally, we characterize each timevarying network as a discrete dominant entropy time series through the Shannon entropy associated with the probability distribution, and in turn develop a new kernelbased approach in terms of the classical dynamic time warping framework [14].
Iiia Identifying Dominant Correlated Time Series
We identify a set of the most mutually correlated time series for each timevarying financial network. Let be a family of timevarying financial networks extracted from a complex system and be the sample network extracted from the system at time . For , each vertex represents the time series of a different stock (e.g., the stock price), each edge represents the absolute Pearson correlation between a pair of time series, and is the absolute Pearson correlation based weighted adjacency matrix. In fact, this manner of constructing each network is a popular way to represent multiple coevolving financial time series [43, 30, 37, 38, 41]. Note that, in this paper we assume that the timevarying network structures have fixed numbers of vertices, i.e., these networks have the same vertex set , whereas the edge sets are quite different with time . In realworld application, this a very common situation and usually appears where the timevarying networks are extracted from complex systems with a specified set of coevolving time series, i.e., the system has a fixing number of components coevolving with time.
For each network , we first compute its commute time matrix as associated with its original absolute Pearson correlation based adjacency matrix. As we have stated previously, the commute time not only reflects the integrated effectiveness of all possible weighted paths between a pair of vertices of the original network structure, but is also robust with the perturbation of the network structure (i.e., the changes of edge weight on the original weighted adjacency matrix). As a result, the commute time matrix can be seen as a reliable enhanced absolute Pearson correlation matrix for . In other words, the commute time matrix provides a stable representation to further characterize the dynamic network associated with timevarying correlations between vertices.
With the commute time matrix of each network to hand, we automatically identify a set of dominant correlated time series through the dominant set problem proposed by Pavan et al. [32]. The definition of the dominant set simultaneously emphasizes internal homogeneity and together with external inhomogeneity, and can be employed as a general definition of a cluster. An instance is exhibited in Fig.2. Here, assume a timevarying financial network consisting of vertices denoted as , , , and . Each weight of this network represents the correlation between pairwise vertices. For this instance, the subset forms the dominant set, i.e., the internal set. This is because the sum of the edge weights between the internal set is larger than the sum of those between the internal and external sets. As a result, the time series specified by can seen as the set of the most mutually correlated time series. To automatically identify the most mutually correlated time series from , we can solve the corresponding dominant set problem by maximizing a quadratic program problem [32]. More specifically, associated with , we compute the solution of the following quadratic program problem [32]
(5) 
subject to , and . The solution vector of Eq.(5) is an dimensional vector. When , the th time series represented by the vertex belongs to the most correlated time series subset of . Thus, the number of the selected time series is specified by counting the number of all positive components of . Based on the definition of Pavan and Pelillo [32], we can solve the local maximum of by
(6) 
where corresponds to the th time series represented by at iteration . Based on the element value of , all time series represented by the vertices fall into two disjoint subsets, i.e.,
and
Clearly, the set with nonzero values indicates the set of dominant correlated time series, i.e., the set of the most mutually correlated time series. Finally, note that, the solution vector also corresponds to a probability distribution of the time series belonging to the dominant set , i.e., each element corresponds to the probability of the th time series belonging to .
IiiB The Entropic Dynamic Time Warping Kernel
In this subsection, we develop a new kernel method for analyzing timevarying financial networks based on the classical dynamic time warping framework. To this end, we commence by representing the complex networks as discrete dominant entropy time series using the Shannon entropy through the most mutually correlated time series set introduced in Section IIIA. The reason of characterizing the network using the entropy measure is that the Shannon entropy is an effective way of measuring the uncertainty in the corresponding financial system, associated with the probability distribution of the stocks belonging to the correlated set. Specifically, for each sample network from at time , we first compute the associated commute time matrix . Moreover, by solving the quadratic program problem [32] on the commute time matrix , we identify the set of dominant correlated time series and compute the associated probability distribution of the time series belonging to . Based on Section IIIA, the remaining nondominant correlated time series are included in the set . With the probability distribution to hand, the dominant Shannon entropy of is computed as
(7) 
where is the probability of the th time series represented by vertex . Eq.(7) indicates that the dominant Shannon entropy is computed by the sum of elements , thus each element can be seen as a dominant subentropy of the th time series represented by vertex , i.e.,
(8) 
Note that, if , we say that the th time series does not belong to and we set . With the subentropies of all vertices to hand, we compute the dominant entropy characteristics for each network at time as
(9) 
where
(10) 
Eq.(10) indicates that we only compute the subentropies for the dominant correlated time series in , and do not consider the nondominant correlated time series in .
With the dominant entropy characteristics to hand, we further characterize each network as entropy time series. Let a time window be denoted as a period of time steps. We shift this window along the whole time steps of the complex system to construct the timevarying dominant entropy time series for each network at time . Specifically, for each time window of the network , we compute the dominant entropy time series of as
(11) 
where , and each column of is the entropy characteristics vector of each network at time and is defined by Eq.(9). Clearly, the dominant entropy time series of the network encapsulates the timevarying entropy characteristics vectors of the networks at time to at time .
Assume and are a pair of timevarying networks at time and respectively, and their associated entropy time series are
and
We define the Entropic Dynamic Warping Kernel (EDTWK) between and as
(12) 
where is the dynamic time warping inspired Global Alignment Kernel (GAK) defined in Eq.(3), is the warping alignment between the entropy time series of and , is all possible alignments and refers to the alignment cost obtained via Eq.(4).
Remarks: Although the proposed EDTWK kernel is related to the general principles of the GAK kernel. However, the proposed kernel has two distinct theoretical differences. First, the original GAK kernel is only designed for vectorial time series and cannot capture intrinsic relationships between time series. In contrast, our proposed kernel is explicitly designed for timevarying financial networks that reflect correlations between pairs of time series. Second, only the proposed EDTWK kernel can identify the dominant correlated time series through the analysis over the commute time matrix. Based on financial risk theory [19], financial crises are usually caused by a set of most correlated stock time series having less uncertainties. Therefore, only the proposed kernel is able to capture more reliable financial information. In summary, the proposed kernel provides an effective way of incorporating the structural correlations between time series into the process of multiple coevolving time series analysis.
IiiC Time Complexity
For a pair of networks, the proposed kernel requires time complexity . The reasons are as follows. Assume a family of timevarying networks and each network has vertices. Computing the dominant commute time entropy kernel between a pair of networks associated with a time window of steps requires time complexity . Because computing the required entropy time series is based on the computation of the commute time. This computation relies on the spectral decomposition of the Laplician matrix and thus requires time complexity . Moreover, computing all possible warping alignments over time steps requires time complexity . Thus, the whole time complexity of the proposed kernel is .
IiiD Related works to the Proposed Kernel
Comparing to some stateoftheart approaches, the proposed EDTWK kernel has a number of advantages.
First, unlike the dynamic time warping inspired GAK kernel [14], the proposed kernel is developed for timevarying complex networks. Since the network encapsulates rich corelationship between pairwise coevolving time series, the proposed kernel can reflect richer correlated information than the classical dynamic time warping framework for original vectorial time series.
Second, the proposed kernel is based on the new dominant entropy time series that is computed through a quadratic programming method on the commute matrix to identify the most correlated time series subset. As a result, unlike the existing thresholdbased approaches [38, 41, 39] that roughly select pairs of relatively more correlated time series, the proposed kernel can reflect reliable dominant correlated information between time series through the dominant entropy time series. Furthermore, the commute time encapsulates the integrated effectiveness of all possible paths between a pair of vertices. As a result, the dominant entropy time series computed through the commute time matrix can potentially encapsulate the weighted information over all edges, and overcome the shortcoming of information loss arising in the thresholdbased approaches.
Third, as we have stated, the timevarying networks are usually complete weighted networks. Most existing graph kernels cannot directly accommodate such network structures and need to transform them into sparse structures. Unfortunately, these sparse structures discard many weighted edges and certainly lead to information loss. By contrast, the commute time is computed through the Laplacian matrix that can directly accommodate complete weighted graphs. Thus, the proposed kernel encapsulates the whole structural information residing on all weighted edges.
In summary, the proposed kernel bridges the gap between stateoftheart graph kernels and the classical dynamic time warping framework for timevarying networks, providing a new alternative way for analyzing time series more effectively.
Iv Experiments of Time Series Analysis
We empirically validate the effectiveness of the proposed kernel approach on a family of timevarying financial networks extracted from the New York Stock Exchange (NYSE) dataset [38, 41]. The NYSE dataset consists of 347 stocks associated with their daily closing prices over 6004 transaction days starting from January 1986 to February 2011. These prices are all collected from the public financial dataset on Yahoo (http://finance.yahoo.com). To abstract the timevarying financial network structures, we employ a time window of fixed size (i.e., 28 days). We slide this fixed sized window along time to derive a sequence from the 29th trading day to the 6004th trading day, where each temporal time window encapsulates a set of 347 coevolving daily stock price time series of the 347 stocks over 28 days. We characterize the trades between various stocks as a network structure with each stock as the vertex. Specifically, for each time window we calculate the absolute value of the Pearson correlation between the time series for pairwise stocks as their edge weight. This in turn generates a family of timevarying financial network with a fixed number of 347 vertices and varying edge weights for the 5976 trading days. The aim of this study is to investigate whether the proposed kernel approach can be used to detect fluctuations in trading network structure due to global political or economic events.
Iva Evaluation of The Entropy Time Series
We commence by exploring whether the dominant entropy time series can significantly characterize the timevarying financial networks, since these new developed time series play an important role for the proposed kernel. Specially, we investigate the evolutionary behavior of the NYSE stock market through calculating the dominant Shannon entropy on the timevarying financial networks at each time step, i.e., we investigate how the sum of the dominant subentropies of the network varies with increasing time . We exhibit the results in Fig.3, where the xaxis corresponds to the date (time) and the yaxis corresponds to the dominant Shannon entropy values. Fig.3 shows that the dominant Shannon entropy is sensitive to different financial crises (i.e., Black Monday [9], Dotcom Bubble Burst [3], Bankrupt of New Centry Financial, Lehman Crisis in Subprime Crisis Period [29], Enron Crisis, and 1997 Asian Financial Crisis), and the entropy values usually lead to a rapid decrease even many days before the significant financial event. In other words, each significant fluctuation of the dominant Shannon entropy values corresponds to a financial crisis, and provides early warning before the crisis occurs. The reason for the effectiveness is that the financial networks are constructed by computing the correlation between pairwise stock time series and the dominant Shannon entropy is computed based on the dominant correlated time series subset that is identified through the commute time matrix. Based on the financial risk theory stated by [19], the financial crisis is usually caused by a set of most correlated stocks having less uncertainties. Thus, the dominant Shannon entropy, that characterizes the the dominant correlated stocks, tends to significantly drop down before a financial crisis. The experiments demonstrates that the proposed dominant entropy time series through the commute time can capture significant financial information, satisfying financial theory.
Note that, although Fig.3 indicates that the dominant Shannon entropy is effective for identifying the extreme financial events in the evolution of the timevarying financial networks, the entropy measure can only represent the network characteristics in an onedimensional pattern space and thus ignores the information regarding specific changes in the network structure. In contrast, our proposed kernel associated with the dominant entropy is able to map the network structures into a high dimensional Hilbert space via kernelizing the entropies. In other words, our kernels should better preserve structural information contained in the network structures.
IvB Kernel Embeddings of Financial Networks from kPCA
In this subsection, we evaluate the performance of the proposed EDTWK kernel on timevarying networks of the NYSE dataset, and explore whether the proposed kernel can distinguish the network evolution with time. Specifically, associated with the proposed kernel, we perform kernel Principle Component Analysis (kPCA) [40] on the kernel matrix of the financial networks and embed the networks in a vector space. We visualize the embedding results through the first three principal components in Fig.4(a). In addition, we compare the proposed kernel to three classical graph characterization methods (GC), that is, the Shannon entropy associated with the classical steady state random walk [7], the von Neumann entropy associated with the normalized Laplician matrix [15], and the average length of the shortest path over all pairs of vertices [38]. All the three network characterization methods can accommodate complete weighted graphs. The visualization spanned by the three graph characterizations are shown in Fig.4(b). Finally, we also compare the proposed kernel with four stateoftheart kernel methods, that is, the dynamic time warping inspired global alignment kernel (GAK) for original time series [14] and three graph kernels for timevarying financial networks. Specifically, the graph kernels include the WeisfeilerLehman subtree kernel (WLSK) [35], the quantum JensenShannon kernel (QJSK) [7], and the feature space Laplacian graph kernel (FLGK) [25]. For the GAK kernel, we also adopt a time window of 28 days for each trading day. For the WLSK kernel, since it cannot accommodate a complete weighted graph and the edge weight, we transform each original network into a minimum spanning tree and ignore the edge weights. For the QJSK and FLGK kernels, since they can accommodate edge weight, we directly apply these kernels to the original financial networks. For the four kernel methods, we also perform kPCA on the resulting kernel matrices and embed the original time series or the timevarying networks into a vector space, and the embedding results are exhibited in Fig.4(c), 4(d), 4(e) and 4(f), respectively.
Methods  EDTWK  GC  GAK  WLSK 
Distance Stress  
Methods  QJSK  FLK  EDTWKO  QK 
Distance Stress 
As we previously stated, the commute time matrix computed from the original absolute Pearson correlation based adjacency matrix of each financial network can be seen as the enhanced Pearson correlation matrix for the proposed DETWK kernel. Thus, the commute time matrix plays a significant role to determine the effectiveness of the proposed kernel. To further validate its effectiveness, we also compare the proposed DETWK kernel associated with commute time matrices with that associated with the original weighted adjacency matrices (DETWKO), and the result is displayed in Fig 4(g). In addition, we compare the proposed kernel with the quantuminspired kernel (QK) [6], since the QK kernel can also accommodate the original Pearson correlation based adjacency matrices of financial networks through the commute time matrices. The result of the QK kernel is exhibited in Fig 4(h).
Fig.4 exhibits the traces of the timevarying financial networks (or the original time series) in the different kernel spaces together with the classical graph characterization pattern space over all trading days. The color bar of each subfigure indicates the specific date over time. We observe that the embedding from the proposed EDTWK kernel exhibits a better manifold structure. Moreover, the embedding resulting from the EDTWK, WLSK and QK kernels are better distributed than those defined by the remaining approaches for comparisons. Another interesting phenomenon exhibited in Fig.4 is that only the proposed EDTWK kernel produces a clear timevarying trajectory associated with the financial networks of consecutive time steps, i.e., the embedding of each timevarying financial network on the current time step is closely near to that on the last time step in the embedding space. By contrast, the alternative methods hardly result in such a trajectory and the associated embeddings tend to distribute as clusters. To further demonstrate this effectiveness of the proposed EDTWK kernel, we compute the distance stress of the financial network embeddings based on different methods. Specifically, the distance stress is defined as
(13) 
where , is the network embedding vector at time , and is the nearest network embedding vector of . For each embedding vector at time , if the nearest embedding vector is always the embedding vector at last time step (i.e., ), the value of the distance stress DS will be . In other words, the distance stress value nearer to indicates the better performance of the embeddings to form a clear timevarying trajectory. The distance stress values of the financial network embeddings based on different methods are show in Table I. It is clear that only the distance stress value of the proposed EDTWK kernel is nearer to . This further indicates that the proposed method can better preserve the ordinal arrangement of the timevarying financial networks.
Finally, although the EDTWK kernel (i.e., the proposed kernel associated with the original weighted adjacency matrix) can also form a clear timevarying trajectory, its embedding points of the financial networks are not well distributed. This reveals that the proposed approach associated with the commute time matrix has better performance than that associated with the original absolute Pearson correlation matrix, demonstrating the effectiveness of the commute time integrated in the proposed approach
. To further reveal the effectiveness of adopting the commute time matrix, we randomly select a financial network. For this network we perform multidimensional on both its commute time matrix and the Pearson correlation matrix, to embed each vertex (i.e., the 347 stocks from NYSE) into a 2dimensional pattern space. Suppose the affinity matrix in this question is
(i.e., the commute time or the Pearson correlation matrix), then the centred similarity matrix is given by(14) 
where is the number of vertices, is the identity matrix, and is the allones matrix. If is the eigendecompositon of the kernel matrix in terms of the diagonal matrix of ordered eigenvalues and the correspondingly ordered matrix of column eigenevtors , then the matrix with the embedding coordinates as column vectors is . Here we discard rows corresponding to negative eigenvalues and take the leading two rows. The results are shown in Fig.5. It is interesting that the stock embedding points through the commute time matrix distribute well and form an approximately linear manifold structure. By contrast, the stock embedding points through the Pearson correlation matrix disperse over a larger volume of the space. Note that we will observe similar results for most of our financial networks. This reveals that when compared to the Pearson correlation matrix, the commute time based vertex affinity matrix offers the advantage of capturing more reliable relationships between the stocks which reside in an approximately linear subspace.
Overall, the above observations indicates that the proposed EDTWK kernel associated with the commute time matrix can better understand the structural evolutions of the financial networks with time than the remaining methods.
IvC Kernel Embeddings for Financial Crisis Analysis
To take our study one step further, in this subsection we exhibit more details of the kernel embedding results for three different financial crisis periods that have been preliminarily explored in Section IVA. Specifically, for differen methods, Fig.6 corresponds to the Black Monday period (from 15th June 1987 to 17th February 1988), Fig.7 corresponds to the Dotcom Bubble period (from 3rd January 1995 to 31st December 2001), and Fig.8 corresponds to the Enron Incident period (the red points, from 16th October 2001 to 11th March 2002). These figures show that the Black Monday (17th October, 1987), the Dotcom Bubble Burst (13rd March, 2000) and the Enron Incident period (from 2nd December 2001 to 11th March 2002) are all extreme financial events. The embedding points of the proposed EDTWK kernel before and after these events can be better separated into independent clusters, and the points representing the extreme financial events are in the middle of the corresponding clusters. Another interesting phenomenon from Fig.8 is that the embedding points of the networks between year 1986 and year 2011 are distinctly divided by the Prosecution against Arthur Andersen (3rd November, 2002). Since the prosecution symbolizes the end of the Enron Incident, the Enron Incident can be considered as a watershed at the beginning of the 21st century, which significantly distinguishes the financial networks of the 21st century from those of the 20th century. These observations again indicate that the proposed kernel can well understand and detect the abrupt financial incident significantly changing the network structures. Although some methods are competitive, only the proposed kernel produces a clearer trajectory in terms of the embedding distributions, i.e., the proposed kernel can better reflect the timevarying transition with time. Note that, we can observe similar results if we explore the proposed kernel and the remaining methods on the alternative financial crisis mentioned in Section IVA.
The above experimental results indicate that the WLSK kernel as well as the QK kernel are the most competitive alternative methods to the proposed EDTWK kernel. Since the embedding results of these two kernels can also be better divided before and after a financial event. To further demonstrate the effectiveness of the proposed kernel, we compare the three methods on two financial events happened in SubPrime Crisis period (from 2nd January 2006 to 1st July 2009) in more details. The financial events for comparisons are the Newcentury Financial Bankruptcy (4th April 2007) and the Lehman Brothers Bankruptcy (15th September 2008), both having significant impacts in the world finance history. Specifically, for each method, we visualize the set of points that indicate the path of the kPCA embeddings with time over about 90 trading days (i.e., 90 points) around each of the two events. The results are shown in Fig.9, and the colour bar beside each subfigure represents the data in time series. Note that, we only show the result from the WLSK kernel, since we will observe the similar phenomenon from both the WLSK and QK kernel. For the proposed kernel, we observe that the point distribution forms a clear trajectory with time, and the trajectory around each of the financial events usually undergoes significant changes, i.e., the trajectory starting from the financial event will tempestuously change the distribution direction during a short time period. By contrast, the point distributions from the WLSK kernel are chaos and cannot observe any significant phenomenon. This demonstrates that only the proposed kernel has better capability to both characterize and distinguish financial crises.
IvD Evaluations of the Kernel Matrix
To further reveal the effectiveness of the proposed EDTWK kernel, in this subsection we also draw the kernel matrices of both the proposed EDTWK kernel and the competitive alternative WLSK kernel. Note that, for another competitive QK kernel, we can observe the similar phenomenon with the WLSK kernel, thus we only exhibit the experiment with the WLSK kernel. The kernel matrices are computed between the networks belonging to the Newcentury crisis period and the Lehman crisis period, as well as that belonging to all the 6004 trading dats of the NYSE dataset. The kernel matrix visualization results are shown in Fig.10, where both the xaxis and yaxis represent the time steps. Note that, to compare the two kernels in the same scaled Hilbert space, we consider the normalized version of both the kernels as
where is the normalized kernel, and is either the EDTWK kernel or the WLSK kernel. As a result, the kernel values are all bounded between to , and the colour bar beside each subfigure of Fig.10 represents the kernel value of the kernel matrix. Through Fig.10, we observe that the kernel values tend to decrease when the elements of the kernel matrix are far away from the trace of the matrix. This is because such elements of the kernel matrix are computed between timevarying financial networks having long time spans, there are more structure changes when the network evolves with a long time variation. Thus, both the EDTWK and the WLSK kernels can reflect the structural evolution of the financial networks with time. However, on the other hand, we observe that the kernel values associated with the WLSK kernel tend to suddenly drop down when the element is a little far from the trace. By contrast, the kernel values associated with the proposed EDTWK kernel tend to gradually decrease when the element gets far away from the trace. This observation reveal the reason why only the kPCA embeddings through the proposed EDTWK kernel can form clear trajectory with time variation in Fig.9. Therefore, only the proposed EDTWK kernel can well distinguish and understand the structural changes of the network structures evolving with a long time period. Note that, we can observe similar results if we explore the proposed kernel and the remaining methods on the alternative financial crisis mentioned in Section IVA.
All the above experiments demonstrate the effectiveness of the proposed kernel. The reasons for the effectiveness are fourfold. First, unlike the WLSK kernel, the proposed kernel can directly accommodate the timevarying networks that are complete weighted graphs. Second, unlike the GC method that computes vectorial network characteristics and tends to approximate the timevarying networks in a low dimensional pattern space, the proposed kernel can represent the network structures in a high dimensional Hilbert space and thus better preserve the network characteristics. Third, unlike the GAK for original time series, the proposed kernel for timevarying networks is based on the dominant entropy time series and reflects richer correlated information than the original time series. Fourth, unlike the QJSK and FLK kernels, only the proposed kernel can capture reliable financial information through the dominant entropy time series.
IvE Classifications from CSVMs
Datasets  Black Monday  Newcentury  Lehman  Asia 1997  Dotcom 

EDTWK  
QJSK  
WLSK  
QK 
In this subsection, we validate the effectiveness of the proposed EDTWK kernel on classification tasks. Specifically, we explore whether the proposed kernel can be used to correctly classify the timevarying financial networks into corresponding stages of each financial crisis period. The crisis periods for evaluations include the Black Monday period, the Dotcom Bubble period, the Newcentury Financial crisis period, the Lehman Crisis period and the 1997 Asia Financial crisis period. Since the Enron crisis is not an emergency crisis and continued for many trading days, we do not perform the classification evaluation on this crisis. For each of these selected financial crisis periods, we utilize 100 trading days around the particular day when the crisis happens, i.e., we respectively select 50 days before and after the crisis event. For each crisis, we sequentially divide the 100 continuous trading days into 10 stages and each stage contains 10 trading days, i.e., the timevarying financial networks of the 100 trading days are sequentially separated into 10 classes. For each financial crisis period, we calculate the kernel matrix between the financial networks of the trading days. Moreover, for the proposed kernel, we perform 10fold crossvalidation using the CSupport Vector Machine (CSVM) Classification to compute the classification accuracies, using LIBSVM
[11]. We use nine samples for training and one for testing. All the CSVMs were performed along with their parameters optimized on each dataset. We repeat the whole experiment 10 times and report the average classification accuracies and standard errors in Table.
II. We also compare the proposed EDTWK kernel with the competitive WLSK and QK kernels [35, 6], as well as the QJSK kernel [7], and the evaluations associated with these kernels follow the same experimental setup of the proposed kernel. The experiment results are also reported in Table.II. It is clear that the proposed EDTWK kernel outperforms the alternative stateoftheart graph kernels on the evaluation of any crisis period, and the proposed kernel can well classify each financial network into correct timevarying stages. This evaluation demonstrates that the proposed kernel has better ability to understand how the structures of the financial networks evolve with time.V Conclusion and Future Work
In this paper, we have proposed a new dynamic time warping framework inspired kernel, namely the Entropic Dynamic Time Warping Kernels between timevarying financial networks for multiple coevolving financial time series analysis. Specifically, for a family of timevarying financial networks with each vertex representing the individual time series of a stock and each edge between pairwise series representing the correlation, we have computed the commute time matrix on each of the network structures and shown how this matrix allows us to identify a dominant correlated stock set as well as the associated dominant probability distribution of these stocks belonging to this set. Based on the probability distribution, we have represented each original network as dominant Shannon entropy time series. With the dominant entropy time series for each pair of financial networks to hand, the proposed kernel has been defined through the dynamic time warping based global alignment kernel between the entropy time series. We have shown that the proposed kernel bridges the gap between graph kernels and the classical dynamic time warping framework for time series analysis. Experiments on timevarying networks extracted from New York Stock Exchange (NYSE) database demonstrate the effectiveness.
In this work, we have identified the most strongly correlated stock trading patterns based on the computetime between pairs of stocks in a network of trading relationships. Commute time implicity averages over all paths connecting a pair of stocks in the network, and not just the firstorder nearest neighbour relations. This renders it robust to missed or erroneously inferred correlation relations in the time series of the stock closing prices. In the future, we will explore the use of hypergraph representations employing the relationships between multiple stocks. Here we will use the commute time to define the groups of stock and develop a hypergraph kernel to measure the relationships between groups.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Grant no. 61976235, 61602535 and 61503422), the Open Project Program of the National Laboratory of Pattern Recognition (NLPR), and the program for innovation research in Central University of Finance and Economics. Primary Contract Authors: Lu Bai (bailucs@cufe.edu.cn) and Lixin Cui (cuilixin@cufe.edu.cn). Lu Bai and Lixin Cui have equal contributions.
References
 [1] (2011) Shannon and von neumann entropy of random networks with heterogeneous expected degree. Physical Review E 83 (3), pp. 036109. Cited by: §IA.
 [2] (2014) Entropy distribution and condensation in random networks with a given degree distribution. Physical Review E 89 (6), pp. 062807. Cited by: §IA.
 [3] (2010) Speculative bubbles in the s&p 500: was the tech bubble confined to the tech sector?. Journal of empirical finance 17 (3), pp. 345–361. Cited by: §IVA.
 [4] (To Appear) Localglobal nested graph kernels using nested complexity traces. In Pattern Recognition Letters, Cited by: §IIB.
 [5] (2014) Attributed graph kernels using the jensentsallis qdifferences. In Proceedings of ECMLPKDD, pp. 99–114. Cited by: §IA.
 [6] (2019) A quantuminspired similarity measure for the analysis of complete weighted graphs. IEEE Transactions on Cybernetics, pp. To Appear. Cited by: §IIA, §IVB, §IVE.
 [7] (2015) A quantum jensenshannon graph kernel for unattributed graphs. Pattern Recognition 48 (2), pp. 344–355. Cited by: §IVB, §IVE.

[8]
(2015)
An aligned subtree kernel for weighted graphs.
In
Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 611 July 2015
, pp. 30–39. Cited by: §IA.  [9] (2007) Exorcising ghosts of octobers past. The Wall Street Journal, pp. C1–C2. Cited by: §IVA.
 [10] (2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10 (3), pp. 186–198. Cited by: §IA.
 [11] (2011) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. Cited by: §IVE.
 [12] (2007) On the time series knearest neighbor classification of abnormal brain activity. IEEE Trans. Systems, Man, and Cybernetics, Part A 37 (6), pp. 1005–1016. Cited by: §I.
 [13] (2009) Erratum to: ”multiscale anomaly detection algorithm based on infrequent pattern of time series” [J. comput. appl. math. 214(1) (2008) 227237]. J. Computational Applied Mathematics 231 (2), pp. 1004. Cited by: §I.
 [14] (2011) Fast global alignment kernels. In Proceedings of ICML, pp. 929–936. Cited by: §IIB, §IIB, §IIID, §III, §IVB.
 [15] (2011) A history of graph entropy measures. Information Science 181 (1), pp. 57–78. Cited by: §IVB.
 [16] (2011) Centrality measures and thermodynamic formalism for complex networks. Physical Review E 83 (4), pp. 046117. Cited by: §IA.
 [17] (1998) Measures of statistical complexity: why?. Physics Letters A 238 (4), pp. 244–252. Cited by: §IA.
 [18] (2007) Thermodynamic forces, flows, and onsager coefficients in complex networks. Physical Review E 76 (6), pp. 061106. Cited by: §IA.
 [19] (2013) Quantifying systemic risk. The University of Chicago Press. Cited by: §IB, §IIIB, §IVA.
 [20] (1999) Convolution kernels on discrete structures. In Technical Report UCSCRL9910, University of California at Santa Cruz, Santa Cruz, CA, USA. Cited by: §IA.
 [21] (1987) Statistical mechanic. Wiley, New York. Cited by: §IA.
 [22] (2013) Quantumclassical transitions in complex networks. Journal of Statistical Mechanics: Theory and Experiment 2013 (04), pp. 04019. Cited by: §IA.
 [23] (2014) Global graph kernels using geometric embeddings. In Proceedings of ICML, pp. 694–702. Cited by: §IIB.
 [24] (2003) Marginalized kernels between labeled graphs. In Proceedings of ICML, pp. 321–328. Cited by: §IA, §IIB.
 [25] (2016) The multiscale laplacian graph kernel. In Proceedings of NIPS, pp. 2982–2990. Cited by: §IVB.
 [26] (2016) On valid optimal assignment kernels and applications to graph classification. In Proceedings of NIPS, pp. 1615–1623. Cited by: §IA.

[27]
(2013)
Changepoint detection in timeseries data by relative densityratio estimation
. Neural Networks 43, pp. 72–83. Cited by: §I.  [28] (2018) Treebased kernel for graphs with continuous attributes. IEEE Trans. Neural Netw. Learning Syst. 29 (7), pp. 3270–3276. Cited by: §IA.
 [29] (2008) Lehman files for bankruptcy, merrill sold, aig seeks cash. The Wall Street Journal. Cited by: §IVA.
 [30] (2005) Dynamical aspects of interaction networks. International Journal of Bifurcation and Chaos 15, pp. 3467. Cited by: §IA, §IIIA.
 [31] (2018) A degeneracy framework for graph similarity. In Proceedings of IJCAI, pp. 2595–2601. Cited by: §IA.
 [32] (2007) Dominant sets and pairwise clustering. IEEE Trans. Pattern Anal. Mach. Intell. 29 (1), pp. 167–172. Cited by: §IIIA, §IIIB.
 [33] (2007) Clustering and embedding using commute times. IEEE Trans. Pattern Anal. Mach. Intell. 29 (11), pp. 1873–1890. Cited by: §IB, §IIA.
 [34] (2017) A piecewise aggregate pattern representation approach for anomaly detection in time series. Knowl.Based Syst. 135, pp. 29–39. Cited by: §I.
 [35] (2009) Efficient graphlet kernels for large graph comparison. Journal of Machine Learning Research 5, pp. 488–495. Cited by: §IIB, §IVB, §IVE.
 [36] (2010) Weisfeilerlehman graph kernels. Journal of Machine Learning Research 1, pp. 1–48. Cited by: §IA.
 [37] (2008) Analysis of chaotic dynamics using measures of the complex network theory. In Proceedings of ICANN, pp. 61–70. Cited by: §IA, §IIIA.
 [38] (2015) Modular dynamics of financial market networks. arXiv preprint arXiv:1501.05040. Cited by: §IA, §IA, §IB, §IB, §I, §IIA, §IIA, §IIIA, §IIID, §IVB, §IV.
 [39] (To Appear) Directed and undirected network evolution from euler clagrange dynamics. Cited by: §IA, §IIA, §IIA, §IIID.
 [40] (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufmann. Cited by: §IVB.
 [41] (2015) Thermodynamic characterization of networks using graph polynomials. Physical Review E 92 (3), pp. 032810. Cited by: §IA, §IA, §IA, §IB, §IB, §IIA, §IIA, §IIIA, §IIID, §IV.
 [42] (2015) Compositional segmentation of time series in the financial markets. Applied Mathematics and Computation 268, pp. 399–412. Cited by: §I.
 [43] (2006) Complex network from pseudoperiodic time series: topology versus dynamics. Physical Review Letters 96, pp. 238701. Cited by: §IA, §IIIA.
Comments
There are no comments yet.