Mask Combination of Multi-layer Graphs for Global Structure Inference

10/22/2019 ∙ by Eda Bayram, et al. ∙ 10

Structure inference is an important task for network data processing and analysis in data science. In recent years, quite a few approaches have been developed to learn the graph structure underlying a set of observations captured in a data space. Although real world data is often acquired in settings where relationships are influenced by a priori known rules, this domain knowledge is still not well exploited in structure inference problems. In this paper, we identify the structure of signals defined in a data space whose inner relationships are encoded by multi-layer graphs. We aim at properly exploiting the information originating from each layer to infer the global structure underlying the signals. We thus present a novel method for combining the multiple graphs into a global graph using mask matrices, which are estimated through an optimization problem that accommodates the multi-layer graph information and a signal representation model. The proposed mask combination method also estimates the contribution of each graph layer in the structure of signals. The experiments conducted both on synthetic and real world data suggest that integrating the multi-layer graph representation of the data in the structure inference framework enhances the learning procedure considerably by adapting to the quality and the quantity of the input data

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 6

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many real-world data can be represented with multiple forms of relations between data samples. Examples include social networks that relate individuals based on different types of connections or behavioral similarities [7, 29]

, biological networks where different modes of interactions exist between neurons or brain regions

[4, 8], transportation networks which lead to the movement of people via different transportation means [5, 1]. Multi-layer graphs are convenient for encoding complex relationships of multiple types between data samples [17]. While they can be directly tailored from a multi-relational network such as a social network data, multi-layer graphs can also be constructed from a multi-view data [9, 16], where each layer is based on one type of feature.

In this paper, we consider data described by a multi-layer graph representation where each data sample corresponds to a vertex on the graph along with signal values acquired on each graph vertex. Then, each graph layer accommodates a specific type of relationship between the data samples. From a multi-view data analysis perspective, we assume that the observed signals reside on a global view, which is latent, while the information about every single view is known. Ultimately, we aim at inferring the hidden global graph that best represents the structure of the observed signals. Here, the task is to employ the partial information given by the multi-layer graphs to estimate the global structure of the data.

In the proposed framework, each graph layer guides the global structure inference process by providing a different type of information. The global graph is learned on the basis of a task whose semantics are determined by the signal set. For such a task, the connections within one layer may not have the same level of importance or multiple layers might have redundancy due to a correlation between them. Hence, it may cause information loss to consider a single layer as it is, or to merge all the layers at once [21]. In such cases, exploiting properly the information originating from each layer and combining them based on the targeted task may improve the performance of the data analysis framework.

Considering the aforementioned challenges, we propose a novel technique to combine the graph layers, which has the flexibility of selecting the connections relevant to the task and dismissing the irrelevant ones from each layer. For this purpose, we employ a set of mask matrices, each corresponding to a graph layer. Through the mask combination of the layers, we then learn the global structure underlying the set of signals. The mask matrices are indicative of the contribution of each layer on the global structure. The problem of learning the unknown global graph boils down to learning the mask matrices, which is solved via an optimization problem that takes into account both the multi-layer graph representation and a signal representation model. The signal representation model depends on the assumption that the signals are smooth on the unknown global graph structure. The main benefit of the proposed method over state-of-the-art methods learning a graph directly from the observations is that it can compensate for the often encountered case where we have a limited number of signals deviating from the assumed signal representation model. Incorporation of the side information obtained from the multi-layer graph representation leads to a more reliable solution in such cases.

Fig. 1: An illustration for the input and output of the mask learning algorithm

Figure 1 illustrates the general framework with inputs that are signals captured on a set of data samples and the multi-layer graph representation that stores the relations between those and the ultimate output that is the global graph that best fits the signals. The set of mask matrices, which forms the mask combination of graph layers, is an output together with a corrective term bridging the gap between the multi-layer graph representation and the signal representation model. The mask combination and the corrective term are summed up to yield the global graph.

We run experiments on a multi-relational social network dataset and a meteorological dataset where the introduced set of observations determines how to combine the multi-layer graphs into a global graph. In the experiments on the meteorological data, for instance, we employ different types of measurements. When the type of the measurement is “temperature”, the task is to infer the global structure that well explains the temperature signals. Yet on the same set of weather stations, if we consider “snow-fall” measurements, then the task is to infer the global structure underlying the snow-fall signals, which is found to be different from that of temperature. The layer combination properly adapts to the target task and thus the inferred mask matrices uncover the relative importance of the layers in terms of structuring the signals of interest. In addition, we test our algorithm on some synthetic data simulating different conditions in terms of the support of the multi-layer graph representation and the agreement of the signal set to the signal representation model. The performance is compared against the state-of-the-art graph learning methods. The results suggest that, in a structure inference problem, exploiting the additional information given by the data space through a multi-layer graph representation enhances the learning procedure by increasing its adaptability to variable input data quality.
Contributions. This paper proposes a novel structure inference framework that learns a graph structure from observations captured on a data space. The main contributions are summarized as follows: (i) The graph learning procedure is integrated with a multi-layer graph representation that encodes certain information offered by the data space. This permits profiting from the domain knowledge in the learning procedure. (ii) The task-relevant information is deduced effectively from each graph layer and combined into a global graph via a novel masking technique. (iii) The mask matrices are optimized on the basis of the task determined by the set of observations. Hence, they indicate the semantic contribution of the layers.

The rest of the paper is organized as follows. We make an overview of the related work in Section II. In Section III, we present the notation used in the paper, explain the proposed algorithm and discuss it in detail. We give experimental results based on both synthetic and real-world data in Section IV. Finally, we conclude in Section V.

Ii Related Work

In the last decade, many studies have adopted multi-layer networks to treat the data emerging in complex systems ranging from biological and technological networks to social networks, which promotes fundamental network analysis operations. In social networks, for instance, each type of relationship between individuals may be represented by a single layer and a specific combination of the layers may reveal hidden motifs in the network. For this purpose, Mangani et al. [21]

propose the concept of power-sociomatrix, which adopts all possible combinations of the layers in the analysis of a social network. Considering multiple graph representations of a data space has also gained importance in some machine learning frameworks as well. For example, Argyriou et al.

[3]

propose to adopt a convex combination of Laplacians of multiple graphs representing a data space for the semi-supervised learning task. The convex combination of graph layers is useful for weighted graph representations. However, from the topological perspective, a convex combination of the layers yields an identical set of solutions of a power-sociomatrix, which corresponds to the corners of the convex-hull created by the convex combination of the layer weight matrices. Nonetheless, they do not permit flexibility in the topology of the layer combination since they treat a graph layer as a whole by keeping all its edges in the layer combination or not. In our framework, on the other hand, the masking technique has the flexibility of selecting a particular set of edges from a layer to incorporate it in the layer combination.

Moreover, many studies have employed multiple graphs in order to represent the data emerging in multi-view domains and adapted the graph regularization framework to the multi-view domain in search of a consensus of the views [18, 28, 9, 15, 6]. Since most of those studies target the semi-supervised learning or clustering tasks, a low-rank representation of the data, which is common across the views, is sufficient. Lately, the authors in [14]

developed a Graph Neural Network scheme to conduct semi-supervised learning on data represented by multi-layer graphs, where they integrate the graph regularization approach to impose the smoothness of the label information at each graph layer. Similar to the aforementioned studies, this paper proposes a learning scheme for data emerging in multi-view domains. Yet, the main difference is that it specifically addresses a structure inference task which is achieved by the estimation of a graph underlying a set of observations/signals living on such a data space.

More recently, several graph regularization approaches have been proposed to learn a global or consensus graph from multi-view data for clustering [30, 31] and semi-supervised learning [20]. They employ multi-view features to obtain a unified graph structure. Particularly in [30, 31], the authors propose optimization problems, where single view graph representations are extracted first and then they are fused into a unified graph. In our optimization scheme, we also adopt a graph regularization approach to fit the signal representation model. However, the set of signals subject to the learning scheme does not belong to a specific view of the data but they are assumed to reside on an unknown global view that we aim at inferring. Furthermore, we obtain the global graph through a novel technique that combines the given graph layers by flexibly adapting to the structure implied by the signals.

The problem of learning a graph representation of the data has been addressed by various network topology inference methods. An important representative is the sparse inverse covariance estimation method via graphical lasso [12]. Later, Lake & Tenenbaum [19]  also adopted the inverse covariance estimation approach to infer a graph Laplacian matrix. Lately, many graph learning approaches exploited the notion of smoothness [11, 23]. An important property of natural signals represented on graphs is the fact that they change smoothly on their graph structure. A smooth signal generative model on graphs is introduced by Dong et al. [10], which we also adopt in our global structure inference problem in multi-layer settings. More recently, other generative models emerged from a diffusion process are studied by [25, 27], where they recover a network topology from the eigenbasis of a graph shift operator such as a graph Laplacian. Although many real world data is acquired in domains possessing multi-view features or complex relations, such kind of domain knowledge is not well exploited in the existing structure inference approaches. Unlike those, we feed the graph learning process by the guidance of the multi-layer graphs that encode the additional information given by the data domain. This brings certain advantages, especially when the signal representation quality is weak due to noisy data or not enough observations, where a graph learning problem is relatively ill-posed. In addition to learning a graph representation of the signals, our framework presents a semantic reasoning of multiple graph representations of the data space by learning how to combine them into the global structure of the signals.

Iii Mask Learning Algorithm

We propose a structure inference framework for a set of observations captured on a vertex space, which can be represented by multi-layer graphs. We treat the observations captured on such a vertex space as signals whose underlying structure is described by the hidden global graph. Our task is to discover the global graph by exploiting the information provided by the multi-layer graph representation and the signals.

Iii-a Multi-layer Graph Settings

Suppose that we have graph layers, each of which stores a single type of relation between the data samples. We introduce a weighted and undirected graph to represent the relations on layer-, for , where stands for the vertex set consisting of vertices shared by all the layers, and, and indicate the edge set and the symmetric weight matrix for layer-. A graph signal can be considered as a function that assigns a value to each vertex as . We denote the set of signals defined on the vertex space by a matrix , which consists of

signal vectors on its columns. The signals in

are assumed to be smooth on the unknown global graph, . The Laplacian matrix of the global graph is further given by , where is the global weight matrix. is the corresponding degree matrix that can be computed as

where is the column vector of ones and forms a diagonal matrix from the input vector elements. is a priori unknown but it belongs to a set of valid Laplacians that is composed of symmetric matrices with non-positive off-diagonal elements and zero row sum as

(1)

where is the column vector of zeros.

Iii-B Mask Combination of Layers

Adopting the multi-layer graph and signal representation model mentioned above, we cast the problem of learning the global graph as the problem of learning the proper combination of the graph layers. While each graph layer encodes a different type of relationship existing on the vertex space, the multiple graph layers might have some connections that are redundant or even irrelevant to the global graph structure. This requires occasional addition or removal of some edges from the layers while combining them into the global graph. For this purpose, we propose an original masking technique, which has the flexibility to properly integrate the relevant information from the layer topologies and to simultaneously adapt the global graph to the structure of the signals. We introduce the combination of layers as a masked sum of the weight matrices of the graph layers:

(2)

where represents the Hadamard (element-wise) product between two matrices: the weight matrix of layer-, which is denoted as , and the symmetric and non-negative mask matrix corresponding to layer-, . The mask matrices are stacked into a variable as , which is eventually optimized to infer the global graph structure. In general, the relations given in different layers may not have the same importance in the global graph. Hence, at an arbitrary edge between node- and node-, the proposed algorithm learns distinct mask elements for each layer, for instance at layer- and at layer-.

We finally define a function to compute the Laplacian matrix of the mask combination given by a set of mask matrices as follows:

(3)

Iii-C Problem Formulation

Our task now is to infer the global graph , on which the signal set has smooth variations. Hence, in the objective function, we employ the well-known graph regularizer , which measures the smoothness of the signal set on the global graph Laplacian . The optimization problem boils down to learning a set of mask matrices, . Within certain masking constraints, it captures the connections that are consistent with the structure of the signals from the multi-layer graph representation and yields a mask combination of the layers. In addition, we introduce a corrective term, , which makes a transition from the mask combination obtained from the given layers to the global graph that fits the observed signals within the smooth signal representation model. By summing it with the Laplacian of the mask combination, we express the global graph Laplacian as

which is the ultimate output of the algorithm. The Frobenius norm of permits to adjust the impact of the corrective Laplacian, , on the global graph. The overall optimization problem is finally expressed as follows:

(4)

where is a weight parameter. The last constraint on , the trace of the global graph Laplacian , fixes the volume of the global graph. It is set to be a non-zero value, i.e., , in order to avoid the trivial solution, i.e., null global graph. It can be considered as the normalization factor fixing the sum of all the edge weights in the global graph so that the relative importance of the edges can be interpreted properly. The mask matrices are then constrained to be symmetric and non-negative, which leads to a symmetric mask combination, . The global graph Laplacian, , is constrained to be a valid Laplacian. Consequently, is forced to be a symmetric matrix but it does not have to be a valid graph Laplacian matrix. In this regard, provides the possibility to make a subtraction from the mask combination as well as to add more weights on top of the mask combination.

We also put a constraint on the mask elements , and set the search space of the mask matrices to yield a unity sum. This establishes a dependency between the mask elements corresponding to the same edge at each layer so that the contribution of the layers at a particular connection between vertex- and vertex- is normalized. As a result of the unity sum constraint on the masks, the weight elements of the mask combination are confined into the weight range delivered by the layers as follows,

(5)

Such a restriction is actually important to keep the weight values of the global graph in a reasonable range, which is desired for the weight prediction task. Note that dismissing an arbitrary edge- from the mask combination is possible if

i.e., a connection is not defined between vertex- and vertex- in at least one of the layers.

The objective function in (4) is linear with respect to the mask matrices due to the first term, and it is quadratic with respect to the corrective Laplacian due to the second term. All the constraints are linear with respect to the optimization variables. Therefore, the problem is convex and it can be efficiently solved by quadratic programming.

Iii-D Discussion

For an -vertex data space, the algorithm solves variables for the corrective term, , and as many variables as the number of edges given by the layers for the masks. Yet, the constraints on the mask elements and the global graph Laplacian narrow down the search space considerably. Accordingly, the optimization problem solves optimization variables in the worst case.

In problem (4), we need to set two parameters: and . First, the parameter adjusts the impact of the corrective Laplacian, , on the global graph Laplacian, . As approaches infinity, there is a full penalty on , hence the problem (4) behaves as a constrained optimization problem where is null. In the other extreme case where , defines fully the global graph structure, which may cancel out all the edges on the mask combination, , and leave only the edges constituting the graph paths along which the signals are the smoothest. In this regard, the parameter can be set according to the reliability of the multi-layer graph representation and also the accuracy of the smooth signal representation model. A reliable multi-layer graph representation means that the edge set given by the layers is sufficient to infer the global graph topology, and the constraints on the mask elements fit the weight range of the global graph. In other words, the multi-layer graph representation is highly coherent with the global graph structure connotated by the signals, in which case, can be set to a high value. A reliable signal representation, on the other hand, implies the existence of a signal set composed of many clean signals that are sufficient to support the smooth signal representation model. In the case where the observed signals are more reliable than the given multi-layer graphs, must be set to a small value.

Second, the value of the parameter determines the volume of the global graph. In addition, it has a direct effect on the sparsity of the global graph. In practice, it can be chosen to ensure the desired sparsity level. When is very large, is solved as a null matrix, which indicates that the global graph is directly equal to the mask combination, i.e., . In that case, due to the relation in (5), has to be set in the range given by the layers, i.e.,

so that the problem in (4) has a solution. The lower limit corresponds to the topology composed of the common edges across the layers and the upper limit corresponds to the topology given by the union of the layers. In other words, by choosing a very large value for , one acknowledges the full reliability of the multi-layer graph representation. This pushes the global graph to have the topology and the weight range determined by the layers. Decreasing the parameter relaxes this restriction, which enlarges the solution space for the global graph.

Iv Experiments

We compare the global graph recovery performance of our method (ML) against some state-of-the-art graph learning algorithms. First, we compare the graph learning algorithm that we consider as baseline [10], which is referred to as GL-SigRep. To make a fair assessment, we compare our method to another version of GL-SigRep, where the graph learning algorithm is informed of the input layers by restricting its solution space to the set of edges given by the layers as below:

(6)

We refer to this method as GL-informed.

We also compare against the optimal convex combination of the layers. Inspired by the method for learning the convex combination of multiple graph Laplacians, which is introduced in [3], we obtain the following optimization problem:

(7)

where we learn the coefficients for the convex combination of the layer Laplacians, , to reach the global graph Laplacian . Throughout this section, the algorithm solving the problem (7) is referred to as GL-conv.

For the quantitative assessment of link prediction performance, we employ the following evaluation metrics:

Precision, Recall and F-score [22]. We also compute the mean squared error (MSE) of the inferred weight matrix for the assessment of weight prediction performance. We solve the problems ML (4), GL-informed (6), GL-SigRep [10] and GL-conv (7) via quadratic programming for which we utilize the CVX toolbox [13] with SDPT3 and MOSEK [2] solver and the code is available online111https://github.com/bayrameda/MaskLearning.

Iv-a Experiments on Synthetic Data

In this section, we run experiments on two different scenarios. First, we generate the global graph in a fully complementary scenario where the mask combination of the layers is directly equal to the global graph. Second, we test the algorithms on a non-fully complementary scenario where the global graph is created from a perturbation on the topology of the mask combination. For both cases, we generate the mask combination and the signal set as follows:
Generation of layers and the mask combination. First, the vertex space is established with

vertices whose coordinates are generated randomly on 2D unit square with a uniform distribution. An edge set is constructed between the vertices whose Euclidean distance is under a certain threshold. The edge weights are computed by applying a Gaussian kernel, i.e.,

, where is the distance between vertex- and vertex- and . For the generation of the layers, the vertex set is randomly separated into two neighborhood groups. All the edges connecting the vertices in one group to all vertices in the vertex space are reserved to construct one graph layer, which yields two layers in total. The weights of the edges are used for constructing the layer weight matrices. Only the in-group edges whose weights are above 0.8 are reserved for masking. Accordingly, they determine the corresponding non-zero entries in the mask matrices for each layer. All the common edges between the layers are also kept non-zero on the mask matrices. Then, the weight matrix of the mask combination is computed via the formulation given in (2). Later, the global graph is produced according to one of the experimental scenarios that will be explained in the following sections.
Signal Generation. Following the generation of the mask combination and the global graph, the global graph Laplacian matrix, , is computed. Using that, a number of smooth signals are generated according to the generative model introduced in [10]. Basically, the graph Fourier coefficients of a sample signal can be drawn from the following distribution;

(8)

where is the Moore-Penrose pseudo-inverse of

, which is set as the diagonal eigenvalue matrix of

. The eigenvalues, which are associated with the main frequencies of the graph, are sorted in the main diagonal of

in ascending order. Thus, the signal Fourier coefficients corresponding to the low-frequency components are selected from a normal distribution with a large variance while the variance of the coefficients decreases towards the high-frequency components. In other words, the signal is produced to have most of its energy in the low frequencies, which enforces smooth variations in the expected signal over the graph structure. A signal vector is then calculated from

through the inverse graph Fourier transform

[24].

Iv-A1 Fully Complementary Scenario

We first conduct experiments where the global graph is directly equal to the mask combination. We refer to this data generation setting as the fully-complementary scenario since the edge set of the global graph is fully covered by the union of the layers. We generate 50 smooth signals on the global graph. Its volume is normalized by the number of vertices, . GL-informed (6) already learns a graph with volume of , therefore, we set the parameter in ML as well. The volume of the graph learned by GL-conv (7) is also normalized to for a fair comparison of the MSE score. This experimental scenario (generation of fully complementary layers, global graph and signal set) is repeated 20 times and the performance metrics are averaged on these 20 instances. The findings are summarized in Table I.

Following the discussion on the selection of the parameter in Section III-D, while running ML, we choose it as a very large value, such as , which forces the corrective term to be a null matrix. Consequently, the global graph is inferred to be directly equal to the mask combination. Note that GL-conv yields a high difference between the recall and the precision rate since it either picks the edge set of a layer as a whole or not. Therefore, it is not able to realize an edge-specific selection, which leads to poor F-score compared to other methods. The global graph recovery performance of GL-informed is presented as a surrogate of GL-SigRep, since the solution for the global graph already lies in the edge set given by the layers in fully-complementary settings. The MSE score of ML and GL-conv is better than the one of GL-informed. This is due to the fact that ML and GL-conv have better guidance on the weight prediction task by confining the interval of weight values of the global graph to the interval introduced by the layers, which is expressed in (5) for ML.

precision recall F-score MSE
Global Graph Recovery ML 86.98% 90.79% 88.84% 1.6E-03
GL-informed 81.26% 88.91% 84.48% 2.6E-03
GL-conv 63.82% 100% 77.41% 2.1E-03
Mask
Recovery
ML 92.57% 94.88% 93.68% -
TABLE I: Global Graph Recovery and Mask Recovery Performances

Finally, ML achieves good rates on the mask recovery performance, which measures how correctly the algorithm selects the edges from each layer to form the mask combination.

Fig. 2: Ground truth global graph and the solution given by ML

Iv-A2 Non-fully complementary scenario

In this section, we test the algorithms in experiments where the data is generated with different levels of multi-layer representation quality and signal representation quality so that we analyze their effects on the global graph recovery performance. First, to create the global graph, we deviate from the exact mask combination by perturbing its topology to some degree. Basically, we randomly replace a set of edges existing on the mask combination outside the union of the graph layers. The degree of such a perturbation on the mask combination can be measured by a term called coverability, which is introduced in [21]. Coverability is the proportion of the global graph edges that are given by the union of the layers to all the edges on the global graph. Indeed, it measures how much the multi-layer graph representation covers the global graph and it is 1 when the global graph is fully covered by the layers, i.e., fully-complementary case studied above. The larger the number of edges perturbed on the topology of the mask combination, the more the global graph is diverted from multi-layer graph representation, which decreases the coverability. Consequently, the multi-layer representation quality drops. A demonstration is provided in Fig. 2 top row where the global graph is generated with coverability 0.7. Here, the set of edges outside the mask combination is shown in green. As seen in Fig. 2 bottom row, ML manages to predict some edges not given by the multi-layer graph representation owing to the contribution of the corrective term in (4).

Fig. 3: Performance of ML with different values vs coverability
Fig. 4: Performance of the algorithms vs coverability

Effect of multi-layer representation quality. Here, we test the performance of ML in non-complementary settings with different coverability and different values of . We conduct each experiment with signal sets composed of 50 signals that is generated on the global graph as explained before. We average the performance metrics on 20 experiments in Fig. 3. The following observations can be made: (i) When coverability has the lowest value (0.4), ML with has the best performance. (ii) When it has the highest value (1), which corresponds to fully complementary settings, ML with has the best performance. (iii) Whatever value is chosen for the parameter , the performance of ML gets better with increasing coverability. Considering these facts, choosing a smaller value for the parameter seems to be a good remedy for lower coverability settings. Yet, this degrades the performance slightly in the high coverability settings, which confirms the theoretical analysis given in III-D. Hence, if there is no prior knowledge on the reliability of the multi-layer graph representation or the signal representation, one may prefer to use small values for by compromising a small decay in the performance in the high reliable multi-layer graph representation. Moreover, the performance of ML improves as the global graph approaches the mask combination of the layers. This is simply because the algorithm bases the global graph on top of the mask combination, and any modification made on it by the corrective term is subject to an extra cost and thus limited. Therefore, ML with any value performs best when the mask combination is directly equal to the global graph, which is possible only in the fully complementary settings. Still, the corrective term improves the performance in the non-fully complementary settings. Given the plots in Fig. 3, an appropriate value for each coverability interval can further be found. For example, it can be chosen as for coverability , then until coverability , later until coverability and for coverability .

We now adopt these values for to present the performance of ML against the competitor algorithms in Fig. 4 by averaging on 20 different instances again. Beginning with the performance of GL-informed in Fig. 4, we see that its performance improves regularly with the raising coverability ratio, and it outperforms GL-SigRep for coverability . The coverability ratio is irrelevant for the performance of GL-SigRep since it receives no multi-layer guidance, hence the fluctuations can be disregarded as coverability changes. Nonetheless, its performance slightly drops in low coverability settings. This is because the edges of the global graph are rewired randomly outside the union of the layers , which renders the graph towards a random network. It is acknowledged in [10] that graph learning from smooth signals in random network structures has slightly lower performance than learning on regular networks. Still, in Fig. 4, the performance of GL-SigRep in black line should be considered as a reference since it is the least affected by the coverability. Furthermore, the trend of ML in blue line seems to be more resistant than GL-informed in low coverability settings, thanks to the corrective term. The performance of ML approaches GL-SigRep as coverability decreases since the multi-layer guidance diminishes. Yet, it manages to keep its F-score above GL-SigRep even where the coverability is low. The MSE of GL-conv follows a similar path with ML. Yet, ML achieves a lower MSE due to the flexibility in the edge selection process and the corrective term. The F-score of GL-conv, on the other hand, is inferior compared to the other methods since it simply merges the topology of the layers without an edge selection process.

Fig. 5: Performance of the algorithms vs number of signals
Fig. 6: Performance of the algorithms vs signal quality

Effect of signal representation quality. Here, we use a fixed coverability of 0.7 to generate the global graph and the parameter for ML is set to . We first evaluate the global graph recovery of the algorithms by generating different numbers of signals on the global graph. The findings are averaged on 20 different instances of this scenario and plotted in Fig. 5

. Then, we measure the performance of the algorithms on signal sets with different signal-to-noise ratio (SNR) values, which is given in Fig.

6. To do that, we generate a noise content from normal distribution at random with different variance values and add it to the signal set. As expected, all the methods but GL-conv achieve better performance as the number of signals increases, or, as the noise power drops. GL-conv, on the other hand, is the least affected by the changes in the number of signals. The strictness of the convex combination constraint permits to obtain a similar combination even when there are few signals or noisy signals. Yet, this further prevents enhancing its performance in the high signal representation quality conditions. For instance in Fig. 5, ML achieves a lower MSE than GL-conv when there is a high number of signals. Based on the plots in Fig. 4, it is already known that around coverability, ML achieves a good performance that is followed by GL-SigRep and GL-informed. This is also confirmed by the plots in Fig. 5 and 6. GL-SigRep is the method that is the most affected by the signal quality since it is not able to compensate the learning procedure for the lack of knowledge in the signal set. On the other hand, ML is resistant to the change in the signal quality, since it exploits the multi-layer guidance. In addition, ML permits flexibility in the learning scheme by adjusting the parameter according to the signal quality. For example, in Fig. 6, under dB SNR, we use , so that the learning process relies more on the multi-layer graph representation. Therefore, ML manages to perform better than the competitor algorithms in low SNR conditions.

Iv-B Learning from Meteorological Data

We now present experiments on real datasets and focus first on the meteorological data provided by Swiss Federal Office of Meteorology and Climatology (MeteoSwiss)222https://www.meteoswiss.admin.ch/home/climate/swiss-climate-in-detail/climate-normals/normal-values-per-measured-parameter.html. The dataset is a compilation of 17 types of measurements including temperature, snowfall, precipitation, humidity, sunshine duration, recorded in weather stations distributed over Switzerland. Monthly normals and yearly averages of the measurements calculated based on the time period 1981-2010 are available at 91 stations. For the stations, we are also provided geographical locations in GPS format and altitude values, i.e., meters above sea level. We use each type of measurement as a different set of observations to feed the graph learning framework. Our goal is to explain the similarity pattern for each type of measurement with the help of geographical location and altitude of the stations.
Multi-Layer Graph Representation. We construct a 2-layer graph representation where vertices are the stations, which are connected based on GPS proximity in one layer and based on altitude proximity in the other one. We construct the layers as unweighted graphs by inserting an edge between two stations that have Euclidean distance below a threshold, which is set to an edge sparsity level of . Consequently, each graph layer has approximately the same number of edges so that the edge selection process during mask learning is not biased by any layer. We normalize the adjacency matrices of the layers to fix the volume of the graph layers to the number of vertices, , which is also used as the value of the parameter in ML.

Iv-B1 Learning Masks from Different Set of Measurements

Fig. 7: Year average of temperature and precipitation
Fig. 8: Sparsity pattern of the layers and the masks with respect to year average of temperature

We test the mask learning algorithm on different types of observations separately. We use the monthly normal of the measurements as the signal set, which makes the number of signals . Here, the yearly averages are not used for graph learning, instead, they will be used for a qualitative and visual assessment of the learned graph. We assume that the similarity between the measurement patterns of two stations must be explained either by geographical proximity or elevation similarity. Due to this, we adjust ML to learn a global graph structure with the fully complementary assumption and thus we set . It is possible to interpret the significance of the geographical location proximity and the altitude proximity in the formation of each type of observation by examining the mask matrices inferred by ML.

Measurement GPS Altitude
Temperature 36% 64%
Snowfall (cm) 37% 63%
Humidity 51% 49%
Precipitation (mm) 52% 48%
Cloudy days 65% 35%
Sunshine (h) 54% 46%
TABLE II: Contribution of layers on the structure of different measurements

In Table II, the percentage of the connections that ML draws from the GPS and the altitude layer is given for different types of measurements that are used as signals. To begin with temperature, its structure seems to be highly coherent with the altitude similarity considering the percentage contribution of each layer. We further check the yearly temperature averages, which is shown in Fig. 7. According to that, Bern and Aadorf are the stations providing the most similar average. Indeed, an edge is inferred between them on the global structure of the temperature measurements, and it is extracted from the altitude layer where the two stations are connected within 14m elevation distance. The correlation between temperature measurements and altitude is also noted by the authors in [10]. Similar to temperature, snowfall is also anticipated to be highly correlated with the altitude of the stations. This is also what is derived by ML which draws more connections from the altitude layer than the GPS layer as given in Table II. The ‘cloudy days’ measurement, however, is found to be highly coherent with the GPS proximity by drawing of its connections from the GPS layer. Next, humidity, precipitation and sunshine are evenly correlated with both of the GPS and altitude layers, according to Table II. Given the yearly average of precipitation shown in Fig. 7, Geneva and Nyon have the closest records. As seen, they are also pretty close on the map and thus their connection on the global graph of precipitation is drawn from the GPS layer. In addition, Fey and Sion are the stations providing the lowest records on average, and their connection is also drawn from the GPS layer. On the other hand, Col du Grand-Saint-Bernard and Säntis display the highest records, and they are connected in the altitude layer with 30m elevation distance between them.

Furthermore, in Fig. 8, we visualize the layer adjacency matrices and the inferred mask matrices by sorting the vertices (stations) with respect to their yearly average temperature measurements. Recall from Table II that the altitude layer is found to be dominant for explaining similarities in temperature. This is also evident by the connectivity pattern of the layers, which is shown on the left of Fig. 8. The GPS layer connectivity is distributed broadly whereas the altitude layer connections are gathered around the main diagonal, which contains the edges between the vertices that are similar in yearly average. On the right of Fig. 8, we see that inferred mask matrices for both of the layers are organized along the diagonal. This indicates that the algorithm manages to dismiss the connections that are irrelevant to the similarity pattern of temperature, especially on the GPS layer.

Iv-B2 Signal Inpainting on the Global Graph

We now prepare a signal inpainting experiment to point out the benefits of learning a proper global graph representation. We consider the monthly normals of the temperature measurements as the signal set. The vertex set is composed of stations that are providing temperature measurements, i.e., . Then, a graph structure is inferred from those observations using GL-SigRep. In addition, by taking the multi-layer graph representation into account, a global graph structure is inferred using GL-informed, GL-conv and ML. During the graph learning process, we train the algorithms by the measurements on months and then try to infer the measurements of the remaining month via inpainting. In the inpainting task, we remove the values of the signal on half of the vertices selected randomly. Our aim is to recover the signal values on the whole vertex space by leveraging the known signal values and the learned graph. We solve the following graph signal inpainting problem [26]:

(9)

which has a closed form solution as:

(10)

where is the vector containing the known signal values by the algorithms, and is the vector that contains the recovered signal values on all the vertices. is a mapping matrix reducing to a vector whose entries correspond to the vertex set with the known signal values. Therefore, is a diagonal matrix whose non-zero entries correspond to this vertex set.

We repeat the graph learning and inpainting sequence on instances where the number of signals used in the graph learning part is and the inpainting is conducted on the values of a different month at each time. We calculate the MSE between the original signal vector and the recovered signal vector. In addition, we compute the mean absolute percentage error (MAPE), which measures the relative absolute error with respect to the original signal magnitudes. We average the performance metrics over instances for each algorithm used in the graph learning part, which is given in Table III.

MSE MAPE
GL-SigRep [10] 0.472 12.6%
GL-informed 0.375 13.2%
GL-conv 1.240 14.8%
ML 0.347 10.7%
TABLE III: Signal inpainting performance of the algorithms

During this experiment, we set for ML and we normalize the volume of the graph obtained by GL-conv to to provide a fair comparison. Based on the results, GL-conv performs poorly compared to other methods, which can be explained by its lack of adaptability to the given signal set. Recall that it finds a convex combination of the given graph layers in order to fit the smooth signals, which is not very flexible due to the tight search space. GL-SigRep, on the other hand, manages to outperform it by learning the structure directly from the signals. GL-Informed performs better than GL-SigRep in terms of MSE, which indicates that knowing the multi-layer graph representation brings certain advantages. By taking this advantage and coupling it with the flexibility in adapting to the signal set, ML leads to a better inpainting performance than the competitors both in terms of MSE and MAPE.

Iv-C Learning from Social Network Data

Finally, we test our algorithm on the social network dataset333http://deim.urv.cat/ alephsys/data.html provided by [21]. It consists of five kinds of relationship data among 62 employees of the Computer Science Department at Aarhus University (CS-AARHUS), including Facebook, leisure, work, co-authorship and lunch connections. For the experiment, we separate the people into two groups; the first group is composed of 32 people having a Facebook account, hence it forms the Facebook network. The second group contains any other person eating lunch with anyone in . The cardinality of is 26. We consider a binary matrix that stores the lunch records between groups and as the signal matrix. Our target task is a graph learning problem where we want to discover the lunch connections inside by looking at the lunch records between and . For the graph learning problem, we revive the “Friend of my friend is my friend” logic through the smoothness of the signal set. In other words, we assume that two people in having lunch with the same person in

will probably have lunch together. Then via the mask learning scheme, we exploit the Facebook and work connections among people in

. Hence, the inputs of the mask learning algorithm are (i) the multi-layer graph representation formed by the Facebook and work layers composing , which makes the number of vertices in the graph representation , and (ii) the signal set that consists of the lunch records taken on , which makes the number of signals . Then, the output is the lunch network of . The number of edges is 124 in Facebook layer and 68 in work layer. The coverability of the union of Facebook and work layers on the ground truth lunch network is since the lunch network has 10 connections that do not exist in any of the layers. The ground truth lunch network and the one inferred by ML are presented in Fig. 9 together with a color code for the layers.

Fig. 9: Performance of ML () on CS-AARHUS data

We compare the performance in terms of the retrieval of the lunch network for the following graph learning algorithms: ML, GL-informed, GL-SigRep and the power-sociomatrix that is introduced by [21]. The performance metrics given in Table IV

are calculated with respect to the ground truth lunch network and they measure only the link prediction performance since the networks are unweighted. In addition to the precision, recall and F-score, we use the Jaccard index in order to measure a type of similarity between the inferred graph and the ground truth graph. In

[21], the Jaccard index is computed for two networks to be compared by the proportion of their intersection to their union and it is 1 when the two have identical topology.

Jaccard Recall Precision F-score
power
sociomatrix
[21]
{FB} 35% 77% 39% 51%
{Work} 31% 50% 46% 48%
{FB,Work} 34% 84% 37% 51%
GL-SigRep [10] 48% 64% 66% 65%
GL-Informed 45% 63% 61% 62%
ML 58% 69% 79% 74%
TABLE IV: Performance of the methods in recovering the lunch network

Regarding the Jaccard index and the F-score, ML performs best at the recovery of the lunch network by exploiting the multi-layer representation and the signal set at the same time. With the power-sociomatrix, we obtain all possible combinations of the layers: (i) only the Facebook layer, which is referred to as {FB}, (ii) only the work layer, which is referred to as {Work}, and (iii) the union of the two layers, which is referred to as {FB, Work}. Note that the recall value stated for {FB, Work} also gives the coverability of the multi-layer graph representation, which is computed by dividing the number of lunch connections given by the Facebook or the work layer by the total number of lunch connections. The power-sociomatrix can achieve a limited F-score and Jaccard index since it depends on a simple merging of the two layers without an edge selection process. Then, despite the reasonable coverability rate, GL-informed can not reach the performance of GL-SigRep, which implies that the signal representation quality is better than the multi-layer representation quality to reach the global graph structure. Yet, when we repeat the experiment with signal sets with a lower number of signals, we observe that GL-informed outperforms GL-SigRep when the multi-layer graph representation becomes more informative than the signals. The related results are plotted in Fig. 10, where we train the algorithms with different numbers of signals, , at each experiment. Here, the signal set is randomly formed from the lunch records on with the corresponding , and the F-score is averaged over 10 such instances. For ML, we set when so that it depends more on the multi-layer graph representation to compensate for the lack of knowledge from the signal side. This permits ML to have the adaptability to different conditions and to outperform the competitor methods as seen in Fig. 10.

Fig. 10: Performance of the graph learning algorithms vs number of signals in lunch data

V Conclusion

In this paper, we have introduced a novel method to learn a global graph that explains the structure of a set of smooth signals using partial information provided by multi-layer graphs. In comparison to the state-of-the-art graph learning methods, the proposed algorithm accepts an additional input, which is the multi-layer graph representation of the data space. This permits to profit from additional domain knowledge in the learning procedure. Our new graph inference algorithm flexibly adjusts the learning procedure between the signal representation and the multi-layer graph representation model, which permits adapting to the quality of the input data. The algorithm further outputs the mask combination of the layers, which indicates the relative relevance of the multi-layer graphs in inferring the global structure of the signals.

The future work will focus on different signal representation models that can be defined by a multi-layer graph representation. Moreover, different masking techniques can also be developed considering some examples encountered in real-world data, such as node-independent or locally consistent masking of the graph layers.

References

  • [1] A. Aleta and Y. Moreno (2019) Multilayer networks in a nutshell. Annual Review of Condensed Matter Physics 10, pp. 45–62. Cited by: §I.
  • [2] M. ApS (2017) The mosek optimization toolbox for matlab manual. version 8.1.. External Links: Link Cited by: §IV.
  • [3] A. Argyriou, M. Herbster, and M. Pontil (2006) Combining graph Laplacians for semi–supervised learning. In Advances in Neural Information Processing Systems, pp. 67–74. Cited by: §II, §IV.
  • [4] B. Bentley, R. Branicky, C. L. Barnes, Y. L. Chew, E. Yemini, E. T. Bullmore, P. E. Vértes, and W. R. Schafer (2016) The multilayer connectome of caenorhabditis elegans. PLoS computational biology 12 (12), pp. e1005283. Cited by: §I.
  • [5] S. Boccaletti, G. Bianconi, R. Criado, C. I. Del Genio, J. Gómez-Gardenes, M. Romance, I. Sendina-Nadal, Z. Wang, and M. Zanin (2014) The structure and dynamics of multilayer networks. Physics Reports 544 (1), pp. 1–122. Cited by: §I.
  • [6] J. Chen, G. Wang, and G. B. Giannakis (2019) Multiview canonical correlation analysis over graphs. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2947–2951. Cited by: §II.
  • [7] E. Cozzo, G. F. de Arruda, F. A. Rodrigues, and Y. Moreno (2016) Multilayer networks: metrics and spectral properties. In Interconnected Networks, pp. 17–35. Cited by: §I.
  • [8] M. D. De (2017) Multilayer modeling and analysis of human brain networks.. GigaScience 6 (5), pp. 1–8. Cited by: §I.
  • [9] X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov (2013) Clustering on multi-layer graphs via subspace analysis on Grassmann manifolds. IEEE Transactions on Signal Processing 62 (4), pp. 905–918. Cited by: §I, §II.
  • [10] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst (2016) Learning Laplacian matrix in smooth graph signal representations. IEEE Transactions on Signal Processing 64 (23), pp. 6160–6173. Cited by: §II, §IV-A2, §IV-A, §IV-B1, TABLE III, TABLE IV, §IV, §IV.
  • [11] X. Dong, D. Thanou, M. Rabbat, and P. Frossard (2019) Learning graphs from data: a signal representation perspective. IEEE Signal Processing Magazine 36 (3), pp. 44–63. Cited by: §II.
  • [12] J. Friedman, T. Hastie, and R. Tibshirani (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 (3), pp. 432–441. Cited by: §II.
  • [13] M. Grant and S. Boyd (2014) CVX: matlab software for disciplined convex programming, version 2.1. Cited by: §IV.
  • [14] V. N. Ioannidis, A. G. Marques, and G. B. Giannakis (2019) A recurrent graph neural network for multi-relational data. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8157–8161. Cited by: §II.
  • [15] V. N. Ioannidis, P. A. Traganitis, Y. Shen, and G. B. Giannakis (2018) Kernel-based semi-supervised learning over multilayer graphs. In 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5. Cited by: §II.
  • [16] R. Khasanova, X. Dong, and P. Frossard (2016)

    Multi-modal image retrieval with random walk on multi-layer graphs

    .
    In 2016 IEEE International Symposium on Multimedia (ISM), pp. 1–6. Cited by: §I.
  • [17] M. Kivelä, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, and M. A. Porter (2014) Multilayer networks. Journal of Complex Networks 2 (3), pp. 203–271. Cited by: §I.
  • [18] A. Kumar, P. Rai, and H. Daume (2011)

    Co-regularized multi-view spectral clustering

    .
    In Advances in Neural Information Processing Systems, pp. 1413–1421. Cited by: §II.
  • [19] B. Lake and J. Tenenbaum (2010) Discovering structure by learning sparse graphs. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, Cited by: §II.
  • [20] S. Li, H. Liu, Z. Tao, and Y. Fu (2017) Multi-view graph learning with adaptive label propagation. In 2017 IEEE International Conference on Big Data (Big Data), pp. 110–115. Cited by: §II.
  • [21] M. Magnani, B. Micenkova, and L. Rossi (2013) Combinatorial analysis of multiple networks. arXiv preprint arXiv:1303.4986. Cited by: §I, §II, §IV-A2, §IV-C, §IV-C, TABLE IV.
  • [22] C. Manning, P. Raghavan, and H. Schütze (2010) Introduction to information retrieval. Natural Language Engineering 16 (1), pp. 100–103. Cited by: §IV.
  • [23] G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro (2019) Connecting the dots: identifying network structure via graph signal processing. IEEE Signal Processing Magazine 36 (3), pp. 16–43. Cited by: §II.
  • [24] A. Ortega, P. Frossard, J. Kovačević, J. M. Moura, and P. Vandergheynst (2018) Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE 106 (5), pp. 808–828. Cited by: §IV-A.
  • [25] B. Pasdeloup, V. Gripon, G. Mercier, D. Pastor, and M. G. Rabbat (2017) Characterization and inference of graph diffusion processes from observations of stationary signals. IEEE Transactions on Signal and Information Processing over Networks 4 (3), pp. 481–496. Cited by: §II.
  • [26] N. Perraudin and P. Vandergheynst (2017) Stationary signal processing on graphs. IEEE Transactions on Signal Processing 65 (13), pp. 3462–3477. Cited by: §IV-B2.
  • [27] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro (2017) Network topology inference from spectral templates. IEEE Transactions on Signal and Information Processing over Networks 3 (3), pp. 467–483. Cited by: §II.
  • [28] V. Sindhwani, P. Niyogi, and M. Belkin (2005) A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of ICML workshop on learning with multiple views, Vol. 2005, pp. 74–79. Cited by: §II.
  • [29] S. Wasserman and K. Faust (1994) Social network analysis: methods and applications. Vol. 8, Cambridge university press. Cited by: §I.
  • [30] K. Zhan, C. Zhang, J. Guan, and J. Wang (2017) Graph learning for multiview clustering. IEEE Transactions on Cybernetics (99), pp. 1–9. Cited by: §II.
  • [31] W. Zhou, H. Wang, and Y. Yang (2019) Consensus graph learning for incomplete multi-view clustering. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 529–540. Cited by: §II.