Exchange-Based Diffusion in Hb-Graphs: Highlighting Complex Relationships

09/01/2018 ∙ by Xavier Ouvrard, et al. ∙ University of Geneva CERN 0

Most networks tend to show complex and multiple relationships between entities. Networks are usually modeled by graphs or hypergraphs; nonetheless a given entity can occur many times in a relationship: this brings the need to deal with multisets instead of sets or simple edges. Diffusion processes are useful to highlight interesting parts of a network: they usually start with a stroke at one vertex and diffuse throughout the network to reach a uniform distribution. Several iterations of the process are required prior to reaching a stable solution. We propose an alternative solution to highlighting main components of a network using a diffusion process based on exchanges; it is an iterative two-phase step exchange process. This process allows to evaluate the importance not only at the vertices level but also at the regrouping level. To model the diffusion process, we extend the concept of hypergraphs that are family of sets to family of multisets, that we call hb-graphs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many relationships are more than pairwise relations: entities are often grouped into sets, corresponding to -adic relationships. Each of these sets can be viewed as a collaboration between entities. Hypergraphs naturally represent -adic relations. It has been shown that facets of an information space can be modeled by hypergraphs [1]: each facet corresponds to a type of metadata. The different facets are then linked by reference data attached to hyperedges within that face. The step forward is to highlight important information: it is commonly achieved in hypergraphs using random walks [2, 3]. Reference [3] shows that the weighting of vertices at the level of the hyperedges in a hypergraph allows better information retrieval. These two approaches - [2, 3] - mainly focus on vertices; but as hyperedges are linked to references that can be used as pivots in between the different facets [4, 1], it is also interesting to highlight important hyperedges. For instance, in a document database, different metadata can be used to label authors, author keywords, processed keywords, categories, added tags: the pivots between the different facets of this information space correspond to the documents themselves. In the specific case of tags, it can be important to have weights attached to them if the users are allowed to attach tags to documents.

Hyperedge-based weighting of vertices is easier to achieve through multisets: multisets store information on multiplicity of elements. We use multisets family over a set of vertices, called hyper-bag graph - hb-graph for short - as an extension of hypergraphs. Hb-graph multisets play the role of the hyperedges in hypergraph: they are called hb-edges. We want to answer the following research questions: “Can we find a network model and a diffusion process that not only rank vertices but also rank hb-edges in hb-graphs?”. We develop an iterative exchange approach in hb-graphs with two-phase steps that allows to extract information not only at the vertex level but also at the hb-edge level.

We validate our approach by using randomly generated hb-graphs. The hb-graph visualisation highlights not only vertices but also hb-edges using the exchange process. We show that the exchange-based diffusion process allows proper coloring of vertices with high connectivity and highlights hb-edges with a normalisation approach - allowing small hb-edges to have a chance to be highlighted.

This paper contributes to present an exchange-based diffusion process that allows not only the ranking of vertices but also of hb-edges. It formalizes exchanges by using hb-graphs that can naturally cope for elements multiplicity. It contributes also to a novel visualisation of this kind of network included in each facet of the information space.

In Section II, the related art is listed and the mathematical background is given in Section III. The construction of the formalisation of the exchange process is presented in Section IV. Results and evaluation are given in Section V and future work and conclusion are addressed in Section VI.

Ii Related work

A hypergraph on a finite set of vertices (or vertices) is defined in [5] as a family of hyperedges where each hyperedge is a non-empty subset of and such that . A hypergraph is said edge-weighted if there exists an application .

In a weighted hypergraph the degree of a vertex is defined as:

The volume of is defined as:

The incident matrix of a hypergraph is the matrix of , where .

Random walks are largely used to evaluate the importance of vertices. In [2], a random walk on a hypergraph is defined as choosing a hyperedge

with a probability proportional to

; in a given hyperedge a vertex is uniformly randomly chosen. The probability transition from a vertex to a vertex is:

where is the degree of a hyperedge defined in [2] as its cardinality. This random walk has a stationary state which is shown to be for [6]. The process differs from the one we propose: our diffusion process is done by successive steps from a random initial vertex on vertices and hyperedges.

Reference [3]

defines a random walk to allow the use of weighted hypergraph with weight functions both on hyperedges and on vertices: a vector of weights is built for each vertex making weights of vertices hyperedge-based; a random walk similar to the one above is built that takes into account the weight of the vertices. The evaluation is done using a hypergraph built from a public dataset of computer science conference proceedings; each document is seen as a hyperedge that contains keywords; hyperedges are weighted by citation score and vertices of a hyperedge are weighted with a tf-idf score. Reference

[3] shows that a random walk on the (double-) weighted hypergraph enables vertex ranking with higher precision than with unweighted vertices random walk. The process differs again from our proposal: our process not only enables simultaneous alternative updates of vertices and hb-edges values but also allow the ranking of hb-edges. We also introduce a new theoretical framework to achieve this.

Random walks are related to diffusion processes. [7] use random walks in hypergraph to do image matching. [8] builds higher order random walks in hypergraph and constructs a generalised Laplacian attached to graphs generated by their random walks.

Hypergraphs are used in multi-feature indexing to help the retrieval of images [9]. For each image a hyperedge gathers the first

most similar images based on different features. Hyperedges are weighted by average similarity. A spectral clustering algorithm is applied to divide the dataset into

sub-hypergraphs. A random walk on these sub-hypergraphs allows to retrieve significant images: they are used to build a new inverted index, useful to query images.

Iii Mathematical background

Iii-a Multisets

Our definitions on multisets are mainly based on [10]. A multiset - or mset or bag - is a pair where is a set of distinct objects and is an application from to or . is called the universe of the multiset , is called the multiplicity function of the multiset . is called the support of . The elements of the support of an mset are called its generators. A multiset where is called a natural multiset. The m-cardinality of written is defined as:

Considering and two msets on the same universe , we define the empty mset, written the set of empty support. is said to be included in - written - if for all : . In this case, is called a submset of . The power multiset of , written , is the multiset of all submsets of Different operations can be defined on multisets of same universe as union, intersection and sum [10].

Iii-B Hb-graphs

Hb-graphs are introduced in [ouvrard2018adjacencypart2]. A hb-graph is a family of multisets with same universe and support a subset of . The msets are called the hb-edges and the elements of the vertices. We consider for the remainder of the article a hb-graph , with and the family of its hb-edges.

Each hb-edge has as universe and a multiplicity function associated to it: where . For a general hb-graph, each hb-edge has to be seen as a weighted system of vertices, where the weights of each vertex are hb-edge dependent.

A hb-graph where the multiplicity range of each hb-edge is a subset of is called a natural hb-graph. A hypergraph is a natural hb-graph where the hb-edges have multiplicity one for every vertex of their support.

The order of a hb-graph - written - is:

The support hypergraph of a hb-graph is the hypergraph whose vertices are the ones of the hb-graph and whose hyperedges are the support of the hb-edges in a one-to-one way. We write it , where .

The hb-star of a vertex is the multiset - written - defined as:

The m-degree of a vertex of a hb-graph - written - is defined as:

The degree of a vertex of a hb-graph - written - corresponds to the degree of this vertex in the support hypergraph

The matrix is called the incident matrix of the hb-graph .

A weighted hb-graph is a hb-graph where the hb-edges are weighted by . An unweighted hb-graph is then a weighted hb-graph with for all .

A strict m-path in a hb-graph from a vertex to a vertex is a vertex / hb-edge alternation with hb-edges to and vertices to such that , , and and that for all , .

A strict m-path in a hb-graph corresponds to a unique path in the hb-graph support hypergraph called the support path. In this article we abusively call it a path of the hb-graph. The length of a path corresponds to the number of hb-edges it is going through.

Representation of hb-graphs can be achieved either by using sub-mset representation or by using edge representation. In this article we use the extra-vertex representation of the support hypergraph of the hb-graph: an extra-vertex is added for each hb-edge. Each hb-edge is represented by enabling a link in between each vertex of the hb-edge support and the hb-edge extra-vertex.

Iv Exchange-based diffusion in hb-graphs

vertices to hb-edges

hb-edges to vertices

vertices

at

reaches

reaches

hb-edges

Figure 1: Diffusion by exchange: principle

In diffusion processes and random walks an initial vertex is chosen. The diffusion process leads to homogenising the information over the structure. Random walks in hypergraphs rank vertices by the number of times they are reached and this ranking is related to the structure of the network itself. Several random walks with random choices of the starting vertex can be needed to achieve ranking by averaging.

We consider a weighted hb-graph with and ; we write the incident matrix of the hb-graph.

At time we set a distribution of values over the vertex set:

and a distribution of values over the hb-edge set:

is the row state vector of the vertices at time and is the row state vector of the hb-edges.

We consider an iterative process with two-phase steps as illustrated in Figure 1. At every time step: the first phase starts at time and ends at followed by the second phase between time and .

The initialisation sets for every vertex and for every hb-edge .

During the first phase between time and , each vertex of the hb-graph shares the value it holds at time with the hb-edges it is connected to.

In an unweighted hb-graph, the fraction of given by of m-degree to each hb-edge is , which corresponds to the ratio of multiplicity of the vertex due to the hb-edge over the total -degree of hb-edges that contains in their support.

In a weighted hb-graph, each hb-edge has a weight . The value of a vertex has to be shared by taking not only the multiplicity of the vertices in the hb-edge but also the weight of a hb-edge into account.

The weights of the hb-edges are stored in a column vector . We also consider the weight diagonal matrix .

We introduce the weighted -degree matrix:

where is called the weighted -degree of the vertex . It is:

The contribution to the value attached to hb-edge of weight from vertex is:

It corresponds to the ratio of weighted multiplicity of the vertex in over the total weighted -degree of the hb-edges where is in the support.

And the value is calculated by summing over the vertex set:

Hence, we obtain:

(1)

During the second phase that starts at time , the hb-edges share their values between the vertices they hold taking into account the multiplicity of the vertices in the hb-edge. Every value is modulated by the weight of the hb-edge it comes from.

The contribution to given by a hb-edge of weight to the vertex of multiplicity is:

The value is then obtained by:

By writing for and writing the diagonal matrix of size , it comes:

(2)

Gathering 1 and 2:

(3)

It is valuable to keep a trace of the intermediate state as it records the importance of the hb-edges.

Writing , it follows from 3:

(4)

V Results and evaluation

Figure 2: Exchange-based diffusion in hb-graphs: highlighting important hb-edges. Simulation with 548 vertices (chosen randomly out of 10 000) gathered in 5 groups of vertices (with 6, 16, 12, 18 and 2 important vertices and 2 important vertex per hb-edge), 300 hb-edges (with cardinality of support less or equal to 15), 10 vertices in between the 5 groups. Extra-vertices are colored in green and have square shape.
Figure 3: Path maximum length and percentage of vertices in over vertices in vs ratio
Figure 4: Path maximum length and percentage of vertices in vs ratio.

This diffusion by exchange process has been validated with two experiments: the first experiment generates a random hb-graph to validate our approach and the second compares the results to a classical random walk on the hb-graph.

We built a random unweighted hb-graph generator. The generator allows to construct single-component hb-graphs or hb-graphs with multiple connected components. A single connected component is built by choosing the number of intermediate vertices that link the different components to ensure that a single component hb-graph is obtained. We generate vertices. We start by building each component and then interconnect them. Let be the number of components. A first set of interconnected vertices is built by choosing vertices out of the . The remaining vertices are then separated into groups. In each of these groups we generate two groups of vertices: a first set of vertices and a second set of vertices with , . The number of hb-edges to be built is adjustable and shared between the different groups. The m-cardinality of a hb-edge is chosen randomly below a maximum tunable threshold. The -vertices are considered as important vertices and must be present in a certain amount of hb-edges per group; the number of important vertices in a hb-edge is randomly fixed below a maximum number. The completion of the hb-edge is done by choosing randomly vertices in the set. The random choice made into this two groups is tuned to follow a power law distribution: it implies that some vertices occur more often than others. Interconnection between the components is achieved by choosing vertices in and inserting them randomly into the hb-edges built.

We apply our diffusion process on these generated hb-graphs: after a few iterations we visualize the hb-graphs to show the evolution of the vertex value with a gradient coloring scale. We also take advantage of the half-step to highlight hb-edges in the background to show important hb-edges with an other gradient coloring scale.

To get proper evaluation and show that vertices with the highest -values correspond to vertices that are important in the network - in the way they are central for the connectivity - we compute the eccentricity of vertices from a subset of the vertices to the remaining of the vertices. Eccentricity of a vertex in a graph is the length of a maximal shortest path between this vertex and the other vertices of the graph: extending this definition to hb-graphs is straightforward. If the graph is disconnected then each vertex has infinite eccentricity.

For the purpose of evaluation, in this article, we define a relative eccentricity as the length of a maximal shortest path starting from a given vertex in and ending with any vertices of ; the relative eccentricity is calculated for each vertex of provided that it is connected to vertices of ; otherwise it is set to .

For the vertices set , the subset is built by using a threshold value : vertices with value above this threshold are gathered into a subset of . We consider the set of vertices with values below the threshold. We evaluate the relative eccentricity of each vertex of to vertices of in the support hypergraph of the corresponding hb-graph.

Assuming that we stop iterating at time , we let vary from 0 to the value - obtained by iterating the algorithm on the hb-graph - by incremental steps and until the eccentricity is kept above 0, first of the two achieved. In order to have a ratio we calculate:

where is the reference normalised value, defined as for the hb-graph . This ratio has values increasing by steps from 0 to .

We show the results obtained in Figure 4: we plot two curves. The first plot corresponds to the maximal length of the path between vertices of and vertices of that are connected in function of the value of : the length of the path corresponds to the half of the length of the path observed in the extra-vertex graph representation of the hb-graph support hypergraph as in between two vertices of there is an extra-vertex that represent the hb-edge (or the support hyperedge). The second curve plots the percentage of vertices that are in over the vertex set in function of . When increases the number of elements in naturally decreases while they are closer to the elements of , marking the fact that they are central.

Figure 5 shows that high values of correspond to vertices that are highly connected either by degree or by m-degree. Hence vertices that are in the positive side of the scale color in Figure 4 correspond to highly connected vertices: the closer to red on the right scale they are, the higher the value of is.

Figure 5: Alpha value of vertices at step 5 and (m-)degree of vertices.

A similar approach is taken for the hb-edges: assuming that the diffusion process stops at time , we use the function to partition the set of hb-edges into two subsets for a given threshold : of the hb-edges that have values above the threshold and the one gathering hb-edges that have values below .

varies from 0 to by incremental steps while keeping the eccentricity is kept above 0, first of the two conditions achieved. In the hb-graph representation, each hb-edge corresponds to an extra-vertex. Each time we evaluate the length of the maximal shortest path linking one vertex of to one vertex of that are connected in the hb-graph support hypergraph extra-vertex graph representation: the length of the path corresponds to the half of the one obtained from the graph for the same reason than before. In Figure 4 we observe for the hb-edges the same trend than the one observed for vertices: the length of the maximal path between two hb-edges decreases as the ratio increases while the percentage of vertices in over decreases.

Figure 6 shows on the left figure the high correlation between the value of and the cardinality of ; the right figure shows that the correlation between value of and the m-cardinality of is even stronger.

Figure 6: Epsilon value of hb-edge at stage 4+ and (m-)cardinality of hb-edge.

The results obtained after five iterations on hb-graphs with different configurations show that we always retrieve the important vertices as the most highlighted. The diffusion by exchange process also highlights additional vertices that were not in the first group but that are at the confluence of different hb-edges. The results on the hb-edges show that the value obtained is highly correlated to the m-cardinality of the hyperedges. To color the hb-edges as it is done in Figure 4 we calculate the ratio , where corresponds to the value obtained from the vertices of the hb-edge support by giving to each of them the reference value. Hb-edges are colored using , the higher its value, the closer to red the color is: we use the left gradient color bar for it.

Figure 7: Comparison of the rank obtained by 100 random walks after total discovery of the vertices in the hb-graph and rank obtained in the exchange-based diffusion process.
Figure 8: Comparison of the rank obtained by 100 random walks after total discovery of the vertices in the hb-graph and (m-)degree of vertices

We have generated random walks on the hb-graphs with random choice of hb-edges when the walker is on a vertex with a distribution of probability and a random choice of the vertex when the walker is on a hb-edge with a distribution of probability We let the possibility of teleportation to an other vertex from a vertex with a tunable value : represents the probability to be teleported. We choose . We count the number of passage of the walker through each vertex and each hb-edge. We stop the random walk when the hb-graph is fully explored. We iterate times the random walk, varying.

Figure 7 shows that after 100 iterations there is weak correlation between the rank obtained by the random walk and our diffusion process. There is no correlation at all with the m-degree of the vertices and the degree of vertices as shown in Figure 8. 100 iterations for the random walk take 6.31 s while it takes 0.009 ms to achieve the 5 iterations needed in the exchange-based approach.

Vi Future work and Conclusion

The results obtained by using hb-graph highlight the possibility of using hb-edges for analyzing networks; they confirm that vertices are highlighted due to their connectivity. The highlighting of the hb-edges has been achieved by using the intermediate step of our diffusion process: to achieve it conveniently without having a ranking by hb-edge m-cardinality we normalized it. Different applications can be thought in particular in the search of tagged multimedia documents: sharing of keywords, geographic location, or any valuable information contained in the annotations. Using tagged documents ranking by this means could help in creating summary for visualisation. Our approach is seen as a strong basis to refine the approach of [9].

Acknowledgments

This work is part of the PhD of Xavier OUVRARD, done at UniGe, supervised by Stéphane MARCHAND-MAILLET and founded by a doctoral position at CERN, in Collaboration Spotting team, supervised by Jean-Marie LE GOFF. A special thanks to André RATTINGER for the exchanges we have daily on our respective PhD.

References

  • [1] X. Ouvrard, J.-M. Le Goff, and S. Marchand-Maillet, “A hypergraph based framework for modelisation and visualisation of high dimension multi-facetted data,” Soon on Arxiv, 2018.
  • [2] D. Zhou, J. Huang, and B. Schölkopf, “Learning with hypergraphs: Clustering, classification, and embedding,” in Advances in neural information processing systems, pp. 1601–1608, 2007.
  • [3] A. Bellaachia and M. Al-Dhelaan, “Random walks in hypergraph,” in Proceedings of the 2013 International Conference on Applied Mathematics and Computational Methods, Venice Italy, pp. 187–194, 2013.
  • [4] M. Dörk, N. H. Riche, G. Ramos, and S. Dumais, “Pivotpaths: Strolling through faceted information spaces,” IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 12, pp. 2709–2718, 2012.
  • [5] C. Berge and E. Minieka, Graphs and hypergraphs, vol. 7. North-Holland publishing company Amsterdam, 1973.
  • [6] A. Ducournau and A. Bretto, “Random walks in directed hypergraphs and application to semi-supervised image segmentation,” Computer Vision and Image Understanding, vol. 120, pp. 91–102, 2014.
  • [7] J. Lee, M. Cho, and K. M. Lee, “Hyper-graph matching via reweighted random walks,” in

    Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on

    , pp. 1633–1640, IEEE, 2011.
  • [8] L. Lu and X. Peng, “High-ordered random walks and generalized laplacians on hypergraphs.,” in WAW, pp. 14–25, Springer, 2011.
  • [9]

    Z. Xu, J. Du, L. Ye, and D. Fan, “Multi-feature indexing for image retrieval based on hypergraph,” in

    Cloud Computing and Intelligence Systems (CCIS), 2016 4th International Conference on, pp. 494–500, IEEE, 2016.
  • [10] D. Singh, A. Ibrahim, T. Yohanna, and J. Singh, “An overview of the applications of multisets,” Novi Sad Journal of Mathematics, vol. 37, no. 3, pp. 73–92, 2007.