Tuning Ranking in Co-occurrence Networks with General Biased Exchange-based Diffusion on Hyper-bag-graphs

03/16/2020 ∙ by Xavier Ouvrard, et al. ∙ CERN 0

Co-occurence networks can be adequately modeled by hyper-bag-graphs (hb-graphs for short). A hb-graph is a family of multisets having same universe, called the vertex set. An efficient exchange-based diffusion scheme has been previously proposed that allows the ranking of both vertices and hb-edges. In this article, we extend this scheme to allow biases of different kinds and explore their effect on the different rankings obtained. The biases enhance the emphasize on some particular aspects of the network.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 8

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Co-occurrence network can be modeled efficiently by using hyper-bag-graphs (hb-graphs for short) introduced in [1]. Depending on the information the co-occurrence network carries, the ranking of the information hold by the associated hb-graph has to be performed on different features, and the importance stressed on the lower, higher or medium values. Hence, the necessity of extending the exchange-based diffusion that is already coupled to a biased random walk given in [2] to a more general approach using biases. We start by giving the background in Section 2. We then propose a framework to achieve such a kind of diffusion in Section 3 and evaluate it in Section 4, before concluding in Section 5.

2 Mathematical Background and Related Work

A hb-graph is a family of multisets of same universe The elements of are called the hb-edges; each hb-edge is a multiset of universe and of multiplicity function: The m-cardinality of a hb-edge is: For more information on hb-graphs, the interested reader can refer to [3] for a full introduction. A weighted hb-graph has hb-edges having a weight given by:

In [4], the authors introduce an abstract information function

which is associated to a probability for each vertex

In [5], a bias is introduced in the transition probability of a random walk in order to explore communities in a network. The bias is either related to a vertex property such as the degree or to an edge property such as the edge multiplicity or the shortest path betweenness. For a vertex, the new transition probability between vertex and is given by: where is the adjacency matrix of the graph and is a parameter.

A same kind of bias, can be used related to the edges and can be combined to the former to have the overall transition probability from one vertex to another.

3 Biased Diffusion in Hb-graphs

We consider a weighted hb-graph with and we write the incidence matrix of the hb-graph.

3.1 Abstract Information Functions and Bias

We consider a hb-edge based vertex abstract information function: The exchange-based diffusion presented in [6, 2] is a particular example of biased diffusion, where the biases are given in Table 1. An unbiased diffusion would be to have a vertex abstract function and a hb-edge vertex function that is put to 1 for every vertices and hb-edges, i.e. equi-probability for every vertices and every hb-edges.

Hb-edge based vertex abstract information function
Vertex abstract information function
Vertex bias function
Vertex overall bias
Vertex-based hb-edge abstract information function
Hb-edge abstract information function
Hb-edge bias function
Hb-edge overall bias
Table 1: Features used in the exchange-based diffusion in [6].

The vertex abstract information function is defined as the function: such that: The probability corresponding to this hb-edge based vertex abstract information as: If we now consider a vertex bias function: applied to we can define a biased probability on the transition from vertices to hb-edges as:

where , the vertex overall bias, is defined as:

Typical choices for are: or When , higher values of are encouraged, and on the contrary, when smaller values of are encouraged.

Similarly,the vertex-based hb-edge abstract information function is defined as the function: The hb-edge abstract information function is defined as the function: such that: The probability corresponding to the vertex-based hb-edge abstract information is defined as: Considering a vertex bias function: applied to a biased probability on the transition from hb-edges to vertices is defined as:

where the hb-edge overall bias is defined as:

Typical choices for are: or When , higher values of are encouraged, and on the contrary, when smaller values of are encouraged.

3.2 Biased Diffusion by Exchange

A two-phase step diffusion by exchange is now considered—with a similar approach to [6, 2]—, taking into account the biased probabilities on vertices and hb-edges.

The vertices hold an information value at time given by:

The hb-edges hold an information value at time given by:

We write

the row state vector of the vertices at time

and the row state vector of the hb-edges. We call information value of the vertices, the value: and the one of the hb-edges. We write:

The initialisation is done such that At the diffusion process start, the vertices concentrate uniformly and exclusively all the information value. Writing we set for all and for all

At every time step, the first phase starts at time and ends at where values held by the vertices are shared completely to the hb-edges, followed by the second phase between time and , where the exchanges take place the other way round. The exchanges between vertices and hb-edges aim at being conservative on the global value of and distributed over the hb-graph.

During the first phase between time and time , the contribution to the value from the vertex is given by:

and:

We have:

Claim 1 (No information on vertices at ).

It holds:

Proof.

For all

Claim 2 (Conservation of the information of the hb-graph at ).

It holds:

Proof.

We have:

We introduce the vertex overall bias matrix: and the biased vertex-feature matrix: It holds:

(1)

During the second phase that starts at time , the values held by the hb-edges are transferred to the vertices. The contribution to given by a hb-edge is proportional to in a factor corresponding to the biased probability

Hence, we have: and:

Claim 3 (The hb-edges have no value at ).

It holds:

Proof.

Similar to the one of the first phase for

Claim 4 (Conservation of the information of the hb-graph at ).

It holds:

Proof.

Similar to the one for the first phase.

We now introduce the diagonal matrix of size and the biased hb-edge-feature matrix: it comes:

(2)

Regrouping (1) and (2):

(3)

It is valuable to keep a trace of the intermediate state: as it records the information on hb-edges.

Writing , it follows from 3:

Claim 5 (Stochastic transition matrix).

is a square row stochastic matrix of dimension

Proof.

Let: and: and are non-negative rectangular matrices. Moreover:

  • and:

  • and:

We have: where:

It yields:

Hence is a non-negative square matrix with its row sums all equal to 1: it is a row stochastic matrix.

Claim 6 (Properties of T).

Assuming that the hb-graph is connected, the biased feature exchange-based diffusion matrix is aperiodic and irreducible.

Proof.

This stochastic matrix is aperiodic, due to the fact that any vertex of the hb-graph retrieves a part of the value it has given to the hb-edge, hence for all . Moreover, as the hb-graph is connected, the matrix is irreducible as any state can be joined from any other state.

The fact that is a stochastic matrix aperiodic and irreducible for a connected hb-graph ensures that converges to a stationary state which is the probability vector

associated to the eigenvalue 1 of

. Nonetheless, due to the presence of the different functions for vertices and hb-edges, the simplifications do not occur anymore as in [6, 2] and thus we do not have an explicit expression for the stationary state vector of the vertices.

The same occurs for the expression of the hb-edge stationary state vector which is still calculated from using the following formula:

4 Results and Evaluation

We consider different biases on a randomly generated hb-graph using still the same features that in the exchange-based diffusion realized in [6, 2]. We generate hb-graphs with 200 collaborations—built out of 10,000 potential vertices—with a maximum m-cardinality of 20, such that the hb-graph has five groups that are generated with two of the vertices chosen out of a group of 10, that have to occur in each of the collaboration; there are 20 vertices that have to stand as central vertices, i.e. that ensures the connectivity in between the different groups of the hb-graph.

The approach is similar to the one taken in [6, 2], using the same hb-edge based vertex abstract information function and the same vertex-based hb-edge abstract information function, but putting different biases as it is presented in Table 2.

Experiment 1 2 3 4 5
Vertex bias function
Hb-edge bias function
Experiment 6 7 8 9
Vertex bias function
Hb-edge bias function
Experiment 10 11 12 13 14 15
Vertex bias function
Hb-edge bias function
Table 2: Biases used during the 15 experiments.

We compare the rankings obtained on vertices and hb-edges after 200 iterations of the exchange-based diffusion using the strict and large Kendall tau correlation coefficients for the different biases proposed in Table 2. We present the results as a visualisation of correlation matrices in Figure 1 and in Figure 2, lines and columns of these matrices being ordered by the experiment index presented in Table 2.

We write the ranking obtained with Experiment biases for indicating whether the ranking is performed on vertices or hb-edges—the absence of means that it works for both rankings. The ranking obtained by Experiment 1 is called the reference ranking.

In Experiments 2 to 5, the same bias is applied to both vertices and hb-edges. In Experiments 2 and 3, the biases are increasing functions on while in Experiments 4 and 5, they are decreasing functions.

Experiments 2 and 3 lead to rankings that are well correlated with the reference ranking given the large Kendall tau correlation coefficient value. The higher value of compared to the one of marks the fact that the rankings with pair of similar biases agree with the ties in this case. The exponential bias yields to a ranking that is more granular in the tail for vertices, and reshuffles the way the hb-edges are ranked; similar observations can be done for both the vertex and hb-edge rankings in Experiments 2 and 3.

In Experiments 4 and 5, the rankings remain well correlated with the reference ranking but the large Kendall tau correlation coefficient values show that there is much less agreement on the ties, but it is very punctual in the rankings, with again more discrimination with an exponential bias. This slight changes imply a reshuffling of the hb-edge rankings in both cases, significantly emphasized by the exponential form.

None of these simultaneous pairs of biases reshuffle very differently the rankings obtained in the head of the rankings of vertices, but most of them have implications on the head of the rankings of the hb-edges: typical examples are given in Figure 3

. It would need further investigations using the Jaccard index.

Dissimilarities in rankings occur when the bias is applied only to vertices or to hb-edges. The strict Kendall tau correlation coefficients between the rankings obtained when applying the bias of Experiments 6 to 9—bias on vertices—and 10 to 13—bias on hb-edges—and the reference ranking for the vertices show weak consistency for vertices with values around 0.4—Figure 1 (a)—, while the large Kendall tau correlation coefficient values show a small disagreement with values around -0.1—Figure 1 (b). For hb-edges, the gap is much less between the strict—values around 0.7 as shown in Figure 2 (a)—and large Kendall tau correlation coefficient values—with values around 0.6 as shown in Figure 2 (b).

Biases with same monotony variations— and on the one hand and and on the other hand—have similar effects independently of their application to vertices xor to hb-edges. It is also worth to remark that increasing biases lead to rankings that have no specific agreement or disagreement with rankings of decreasing biases—as it is shown with and for

We remark also that increasing biases applied only to vertices correlate with the corresponding decreasing biases applied only to hb-edges, and vice-versa. This is the case for Experiments 6 and 12, Experiments 7 and 13, Experiments 8 and 10, and Experiments 9 and 11 for both vertices—Figures 1 (a) and (b)—and hb-edges—Figures 2 (a) and (b).

Finally, we conduct two more experiments—Experiments 14 and 15—combining the biases and in two different manners. With no surprise, they reinforce the disagreement with the reference ranking both on vertices and hb-edges, with a stronger disagreement when the decreasing bias is put on vertices. We can remark that Experiment 14— and —has the strongest correlations with the rankings of dissimilar biases that are either similar to the one of vertices—Experiments 6 and 7— or to the one of hb-edges—Experiments 12 and 13.

(a) Strict Kendall tau correlation coefficient
(b) Large Kendall tau correlation coefficient
Figure 1: Strict (a) and large (b) Kendall tau correlation coefficient for node ranking with biases. Realized on 100 random hb-graphs with 200 hb-edges of maximal size 20, with 5 groups.
(a) Strict Kendall tau correlation coefficient
(b) Large Kendall tau correlation coefficient
Figure 2: Strict (a) and large (b) Kendall tau correlation coefficient for hb-edge ranking with biases. Realized on 100 random hb-graphs with 200 hb-edges of maximal size 20, with 5 groups.
(a) First ranking: and ; Second ranking: and
(b) First ranking: and ; Second ranking: and
(c) First ranking: and ; Second ranking: and
(d) First ranking: and ; Second ranking: and
Figure 3: Effect of vertex biases on ranking.

A last remark is on the variability of the results: if the values of the correlation coefficients change, from one hb-graph to another, the phenomenon observed remains the same, whatever the first hb-graph observed; however, the number of experiments performed ensures already a minimized fluctuation in these results.

5 Further Comments

The biased-exchange-based diffusion proposed in this Chapter enhances a tunable diffusion that can be integrated into the hb-graph framework to tune adequately the ranking of the facets. The results obtained on randomly generated hb-graphs have still to be applied to real hb-graphs, with the known difficulty of the connectedness: it will be addressed in future work. There remains a lot to explore on the subject in order to refine the query results obtained with real searches. The difficulty remains that in ground truth classification by experts, only a few criteria can be retained, that ends up in most cases in pairwise comparison of elements, and, hence, does not account for higher order relationships.

References