1 Introduction
Graphs are ubiquitous in machine learning, where they are used to represent pairwise relationships between objects. For example, social networks, proteinprotein interaction (PPI) networks, and the internet are modeled with graphs. One limitation of graph models, however, is that they do not encode higherorder relationships between objects. A social network can represent a community of users (e.g. a friend group) as a collection of edges between each user, but this pairwise representation loses information about the overall group structure [38]. In biology, protein interactions are not only between pairs of proteins, but also between groups of proteins in protein complexes [32, 33].
Such higherorder interactions can be modeled using a hypergraph: a generalization of a graph containing hyperedges that can be incident to more than two nodes. A hypergraph representation of a social network can model a community of friends with a single hyperedge. In contrast, the corresponding representation of a community in a graph requires many edges that connect pairs of individuals within the community; conversely, it may not be clear which collection of edges in a graph represents a community (e.g. a clique, an edgedense subnetwork, etc). Hypergraphs have been used in a variety of machine learning tasks, including clustering [1, 43, 27, 28], ranking keywords in a collection of documents [5], predicting customer behavior in ecommerce [26], object classification [42, 41], and image segmentation [24].
A common approach to incorporate graph information in a machine learning algorithm is to utilize properties of random walks or diffusion processes on the graph. For example, random walks on graphs underlie algorithms for recommendation systems [21], clustering [18, 31], information retrieval [6]
, and other applications. In many machine learning applications, the graph is represented through the graph Laplacian. Spectral theory includes many key results regarding the eigenvalues and eigenvectors of the graph Laplacian, and these results form the foundation of spectral learning algorithms.
Spectral theory on hypergraphs is much less developed than on graphs. In seminal work, Zhou et al. [43] developed learning algorithms on hypergraphs based on random walks on graphs. However, at nearly the same time, Agarwal et al. [2] showed that the hypergraph Laplacian matrix used by Zhou et al. is equal to the Laplacian matrix of a closely related graph, the star graph. A consequence of this equivalence is that the methods introduced by Zhou et al. utilize only pairwise relationships between objects, rather than the higherorder relationships encoded in the hypergraph. More recently, Chan et al. [7] and Li and Milenkovic [27, 28] developed nonlinear Laplacian operators for hypergraphs that partially address this issue. However, all existing constructions of linear Laplacian operators utilize only pairwise relationships between vertices, as shown by Agarwal et al. [2].
In this paper, we develop a spectral theory for hypergraphs with edgedependent vertex weights. In such a hypergraph, each hyperedge has an edge weight , and each vertex has a collection of vertex weights, with one weight for each hyperedge incident to . The edgedependent vertex weight models the contribution of vertex to hyperedge
. Edgedependent vertex weights have previously been used in several applications including: image segmentation, where the weights represent the probability of an image pixel (vertex) belonging to a segment (hyperedge)
[11]; ecommerce, where the weights model the quantity of a product (hyperedge) in a user’s shopping basket (vertex) [26]; and text ranking, where the weights represent the importance of a keyword (vertex) to a document (hyperedge) [5]. Hypergraphs with edgedependent vertex weights have also been used in image search [40, 20] and 3D object classification [42], where the weights represent contributions of vertices in a knearestneighbors hypergraph.Unfortunately, because of a lack of a spectral theory for hypergraphs with edgedependent vertex weights, many of the papers that use these hypergraphs rely on incorrect or theoretically unsound assumptions. For example, Zhang et al. [42] and Ding and Yilmaz [11] use a hypergraph Laplacian with no spectral guarantees, while Li et al. [26] derive an incorrect stationary distribution for a random walk on such a hypergraph (see Supplement for additional details). The reason such issues arise is because existing spectral methods are developed for hypergraphs with edgeindependent vertex weights, i.e. hypergraphs where the are identical for all hyperedges .
In this paper, we derive several results for hypergraphs with edgedependent vertex weights. First, we show that random walks on hypergraphs with edgeindependent vertex weights are always equivalent to random walks on the clique graph (Figure 1). This generalizes the results of Agarwal et al. [2] and gives the underlying reason why existing constructions of hypergraph Laplacian matrices [34, 43] do not utilize the higherorder relations of the hypergraph.
Motivated by this result, we derive a random walkbased Laplacian matrix for hypergraphs with edgedependent vertex weights that utilizes the higherorder relations expressed in the hypergraph structure. This Laplacian matrix satisfies the typical properties one would expect of a Laplacian matrix, including being positive semidefinite and satisfying a Cheeger inequality. We also derive a formula for the stationary distribution of a random walk on a hypergraph with edgedependent vertex weights, and give a bound on the mixing time of the random walk.
Our paper is organized as follows. In Section 2, we define our notation, and introduce hypergraphs with edgedependent vertex weights. In Section 3, we formally define random walks on hypergraphs with edgedependent vertex weights, and show that when the vertex weights are edgeindependent, a random walk on a hypergraph has the same transition matrix as a random walk on its clique graph. In Section 4, we derive a formula for the stationary distribution of a random walk, and use it to bound the mixing time. In Section 5, we derive a randomwalk based Laplacian matrix for hypergraphs with edgedependent vertex weights and show some basic properties of the matrix. Finally, in Section 6, we demonstrate two applications of hypergraphs with edgedependent vertex weights: ranking authors in a citation network and ranking players in a video game. All proofs are in the Supplementary Material.
2 Graphs, Hypergraphs, and Random Walks
Let be a graph with vertex set , edge set , and edge weights . For a vertex , let denote the vertices incident to . The adjacency matrix of a graph is a matrix where if and otherwise.
Let be a hypergraph with vertex set ; edge set ; and hyperedge weights . A graph is a special case of a hypergraph, where each hyperedge has size . For hypergraphs, the terms “hyperedge” and “edge” are used interchangeably. A random walk on a hypergraph is typically defined as follows [43, 12, 9, 4]. At time , a “random walker” at vertex will:

Select an edge containing , with probability proportional to .

Select a vertex from , uniformly at random.

Move to vertex at time .
A natural extension is to modify Step 2: instead of choosing uniformly at random from , we pick
according to a fixed probability distribution on the vertices in
. This motivates the following definition of a hypergraph with edgedependent vertex weights.A hypergraph with edgedependent vertex weights is a set of vertices , a set of hyperedges, a weight for every hyperedge , and a weight for every hyperedge and every vertex incident to . We emphasize that a vertex in a hypergraph with edgedependent vertex weights has multiple weights: one weight for each hyperedge that contains . Intuitively, measures the contribution of vertex to hyperedge . In a random walk on a hypergraph with edgedependent vertex weights, the random walker will pick a vertex from hyperedge with probability proportional to . Note that we set if . We show an example of a hypergraph with edgedependent vertex weights in Figure 1.
If each vertex has the same contribution to all incident hyperedges, i.e. for all hyperedges and incident to , then we say that the hypergraph has edgeindependent vertex weights, and we use to refer to the vertex weights of . If for all vertices and incident hyperedges , we say the vertex weights are trivial.
We define to be the hyperedges incident to a vertex , and to be the hyperedges incident to both vertices and . Let denote the degree of vertex , and let denote the degree of hyperedge . The vertexweight matrix of a hypergraph with edgedependent vertex weights is an matrix with entries , and the hyperedge weight matrix is a matrix with if , and otherwise. The vertexdegree matrix is a diagonal matrix with entries , and the hyperedgedegree matrix is a diagonal matrix with entries .
Given , the clique graph of , , is an unweighted graph with vertices , and edges . In other words, turns all hyperedges into cliques.
We say a hypergraph is connected if its clique graph is connected. In this paper, we assume all hypergraphs are connected.
For a Markov chain with states
transition probabilities , we use to denote the probability of going from state to state .3 Random Walks on Hypergraphs with EdgeDependent Vertex Weights
Let be a hypergraph with edgedependent vertex weights. We first define a random walk on . At time , a random walker at vertex will do the following:

Pick an edge containing , with probability .

Pick a vertex from , with probability .

Move to vertex , at time .
Formally, we define a random walk on by writing out the transition probabilities according to the above steps.
A random walk on a hypergraph with edgedependent vertex weights is a Markov chain on with transition probabilities
(1) 
The probability transition matrix of a random walk on is the matrix with entries and can be written in matrix form as . (We use the convention that probability transition matrices have row sum .) Using the probability transition matrix , we can also define a random walk with restart on [36]. The random walk with restart is useful when it is unknown whether the random walk is irreducible.
Note that our definition allows selfloops, i.e. , and thus the random walk is lazy. While one can define a nonlazy random walk (i.e. for all ), the analysis of such walks is significantly more difficult, as the probability transition matrix cannot be factored as easily. In the Supplement, we show that a weaker version of Theorem 1 below holds for a nonlazy random walk. Cooper et al. [9] also studies the cover time of a nonlazy random walk on a hypergraph with edgeindependent vertex weights.
Next, we define what it means for two random walks to be equivalent. Because random walks are Markov chains, we define equivalence in terms of Markov chains. Let and be Markov chains with the same (countable) state space, and let and be their respective probability transition matrices. We say that and are equivalent if
for all states and .
Using this definition, we state our first main theorem: a random walk on a hypergraph with edgeindependent vertex weights is equivalent to a random walk on its clique graph, for some choice of weights on the clique graph. Let be a hypergraph with edgeindependent vertex weights. There exist weights on the clique graph such that a random walk on is equivalent to a random walk on .
Theorem 1 generalizes the result by Agarwal et al. [2] who showed that the two hypergraph Laplacian matrices constructed in Zhou et al. [43] and RodriguezVelazquez [34] are equal to the Laplacian matrix of either the clique graph or the star graph, another graph constructed from a hypergraph. Agarwal et al. [2] also showed that the Laplacians of the clique graph and the star graph are equal when is uniform (i.e. when all hyperedges have size ), and are very close otherwise. Since the Laplacian matrices in Zhou et al. [43] and RodriguezVelazquez [34] are derived from random walks on edgeindependent vertex weights, Theorem 1 implies that both Laplacians are equal to the Laplacian of the clique graph – even when the hypergraph is not uniform – thus strengthening the result in Agarwal et al. [2].
The proof of Theorem 1 relies on the fact that a random walk on satisfies a property known as timereversibility: for all vertices , where is the stationary distribution of the random walk [3]. It is wellknown that a Markov chain can be represented as a random walk on a graph if and only if it is timereversible. Moreover, timereversiblility allows us to derive a formula for the weights on . Let be the edgeindependent weight for vertex . Then,
(2) 
Conversely, the caption of Figure 1 describes a simple example of a hypergraph with edgedependent vertex weights that is not timereversible. This proves the following result.
There exists a hypergraph with edgedependent weights such that a random walk on is not equivalent to a random walk on its clique graph for any choice of edge weights on .
Anecdotally, we find from simulations that most random walks on hypergraphs with edgedependent vertex weights are not timereversible, and therefore satisfy Theorem 3. However, it is not clear how to formalize this observation.
Theorem 3 says that random walks on graphs with vertex set are a strict subset of Markov chains on . A natural followup question is whether all Markov chains on can be described as a random walk on some hypergraph with vertex set and edgedependent vertex weights. In the Supplement, we show that the answer to this question is no and provide a counterexample.
In addition, we show in the Supplement that hypergraphs with edgedependent vertex weights create a rich hierarchy of Markov chains, beyond the division between timereversible and timeirreversible Markov chains. In particular, we show that random walks on hypergraphs with edge dependent vertex weights and at least one hyperedge of cardinality cannot in general be reduced to a random walk on a hypergraph with hyperedges of cardinality at most .
Finally, note that our definition of equivalent random walks (Definition 1) requires the probability transition matrices to be equal. Thus, another natural question is: given , do there exist weights on the clique graph such that random walks on and are “close”? We provide a partial answer to this question in Section 5, where we show that, for a specific choice of weights on , the secondsmallest eigenvalues of the Laplacian matrices of and are close.
4 Stationary Distribution and Mixing Time
4.1 Stationary Distribution
Recall the formula for the stationary distribution of a random walk on a graph. If is a graph, then the stationary distribution of a random walk on is
(3) 
where . We derive a formula for the stationary distribution for a random walk on a hypergraph with edgedependent vertex weights; the formula is analogous to equation (3) above with two important changes: first, the proportionality constant depends on the hyperedge, and second, each term in the sum is multiplied by the vertex weight . Let be a hypergraph with edgeindependent vertex weights. There exist positive constants such that the stationary distribution of a random walk on is
(4) 
Moreover, can be computed in time .
Note that while the vertex weights can be scaled arbitrarily without affecting the properties of the random walk, Theorem 3 suggests that is the “correct” scaling factor.
When the hypergraph has edgeindependent vertex weights (i.e. for all incident hyperedges ), , leading to the following formula for the stationary distribution:
(5) 
Furthermore, if the vertex weights are trivial (i.e. ) then , recovering the formula derived in Zhou et al. [43] for the stationary distribution of hypergraphs with trivial vertex weights.
4.2 Mixing Time
In this section, we derive a bound on the mixing time of a random walk on . First, we recall the definition of the mixing time of a Markov chain.
Let be a Markov chain with states and probability transition matrix . The mixing time of is
where is the total variation distance.
We derive the following bound on the mixing time for a random walk on a hypergraph with edgedependent vertex weights.
Let be a hypergraph with edgedependent vertex weights. Without loss of generality, assume (i.e. by multiplying the vertex weights in hyperedge by ). Then,
(6) 
where

is the minimum degree of a vertex in , i.e. ,

,

.
This bound on the mixing time of the hypergraph random walk has a similar form to the bound on the mixing time bound for a random walk on a graph [22]. For a graph with edge weights satisfying , we have,
(7) 
Note that both and have the same dependence on , and . Intuitively, the additional dependence of on and is because small values of and correspond to the hypergraph having vertices that are hard to reach, and the presence of such vertices increases the mixing time.
5 Hypergraph Laplacian
Let be a hypergraph with edgedependent vertex weights. Since a random walk on is a Markov chain, we can model the transition probabilities of the random walk using a weighted directed graph with the same vertex set . Specifically, let be a directed graph with directed edges , and edge weights . Extending the definition of the Laplacian matrix for directed graphs [8], we define a Laplacian matrix for the hypergraph as follows.
[Random walkbased hypergraph Laplacian] Let be a hypergraph with edgedependent vertex weights. Let be the probability transition matrix of a random walk on with stationary distribution . Let be a diagonal matrix with . Then, the random walkbased hypergraph Laplacian matrix is
(8) 
At first glance, one might hypothesize that the hypergraph Laplacian defined above does not model higherorder relations between vertices, since is defined using a directed graph containing edges only between pairs of vertices. Indeed, if has edgeindependent vertex weights, then it is true that does not model higherorder relations between vertices. This is because the transition probabilities are completely determined by the edge weights of the undirected clique graph (Theorem 1). Thus, for each pair of vertices in , only a single quantity , which encodes a pairwise relation between and , is required to define the random walk. As such, the Laplacian matrix defined in Equation (8) is equal to the Laplacian matrix of an undirected graph, showing that only encodes pairwise relationships between vertices.
In contrast, when has edgedependent vertex weights, the transition probabilities generally cannot be computed from a single quantity defined for each pair of vertices (Theorem 3). The absence of such a reduction implies that the transition probabilities , which are the edge weights of the directed graph , encode higherorder relations between vertices. Thus, the Laplacian matrix also encodes these higherorder relations.
From Chung [8], the hypergraph Laplacian matrix given in equation (8) is positive semidefinite and has a Rayleigh quotient for computing its eigenvalues. can be used in developing spectral learning algorithms for hypergraphs with edgedependent vertex weights, or to study the properties of random walks on such hypergraphs. For example, the following Cheeger inequality for hypergraphs follows directly from the Cheeger inequality for directed graphs [8].
[Cheeger inequality for hypergraphs] Let be a hypergraph with edgedependent vertex weights. Let be the Laplacian matrix given in equation (8), and let be the Cheeger constant of a random walk on . Let be the nonzero eigenvalues of , and let . We have
(9) 
5.1 Approximating the Hypergraph Laplacian with a Graph Laplacian
In Section 3, we posed the following question: given a hypergraph with edgedependent vertex weights, can we find weights on the clique graph such that the random walks of and are close? We prove the following result. Let be a hypergraph, with the edgedependent vertex weights normalized so that for all hyperedges . Let be the clique graph of , with edge weights
(10) 
Let be the Laplacians of and , respectively, and let be the secondsmallest eigenvalues of , respectively. Then
(11) 
where . This theorem says that there exist edge weights on such that second smallest eigenvalues of the Laplacians of and are within a constant factor of each other, where is determined by the vertex weights. We do not know if the edge weights in Equation (59) give the tightest bound, or if another choice of edge weights on will yield a Laplacian that is “closer” to the hypergraph Laplacians .
Interestingly, Zhang et al. [42] use a variant of as the Laplacian matrix of a hypergraph with edgedependent vertex weights, and obtain stateoftheart results on an object classification task. Theorem 5.1 provides some theoretical evidence for why Zhang et al. [42] are able to obtain good results, even with the “wrong” Laplacian.
6 Experiments
We demonstrate the utility of hypergraphs with edgedependent vertex weights in two different ranking applications: ranking authors in an academic citation network, and ranking players in a video game.
6.1 Citation Network
We construct a citation network of all machine learning papers from NIPS, ICML, KDD, IJCAI, UAI, ICLR, and COLT published on or before 10/27/2017, and extracted from the ArnetMiner database [35]. We represent the network as a hypergraph whose vertices are authors and whose hyperedges are papers, such that each hyperedge connects the authors of a paper. The hypergraph has vertices and hyperedges.
We consider two vertex weighted hypergraphs: has trivial vertex weights with for all for all vertices and incident hyperedges , and has edgedependent vertex weights
The edgedependent vertex weights model unequal contributions by different authors. For papers whose authors are in alphabetical order (as is common in theory papers), we set vertex weights for all . We set the hyperedge weights in both hypergraphs.
We calculate the stationary distribution of a random walk with restart on both and (restart parameter ), and rank authors in each hypergraph by their value in the stationary distribution. This yields two different rankings of authors: one with edgeindependent vertex weights, and one with edgedependent vertex weights.
The two rankings have a Kendall correlation coefficient [23] of , indicating modest similarity. Examining individual authors, we typically see that authors who are first/last authors on their most cited papers have higher rankings in compared to , e.g. Ian Goodfellow [17]. In contrast, authors who are middle authors on their most cited papers have lower rankings in relative to their rankings in . Table 1 shows the authors with rank above in at least one of the two hypergraphs, and with the largest gain in rank in relative to .
Name  Rank in  Rank in 

Richard Socher  687  382 
Zhongzhi Shi  543  304 
Daniel Rueckert  619  391 
Lars SchmidtThieme  673  454 
TatSeng Chua  650  435 
Ian J. Goodfellow  612  413 
We emphasize that this example is intended to illustrate how a straightforward application of vertex weights leads to alternative author rankings. We do not anticipate that our simple scheme for choosing edgedependent vertex weights will always yield the best results in practice. For example, Christopher Manning drops in rank when edgedependent vertex weights are added, but this is because he is the secondtolast, and cocorresponding, author on his most cited papers in the database. A more robust vertex weighting scheme would include knowledge of such equalcontribution authors, and would also incorporate different relative contributions of first, middle, and corresponding authors.
6.2 Rank Aggregation
We illustrate the usage of hypergraphs with edgedependent vertex weights on the rank aggregation problem. The rank aggregation problem aims to combine many partial rankings into one complete ranking. Formally, given a universe of items and a collection of partial rankings (e.g. is a partial ranking expressing item item item ), a rank aggregation algorithm should find a permutation on that is “close” to the partial rankings .
We consider a particular application of rank aggregation: ranking players in a multiplayer game. Here, the outcome of a game/match gives a partial ranking of the players participating in the match. In addition to the ranking, one may also have additional information such as the scores of each player in the match. The latter setting has been extensively studied; classic ranking methods are the ELO [14], and Glicko [16] systems that are used to rank chess players. More recently, online multiplayer games such as Halo have led to the development of alternative ranking systems such as Microsoft’s TrueSkill [19] and TrueSkill 2 [29].
We develop a rank aggregation algorithm that uses random walks on hypergraphs with edgedependent vertex weights, and evaluate the performance of this algorithm on a realworld datasets of Halo 2 games. In the Supplement, we also include results on experiments with synthetic data.
Data. We analyze the Halo 2 dataset from the TrueSkill paper [19]. This dataset contains two kinds of matches: freeforall matches with up to players, and 1v1 matches. There are freeforall matches and 1v1 matches among players. Using the freeforall matches as partial rankings, we construct rankings of all players in the dataset, and evaluate those rankings on the 1v1 matches.
Methods. A wellknown class of rank aggregation algorithms are Markov chainbased algorithms, first developed by Dwork et al. [13]. Markovchain based algorithms create a Markov chain whose states are the players and whose the transition probabilities depend in some way on the partial rankings. The final ranking of players is determined by sorting the values in the stationary distribution of . In our experiments, we use a random walk with restart () instead of just a random walk, so that the stationary distribution always exists [36].
Using the freeforall matches, we construct rankings of the players using four algorithms. The first three algorithms use Markov chains: a random walk on hypergraph with edgedependent vertex weights; a random walk on a clique graph; and MC3, a Markov chainbased rank aggregation algorithm designed by Dwork et al. [13]. The fourth algorithm is TrueSkill [19].
First, we derive a rank aggregation algorithm using a random walk on a hypergraph with edgedependent vertex weights. The vertices are the players, and the hyperedges correspond to the freeforall matches. We set the hyperedge and vertex weights to be
This choice of hyperedge weights are inspired by Ding and Yilmaz [11]
, who also use variance to define the hyperedge weights of their hypergraph. For vertex weights, we use
. We choose these vertex weights instead of raw scores for two reasons: first, scores in Halo 5 can be negative, but vertex weights should be positive, and second, exponentiating the score gives more importance to the winner of a match. We chose to use relatively simple formulas for the hyperedge and vertex weights to evaluate the potential benefits of utilizing edgedependent vertex weights; further optimization of vertex and edge weights may yield better performance.Second, we derive a rank aggregation algorithm using a random walk on the clique graph of hypergraph described above, with the edge weights of given by Equation 59. Specifically, if is the hypergraph defined above, then is a graph with vertex set and edge weights defined by
(12) 
In contrast to Equation 59, here we do not normalize vertex weights on so that for each hyperedge , since computing is computationally infeasible on our large dataset. Instead, we normalize vertex weights so that for all hyperedges .
Third, we use MC3, a Markov chainbased rank aggregation algorithm designed by Dwork et al. [13]. MC3 uses the partial rankings in each match; it does not use the score information. MC3 is very similar to a random walk on a hypergraph with edgeindependent vertex weights. We convert the scores from each player in match into a partial ranking of the players, and use the as input to MC3.
Fourth, we use TrueSkill [19]
. TrueSkill models each player’s skill with a normal distribution. We rank players according to the mean of this distribution. We also implemented the probabilistic decision procedure for ranking players from the TrueSkill paper, and found no difference in performance between ranking by the mean of the distribution and the probabilistic decision procedure.
Evaluation and Results: We evaluate the rankings of each algorithm by using them to predict the outcomes of the 1v1 matches. Specifically, given a ranking of players, we predict that the winner of a match between two players is the player with the higher ranking in . Table 2 shows the fraction of 1v1 matches correctly predicted by each of the four algorithms. Random walks on the hypergraph with edgedependent vertex weights have significantly better performance than both MC3 and random walks on the clique graph , and comparable performance to TrueSkill. Moreover, on of 1v1 matches, the hypergraph method correctly predicts the outcome of the match, while TrueSkill incorrectly predicts the outcome—suggesting that the hypergraph model is capturing some information about the players that TrueSkill is missing. Unfortunately, we are unable to identify any specific pattern in the matches where the hypergraph predicted the outcome correctly and TrueSkill predicted incorrectly.
Correctly Predicted  

TrueSkill  73.4% 
Hypergraph  71.1% 
Clique Graph  61.1% 
MC3  52.3% 
7 Conclusion
In this paper, we use random walks to develop a spectral theory for hypergraphs with edgedependent vertex weights. We demonstrate both theoretically and experimentally how edgedependent vertex weights model higherorder information in hypergraphs and improve the performance of hypergraphbased algorithms. At the same time, we show that random walks on hypergraphs with edgeindependent vertex weights are equivalent to random walks on graphs, generalizing earlier results tha showed this equivalence in special cases [2].
There are numerous directions for future work. It would be desirable to evaluate additional applications where hypergraphs with edgedependent vertex weights have previously been used (e.g. [42, 26]), replacing the Laplacian used in some of these works with the hypergraph Laplacian introduced in Section 5. Sharper bounds on the approximation of the hypergraph Laplacian by a graph Laplacian are also desirable. Another direction is to examine the relationship between the linear hypergraph Laplacian matrix introduced here and the nonlinear Laplacian operators that were recently introduced in the case of trivial vertex weights [7] or submodular vertex weights [27, 28].
Another interesting direction is in extending graph convolutional neural networks (GCNs) to hypergraphs. Recent approaches to GCNs implement the graph convolution operator as a nonlinear function of the graph Laplacian
[25, 10]. GCNs have also been generalized to hypergraph convolutional neural networks (HGCNs), where the convolution layer operates on a hypergraph with edgeindependent vertex weights instead of a graph [37, 15]. The hypergraph Laplacian matrix introduced in this paper would allow one to extend HGCNs to hypergraphs with edgedependent vertex weights.References

Agarwal et al. [2005]
S. Agarwal, Jongwoo Lim, L. ZelnikManor, P. Perona, D. Kriegman, and
S. Belongie.
Beyond pairwise clustering.
In
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
, volume 2, pages 838–845 vol. 2, June 2005.  Agarwal et al. [2006] Sameer Agarwal, Kristin Branson, and Serge Belongie. Higher order learning with graphs. In Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pages 17–24, New York, NY, USA, 2006. ACM. ISBN 1595933832. doi: 10.1145/1143844.1143847.
 Aldous and Fill [2002] David Aldous and James Allen Fill. Reversible Markov Chains and Random Walks on Graphs. 2002.
 Avin et al. [2014] Chen Avin, Yuval Lando, and Zvi Lotker. Radio cover time in hypergraphs. Ad Hoc Networks, 12:278 – 290, 2014. ISSN 15708705. doi: http://doi.org/10.1016/j.adhoc.2012.08.010.
 Bellaachia and AlDhelaan [2013] Abdelghani Bellaachia and Mohammed AlDhelaan. Random walks in hypergraph. In Proceedings of the 2013 International Conference on Applied Mathematics and Computational Methods, Venice Italy, pages 187–194, 2013.
 Brin and Page [1998] S. Brin and L. Page. The anatomy of a largescale hypertextual web search engine. In Seventh International WorldWide Web Conference (WWW 1998), 1998.
 Chan et al. [2018] T.H. Hubert Chan, Anand Louis, Zhihao Gavin Tang, and Chenzi Zhang. Spectral properties of hypergraph laplacian and approximation algorithms. J. ACM, 65(3):15:1–15:48, March 2018. ISSN 00045411. doi: 10.1145/3178123.
 Chung [2005] Fan Chung. Laplacians and the cheeger inequality for directed graphs. Annals of Combinatorics, 9(1):1–19, Apr 2005. ISSN 02193094. doi: 10.1007/s000260050237z.
 Cooper et al. [2013] Colin Cooper, Alan Frieze, and Tomasz Radzik. The cover times of random walks on random uniform hypergraphs. Theoretical Computer Science, 509:51 – 69, 2013. ISSN 03043975. doi: http://dx.doi.org/10.1016/j.tcs.2013.01.020.
 Defferrard et al. [2016] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. CoRR, abs/1606.09375, 2016.
 Ding and Yilmaz [2010] Lei Ding and Alper Yilmaz. Interactive image segmentation using probabilistic hypergraphs. Pattern Recognition, 43(5):1863 – 1873, 2010. ISSN 00313203.
 Ducournau and Bretto [2014] Aurélien Ducournau and Alain Bretto. Random walks in directed hypergraphs and application to semisupervised image segmentation. Comput. Vis. Image Underst., 120:91–102, March 2014. ISSN 10773142. doi: 10.1016/j.cviu.2013.10.012.
 Dwork et al. [2001] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01, pages 613–622, New York, NY, USA, 2001. ACM. ISBN 1581133480. doi: 10.1145/371920.372165.
 Elo [1978] Arpad E. Elo. The rating of chessplayers, past and present. Arco Pub., New York, 1978. ISBN 0668047216 9780668047210.
 Feng et al. [2018] Yifan Feng, Haoxuan You, Zizhao Zhang, Rongrong Ji, and Yue Gao. Hypergraph neural networks. CoRR, abs/1809.09401, 2018.
 Glickman [1995] Mark E Glickman. The glicko system. Boston University, 1995.
 Goodfellow et al. [2014] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 Harel and Koren [2001] David Harel and Yehuda Koren. On clustering using random walks. In Proceedings of the 21st Conference on Foundations of Software Technology and Theoretical Computer Science, FST TCS ’01, pages 18–41, Berlin, Heidelberg, 2001. SpringerVerlag. ISBN 3540430024.
 Herbrich et al. [2006] Ralf Herbrich, Tom Minka, and Thore Graepel. Trueskill™: A bayesian skill rating system. In Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06, pages 569–576, Cambridge, MA, USA, 2006. MIT Press.
 Huang et al. [2010] Y. Huang, Q. Liu, S. Zhang, and D. N. Metaxas. Image retrieval via probabilistic hypergraph ranking. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3376–3383, June 2010. doi: 10.1109/CVPR.2010.5540012.
 Jamali and Ester [2009] Mohsen Jamali and Martin Ester. Trustwalker: A random walk model for combining trustbased and itembased recommendation. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pages 397–406, New York, NY, USA, 2009. ACM. ISBN 9781605584959. doi: 10.1145/1557019.1557067.
 Jerison [2013] Daniel Jerison. General mixing time bounds for finite markov chains via the absolute spectral gap, October 2013.
 Kendall [1938] M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. ISSN 00063444.
 Kim et al. [2011] Sungwoong Kim, Sebastian Nowozin, Pushmeet Kohli, and Chang D. Yoo. Higherorder correlation clustering for image segmentation. In J. ShaweTaylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 1530–1538. Curran Associates, Inc., 2011.
 Kipf and Welling [2016] Thomas N. Kipf and Max Welling. Semisupervised classification with graph convolutional networks. CoRR, abs/1609.02907, 2016.
 Li et al. [2018] Jianbo Li, Jingrui He, and Yada Zhu. Etail product return prediction via hypergraphbased local graph cut. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, pages 519–527, New York, NY, USA, 2018. ACM. ISBN 9781450355520.
 Li and Milenkovic [2017] Pan Li and Olgica Milenkovic. Inhomogeneous hypergraph clustering with applications. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 2308–2318. Curran Associates, Inc., 2017.

Li and Milenkovic [2018]
Pan Li and Olgica Milenkovic.
Submodular hypergraphs: plaplacians, Cheeger inequalities and spectral clustering.
In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3014–3023, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR.  Minka et al. [2018] Tom Minka, Ryan Cleven, and Yordan Zaykov. Trueskill 2: An improved bayesian skill rating system. March 2018.
 Montenegro and Tetali [2006] R. Montenegro and P. Tetali. Mathematical aspects of mixing times in markov chains. Found. Trends Theor. Comput. Sci., 1(3):237–354, May 2006. ISSN 1551305X. doi: 10.1561/0400000003.

Ng et al. [2001]
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss.
On spectral clustering: Analysis and an algorithm.
In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, NIPS’01, pages 849–856, Cambridge, MA, USA, 2001. MIT Press.  Ramadan et al. [2004] E. Ramadan, A. Tarafdar, and A. Pothen. A hypergraph model for the yeast protein complex network. In 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings., pages 189–, April 2004. doi: 10.1109/IPDPS.2004.1303205.
 Ritz et al. [2014] Anna Ritz, Allison N. Tegge, Hyunju Kim, Christopher L. Poirel, and T.M. Murali. Signaling hypergraphs. Trends in Biotechnology, 32(7):356 – 362, 2014. ISSN 01677799. doi: http://doi.org/10.1016/j.tibtech.2014.04.007.
 RodriguezVelazquez [2002] Juan Alberto RodriguezVelazquez. On the laplacian eigenvalues and metric parameters of hypergraphs. Linear and Multilinear Algebra, 50:1–14, 03 2002.
 Tang et al. [2008] Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pages 990–998, New York, NY, USA, 2008. ACM. ISBN 9781605581934. doi: 10.1145/1401890.1402008.
 Tong et al. [2006] H. Tong, C. Faloutsos, and J. Pan. Fast random walk with restart and its applications. In Sixth International Conference on Data Mining (ICDM’06), pages 613–622, Dec 2006. doi: 10.1109/ICDM.2006.70.
 Yadati et al. [2018] Naganand Yadati, Madhav Nimishakavi, Prateek Yadav, Anand Louis, and Partha Talukdar. Hypergcn: Hypergraph convolutional networks for semisupervised classification. CoRR, abs/1809.02589, 2018.
 Yang et al. [2017] Wenyin Yang, Guojun Wang, Md Zakirul Alam Bhuiyan, and KimKwang Raymond Choo. Hypergraph partitioning for social networks based on information entropy modularity. Journal of Network and Computer Applications, 86:59 – 71, 2017. ISSN 10848045. Special Issue on Pervasive Social Networking.
 Yilmaz et al. [2008] Emine Yilmaz, Javed A. Aslam, and Stephen Robertson. A new rank correlation coefficient for information retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pages 587–594, New York, NY, USA, 2008. ACM. ISBN 9781605581644. doi: 10.1145/1390334.1390435.
 Zeng et al. [2016] Kaiman Zeng, Nansong Wu, Arman Sargolzaei, and Kang Yen. Learn to rank images: A unified probabilistic hypergraph model for visual search. Mathematical Problems in Engineering, 2016:1–7, 01 2016. doi: 10.1155/2016/7916450.
 Zhang et al. [2018a] Z. Zhang, H. Lin, X. Zhao, R. Ji, and Y. Gao. Inductive multihypergraph learning and its application on viewbased 3d object classification. IEEE Transactions on Image Processing, 27(12):5957–5968, Dec 2018a.

Zhang et al. [2018b]
Zizhao Zhang, Haojie Lin, and Yue Gao.
Dynamic hypergraph structure learning.
In Proceedings of the TwentySeventh International Joint
Conference on Artificial Intelligence, IJCAI18
, pages 3162–3169. International Joint Conferences on Artificial Intelligence Organization, 7 2018b.
 Zhou et al. [2006] Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. Learning with hypergraphs: Clustering, classification, and embedding. In Proceedings of the 19th International Conference on Neural Information Processing Systems, NIPS’06, pages 1601–1608, Cambridge, MA, USA, 2006. MIT Press.
Appendix A Incorrect Stationary Distribution in Earlier Work
Li et al. [26] claim in Equation 4 that the stationary distribution of a random walk on a hypergraph with edgedependent vertex weights is
(13) 
where is the sum of edge weights of incident hyperedges. Curiously, the stationary distribution given by this formula does not depend on the vertex weights. A counterexample to this formula is shown in hypergraph in Figure 1 of the main text, with edgedependent vertex weights as described in the caption (i.e. ). Computing the stationary distribution of a random walk on yields that , while Equation (13) incorrectly yields .
Appendix B Proof of Theorem 1
First we need the following definition and lemma.
Let be a Markov chain with state space and transition probabilities , for . We say is reversible if there exists a probability distribution over such that
(14) 
Let be an irreducible Markov chain with finite state space and transition probabilities for . is reversible if and only if there exists a weighted, undirected graph with vertex set such that a random walk on and are equivalent.
Proof of Lemma.
First, suppose is reversible. Since is irreducible, let be the stationary distribution of . Note that, because is irreducible, for all states .
Let be a graph with vertices , and edge weights . By reversibility, is welldefined. In a random walk on , the probability of going from to in one timestep is
since .
Thus, if is reversible, the stated claim holds. The other direction follows from the fact that a random walk on an undirected graph is always reversible [3]. ∎
Theorem 1.
Let be a hypergraph with edgeindependent vertex weights. Then, there exist weights on the clique graph such that a random walk on is equivalent to a random walk on .
Proof of Theorem 1.
Let for vertices and incident hyperedges . We first show that a random walk on is reversible. By Kolmogorov’s criterion, reversibility is equivalent to
(15) 
for any set of vertices .
Since the transition probabilities for any two vertices are
(16) 
we have
(17) 
So by Kolmogorov’s criterion, a random walk on is reversible.
Furthermore, because is connected, random walks on are irreducible. Thus, by Lemma B, there exists a graph with vertex set and edge weights such that random walks on and are equivalent. The equivalence of the random walks implies that if and only if , so it follows that is the clique graph of . ∎
Appendix C NonLazy Random Walks on Hypergraphs
First we generalize the random walk framework of Cooper et al. [9] to random walks on hypergraphs with edgedependent vertex weights. Informally, in a nonlazy random walk, a random walker at vertex will do the following:

pick an edge containing , with probability ,

pick a vertex from , with probability , and

move to vertex .
Formally, we have the following. A nonlazy random walk on a hypergraph with edgedependent vertex weights is a Markov chain on with transition probabilities
(18) 
for all states .
It is also useful to define a modified version of the clique graph without selfloops.
Let be a hypergraph with edgedependent vertex weights. The clique graph of without selfloops, , is a weighted, undirected graph with vertex set , and edges defined by
(19) 
In contrast to the lazy random walk, a nonlazy random walk on a hypergraph with edgeindependent vertex weights is not guaranteed to satisfy reversibility. However, if has trivial vertex weights, then reversibility holds, and we get the following result.
Let be a hypergraph with trivial vertex weights, i.e. for all vertices and incident hyperedges . Then, there exist weights on the clique graph without selfloops such that a nonlazy random walk on is equivalent to a random walk on .
Proof.
Again, we first show that a nonlazy random walk on is reversible. Define the probability mass function for normalizing constant . Let be the probability of going from to in a nonlazy random walk on , where . Then,
By symmetry, , so a nonlazy random is reversible. Thus, by Lemma B, there exists a graph with vertex set and edge weights such that a random walk on and a nonlazy random walk on are equivalent. The equivalence of the random walks implies that if and only if , so it follows that is the clique graph of without selfloops. ∎
Appendix D Relationships between Random Walks on Hypergraphs and Markov Chains on Vertex Set
In the main text, we show that there are hypergraphs with edgedependent vertex weights whose random walks are not equivalent to a random walk on a graph. A natural followup question is to ask whether all Markov chains on a vertex set can be represented as a random walk on some hypergraph with the same vertex set and edgedependent vertex weights. Below, we show that the answer is no. Since random walks on hypergraphs with edgedependent vertex weights are lazy, in the sense that for all vertices , we restrict our attention to lazy Markov chains with .
There exists a lazy Markov chain with state space such that is not equivalent to a random walk on a hypergraph with vertex set and edgedependent vertex weights.
Proof.
Suppose for the sake of contradiction that any lazy Markov chain with is equivalent to a random walk on some hypergraph with vertex set . Let be a lazy Markov chain with states and transition probabilities , with the following property. For some states , let
(20) 
By assumption, let be a hypergraph with vertex set and edgedependent vertex weights, such that a random walk on is equivalent to . Let be the transition probabilities of a random walk on . We have
(21) 
Plugging in Equations (20) to the above yields , or .
By similar reasoning, we also have , and plugging in Equations (20) gives us , or .
Combining both of these inequalities, we obtain
(22) 
Since the vertex degree , we obtain a contradiction. ∎
Next, for any , define a hypergraph to be a hypergraph with edgedependent vertex weights whose hyperedges have cardinality at most . We show that, for any , there exists a hypergraph with vertex set whose random walk is not equivalent to the random walk of any hypergraph with vertex set . We first prove the result for .
There exists a hypergraph with vertex set , whose random walk is not equivalent to a random walk on any hypergraph with vertex set .
Proof.
Let be a hypergraph with four vertices, , and two hyperedges and . Let the hyperedge weights be and the vertex weights be , and for all other such that .
For the sake of contradiction, suppose a random walk on is equivalent to a random walk on , where is a hypergraph with vertex set . Let be the transition probabilities of for ; by assumption, .
must have the following edges: , , , , and . WLOG let for each . Moreover, while we do not depict these edges in the figure below, also has edges for , though it may be the case that .
For shorthand, we write for , for , and for where .
By definition, we have
(23) 
Thus, .
By similar analysis of , and using that , we also have . Thus, adding together the bounds on and
(24) 
Note that, to get the bound in Equation (24), we summed for . If we follow the same steps but replace with , we get the following bounds, respectively:
(25)  
(26) 
Now, solving for in Equation (24) yields
(27) 
Next, using that , we bound Equation (25):
(28) 
Solving for yields . Combining with Equation (27):
(29) 
Comments
There are no comments yet.