Concentration Bounds for Co-occurrence Matrices of Markov Chains
Co-occurrence statistics for sequential data are common and important data signals in machine learning, which provide rich correlation and clustering information about the underlying object space. We give the first bound on the convergence rate of estimating the co-occurrence matrix of a regular (aperiodic and irreducible) finite Markov chain from a single random trajectory. Our work is motivated by the analysis of a well-known graph learning algorithm DeepWalk by [Qiu et al. WSDM '18], who study the convergence (in probability) of co-occurrence matrix from random walk on undirected graphs in the limit, but left the convergence rate an open problem. We prove a Chernoff-type bound for sums of matrix-valued random variables sampled via an ergodic Markov chain, generalizing the regular undirected graph case studied by [Garg et al. STOC '18]. Using the Chernoff-type bound, we show that given a regular Markov chain with n states and mixing time τ, we need a trajectory of length O(τ (log(n)+log(τ))/ϵ^2) to achieve an estimator of the co-occurrence matrix with error bound ϵ. We conduct several experiments and the experimental results are consistent with the exponentially fast convergence rate from theoretical analysis. Our result gives the first sample complexity analysis in graph representation learning.
READ FULL TEXT