Learning Mixtures of Markov Chains with Quality Guarantees

02/09/2023
by   Fabian Spaeh, et al.
0

A large number of modern applications ranging from listening songs online and browsing the Web to using a navigation app on a smartphone generate a plethora of user trails. Clustering such trails into groups with a common sequence pattern can reveal significant structure in human behavior that can lead to improving user experience through better recommendations, and even prevent suicides [LMCR14]. One approach to modeling this problem mathematically is as a mixture of Markov chains. Recently, Gupta, Kumar and Vassilvitski [GKV16] introduced an algorithm (GKV-SVD) based on the singular value decomposition (SVD) that under certain conditions can perfectly recover a mixture of L chains on n states, given only the distribution of trails of length 3 (3-trail). In this work we contribute to the problem of unmixing Markov chains by highlighting and addressing two important constraints of the GKV-SVD algorithm [GKV16]: some chains in the mixture may not even be weakly connected, and secondly in practice one does not know beforehand the true number of chains. We resolve these issues in the Gupta et al. paper [GKV16]. Specifically, we propose an algebraic criterion that enables us to choose a value of L efficiently that avoids overfitting. Furthermore, we design a reconstruction algorithm that outputs the true mixture in the presence of disconnected chains and is robust to noise. We complement our theoretical results with experiments on both synthetic and real data, where we observe that our method outperforms the GKV-SVD algorithm. Finally, we empirically observe that combining an EM-algorithm with our method performs best in practice, both in terms of reconstruction error with respect to the distribution of 3-trails and the mixture of Markov Chains.

READ FULL TEXT
research
11/17/2022

Learning Mixtures of Markov Chains and MDPs

We present an algorithm for use in learning mixtures of both Markov chai...
research
02/25/2019

FPRAS for the Potts Model and the Number of k-colorings

In this paper, we give a sampling algorithm for the Potts model using Ma...
research
05/13/2021

Identity testing of reversible Markov chains

We consider the problem of identity testing of Markov chains based on a ...
research
03/21/2019

Reduction of Markov Chains using a Value-of-Information-Based Approach

In this paper, we propose an approach to obtain reduced-order models of ...
research
03/10/2020

Error Estimation for Sketched SVD via the Bootstrap

In order to compute fast approximations to the singular value decomposit...
research
05/26/2023

Irreducibility of Recombination Markov Chains in the Triangular Lattice

In the United States, regions are frequently divided into districts for ...
research
07/06/2021

Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification

Modeling the time evolution of discrete sets of items (e.g., genetic mut...

Please sign up or login with your details

Forgot password? Click here to reset