Modelling Correlated Bernoulli Data Part I: Theory and Run Lengths

11/30/2022
by   Louise Kimpton, et al.
0

Binary data are very common in many applications, and are typically simulated independently via a Bernoulli distribution with a single probability of success. However, this is not always the physical truth, and the probability of a success can be dependent on the outcome successes of past events. Presented here is a novel approach for simulating binary data where, for a chain of events, successes (1) and failures (0) cluster together according to a distance correlation. The structure is derived from de Bruijn Graphs - a directed graph, where given a set of symbols, V, and a 'word' length, m, the nodes of the graph consist of all possible sequences of V of length m. De Bruijn Graphs are a generalisation of Markov chains, where the 'word' length controls the number of states that each individual state is dependent on. This increases correlation over a wider area. To quantify how clustered a sequence generated from a de Bruijn process is, the run lengths of letters are observed along with run length properties.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset