Evolution of k-mer Frequencies and Entropy in Duplication and Substitution Mutation Systems

12/05/2018
by   Hao Lou, et al.
0

Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processes, designing tractable stochastic models and analyzing them are challenging. In this paper, we study two kinds of systems, each representing a set of mutations. In the first system, tandem duplications and substitution mutations are allowed and in the other, interspersed duplications. We provide stochastic models and, via stochastic approximation, study the evolution of substring frequencies for these two systems separately. Specifically, we show that k-mer frequencies converge almost surely and determine the limit set. Furthermore, we present a method for finding upper bounds on entropy for such systems.

READ FULL TEXT

page 3

page 5

page 7

page 9

page 10

page 11

page 12

page 13

research
08/18/2018

The Capacity of Some Pólya String Models

We study random string-duplication systems, which we call Pólya string m...
research
07/11/2017

On the letter frequencies and entropy of written Marathi

We carry out a comprehensive analysis of letter frequencies in contempor...
research
07/01/2022

A Stochastic Contraction Mapping Theorem

In this paper we define contractive and nonexpansive properties for adap...
research
06/30/2023

Analyzing Generalized Pólya Urn Models using Martingales, with an Application to Viral Evolution

The randomized play-the-winner (RPW) model is a generalized Pólya Urn pr...
research
09/03/2021

The typical set and entropy in stochastic systems with arbitrary phase space growth

The existence of the typical set is key for the consistence of the ensem...
research
04/07/2021

Numerics for Stochastic Distributed Parameter Control Systems: a Finite Transposition Method

In this chapter, we present some recent progresses on the numerics for s...
research
06/18/2019

New Uniform Bounds for Almost Lossless Analog Compression

Wu and Verdú developed a theory of almost lossless analog compression, w...

Please sign up or login with your details

Forgot password? Click here to reset