Comparing reverse complementary genomic words based on their distance distributions and frequencies

10/06/2017
by   Ana Helena Tavares, et al.
0

In this work we study reverse complementary genomic word pairs in the human DNA, by comparing both the distance distribution and the frequency of a word to those of its reverse complement. Several measures of dissimilarity between distance distributions are considered, and it is found that the peak dissimilarity works best in this setting. We report the existence of reverse complementary word pairs with very dissimilar distance distributions, as well as word pairs with very similar distance distributions even when both distributions are irregular and contain strong peaks. The association between distribution dissimilarity and frequency discrepancy is explored also, and it is speculated that symmetric pairs combining low and high values of each measure may uncover features of interest. Taken together, our results suggest that some asymmetries in the human genome go far beyond Chargaff's rules. This study uses both the complete human genome and its repeat-masked version.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2018

Clustering genomic words in human DNA using peaks and trends of distributions

In this work we seek clusters of genomic words in human DNA by studying ...
research
08/01/2020

Theta palindromes in theta conjugates

A DNA string is a Watson-Crick (WK) palindrome when the complement of it...
research
04/10/2023

Kernel Code for DNA Digital Data Storage

The biggest challenge when using DNA as a storage medium is maintaining ...
research
10/18/2021

DNA Codes over the Ring ℤ_4 + wℤ_4

In this present work, we generalize the study of construction of DNA cod...
research
11/16/2021

Evaluation problems for the Thompson group and the Brin-Thompson group, and their relation to the word problem

The Thompson group V, as well as the Brin-Thompson group 2V, is finitely...
research
10/03/2017

Dilated Convolutions for Modeling Long-Distance Genomic Dependencies

We consider the task of detecting regulatory elements in the human genom...
research
10/01/2015

Similarity of symbol frequency distributions with heavy tails

Quantifying the similarity between symbolic sequences is a traditional p...

Please sign up or login with your details

Forgot password? Click here to reset