Segmentation and genome annotation algorithms

01/03/2021
by   Maxwell W Libbrecht, et al.
0

Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, catalogue existing large-scale reference annotations, and discuss the outlook for future work.

READ FULL TEXT
research
08/03/2015

Unsupervised Learning in Genome Informatics

With different genomes available, unsupervised learning algorithms are e...
research
05/23/2018

Analysis of Novel Annotations in the Gene Ontology for Boosting the Selection of Negative Examples

Public repositories for genome and proteome annotations, such as the Gen...
research
01/13/2022

Multiple Genome Analytics Framework: The Case of All SARS-CoV-2 Complete Variants

Pattern detection and string matching are fundamental problems in comput...
research
04/19/2019

Random Fragments Classification of Microbial Marker Clades with Multi-class SVM and N-Best Algorithm

Microbial clades modeling is a challenging problem in biology based on m...
research
05/10/2019

Alignment- and reference-free phylogenomics with colored de-Bruijn graphs

We present a new whole-genome based approach to infer large-scale phylog...
research
05/13/2020

Genome-Wide Epigenetic Modifications as a Shared Memory Consensus Problem

A distributed computing system is a collection of processors that commun...

Please sign up or login with your details

Forgot password? Click here to reset