Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

05/19/2017
by   Taikai Takeda, et al.
0

Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy.We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criteria (FIC), which is widely utilised in model selection for probabilistic models with hidden variables. Our simulations indicated this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies.

READ FULL TEXT

page 12

page 13

page 20

page 21

research
06/26/2015

Factorized Asymptotic Bayesian Inference for Factorial Hidden Markov Models

Factorial hidden Markov models (FHMMs) are powerful tools of modeling se...
research
06/18/2012

Factorized Asymptotic Bayesian Hidden Markov Models

This paper addresses the issue of model selection for hidden Markov mode...
research
06/22/2012

Hidden Markov Models with mixtures as emission distributions

In unsupervised classification, Hidden Markov Models (HMM) are used to a...
research
07/04/2012

'Say EM' for Selecting Probabilistic Models for Logical Sequences

Many real world sequences such as protein secondary structures or shell ...
research
08/08/2023

Are Information criteria good enough to choose the right the number of regimes in Hidden Markov Models?

Selecting the number of regimes in Hidden Markov models is an important ...
research
02/18/2010

Asymptotic risks of Viterbi segmentation

We consider the maximum likelihood (Viterbi) alignment of a hidden Marko...
research
06/27/2019

A Bayesian Phylogenetic Hidden Markov Model for B Cell Receptor Sequence Analysis

The human body is able to generate a diverse set of high affinity antibo...

Please sign up or login with your details

Forgot password? Click here to reset