Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

Taikai Takeda, Michiaki Hamada*

*この研究の対応する著者

研究成果: Article査読

抄録

Motivation Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. Results We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. Availability and implementation The software is available at https://github.com/bigsea-t/fab-phmm. Contact mhamada@waseda.jp Supplementary informationSupplementary dataare available at Bioinformatics online.

本文言語English
ページ(範囲)576-584
ページ数9
ジャーナルBioinformatics
34
4
DOI
出版ステータスPublished - 2018 2月 15

ASJC Scopus subject areas

  • 統計学および確率
  • 生化学
  • 分子生物学
  • コンピュータ サイエンスの応用
  • 計算理論と計算数学
  • 計算数学

フィンガープリント

「Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル