CentroidAlign: Fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score

Michiaki Hamada, Kengo Sato, Hisanori Kiryu, Toutai Mituyama, Kiyoshi Asai

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Motivation: The importance of accurate and fast predictions of multiple alignments for RNA sequences has increased due to recent findings about functional non-coding RNAs. Recent studies suggest that maximizing the expected accuracy of predictions will be useful for many problems in bioinformatics.Results: We designed a novel estimator for multiple alignments of structured RNAs, based on maximizing the expected accuracy of predictions. First, we define the maximum expected accuracy (MEA) estimator for pairwise alignment of RNA sequences. This maximizes the expected sum-of-pairs score (SPS) of a predicted alignment under a probability distribution of alignments given by marginalizing the Sankoff model. Then, by approximating the MEA estimator, we obtain an estimator whose time complexity is O(L3+c2dL2) where L is the length of input sequences and both c and d are constants independent of L. The proposed estimator can handle uncertainty of secondary structures and alignments that are obstacles in Bioinformatics because it considers all the secondary structures and all the pairwise alignments as input sequences. Moreover, we integrate the probabilistic consistency transformation (PCT) on alignments into the proposed estimator. Computational experiments using six benchmark datasets indicate that the proposed method achieved a favorable SPS and was the fastest of many state-of-the-art tools for multiple alignments of structured RNAs.

Original languageEnglish
Article numberbtp580
Pages (from-to)3236-3243
Number of pages8
JournalBioinformatics
Volume25
Issue number24
DOIs
Publication statusPublished - 2009 Oct 6
Externally publishedYes

Fingerprint

Computational Biology
RNA
Alignment
Benchmarking
Untranslated RNA
Uncertainty
Estimator
Secondary Structure
Bioinformatics
Prediction
Pairwise
Computational Experiments
Probability distributions
Time Complexity
Probability Distribution
Maximise
Integrate
Benchmark
Datasets

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

CentroidAlign : Fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score. / Hamada, Michiaki; Sato, Kengo; Kiryu, Hisanori; Mituyama, Toutai; Asai, Kiyoshi.

In: Bioinformatics, Vol. 25, No. 24, btp580, 06.10.2009, p. 3236-3243.

Research output: Contribution to journalArticle

Hamada, Michiaki ; Sato, Kengo ; Kiryu, Hisanori ; Mituyama, Toutai ; Asai, Kiyoshi. / CentroidAlign : Fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score. In: Bioinformatics. 2009 ; Vol. 25, No. 24. pp. 3236-3243.
@article{8bbbc8cf525845a8b86b80ebc8d7da81,
title = "CentroidAlign: Fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score",
abstract = "Motivation: The importance of accurate and fast predictions of multiple alignments for RNA sequences has increased due to recent findings about functional non-coding RNAs. Recent studies suggest that maximizing the expected accuracy of predictions will be useful for many problems in bioinformatics.Results: We designed a novel estimator for multiple alignments of structured RNAs, based on maximizing the expected accuracy of predictions. First, we define the maximum expected accuracy (MEA) estimator for pairwise alignment of RNA sequences. This maximizes the expected sum-of-pairs score (SPS) of a predicted alignment under a probability distribution of alignments given by marginalizing the Sankoff model. Then, by approximating the MEA estimator, we obtain an estimator whose time complexity is O(L3+c2dL2) where L is the length of input sequences and both c and d are constants independent of L. The proposed estimator can handle uncertainty of secondary structures and alignments that are obstacles in Bioinformatics because it considers all the secondary structures and all the pairwise alignments as input sequences. Moreover, we integrate the probabilistic consistency transformation (PCT) on alignments into the proposed estimator. Computational experiments using six benchmark datasets indicate that the proposed method achieved a favorable SPS and was the fastest of many state-of-the-art tools for multiple alignments of structured RNAs.",
author = "Michiaki Hamada and Kengo Sato and Hisanori Kiryu and Toutai Mituyama and Kiyoshi Asai",
year = "2009",
month = "10",
day = "6",
doi = "10.1093/bioinformatics/btp580",
language = "English",
volume = "25",
pages = "3236--3243",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "24",

}

TY - JOUR

T1 - CentroidAlign

T2 - Fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score

AU - Hamada, Michiaki

AU - Sato, Kengo

AU - Kiryu, Hisanori

AU - Mituyama, Toutai

AU - Asai, Kiyoshi

PY - 2009/10/6

Y1 - 2009/10/6

N2 - Motivation: The importance of accurate and fast predictions of multiple alignments for RNA sequences has increased due to recent findings about functional non-coding RNAs. Recent studies suggest that maximizing the expected accuracy of predictions will be useful for many problems in bioinformatics.Results: We designed a novel estimator for multiple alignments of structured RNAs, based on maximizing the expected accuracy of predictions. First, we define the maximum expected accuracy (MEA) estimator for pairwise alignment of RNA sequences. This maximizes the expected sum-of-pairs score (SPS) of a predicted alignment under a probability distribution of alignments given by marginalizing the Sankoff model. Then, by approximating the MEA estimator, we obtain an estimator whose time complexity is O(L3+c2dL2) where L is the length of input sequences and both c and d are constants independent of L. The proposed estimator can handle uncertainty of secondary structures and alignments that are obstacles in Bioinformatics because it considers all the secondary structures and all the pairwise alignments as input sequences. Moreover, we integrate the probabilistic consistency transformation (PCT) on alignments into the proposed estimator. Computational experiments using six benchmark datasets indicate that the proposed method achieved a favorable SPS and was the fastest of many state-of-the-art tools for multiple alignments of structured RNAs.

AB - Motivation: The importance of accurate and fast predictions of multiple alignments for RNA sequences has increased due to recent findings about functional non-coding RNAs. Recent studies suggest that maximizing the expected accuracy of predictions will be useful for many problems in bioinformatics.Results: We designed a novel estimator for multiple alignments of structured RNAs, based on maximizing the expected accuracy of predictions. First, we define the maximum expected accuracy (MEA) estimator for pairwise alignment of RNA sequences. This maximizes the expected sum-of-pairs score (SPS) of a predicted alignment under a probability distribution of alignments given by marginalizing the Sankoff model. Then, by approximating the MEA estimator, we obtain an estimator whose time complexity is O(L3+c2dL2) where L is the length of input sequences and both c and d are constants independent of L. The proposed estimator can handle uncertainty of secondary structures and alignments that are obstacles in Bioinformatics because it considers all the secondary structures and all the pairwise alignments as input sequences. Moreover, we integrate the probabilistic consistency transformation (PCT) on alignments into the proposed estimator. Computational experiments using six benchmark datasets indicate that the proposed method achieved a favorable SPS and was the fastest of many state-of-the-art tools for multiple alignments of structured RNAs.

UR - http://www.scopus.com/inward/record.url?scp=75849160582&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=75849160582&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btp580

DO - 10.1093/bioinformatics/btp580

M3 - Article

C2 - 19808876

AN - SCOPUS:75849160582

VL - 25

SP - 3236

EP - 3243

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 24

M1 - btp580

ER -