TY - JOUR
T1 - Prediction of RNA secondary structure using generalized centroid estimators
AU - Hamada, Michiaki
AU - Kiryu, Hisanori
AU - Sato, Kengo
AU - Mituyama, Toutai
AU - Asai, Kiyoshi
N1 - Funding Information:
Funding: New Energy and Industrial Technology Development Organization (NEDO) of Japan (to ‘Functional RNA project); Grant-in-Aid for Scientific Research on Priority Areas ‘Comparative Genomics’ from the Ministry of Education, Culture, Sports, Science and Technology of Japan.
PY - 2009/2
Y1 - 2009/2
N2 - Motivation: Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures. Results: We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics.
AB - Motivation: Recent studies have shown that the methods for predicting secondary structures of RNAs on the basis of posterior decoding of the base-pairing probabilities has an advantage with respect to prediction accuracy over the conventionally utilized minimum free energy methods. However, there is room for improvement in the objective functions presented in previous studies, which are maximized in the posterior decoding with respect to the accuracy measures for secondary structures. Results: We propose novel estimators which improve the accuracy of secondary structure prediction of RNAs. The proposed estimators maximize an objective function which is the weighted sum of the expected number of the true positives and that of the true negatives of the base pairs. The proposed estimators are also improved versions of the ones used in previous works, namely CONTRAfold for secondary structure prediction from a single RNA sequence and McCaskill-MEA for common secondary structure prediction from multiple alignments of RNA sequences. We clarify the relations between the proposed estimators and the estimators presented in previous works, and theoretically show that the previous estimators include additional unnecessary terms in the evaluation measures with respect to the accuracy. Furthermore, computational experiments confirm the theoretical analysis by indicating improvement in the empirical accuracy. The proposed estimators represent extensions of the centroid estimators proposed in Ding et al. and Carvalho and Lawrence, and are applicable to a wide variety of problems in bioinformatics.
UR - http://www.scopus.com/inward/record.url?scp=60149094002&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=60149094002&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btn601
DO - 10.1093/bioinformatics/btn601
M3 - Article
C2 - 19095700
AN - SCOPUS:60149094002
VL - 25
SP - 465
EP - 473
JO - Bioinformatics
JF - Bioinformatics
SN - 1367-4803
IS - 4
ER -