POODLE-L: A two-level SVM prediction system for reliably predicting long disordered regions

Shuichi Hirose, Kana Shimizu, Satoru Kanai, Yutaka Kuroda, Tamotsu Noguchi

Research output: Contribution to journalArticle

110 Citations (Scopus)

Abstract

Motivation: Recent experimental and theoretical studies have revealed several proteins containing sequence segments that are unfolded under physiological conditions. These segments are called disordered regions. They are actively investigated because of their possible involvement in various biological processes, such as cell signaling, transcriptional and translational regulation. Additionally, disordered regions can represent a major obstacle to high-throughput proteome analysis and often need to be removed from experimental targets. The accurate prediction of long disordered regions is thus expected to provide annotations that are useful for a wide range of applications. Results: We developed Prediction of Order and Disorder by machine LEarning (POODLE-L; L stands for long), the Support Vector Machines (SVMs) based method for predicting long disordered regions using 10 kinds of simple physico-chemical properties of amino acid. POODLE-L assembles the output of 10 two-level SVM predictors into a final prediction of disordered regions. The performance of POODLE-L for predicting long disordered regions, which exhibited a Matthew's correlation coefficient of 0.658, was the highest when compared with eight well-established publicly available disordered region predictors.

Original languageEnglish
Pages (from-to)2046-2053
Number of pages8
JournalBioinformatics
Volume23
Issue number16
DOIs
Publication statusPublished - 2007 Aug 15
Externally publishedYes

Fingerprint

Support vector machines
Support Vector Machine
Biological Phenomena
Prediction
Proteome
Cell signaling
Proteins
Theoretical Models
Amino Acids
Chemical properties
Learning systems
Amino acids
Throughput
Predictors
Protein Sequence
Correlation coefficient
High Throughput
Annotation
Disorder
Machine Learning

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computational Theory and Mathematics
  • Computer Science Applications

Cite this

POODLE-L : A two-level SVM prediction system for reliably predicting long disordered regions. / Hirose, Shuichi; Shimizu, Kana; Kanai, Satoru; Kuroda, Yutaka; Noguchi, Tamotsu.

In: Bioinformatics, Vol. 23, No. 16, 15.08.2007, p. 2046-2053.

Research output: Contribution to journalArticle

Hirose, Shuichi ; Shimizu, Kana ; Kanai, Satoru ; Kuroda, Yutaka ; Noguchi, Tamotsu. / POODLE-L : A two-level SVM prediction system for reliably predicting long disordered regions. In: Bioinformatics. 2007 ; Vol. 23, No. 16. pp. 2046-2053.
@article{9860b445bcc5446083b185dae7a4c8c3,
title = "POODLE-L: A two-level SVM prediction system for reliably predicting long disordered regions",
abstract = "Motivation: Recent experimental and theoretical studies have revealed several proteins containing sequence segments that are unfolded under physiological conditions. These segments are called disordered regions. They are actively investigated because of their possible involvement in various biological processes, such as cell signaling, transcriptional and translational regulation. Additionally, disordered regions can represent a major obstacle to high-throughput proteome analysis and often need to be removed from experimental targets. The accurate prediction of long disordered regions is thus expected to provide annotations that are useful for a wide range of applications. Results: We developed Prediction of Order and Disorder by machine LEarning (POODLE-L; L stands for long), the Support Vector Machines (SVMs) based method for predicting long disordered regions using 10 kinds of simple physico-chemical properties of amino acid. POODLE-L assembles the output of 10 two-level SVM predictors into a final prediction of disordered regions. The performance of POODLE-L for predicting long disordered regions, which exhibited a Matthew's correlation coefficient of 0.658, was the highest when compared with eight well-established publicly available disordered region predictors.",
author = "Shuichi Hirose and Kana Shimizu and Satoru Kanai and Yutaka Kuroda and Tamotsu Noguchi",
year = "2007",
month = "8",
day = "15",
doi = "10.1093/bioinformatics/btm302",
language = "English",
volume = "23",
pages = "2046--2053",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "16",

}

TY - JOUR

T1 - POODLE-L

T2 - A two-level SVM prediction system for reliably predicting long disordered regions

AU - Hirose, Shuichi

AU - Shimizu, Kana

AU - Kanai, Satoru

AU - Kuroda, Yutaka

AU - Noguchi, Tamotsu

PY - 2007/8/15

Y1 - 2007/8/15

N2 - Motivation: Recent experimental and theoretical studies have revealed several proteins containing sequence segments that are unfolded under physiological conditions. These segments are called disordered regions. They are actively investigated because of their possible involvement in various biological processes, such as cell signaling, transcriptional and translational regulation. Additionally, disordered regions can represent a major obstacle to high-throughput proteome analysis and often need to be removed from experimental targets. The accurate prediction of long disordered regions is thus expected to provide annotations that are useful for a wide range of applications. Results: We developed Prediction of Order and Disorder by machine LEarning (POODLE-L; L stands for long), the Support Vector Machines (SVMs) based method for predicting long disordered regions using 10 kinds of simple physico-chemical properties of amino acid. POODLE-L assembles the output of 10 two-level SVM predictors into a final prediction of disordered regions. The performance of POODLE-L for predicting long disordered regions, which exhibited a Matthew's correlation coefficient of 0.658, was the highest when compared with eight well-established publicly available disordered region predictors.

AB - Motivation: Recent experimental and theoretical studies have revealed several proteins containing sequence segments that are unfolded under physiological conditions. These segments are called disordered regions. They are actively investigated because of their possible involvement in various biological processes, such as cell signaling, transcriptional and translational regulation. Additionally, disordered regions can represent a major obstacle to high-throughput proteome analysis and often need to be removed from experimental targets. The accurate prediction of long disordered regions is thus expected to provide annotations that are useful for a wide range of applications. Results: We developed Prediction of Order and Disorder by machine LEarning (POODLE-L; L stands for long), the Support Vector Machines (SVMs) based method for predicting long disordered regions using 10 kinds of simple physico-chemical properties of amino acid. POODLE-L assembles the output of 10 two-level SVM predictors into a final prediction of disordered regions. The performance of POODLE-L for predicting long disordered regions, which exhibited a Matthew's correlation coefficient of 0.658, was the highest when compared with eight well-established publicly available disordered region predictors.

UR - http://www.scopus.com/inward/record.url?scp=34548567232&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548567232&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm302

DO - 10.1093/bioinformatics/btm302

M3 - Article

C2 - 17545177

AN - SCOPUS:34548567232

VL - 23

SP - 2046

EP - 2053

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 16

ER -