MFSPSSMpred

Identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation

Chun Fang, Tamotsu Noguchi, Daisuke Tominaga, Hayato Yamana

    Research output: Contribution to journalArticle

    31 Citations (Scopus)

    Abstract

    Background: Molecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.Results: Here, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004~0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045~0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.Conclusions: This study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.

    Original languageEnglish
    Article number300
    JournalBMC Bioinformatics
    Volume14
    Issue number1
    DOIs
    Publication statusPublished - 2013 Oct 4

    Fingerprint

    Position-Specific Scoring Matrices
    Feature Recognition
    Molecular recognition
    Scoring
    Disorder
    Conservation
    Predictors
    Proteins
    Protein
    Masking
    Intrinsically Disordered Proteins
    Area Under Curve
    Molecular interactions
    Drug Design
    Membrane Protein
    Smoothing Methods
    Prediction
    Inborn Genetic Diseases
    Amino acids
    Medical Genetics

    Keywords

    • Intrinsically disordered protein
    • Molecular recognition features
    • Position-specific scoring matrix

    ASJC Scopus subject areas

    • Biochemistry
    • Molecular Biology
    • Computer Science Applications
    • Applied Mathematics
    • Structural Biology

    Cite this

    MFSPSSMpred : Identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation. / Fang, Chun; Noguchi, Tamotsu; Tominaga, Daisuke; Yamana, Hayato.

    In: BMC Bioinformatics, Vol. 14, No. 1, 300, 04.10.2013.

    Research output: Contribution to journalArticle

    @article{938d39bb924841b2836219ea2e58f192,
    title = "MFSPSSMpred: Identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation",
    abstract = "Background: Molecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.Results: Here, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004~0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045~0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.Conclusions: This study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.",
    keywords = "Intrinsically disordered protein, Molecular recognition features, Position-specific scoring matrix",
    author = "Chun Fang and Tamotsu Noguchi and Daisuke Tominaga and Hayato Yamana",
    year = "2013",
    month = "10",
    day = "4",
    doi = "10.1186/1471-2105-14-300",
    language = "English",
    volume = "14",
    journal = "BMC Bioinformatics",
    issn = "1471-2105",
    publisher = "BioMed Central",
    number = "1",

    }

    TY - JOUR

    T1 - MFSPSSMpred

    T2 - Identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation

    AU - Fang, Chun

    AU - Noguchi, Tamotsu

    AU - Tominaga, Daisuke

    AU - Yamana, Hayato

    PY - 2013/10/4

    Y1 - 2013/10/4

    N2 - Background: Molecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.Results: Here, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004~0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045~0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.Conclusions: This study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.

    AB - Background: Molecular recognition features (MoRFs) are short binding regions located in longer intrinsically disordered protein regions. Although these short regions lack a stable structure in the natural state, they readily undergo disorder-to-order transitions upon binding to their partner molecules. MoRFs play critical roles in the molecular interaction network of a cell, and are associated with many human genetic diseases. Therefore, identification of MoRFs is an important step in understanding functional aspects of these proteins and in finding applications in drug design.Results: Here, we propose a novel method for identifying MoRFs, named as MFSPSSMpred (Masked, Filtered and Smoothed Position-Specific Scoring Matrix-based Predictor). Firstly, a masking method is used to calculate the average local conservation scores of residues within a masking-window length in the position-specific scoring matrix (PSSM). Then, the scores below the average are filtered out. Finally, a smoothing method is used to incorporate the features of flanking regions for each residue to prepare the feature sets for prediction. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the PSSM of sequence only. Experimental results show that, comparing with other methods tested on the same datasets, our method achieves the best performance: achieving 0.004~0.079 higher AUC than other methods when tested on TEST419, and achieving 0.045~0.212 higher AUC than other methods when tested on TEST2012. In addition, when tested on an independent membrane proteins-related dataset, MFSPSSMpred significantly outperformed the existing predictor MoRFpred.Conclusions: This study suggests that: 1) amino acid composition and physicochemical properties in the flanking regions of MoRFs are very different from those in the general non-MoRF regions; 2) MoRFs contain both highly conserved residues and highly variable residues and, on the whole, are highly locally conserved; and 3) combining contextual information with local conservation information of residues facilitates the prediction of MoRFs.

    KW - Intrinsically disordered protein

    KW - Molecular recognition features

    KW - Position-specific scoring matrix

    UR - http://www.scopus.com/inward/record.url?scp=84884959977&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84884959977&partnerID=8YFLogxK

    U2 - 10.1186/1471-2105-14-300

    DO - 10.1186/1471-2105-14-300

    M3 - Article

    VL - 14

    JO - BMC Bioinformatics

    JF - BMC Bioinformatics

    SN - 1471-2105

    IS - 1

    M1 - 300

    ER -