Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder

Kana Shimizu, Shuichi Hirose, Yoichi Muraoka, Tamotsu Noguchi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    7 Citations (Scopus)

    Abstract

    The prediction of intrinsic disorder from amino acid sequence has been gaining increasing attention because these have come to be known as important regions for protein functions. The most common way of predicting disorder is based on binary classification with machine learning. Since amino acid composition has different propensities in the N-term, C-term, and internal regions, the accuracy of prediction increases by dividing training data into these three regions and predicting them separately. However, previous work has lacked discussion about a concrete definition of the N-term and C-term regions, and has only used the heuristic length from the terminal. Other previous work has shown that general physicochemical properties rather than specific amino acids are important factors contributing to disorder, and a reduced amino acid alphabet can maintain excellent precision In predicting disorder. In this paper, we redefine a suitable length and position for the N-term and C-term regions for predicting disorder. Moreover, we show that each region has different physicochemical properties, which are important factors contributing to disorder. We also suggest a region-specific-reduced set of amino acid and modified PSSM based on that for predicting disorder. We implemented our method and (1) compare it with the conventional division method, (2) compare our feature selection with all physicochemical features, on casp6 benchmark, PDB dataset, and DisProt. The result supports that the method of new data separation is effective, and indicates each region has different physicochemical properties that are important factors for predicting protein disorders.

    Original languageEnglish
    Title of host publicationProceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05
    Volume2005
    Publication statusPublished - 2005
    Event2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05 - La Jolla, CA, United States
    Duration: 2005 Nov 142005 Nov 15

    Other

    Other2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05
    CountryUnited States
    CityLa Jolla, CA
    Period05/11/1405/11/15

    Fingerprint

    Amino acids
    Feature extraction
    Proteins
    Learning systems
    Concretes
    Chemical analysis

    Keywords

    • C-term region
    • Intrinsic disorder
    • N-term region
    • Physicochemical property
    • PSSM

    ASJC Scopus subject areas

    • Engineering(all)

    Cite this

    Shimizu, K., Hirose, S., Muraoka, Y., & Noguchi, T. (2005). Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder. In Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05 (Vol. 2005). [1594927]

    Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder. / Shimizu, Kana; Hirose, Shuichi; Muraoka, Yoichi; Noguchi, Tamotsu.

    Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05. Vol. 2005 2005. 1594927.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Shimizu, K, Hirose, S, Muraoka, Y & Noguchi, T 2005, Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder. in Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05. vol. 2005, 1594927, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05, La Jolla, CA, United States, 05/11/14.
    Shimizu K, Hirose S, Muraoka Y, Noguchi T. Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder. In Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05. Vol. 2005. 2005. 1594927
    Shimizu, Kana ; Hirose, Shuichi ; Muraoka, Yoichi ; Noguchi, Tamotsu. / Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder. Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05. Vol. 2005 2005.
    @inproceedings{2957bc1a9460410499ad80dcc371f47c,
    title = "Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder",
    abstract = "The prediction of intrinsic disorder from amino acid sequence has been gaining increasing attention because these have come to be known as important regions for protein functions. The most common way of predicting disorder is based on binary classification with machine learning. Since amino acid composition has different propensities in the N-term, C-term, and internal regions, the accuracy of prediction increases by dividing training data into these three regions and predicting them separately. However, previous work has lacked discussion about a concrete definition of the N-term and C-term regions, and has only used the heuristic length from the terminal. Other previous work has shown that general physicochemical properties rather than specific amino acids are important factors contributing to disorder, and a reduced amino acid alphabet can maintain excellent precision In predicting disorder. In this paper, we redefine a suitable length and position for the N-term and C-term regions for predicting disorder. Moreover, we show that each region has different physicochemical properties, which are important factors contributing to disorder. We also suggest a region-specific-reduced set of amino acid and modified PSSM based on that for predicting disorder. We implemented our method and (1) compare it with the conventional division method, (2) compare our feature selection with all physicochemical features, on casp6 benchmark, PDB dataset, and DisProt. The result supports that the method of new data separation is effective, and indicates each region has different physicochemical properties that are important factors for predicting protein disorders.",
    keywords = "C-term region, Intrinsic disorder, N-term region, Physicochemical property, PSSM",
    author = "Kana Shimizu and Shuichi Hirose and Yoichi Muraoka and Tamotsu Noguchi",
    year = "2005",
    language = "English",
    isbn = "0780393872",
    volume = "2005",
    booktitle = "Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05",

    }

    TY - GEN

    T1 - Feature selection based on physicochemical properties of redefined N-term region and C-term regions for predicting disorder

    AU - Shimizu, Kana

    AU - Hirose, Shuichi

    AU - Muraoka, Yoichi

    AU - Noguchi, Tamotsu

    PY - 2005

    Y1 - 2005

    N2 - The prediction of intrinsic disorder from amino acid sequence has been gaining increasing attention because these have come to be known as important regions for protein functions. The most common way of predicting disorder is based on binary classification with machine learning. Since amino acid composition has different propensities in the N-term, C-term, and internal regions, the accuracy of prediction increases by dividing training data into these three regions and predicting them separately. However, previous work has lacked discussion about a concrete definition of the N-term and C-term regions, and has only used the heuristic length from the terminal. Other previous work has shown that general physicochemical properties rather than specific amino acids are important factors contributing to disorder, and a reduced amino acid alphabet can maintain excellent precision In predicting disorder. In this paper, we redefine a suitable length and position for the N-term and C-term regions for predicting disorder. Moreover, we show that each region has different physicochemical properties, which are important factors contributing to disorder. We also suggest a region-specific-reduced set of amino acid and modified PSSM based on that for predicting disorder. We implemented our method and (1) compare it with the conventional division method, (2) compare our feature selection with all physicochemical features, on casp6 benchmark, PDB dataset, and DisProt. The result supports that the method of new data separation is effective, and indicates each region has different physicochemical properties that are important factors for predicting protein disorders.

    AB - The prediction of intrinsic disorder from amino acid sequence has been gaining increasing attention because these have come to be known as important regions for protein functions. The most common way of predicting disorder is based on binary classification with machine learning. Since amino acid composition has different propensities in the N-term, C-term, and internal regions, the accuracy of prediction increases by dividing training data into these three regions and predicting them separately. However, previous work has lacked discussion about a concrete definition of the N-term and C-term regions, and has only used the heuristic length from the terminal. Other previous work has shown that general physicochemical properties rather than specific amino acids are important factors contributing to disorder, and a reduced amino acid alphabet can maintain excellent precision In predicting disorder. In this paper, we redefine a suitable length and position for the N-term and C-term regions for predicting disorder. Moreover, we show that each region has different physicochemical properties, which are important factors contributing to disorder. We also suggest a region-specific-reduced set of amino acid and modified PSSM based on that for predicting disorder. We implemented our method and (1) compare it with the conventional division method, (2) compare our feature selection with all physicochemical features, on casp6 benchmark, PDB dataset, and DisProt. The result supports that the method of new data separation is effective, and indicates each region has different physicochemical properties that are important factors for predicting protein disorders.

    KW - C-term region

    KW - Intrinsic disorder

    KW - N-term region

    KW - Physicochemical property

    KW - PSSM

    UR - http://www.scopus.com/inward/record.url?scp=33847206446&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33847206446&partnerID=8YFLogxK

    M3 - Conference contribution

    SN - 0780393872

    SN - 9780780393875

    VL - 2005

    BT - Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB '05

    ER -