SCPSSMpred

A general sequence-based method for ligand-binding site prediction

Chun Fang, Tamotsu Noguchi, Hayato Yamana

    Research output: Contribution to journalArticle

    Abstract

    In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior.

    Original languageEnglish
    Pages (from-to)35-42
    Number of pages8
    JournalIPSJ Transactions on Bioinformatics
    Volume6
    DOIs
    Publication statusPublished - 2013 Jun

    Fingerprint

    Binding sites
    Position-Specific Scoring Matrices
    Binding Sites
    Ligands
    Proteins
    Flavin-Adenine Dinucleotide
    Adenosinetriphosphate
    NAD
    Amino acids
    Classifiers
    Adenosine Triphosphate
    Amino Acids

    Keywords

    • Ligand-binding
    • Prediction
    • Sequence-based
    • Simplified PSSM

    ASJC Scopus subject areas

    • Computer Science Applications
    • Biochemistry, Genetics and Molecular Biology (miscellaneous)

    Cite this

    SCPSSMpred : A general sequence-based method for ligand-binding site prediction. / Fang, Chun; Noguchi, Tamotsu; Yamana, Hayato.

    In: IPSJ Transactions on Bioinformatics, Vol. 6, 06.2013, p. 35-42.

    Research output: Contribution to journalArticle

    @article{094b078d64a545ad8755419a6fa3d213,
    title = "SCPSSMpred: A general sequence-based method for ligand-binding site prediction",
    abstract = "In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50{\%} of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior.",
    keywords = "Ligand-binding, Prediction, Sequence-based, Simplified PSSM",
    author = "Chun Fang and Tamotsu Noguchi and Hayato Yamana",
    year = "2013",
    month = "6",
    doi = "10.2197/ipsjtbio.6.35",
    language = "English",
    volume = "6",
    pages = "35--42",
    journal = "IPSJ Transactions on Bioinformatics",
    issn = "1882-6679",
    publisher = "Information Processing Society of Japan",

    }

    TY - JOUR

    T1 - SCPSSMpred

    T2 - A general sequence-based method for ligand-binding site prediction

    AU - Fang, Chun

    AU - Noguchi, Tamotsu

    AU - Yamana, Hayato

    PY - 2013/6

    Y1 - 2013/6

    N2 - In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior.

    AB - In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior.

    KW - Ligand-binding

    KW - Prediction

    KW - Sequence-based

    KW - Simplified PSSM

    UR - http://www.scopus.com/inward/record.url?scp=84882249635&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84882249635&partnerID=8YFLogxK

    U2 - 10.2197/ipsjtbio.6.35

    DO - 10.2197/ipsjtbio.6.35

    M3 - Article

    VL - 6

    SP - 35

    EP - 42

    JO - IPSJ Transactions on Bioinformatics

    JF - IPSJ Transactions on Bioinformatics

    SN - 1882-6679

    ER -