Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures

Hiroshi Wako, Tom L. Blundell

Research output: Contribution to journalArticle

48 Citations (Scopus)

Abstract

A three-step method is presented to predict secondary structures of proteins, by utilizing aligned sequences of homologous proteins. Mean propensities and amino acid substitution patterns at a given site in the aligned sequences are first evaluated for four conformational states (i.e. α-helix, β-strand, buried coil and exposed coil). Capping rules are applied in order to define boundaries of the secondary-structure segments more precisely. In the second step ,B-strand is predicted by searching regions predicted as coil for the two patterns characteristic of alternating and fully buried p-strands. The complete sequences of the solvent-accessibility classes predicted by substitution tables and propensities are also searched using Fourier transform methods for α-helical periodicity. After applying capping rules, the α-helices and β-strands predicted in the second step replace, where appropriate, the conformational states predicted in the first step. Finally, in the third step, if one of the four conformational states is assigned to the residues at an equivalent site of aligned sequences in more than a given fraction of the proteins, such a state is reassigned to all the residues at that site. The method is applied to 13 protein families, which contain four folding types, α, β, α/β and α+β. The accuracy of the prediction ranges from 60 to 79% (mean percentage over the 13 families is 69%). For comparison the Garnier-Osguthorpe-Robson (GOR) method is also applied to them. Although the mean prediction accuracy for the GOR method, 58%, can be improved to 63% by applying the second and third steps in this method, there remain four families with less than 55% accuracy. The mean accuracy is relatively higher and poor predictions are reduced in this method.

Original languageEnglish
Pages (from-to)693-708
Number of pages16
JournalJournal of Molecular Biology
Volume238
Issue number5
DOIs
Publication statusPublished - 1994

Fingerprint

Sequence Homology
Amino Acids
Proteins
Secondary Protein Structure
Periodicity
Fourier Analysis
Amino Acid Substitution

Keywords

  • Homologous sequences
  • Protein structure prediction
  • Secondary structure
  • Substitution tables

ASJC Scopus subject areas

  • Virology

Cite this

@article{4c51a1aea2034e60a4a8a86dcf85ec4a,
title = "Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures",
abstract = "A three-step method is presented to predict secondary structures of proteins, by utilizing aligned sequences of homologous proteins. Mean propensities and amino acid substitution patterns at a given site in the aligned sequences are first evaluated for four conformational states (i.e. α-helix, β-strand, buried coil and exposed coil). Capping rules are applied in order to define boundaries of the secondary-structure segments more precisely. In the second step ,B-strand is predicted by searching regions predicted as coil for the two patterns characteristic of alternating and fully buried p-strands. The complete sequences of the solvent-accessibility classes predicted by substitution tables and propensities are also searched using Fourier transform methods for α-helical periodicity. After applying capping rules, the α-helices and β-strands predicted in the second step replace, where appropriate, the conformational states predicted in the first step. Finally, in the third step, if one of the four conformational states is assigned to the residues at an equivalent site of aligned sequences in more than a given fraction of the proteins, such a state is reassigned to all the residues at that site. The method is applied to 13 protein families, which contain four folding types, α, β, α/β and α+β. The accuracy of the prediction ranges from 60 to 79{\%} (mean percentage over the 13 families is 69{\%}). For comparison the Garnier-Osguthorpe-Robson (GOR) method is also applied to them. Although the mean prediction accuracy for the GOR method, 58{\%}, can be improved to 63{\%} by applying the second and third steps in this method, there remain four families with less than 55{\%} accuracy. The mean accuracy is relatively higher and poor predictions are reduced in this method.",
keywords = "Homologous sequences, Protein structure prediction, Secondary structure, Substitution tables",
author = "Hiroshi Wako and Blundell, {Tom L.}",
year = "1994",
doi = "10.1006/jmbi.1994.1330",
language = "English",
volume = "238",
pages = "693--708",
journal = "Journal of Molecular Biology",
issn = "0022-2836",
publisher = "Academic Press Inc.",
number = "5",

}

TY - JOUR

T1 - Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. II. Secondary structures

AU - Wako, Hiroshi

AU - Blundell, Tom L.

PY - 1994

Y1 - 1994

N2 - A three-step method is presented to predict secondary structures of proteins, by utilizing aligned sequences of homologous proteins. Mean propensities and amino acid substitution patterns at a given site in the aligned sequences are first evaluated for four conformational states (i.e. α-helix, β-strand, buried coil and exposed coil). Capping rules are applied in order to define boundaries of the secondary-structure segments more precisely. In the second step ,B-strand is predicted by searching regions predicted as coil for the two patterns characteristic of alternating and fully buried p-strands. The complete sequences of the solvent-accessibility classes predicted by substitution tables and propensities are also searched using Fourier transform methods for α-helical periodicity. After applying capping rules, the α-helices and β-strands predicted in the second step replace, where appropriate, the conformational states predicted in the first step. Finally, in the third step, if one of the four conformational states is assigned to the residues at an equivalent site of aligned sequences in more than a given fraction of the proteins, such a state is reassigned to all the residues at that site. The method is applied to 13 protein families, which contain four folding types, α, β, α/β and α+β. The accuracy of the prediction ranges from 60 to 79% (mean percentage over the 13 families is 69%). For comparison the Garnier-Osguthorpe-Robson (GOR) method is also applied to them. Although the mean prediction accuracy for the GOR method, 58%, can be improved to 63% by applying the second and third steps in this method, there remain four families with less than 55% accuracy. The mean accuracy is relatively higher and poor predictions are reduced in this method.

AB - A three-step method is presented to predict secondary structures of proteins, by utilizing aligned sequences of homologous proteins. Mean propensities and amino acid substitution patterns at a given site in the aligned sequences are first evaluated for four conformational states (i.e. α-helix, β-strand, buried coil and exposed coil). Capping rules are applied in order to define boundaries of the secondary-structure segments more precisely. In the second step ,B-strand is predicted by searching regions predicted as coil for the two patterns characteristic of alternating and fully buried p-strands. The complete sequences of the solvent-accessibility classes predicted by substitution tables and propensities are also searched using Fourier transform methods for α-helical periodicity. After applying capping rules, the α-helices and β-strands predicted in the second step replace, where appropriate, the conformational states predicted in the first step. Finally, in the third step, if one of the four conformational states is assigned to the residues at an equivalent site of aligned sequences in more than a given fraction of the proteins, such a state is reassigned to all the residues at that site. The method is applied to 13 protein families, which contain four folding types, α, β, α/β and α+β. The accuracy of the prediction ranges from 60 to 79% (mean percentage over the 13 families is 69%). For comparison the Garnier-Osguthorpe-Robson (GOR) method is also applied to them. Although the mean prediction accuracy for the GOR method, 58%, can be improved to 63% by applying the second and third steps in this method, there remain four families with less than 55% accuracy. The mean accuracy is relatively higher and poor predictions are reduced in this method.

KW - Homologous sequences

KW - Protein structure prediction

KW - Secondary structure

KW - Substitution tables

UR - http://www.scopus.com/inward/record.url?scp=0028304961&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028304961&partnerID=8YFLogxK

U2 - 10.1006/jmbi.1994.1330

DO - 10.1006/jmbi.1994.1330

M3 - Article

VL - 238

SP - 693

EP - 708

JO - Journal of Molecular Biology

JF - Journal of Molecular Biology

SN - 0022-2836

IS - 5

ER -