Distance-constraint approach to protein folding. I. Statistical analysis of protein conformations in terms of distances between residues

Hiroshi Wako, Harold A. Scheraga

研究成果: Article

33 引用 (Scopus)

抄録

A statistical analysis of protein conformations in terms of the distance between residues, represented by their Cα atoms, is presented. We consider four factors that contribute to the determination of the distance di,i+k between a given pair of ith and (i+k)th residues in the native conformation of a globular protein: (1) the distance k along the chain, (2) the size of the protein, (3) the conformational states of the ith to (i+k)th residues, and (4) the amino acid types of the and (i+k)th residues. In order to account for the dependence on the distance k along the chain, the statistics are taken for three ranges, viz., short, medium, and long ranges (k≤8; 9≤k≤20; and k≥21; respectively). In the statistics of short-range distances, a mean distance Dk and its standard deviation Sk are calculated for each value of k, with and without taking into account the conformational states of all residues from i to i+k (factors 1 and 3). As an Appendix, the relations for converting from the distances between residues into other conformational parameters are discussed. In the statistics of long-range distances, a reduced distance d*ij (the actual distance divided by the radius of gyration) is used to scale the data so that they become independent of protein size, and then a mean reduced distance Dl (aμ, aν) and its standard deviation σl (aμ, aν) are calculated for each amino acid pair (aμ, aν) (factors 2 and 4). The effect of the neighboring residues along the chain on the value of the distance d*ij is explored by a linear regression analysis between the actual reduced distance d*ij and the mean value over the Dl for all possible pairs of residues in the two segments of the (i-2)th to the (i+2)th and the (j-2)th to the (j+2)th residues. The effect is assessed in terms of the tangent Al (aμ, aν) of the calculated regression line for each amino acid pair (aμ, aν). In the statistics of medium-range distances, only factors 1 and 4 are considered, to simplify the analysis. The scaled distance di,i+k =(di,i+k-Dk)/Sk is used to eliminate the dependence on k, the distance along the chain. The properties Dm (aμ, aν), σm (aμ, aν) and Am (aμ, aν) corresponding to Dl (aμ, aν), σl (aμ, aν), and Al (aμ, aν), and also calculated for each amino acid pair (aμ, aν). The results are interpreted as follows: the smaller values of Dl (aμ, aν) and Dm (aμ, aν) indicate a preference of the pair (aμ, aν) for a contact (e.g., pairs between hydrophobic amino acids, and pairs of Cys with aromatic amino acids), and the larger values of these quantities indicate a preference for distant mutual location (e.g., pairs between strong hydrophilic amino acids); the smaller values of σl (aμ, aν) and σm (aμ, aν) indicate a strong preference for either contact or noncontact (e.g., pairs between hydrophobic amino acids, and pairs between strong hydrophobic and hydrophilic amino acids, respectively), and the larger values of these quantities indicate the ambivalent/neutral nature of the preference for contact and noncontact (e.g., pairs containing Ser or Thr); the smaller values of Al (aμ, aν) and Am (aμ, aν) indicate that the distance of an (aμ, aν) pair is determined independently of the amino acid character of the neighboring residues along the chain (e.g., some pairs of Cys or Met with other amino acids) and the larger values of these quantities indicare that such amino acid character contributes strongly to the determination of the distance (e.g., pairs containing Ser or Thr, and pairs between amino acids with small side chains). The difference between the statistics for the long- and medium-range distances is also discussed; the former reflect the difference between the hydrophobic and hydrophilic character of the residues, but the latter cannot be easily interpretable only in terms of hydrophobicity and hydrophilicity. The data analyzed here are used in the optimization of an object function to compute protein conformation in a subsequent paper.

元の言語English
ページ(範囲)5-45
ページ数41
ジャーナルJournal of Protein Chemistry
1
発行部数1
DOI
出版物ステータスPublished - 1982 5
外部発表Yes

Fingerprint

Protein folding
Protein Conformation
Protein Folding
Conformations
Amino acids
Statistical methods
Proteins
Amino Acids
Statistics
Hydrophobic and Hydrophilic Interactions

ASJC Scopus subject areas

  • Biochemistry

これを引用

@article{f581d95bba5042519c0458c6c3028fcb,
title = "Distance-constraint approach to protein folding. I. Statistical analysis of protein conformations in terms of distances between residues",
abstract = "A statistical analysis of protein conformations in terms of the distance between residues, represented by their Cα atoms, is presented. We consider four factors that contribute to the determination of the distance di,i+k between a given pair of ith and (i+k)th residues in the native conformation of a globular protein: (1) the distance k along the chain, (2) the size of the protein, (3) the conformational states of the ith to (i+k)th residues, and (4) the amino acid types of the and (i+k)th residues. In order to account for the dependence on the distance k along the chain, the statistics are taken for three ranges, viz., short, medium, and long ranges (k≤8; 9≤k≤20; and k≥21; respectively). In the statistics of short-range distances, a mean distance Dk and its standard deviation Sk are calculated for each value of k, with and without taking into account the conformational states of all residues from i to i+k (factors 1 and 3). As an Appendix, the relations for converting from the distances between residues into other conformational parameters are discussed. In the statistics of long-range distances, a reduced distance d*ij (the actual distance divided by the radius of gyration) is used to scale the data so that they become independent of protein size, and then a mean reduced distance Dl (aμ, aν) and its standard deviation σl (aμ, aν) are calculated for each amino acid pair (aμ, aν) (factors 2 and 4). The effect of the neighboring residues along the chain on the value of the distance d*ij is explored by a linear regression analysis between the actual reduced distance d*ij and the mean value over the Dl for all possible pairs of residues in the two segments of the (i-2)th to the (i+2)th and the (j-2)th to the (j+2)th residues. The effect is assessed in terms of the tangent Al (aμ, aν) of the calculated regression line for each amino acid pair (aμ, aν). In the statistics of medium-range distances, only factors 1 and 4 are considered, to simplify the analysis. The scaled distance di,i+k †=(di,i+k-Dk)/Sk is used to eliminate the dependence on k, the distance along the chain. The properties Dm (aμ, aν), σm (aμ, aν) and Am (aμ, aν) corresponding to Dl (aμ, aν), σl (aμ, aν), and Al (aμ, aν), and also calculated for each amino acid pair (aμ, aν). The results are interpreted as follows: the smaller values of Dl (aμ, aν) and Dm (aμ, aν) indicate a preference of the pair (aμ, aν) for a contact (e.g., pairs between hydrophobic amino acids, and pairs of Cys with aromatic amino acids), and the larger values of these quantities indicate a preference for distant mutual location (e.g., pairs between strong hydrophilic amino acids); the smaller values of σl (aμ, aν) and σm (aμ, aν) indicate a strong preference for either contact or noncontact (e.g., pairs between hydrophobic amino acids, and pairs between strong hydrophobic and hydrophilic amino acids, respectively), and the larger values of these quantities indicate the ambivalent/neutral nature of the preference for contact and noncontact (e.g., pairs containing Ser or Thr); the smaller values of Al (aμ, aν) and Am (aμ, aν) indicate that the distance of an (aμ, aν) pair is determined independently of the amino acid character of the neighboring residues along the chain (e.g., some pairs of Cys or Met with other amino acids) and the larger values of these quantities indicare that such amino acid character contributes strongly to the determination of the distance (e.g., pairs containing Ser or Thr, and pairs between amino acids with small side chains). The difference between the statistics for the long- and medium-range distances is also discussed; the former reflect the difference between the hydrophobic and hydrophilic character of the residues, but the latter cannot be easily interpretable only in terms of hydrophobicity and hydrophilicity. The data analyzed here are used in the optimization of an object function to compute protein conformation in a subsequent paper.",
keywords = "amino acid pairs, distances within, amino acid pairs, nature of, amino acids, nature of, long-range interactions, medium-range interactions, protein conformation, conformational parameters of, short-range interactions",
author = "Hiroshi Wako and Scheraga, {Harold A.}",
year = "1982",
month = "5",
doi = "10.1007/BF01025549",
language = "English",
volume = "1",
pages = "5--45",
journal = "Protein Journal",
issn = "1572-3887",
publisher = "Springer New York",
number = "1",

}

TY - JOUR

T1 - Distance-constraint approach to protein folding. I. Statistical analysis of protein conformations in terms of distances between residues

AU - Wako, Hiroshi

AU - Scheraga, Harold A.

PY - 1982/5

Y1 - 1982/5

N2 - A statistical analysis of protein conformations in terms of the distance between residues, represented by their Cα atoms, is presented. We consider four factors that contribute to the determination of the distance di,i+k between a given pair of ith and (i+k)th residues in the native conformation of a globular protein: (1) the distance k along the chain, (2) the size of the protein, (3) the conformational states of the ith to (i+k)th residues, and (4) the amino acid types of the and (i+k)th residues. In order to account for the dependence on the distance k along the chain, the statistics are taken for three ranges, viz., short, medium, and long ranges (k≤8; 9≤k≤20; and k≥21; respectively). In the statistics of short-range distances, a mean distance Dk and its standard deviation Sk are calculated for each value of k, with and without taking into account the conformational states of all residues from i to i+k (factors 1 and 3). As an Appendix, the relations for converting from the distances between residues into other conformational parameters are discussed. In the statistics of long-range distances, a reduced distance d*ij (the actual distance divided by the radius of gyration) is used to scale the data so that they become independent of protein size, and then a mean reduced distance Dl (aμ, aν) and its standard deviation σl (aμ, aν) are calculated for each amino acid pair (aμ, aν) (factors 2 and 4). The effect of the neighboring residues along the chain on the value of the distance d*ij is explored by a linear regression analysis between the actual reduced distance d*ij and the mean value over the Dl for all possible pairs of residues in the two segments of the (i-2)th to the (i+2)th and the (j-2)th to the (j+2)th residues. The effect is assessed in terms of the tangent Al (aμ, aν) of the calculated regression line for each amino acid pair (aμ, aν). In the statistics of medium-range distances, only factors 1 and 4 are considered, to simplify the analysis. The scaled distance di,i+k †=(di,i+k-Dk)/Sk is used to eliminate the dependence on k, the distance along the chain. The properties Dm (aμ, aν), σm (aμ, aν) and Am (aμ, aν) corresponding to Dl (aμ, aν), σl (aμ, aν), and Al (aμ, aν), and also calculated for each amino acid pair (aμ, aν). The results are interpreted as follows: the smaller values of Dl (aμ, aν) and Dm (aμ, aν) indicate a preference of the pair (aμ, aν) for a contact (e.g., pairs between hydrophobic amino acids, and pairs of Cys with aromatic amino acids), and the larger values of these quantities indicate a preference for distant mutual location (e.g., pairs between strong hydrophilic amino acids); the smaller values of σl (aμ, aν) and σm (aμ, aν) indicate a strong preference for either contact or noncontact (e.g., pairs between hydrophobic amino acids, and pairs between strong hydrophobic and hydrophilic amino acids, respectively), and the larger values of these quantities indicate the ambivalent/neutral nature of the preference for contact and noncontact (e.g., pairs containing Ser or Thr); the smaller values of Al (aμ, aν) and Am (aμ, aν) indicate that the distance of an (aμ, aν) pair is determined independently of the amino acid character of the neighboring residues along the chain (e.g., some pairs of Cys or Met with other amino acids) and the larger values of these quantities indicare that such amino acid character contributes strongly to the determination of the distance (e.g., pairs containing Ser or Thr, and pairs between amino acids with small side chains). The difference between the statistics for the long- and medium-range distances is also discussed; the former reflect the difference between the hydrophobic and hydrophilic character of the residues, but the latter cannot be easily interpretable only in terms of hydrophobicity and hydrophilicity. The data analyzed here are used in the optimization of an object function to compute protein conformation in a subsequent paper.

AB - A statistical analysis of protein conformations in terms of the distance between residues, represented by their Cα atoms, is presented. We consider four factors that contribute to the determination of the distance di,i+k between a given pair of ith and (i+k)th residues in the native conformation of a globular protein: (1) the distance k along the chain, (2) the size of the protein, (3) the conformational states of the ith to (i+k)th residues, and (4) the amino acid types of the and (i+k)th residues. In order to account for the dependence on the distance k along the chain, the statistics are taken for three ranges, viz., short, medium, and long ranges (k≤8; 9≤k≤20; and k≥21; respectively). In the statistics of short-range distances, a mean distance Dk and its standard deviation Sk are calculated for each value of k, with and without taking into account the conformational states of all residues from i to i+k (factors 1 and 3). As an Appendix, the relations for converting from the distances between residues into other conformational parameters are discussed. In the statistics of long-range distances, a reduced distance d*ij (the actual distance divided by the radius of gyration) is used to scale the data so that they become independent of protein size, and then a mean reduced distance Dl (aμ, aν) and its standard deviation σl (aμ, aν) are calculated for each amino acid pair (aμ, aν) (factors 2 and 4). The effect of the neighboring residues along the chain on the value of the distance d*ij is explored by a linear regression analysis between the actual reduced distance d*ij and the mean value over the Dl for all possible pairs of residues in the two segments of the (i-2)th to the (i+2)th and the (j-2)th to the (j+2)th residues. The effect is assessed in terms of the tangent Al (aμ, aν) of the calculated regression line for each amino acid pair (aμ, aν). In the statistics of medium-range distances, only factors 1 and 4 are considered, to simplify the analysis. The scaled distance di,i+k †=(di,i+k-Dk)/Sk is used to eliminate the dependence on k, the distance along the chain. The properties Dm (aμ, aν), σm (aμ, aν) and Am (aμ, aν) corresponding to Dl (aμ, aν), σl (aμ, aν), and Al (aμ, aν), and also calculated for each amino acid pair (aμ, aν). The results are interpreted as follows: the smaller values of Dl (aμ, aν) and Dm (aμ, aν) indicate a preference of the pair (aμ, aν) for a contact (e.g., pairs between hydrophobic amino acids, and pairs of Cys with aromatic amino acids), and the larger values of these quantities indicate a preference for distant mutual location (e.g., pairs between strong hydrophilic amino acids); the smaller values of σl (aμ, aν) and σm (aμ, aν) indicate a strong preference for either contact or noncontact (e.g., pairs between hydrophobic amino acids, and pairs between strong hydrophobic and hydrophilic amino acids, respectively), and the larger values of these quantities indicate the ambivalent/neutral nature of the preference for contact and noncontact (e.g., pairs containing Ser or Thr); the smaller values of Al (aμ, aν) and Am (aμ, aν) indicate that the distance of an (aμ, aν) pair is determined independently of the amino acid character of the neighboring residues along the chain (e.g., some pairs of Cys or Met with other amino acids) and the larger values of these quantities indicare that such amino acid character contributes strongly to the determination of the distance (e.g., pairs containing Ser or Thr, and pairs between amino acids with small side chains). The difference between the statistics for the long- and medium-range distances is also discussed; the former reflect the difference between the hydrophobic and hydrophilic character of the residues, but the latter cannot be easily interpretable only in terms of hydrophobicity and hydrophilicity. The data analyzed here are used in the optimization of an object function to compute protein conformation in a subsequent paper.

KW - amino acid pairs, distances within

KW - amino acid pairs, nature of

KW - amino acids, nature of

KW - long-range interactions

KW - medium-range interactions

KW - protein conformation, conformational parameters of

KW - short-range interactions

UR - http://www.scopus.com/inward/record.url?scp=0006947818&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0006947818&partnerID=8YFLogxK

U2 - 10.1007/BF01025549

DO - 10.1007/BF01025549

M3 - Article

AN - SCOPUS:0006947818

VL - 1

SP - 5

EP - 45

JO - Protein Journal

JF - Protein Journal

SN - 1572-3887

IS - 1

ER -