### Abstract

A simple method is presented to assess the information that is provided by distance constraints for pairs of residues in proteins. The probability that the distance d_{ij} between the C^{α} atoms of residues i and j lies within a given range is computed for all N(N - 1)/2 pairs in a molecule of N residues, and a quantity H is defined in terms of these probabilities; H is a measure of the ambiguity in the computed conformation of the molecule (consistent with the given distance constraints) and is related to the root-mean-square deviation of the computed conformation from the native one. The quantity H is used to determine the number, kind, and quality of the distance constraints required to define the conformation of a protein within given limits of error, using the 58-residue molecule bovine pancreatic trypsin inhibitor as an illustration. For example, to obtain the computed conformation with a root-mean-square deviation of less than 2 Å from the native conformation, the values of d_{ij} of more than ∼80 pairs (half of them with 5 ≤ |i - j| ≤ 20 and the other half with 21 ≤ |i - j| ≤ 57) must be known exactly, or of more than ∼150 pairs (half of them with 5 ≤ |i - j| ≤ 20 and the other half with 21 ≤ |i - j| ≤ 57) must be known with an error no greater than ∼2 Å; alternatively, the same root-mean-square deviation of less than 2 Å from the native structure can be achieved by the computed conformation if more than ∼160 pairs are chosen so that 20 Å is assigned as the lower limit for half of these d_{ij}'s (for those pairs in the native protein that are separated by ≥20 Å) and 10 Å is assigned as the upper limit for the other half of these d_{ij}'s (for those pairs in the native protein that are separated by ≤10 Å). In all of the above examples, all values of d_{i,i+1} were fixed at 3.8 Å, and all values of d_{i,i+2} were confined to the range 4.5-7.2 Å (the minimum and maximum possible values for a polypeptide chain). We also examined the kind of constraints (in terms of their distance both along the chain and through space) that are most effective to obtain a small root-mean-square deviation. For a given number of constraints, information about pairs with large |i - j| or small d_{ij} is more effective in determining the conformation than is information about pairs with small |i - j| or large d_{ij}. It is found, however, that information that includes both small and large |i - j| or both small and large dy is the most effective.

Original language | English |
---|---|

Pages (from-to) | 961-969 |

Number of pages | 9 |

Journal | Macromolecules |

Volume | 14 |

Issue number | 4 |

Publication status | Published - 1981 |

Externally published | Yes |

### Fingerprint

### ASJC Scopus subject areas

- Materials Chemistry

### Cite this

*Macromolecules*,

*14*(4), 961-969.

**On the use of distance constraints to fold a protein.** / Wako, Hiroshi; Scheraga, Harold A.

Research output: Contribution to journal › Article

*Macromolecules*, vol. 14, no. 4, pp. 961-969.

}

TY - JOUR

T1 - On the use of distance constraints to fold a protein

AU - Wako, Hiroshi

AU - Scheraga, Harold A.

PY - 1981

Y1 - 1981

N2 - A simple method is presented to assess the information that is provided by distance constraints for pairs of residues in proteins. The probability that the distance dij between the Cα atoms of residues i and j lies within a given range is computed for all N(N - 1)/2 pairs in a molecule of N residues, and a quantity H is defined in terms of these probabilities; H is a measure of the ambiguity in the computed conformation of the molecule (consistent with the given distance constraints) and is related to the root-mean-square deviation of the computed conformation from the native one. The quantity H is used to determine the number, kind, and quality of the distance constraints required to define the conformation of a protein within given limits of error, using the 58-residue molecule bovine pancreatic trypsin inhibitor as an illustration. For example, to obtain the computed conformation with a root-mean-square deviation of less than 2 Å from the native conformation, the values of dij of more than ∼80 pairs (half of them with 5 ≤ |i - j| ≤ 20 and the other half with 21 ≤ |i - j| ≤ 57) must be known exactly, or of more than ∼150 pairs (half of them with 5 ≤ |i - j| ≤ 20 and the other half with 21 ≤ |i - j| ≤ 57) must be known with an error no greater than ∼2 Å; alternatively, the same root-mean-square deviation of less than 2 Å from the native structure can be achieved by the computed conformation if more than ∼160 pairs are chosen so that 20 Å is assigned as the lower limit for half of these dij's (for those pairs in the native protein that are separated by ≥20 Å) and 10 Å is assigned as the upper limit for the other half of these dij's (for those pairs in the native protein that are separated by ≤10 Å). In all of the above examples, all values of di,i+1 were fixed at 3.8 Å, and all values of di,i+2 were confined to the range 4.5-7.2 Å (the minimum and maximum possible values for a polypeptide chain). We also examined the kind of constraints (in terms of their distance both along the chain and through space) that are most effective to obtain a small root-mean-square deviation. For a given number of constraints, information about pairs with large |i - j| or small dij is more effective in determining the conformation than is information about pairs with small |i - j| or large dij. It is found, however, that information that includes both small and large |i - j| or both small and large dy is the most effective.

AB - A simple method is presented to assess the information that is provided by distance constraints for pairs of residues in proteins. The probability that the distance dij between the Cα atoms of residues i and j lies within a given range is computed for all N(N - 1)/2 pairs in a molecule of N residues, and a quantity H is defined in terms of these probabilities; H is a measure of the ambiguity in the computed conformation of the molecule (consistent with the given distance constraints) and is related to the root-mean-square deviation of the computed conformation from the native one. The quantity H is used to determine the number, kind, and quality of the distance constraints required to define the conformation of a protein within given limits of error, using the 58-residue molecule bovine pancreatic trypsin inhibitor as an illustration. For example, to obtain the computed conformation with a root-mean-square deviation of less than 2 Å from the native conformation, the values of dij of more than ∼80 pairs (half of them with 5 ≤ |i - j| ≤ 20 and the other half with 21 ≤ |i - j| ≤ 57) must be known exactly, or of more than ∼150 pairs (half of them with 5 ≤ |i - j| ≤ 20 and the other half with 21 ≤ |i - j| ≤ 57) must be known with an error no greater than ∼2 Å; alternatively, the same root-mean-square deviation of less than 2 Å from the native structure can be achieved by the computed conformation if more than ∼160 pairs are chosen so that 20 Å is assigned as the lower limit for half of these dij's (for those pairs in the native protein that are separated by ≥20 Å) and 10 Å is assigned as the upper limit for the other half of these dij's (for those pairs in the native protein that are separated by ≤10 Å). In all of the above examples, all values of di,i+1 were fixed at 3.8 Å, and all values of di,i+2 were confined to the range 4.5-7.2 Å (the minimum and maximum possible values for a polypeptide chain). We also examined the kind of constraints (in terms of their distance both along the chain and through space) that are most effective to obtain a small root-mean-square deviation. For a given number of constraints, information about pairs with large |i - j| or small dij is more effective in determining the conformation than is information about pairs with small |i - j| or large dij. It is found, however, that information that includes both small and large |i - j| or both small and large dy is the most effective.

UR - http://www.scopus.com/inward/record.url?scp=0013203662&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0013203662&partnerID=8YFLogxK

M3 - Article

VL - 14

SP - 961

EP - 969

JO - Macromolecules

JF - Macromolecules

SN - 0024-9297

IS - 4

ER -