A simple method is presented to assess the information that is provided by distance constraints for pairs of residues in proteins. The probability that the distance dy between the Cα atoms of residues i and j lies within a given range is computed for all N(N-1)/2 pairs in a molecule of N residues, and a quantity H is defined in terms of these probabilities; H is a measure of the ambiguity in the computed conformation of the molecule (consistent with the given distance constraints) and is related to the root-mean-square deviation of the computed conformation from the native one. The quantity H is used to determine the number, kind, and quality of the distance constraints required to define the conformation of a protein within given limits of error, using the 58-residue molecule bovine pancreatic trypsin inhibitor as an illustration. For example, to obtain the computed conformation with a root-mean-square deviation of less than 2 Å from the native conformation, the values of dy of more than ~80 pairs (half of them with 5 ≤ |i - j| ≤ 20 and the other half with 21 ≤ |i-j| ≤ 57) must be known exactly, or of more than ~150 pairs (half of them with 5 ≤ |i-j| ≤ 20 and the other half with 21 ≤ |i-j| ≤ 57) must be known with an error no greater than -2 Å; alternatively, the same root-mean-square deviation of less than 2 Å from the native structure can be achieved by the computed conformation if more than ~160 pairs are chosen so that 20 Å is assigned as the lower limit for half of these dij's (for those pairs in the native protein that are separated by ≥20 Å) and 10 Å is assigned as the upper limit for the other half of these dij's (for those pairs in the native protein that are separated by ≤10 Å). In all of the above examples, all values of difi+1 were fixed at 3.8 Å, and all values of dij+2 were confined to the range 4.5-7.2 Å (the minimum and maximum possible values for a polypeptide chain). We also examined the kind of constraints (in terms of their distance both along the chain and through space) that are most effective to obtain a small root-mean-square deviation. For a given number of constraints, information about pairs with large |i-j| or small dy is more effective in determining the conformation than is information about pairs with small |i-j| or large dy. It is found, however, that information that includes both small and large |i-j| or both small and large dij is the most effective.
ASJC Scopus subject areas