Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility

Atsushi Hijikata, Kei Yura, Tosiyuki Noguti, Mitiko Go

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

In comparative modeling, the quality of amino acid sequence alignment still constitutes a major bottleneck in the generation of high quality models of protein three-dimensional (3D) structures. Substantial efforts have been made to improve alignment quality by revising the substitution matrix, introducing multiple sequences, replacing dynamic programming with hidden Markov models, and incorporating 3D structure information. Improvements in the gap penalty have not been a major focus, however, following the development of the affine gap penalty and of the secondary structure dependent gap penalty. We revisited the correlation between protein 3D structure and gap location in a large protein 3D structure data set, and found that the frequency of gap locations approximated to an exponential function of the solvent accessibility of the inserted residues. The nonlinearity of the gap frequency as a function of accessibility corresponded well to the relationship between residue mutation pattern and residue accessibility. By introducing this relationship into the gap penalty calculation for pairwise alignment between template and target amino acid sequences, we were able to obtain a sequence alignment much closer to the structural alignment. The quality of the alignments was substantially improved on a pair of sequences with identity in the "twilight zone" between 20 and 40%. The relocation of gaps by our new method made a significant improvement in comparative modeling, exemplified here by the Bacillus subtilis yitF protein. The method was implemented in a computer program, ALAdeGAP (ALignment with Accessibility dependent GAp Penalty), which is available at.

Original languageEnglish
Pages (from-to)1868-1877
Number of pages10
JournalProteins: Structure, Function and Bioinformatics
Volume79
Issue number6
DOIs
Publication statusPublished - 2011 Jun
Externally publishedYes

Fingerprint

Sequence Alignment
Amino Acid Sequence
Amino Acids
Proteins
Bacillus subtilis
Software
Relocation
Exponential functions
Mutation
Bacilli
Hidden Markov models
Dynamic programming
Computer program listings
Substitution reactions

Keywords

  • ALAdeGAP
  • Amino acid sequence alignment
  • Comparative modeling
  • Position dependent gap penalty
  • Solvent accessibility

ASJC Scopus subject areas

  • Biochemistry
  • Structural Biology
  • Molecular Biology

Cite this

Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility. / Hijikata, Atsushi; Yura, Kei; Noguti, Tosiyuki; Go, Mitiko.

In: Proteins: Structure, Function and Bioinformatics, Vol. 79, No. 6, 06.2011, p. 1868-1877.

Research output: Contribution to journalArticle

@article{dabdcd7e4e744b34a5571f01743d87f3,
title = "Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility",
abstract = "In comparative modeling, the quality of amino acid sequence alignment still constitutes a major bottleneck in the generation of high quality models of protein three-dimensional (3D) structures. Substantial efforts have been made to improve alignment quality by revising the substitution matrix, introducing multiple sequences, replacing dynamic programming with hidden Markov models, and incorporating 3D structure information. Improvements in the gap penalty have not been a major focus, however, following the development of the affine gap penalty and of the secondary structure dependent gap penalty. We revisited the correlation between protein 3D structure and gap location in a large protein 3D structure data set, and found that the frequency of gap locations approximated to an exponential function of the solvent accessibility of the inserted residues. The nonlinearity of the gap frequency as a function of accessibility corresponded well to the relationship between residue mutation pattern and residue accessibility. By introducing this relationship into the gap penalty calculation for pairwise alignment between template and target amino acid sequences, we were able to obtain a sequence alignment much closer to the structural alignment. The quality of the alignments was substantially improved on a pair of sequences with identity in the {"}twilight zone{"} between 20 and 40{\%}. The relocation of gaps by our new method made a significant improvement in comparative modeling, exemplified here by the Bacillus subtilis yitF protein. The method was implemented in a computer program, ALAdeGAP (ALignment with Accessibility dependent GAp Penalty), which is available at.",
keywords = "ALAdeGAP, Amino acid sequence alignment, Comparative modeling, Position dependent gap penalty, Solvent accessibility",
author = "Atsushi Hijikata and Kei Yura and Tosiyuki Noguti and Mitiko Go",
year = "2011",
month = "6",
doi = "10.1002/prot.23011",
language = "English",
volume = "79",
pages = "1868--1877",
journal = "Proteins: Structure, Function and Genetics",
issn = "0887-3585",
publisher = "Wiley-Liss Inc.",
number = "6",

}

TY - JOUR

T1 - Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility

AU - Hijikata, Atsushi

AU - Yura, Kei

AU - Noguti, Tosiyuki

AU - Go, Mitiko

PY - 2011/6

Y1 - 2011/6

N2 - In comparative modeling, the quality of amino acid sequence alignment still constitutes a major bottleneck in the generation of high quality models of protein three-dimensional (3D) structures. Substantial efforts have been made to improve alignment quality by revising the substitution matrix, introducing multiple sequences, replacing dynamic programming with hidden Markov models, and incorporating 3D structure information. Improvements in the gap penalty have not been a major focus, however, following the development of the affine gap penalty and of the secondary structure dependent gap penalty. We revisited the correlation between protein 3D structure and gap location in a large protein 3D structure data set, and found that the frequency of gap locations approximated to an exponential function of the solvent accessibility of the inserted residues. The nonlinearity of the gap frequency as a function of accessibility corresponded well to the relationship between residue mutation pattern and residue accessibility. By introducing this relationship into the gap penalty calculation for pairwise alignment between template and target amino acid sequences, we were able to obtain a sequence alignment much closer to the structural alignment. The quality of the alignments was substantially improved on a pair of sequences with identity in the "twilight zone" between 20 and 40%. The relocation of gaps by our new method made a significant improvement in comparative modeling, exemplified here by the Bacillus subtilis yitF protein. The method was implemented in a computer program, ALAdeGAP (ALignment with Accessibility dependent GAp Penalty), which is available at.

AB - In comparative modeling, the quality of amino acid sequence alignment still constitutes a major bottleneck in the generation of high quality models of protein three-dimensional (3D) structures. Substantial efforts have been made to improve alignment quality by revising the substitution matrix, introducing multiple sequences, replacing dynamic programming with hidden Markov models, and incorporating 3D structure information. Improvements in the gap penalty have not been a major focus, however, following the development of the affine gap penalty and of the secondary structure dependent gap penalty. We revisited the correlation between protein 3D structure and gap location in a large protein 3D structure data set, and found that the frequency of gap locations approximated to an exponential function of the solvent accessibility of the inserted residues. The nonlinearity of the gap frequency as a function of accessibility corresponded well to the relationship between residue mutation pattern and residue accessibility. By introducing this relationship into the gap penalty calculation for pairwise alignment between template and target amino acid sequences, we were able to obtain a sequence alignment much closer to the structural alignment. The quality of the alignments was substantially improved on a pair of sequences with identity in the "twilight zone" between 20 and 40%. The relocation of gaps by our new method made a significant improvement in comparative modeling, exemplified here by the Bacillus subtilis yitF protein. The method was implemented in a computer program, ALAdeGAP (ALignment with Accessibility dependent GAp Penalty), which is available at.

KW - ALAdeGAP

KW - Amino acid sequence alignment

KW - Comparative modeling

KW - Position dependent gap penalty

KW - Solvent accessibility

UR - http://www.scopus.com/inward/record.url?scp=79955705563&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955705563&partnerID=8YFLogxK

U2 - 10.1002/prot.23011

DO - 10.1002/prot.23011

M3 - Article

C2 - 21465562

AN - SCOPUS:79955705563

VL - 79

SP - 1868

EP - 1877

JO - Proteins: Structure, Function and Genetics

JF - Proteins: Structure, Function and Genetics

SN - 0887-3585

IS - 6

ER -