Coverage of whole proteome by structural genomics observed through protein homology modeling database

Kei Yura, Akihiro Yamaguchi, Mitiko Go

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.

Original languageEnglish
Pages (from-to)65-76
Number of pages12
JournalJournal of Structural and Functional Genomics
Volume7
Issue number2
DOIs
Publication statusPublished - 2006 Jun
Externally publishedYes

Fingerprint

Proteome
Genomics
Open Reading Frames
Databases
Genes
Proteins
Genome
Archaea
Bacteriophages
Hand
Model structures
Bacteria
Eukaryota
Amino Acids
Amino Acid Sequence

Keywords

  • Domain duplication
  • Domain interactions
  • Genome
  • Homology modeling
  • P-loop
  • Structural genomics

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Genetics

Cite this

Coverage of whole proteome by structural genomics observed through protein homology modeling database. / Yura, Kei; Yamaguchi, Akihiro; Go, Mitiko.

In: Journal of Structural and Functional Genomics, Vol. 7, No. 2, 06.2006, p. 65-76.

Research output: Contribution to journalArticle

@article{8e024ce610cb42bd860a43e2254a690a,
title = "Coverage of whole proteome by structural genomics observed through protein homology modeling database",
abstract = "We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50{\%} of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60{\%} of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30{\%} in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5{\%}, and that for eubacteria by 7{\%} in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.",
keywords = "Domain duplication, Domain interactions, Genome, Homology modeling, P-loop, Structural genomics",
author = "Kei Yura and Akihiro Yamaguchi and Mitiko Go",
year = "2006",
month = "6",
doi = "10.1007/s10969-006-9010-3",
language = "English",
volume = "7",
pages = "65--76",
journal = "Journal of Structural and Functional Genomics",
issn = "1345-711X",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - Coverage of whole proteome by structural genomics observed through protein homology modeling database

AU - Yura, Kei

AU - Yamaguchi, Akihiro

AU - Go, Mitiko

PY - 2006/6

Y1 - 2006/6

N2 - We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.

AB - We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.

KW - Domain duplication

KW - Domain interactions

KW - Genome

KW - Homology modeling

KW - P-loop

KW - Structural genomics

UR - http://www.scopus.com/inward/record.url?scp=33846152637&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846152637&partnerID=8YFLogxK

U2 - 10.1007/s10969-006-9010-3

DO - 10.1007/s10969-006-9010-3

M3 - Article

VL - 7

SP - 65

EP - 76

JO - Journal of Structural and Functional Genomics

JF - Journal of Structural and Functional Genomics

SN - 1345-711X

IS - 2

ER -