Integrative annotation of 21,037 human genes validated by full-length cDNA clones

Tadashi Imanishi, Takeshi Itoh, Yutaka Suzuki, Claire O'Donovan, Satoshi Fukuchi, Kanako O. Koyanagi, Roberto A. Barrero, Takuro Tamura, Yumi Yamaguchi-Kabata, Motohiko Tanino, Kei Yura, Satoru Miyazaki, Kazuho Ikeo, Keiichi Homma, Arek Kasprzyk, Tetsuo Nishikawa, Mika Hirakawa, Jean Thierry-Mieg, Danielle Thierry-Mieg, Jennifer AshurstLibin Jia, Mitsuteru Nakao, Michael A. Thomas, Nicola Mulder, Youla Karavidopoulou, Lihua Jin, Sangsoo Kim, Tomohiro Yasuda, Boris Lenhard, Eric Eveno, Yoshiyuki Suzuki, Chisato Yamasaki, Jun Ichi Takeda, Craig Gough, Phillip Hilton, Yasuyuki Fujii, Hiroaki Sakai, Susumu Tanaka, Clara Amid, Matthew Bellgard, Maria de Fatima Bonaldo, Hidemasa Bono, Susan K. Bromberg, Anthony J. Brookes, Elspeth Bruford, Piero Carninci, Claude Chelala, Christine Couillault, Sandro J. de Souza, Marie Anne Debily, Marie Dominique Devignes, Inna Dubchak, Toshinori Endo, Anne Estreicher, Eduardo Eyras, Kaoru Fukami-Kobayashi, Gopal R. Gopinath, Esther Graudens, Yoonsoo Hahn, Michael Han, Ze Guang Han, Kousuke Hanada, Hideki Hanaoka, Erimi Harada, Katsuyuki Hashimoto, Ursula Hinz, Momoki Hirai, Teruyoshi Hishiki, Ian Hopkinson, Sandrine Imbeaud, Hidetoshi Inoko, Alexander Kanapin, Yayoi Kaneko, Takeya Kasukawa, Janet Kelso, Paul Kersey, Reiko Kikuno, Kouichi Kimura, Bernhard Korn, Vladimir Kuryshev, Izabela Makalowska, Takashi Makino, Shuhei Mano, Regine Mariage-Samson, Jun Mashima, Hideo Matsuda, Hans Werner Mewes, Shinsei Minoshima, Keiichi Nagai, Hideki Nagasaki, Naoki Nagata, Rajni Nigam, Osamu Ogasawara, Osamu Ohara, Masafumi Ohtsubo, Norihiro Okada, Toshihisa Okido, Satoshi Oota, Motonori Ota, Toshio Ota, Tetsuji Otsuki, Dominique Piatier-Tonneau, Annemarie Poustka, Shuang Xi Ren, Naruya Saitou, Katsunaga Sakai, Shigetaka Sakamoto, Ryuichi Sakate, Ingo Schupp, Florence Servant, Stephen Sherry, Rie Shiba, Nobuyoshi Shimizu, Mary Shimoyama, Andrew J. Simpson, Bento Soares, Charles Steward, Makiko Suwa, Mami Suzuki, Aiko Takahashi, Gen Tamiya, Hiroshi Tanaka, Todd Taylor, Joseph D. Terwilliger, Per Unneberg, Vamsi Veeramachaneni, Shinya Watanabe, Laurens Wilming, Norikazu Yasuda, Sook Hyang-Yoo, Marvin Stodolsky, Wojciech Makalowski, Mitiko Go, Kenta Nakai, Toshihisa Takagi, Minoru Kanehisa, Yoshiyuki Sakaki, John Quackenbush, Yasushi Okazaki, Yoshihide Hayashizaki, Winston Hide, Ranajit Chakraborty, Ken Nishikawa, Hideaki Sugawara, Yoshio Tateno, Zhu Chen, Michio Oishi, Peter Tonellato, Rolf Apweiler, Kousaku Okubo, Lukas Wagner, Stefan Wiemann, Robert L. Strausberg, Takao Isogai, Charles Auffray, Nobuo Nomura, Takashi Gojobori, Sumio Sugano

Research output: Contribution to journalArticle

262 Citations (Scopus)

Abstract

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.

Original languageEnglish
JournalPLoS Biology
Volume2
Issue number6
DOIs
Publication statusPublished - 2004
Externally publishedYes

Fingerprint

Complementary DNA
Clone Cells
Genes
clones
genes
Single Nucleotide Polymorphism
single nucleotide polymorphism
Human Genome
Polymorphism
Untranslated RNA
Nucleotides
Microsatellite Repeats
genome
Databases
Molecular Sequence Annotation
microsatellite repeats
National Center for Biotechnology Information
Information Centers
Proteins
Biological Sciences

ASJC Scopus subject areas

  • Neuroscience(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Imanishi, T., Itoh, T., Suzuki, Y., O'Donovan, C., Fukuchi, S., Koyanagi, K. O., ... Sugano, S. (2004). Integrative annotation of 21,037 human genes validated by full-length cDNA clones. PLoS Biology, 2(6). https://doi.org/10.1371/journal.pbio.0020162

Integrative annotation of 21,037 human genes validated by full-length cDNA clones. / Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O.; Barrero, Roberto A.; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A.; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun Ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; de Fatima Bonaldo, Maria; Bono, Hidemasa; Bromberg, Susan K.; Brookes, Anthony J.; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie Anne; Devignes, Marie Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; Gopinath, Gopal R.; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J.; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D.; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Hyang-Yoo, Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L.; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Gojobori, Takashi; Sugano, Sumio.

In: PLoS Biology, Vol. 2, No. 6, 2004.

Research output: Contribution to journalArticle

Imanishi, T, Itoh, T, Suzuki, Y, O'Donovan, C, Fukuchi, S, Koyanagi, KO, Barrero, RA, Tamura, T, Yamaguchi-Kabata, Y, Tanino, M, Yura, K, Miyazaki, S, Ikeo, K, Homma, K, Kasprzyk, A, Nishikawa, T, Hirakawa, M, Thierry-Mieg, J, Thierry-Mieg, D, Ashurst, J, Jia, L, Nakao, M, Thomas, MA, Mulder, N, Karavidopoulou, Y, Jin, L, Kim, S, Yasuda, T, Lenhard, B, Eveno, E, Suzuki, Y, Yamasaki, C, Takeda, JI, Gough, C, Hilton, P, Fujii, Y, Sakai, H, Tanaka, S, Amid, C, Bellgard, M, de Fatima Bonaldo, M, Bono, H, Bromberg, SK, Brookes, AJ, Bruford, E, Carninci, P, Chelala, C, Couillault, C, de Souza, SJ, Debily, MA, Devignes, MD, Dubchak, I, Endo, T, Estreicher, A, Eyras, E, Fukami-Kobayashi, K, Gopinath, GR, Graudens, E, Hahn, Y, Han, M, Han, ZG, Hanada, K, Hanaoka, H, Harada, E, Hashimoto, K, Hinz, U, Hirai, M, Hishiki, T, Hopkinson, I, Imbeaud, S, Inoko, H, Kanapin, A, Kaneko, Y, Kasukawa, T, Kelso, J, Kersey, P, Kikuno, R, Kimura, K, Korn, B, Kuryshev, V, Makalowska, I, Makino, T, Mano, S, Mariage-Samson, R, Mashima, J, Matsuda, H, Mewes, HW, Minoshima, S, Nagai, K, Nagasaki, H, Nagata, N, Nigam, R, Ogasawara, O, Ohara, O, Ohtsubo, M, Okada, N, Okido, T, Oota, S, Ota, M, Ota, T, Otsuki, T, Piatier-Tonneau, D, Poustka, A, Ren, SX, Saitou, N, Sakai, K, Sakamoto, S, Sakate, R, Schupp, I, Servant, F, Sherry, S, Shiba, R, Shimizu, N, Shimoyama, M, Simpson, AJ, Soares, B, Steward, C, Suwa, M, Suzuki, M, Takahashi, A, Tamiya, G, Tanaka, H, Taylor, T, Terwilliger, JD, Unneberg, P, Veeramachaneni, V, Watanabe, S, Wilming, L, Yasuda, N, Hyang-Yoo, S, Stodolsky, M, Makalowski, W, Go, M, Nakai, K, Takagi, T, Kanehisa, M, Sakaki, Y, Quackenbush, J, Okazaki, Y, Hayashizaki, Y, Hide, W, Chakraborty, R, Nishikawa, K, Sugawara, H, Tateno, Y, Chen, Z, Oishi, M, Tonellato, P, Apweiler, R, Okubo, K, Wagner, L, Wiemann, S, Strausberg, RL, Isogai, T, Auffray, C, Nomura, N, Gojobori, T & Sugano, S 2004, 'Integrative annotation of 21,037 human genes validated by full-length cDNA clones', PLoS Biology, vol. 2, no. 6. https://doi.org/10.1371/journal.pbio.0020162
Imanishi, Tadashi ; Itoh, Takeshi ; Suzuki, Yutaka ; O'Donovan, Claire ; Fukuchi, Satoshi ; Koyanagi, Kanako O. ; Barrero, Roberto A. ; Tamura, Takuro ; Yamaguchi-Kabata, Yumi ; Tanino, Motohiko ; Yura, Kei ; Miyazaki, Satoru ; Ikeo, Kazuho ; Homma, Keiichi ; Kasprzyk, Arek ; Nishikawa, Tetsuo ; Hirakawa, Mika ; Thierry-Mieg, Jean ; Thierry-Mieg, Danielle ; Ashurst, Jennifer ; Jia, Libin ; Nakao, Mitsuteru ; Thomas, Michael A. ; Mulder, Nicola ; Karavidopoulou, Youla ; Jin, Lihua ; Kim, Sangsoo ; Yasuda, Tomohiro ; Lenhard, Boris ; Eveno, Eric ; Suzuki, Yoshiyuki ; Yamasaki, Chisato ; Takeda, Jun Ichi ; Gough, Craig ; Hilton, Phillip ; Fujii, Yasuyuki ; Sakai, Hiroaki ; Tanaka, Susumu ; Amid, Clara ; Bellgard, Matthew ; de Fatima Bonaldo, Maria ; Bono, Hidemasa ; Bromberg, Susan K. ; Brookes, Anthony J. ; Bruford, Elspeth ; Carninci, Piero ; Chelala, Claude ; Couillault, Christine ; de Souza, Sandro J. ; Debily, Marie Anne ; Devignes, Marie Dominique ; Dubchak, Inna ; Endo, Toshinori ; Estreicher, Anne ; Eyras, Eduardo ; Fukami-Kobayashi, Kaoru ; Gopinath, Gopal R. ; Graudens, Esther ; Hahn, Yoonsoo ; Han, Michael ; Han, Ze Guang ; Hanada, Kousuke ; Hanaoka, Hideki ; Harada, Erimi ; Hashimoto, Katsuyuki ; Hinz, Ursula ; Hirai, Momoki ; Hishiki, Teruyoshi ; Hopkinson, Ian ; Imbeaud, Sandrine ; Inoko, Hidetoshi ; Kanapin, Alexander ; Kaneko, Yayoi ; Kasukawa, Takeya ; Kelso, Janet ; Kersey, Paul ; Kikuno, Reiko ; Kimura, Kouichi ; Korn, Bernhard ; Kuryshev, Vladimir ; Makalowska, Izabela ; Makino, Takashi ; Mano, Shuhei ; Mariage-Samson, Regine ; Mashima, Jun ; Matsuda, Hideo ; Mewes, Hans Werner ; Minoshima, Shinsei ; Nagai, Keiichi ; Nagasaki, Hideki ; Nagata, Naoki ; Nigam, Rajni ; Ogasawara, Osamu ; Ohara, Osamu ; Ohtsubo, Masafumi ; Okada, Norihiro ; Okido, Toshihisa ; Oota, Satoshi ; Ota, Motonori ; Ota, Toshio ; Otsuki, Tetsuji ; Piatier-Tonneau, Dominique ; Poustka, Annemarie ; Ren, Shuang Xi ; Saitou, Naruya ; Sakai, Katsunaga ; Sakamoto, Shigetaka ; Sakate, Ryuichi ; Schupp, Ingo ; Servant, Florence ; Sherry, Stephen ; Shiba, Rie ; Shimizu, Nobuyoshi ; Shimoyama, Mary ; Simpson, Andrew J. ; Soares, Bento ; Steward, Charles ; Suwa, Makiko ; Suzuki, Mami ; Takahashi, Aiko ; Tamiya, Gen ; Tanaka, Hiroshi ; Taylor, Todd ; Terwilliger, Joseph D. ; Unneberg, Per ; Veeramachaneni, Vamsi ; Watanabe, Shinya ; Wilming, Laurens ; Yasuda, Norikazu ; Hyang-Yoo, Sook ; Stodolsky, Marvin ; Makalowski, Wojciech ; Go, Mitiko ; Nakai, Kenta ; Takagi, Toshihisa ; Kanehisa, Minoru ; Sakaki, Yoshiyuki ; Quackenbush, John ; Okazaki, Yasushi ; Hayashizaki, Yoshihide ; Hide, Winston ; Chakraborty, Ranajit ; Nishikawa, Ken ; Sugawara, Hideaki ; Tateno, Yoshio ; Chen, Zhu ; Oishi, Michio ; Tonellato, Peter ; Apweiler, Rolf ; Okubo, Kousaku ; Wagner, Lukas ; Wiemann, Stefan ; Strausberg, Robert L. ; Isogai, Takao ; Auffray, Charles ; Nomura, Nobuo ; Gojobori, Takashi ; Sugano, Sumio. / Integrative annotation of 21,037 human genes validated by full-length cDNA clones. In: PLoS Biology. 2004 ; Vol. 2, No. 6.
@article{153ae3ff7b494c3fbd2609bbd656edd1,
title = "Integrative annotation of 21,037 human genes validated by full-length cDNA clones",
abstract = "The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4{\%} of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5{\%} of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.",
author = "Tadashi Imanishi and Takeshi Itoh and Yutaka Suzuki and Claire O'Donovan and Satoshi Fukuchi and Koyanagi, {Kanako O.} and Barrero, {Roberto A.} and Takuro Tamura and Yumi Yamaguchi-Kabata and Motohiko Tanino and Kei Yura and Satoru Miyazaki and Kazuho Ikeo and Keiichi Homma and Arek Kasprzyk and Tetsuo Nishikawa and Mika Hirakawa and Jean Thierry-Mieg and Danielle Thierry-Mieg and Jennifer Ashurst and Libin Jia and Mitsuteru Nakao and Thomas, {Michael A.} and Nicola Mulder and Youla Karavidopoulou and Lihua Jin and Sangsoo Kim and Tomohiro Yasuda and Boris Lenhard and Eric Eveno and Yoshiyuki Suzuki and Chisato Yamasaki and Takeda, {Jun Ichi} and Craig Gough and Phillip Hilton and Yasuyuki Fujii and Hiroaki Sakai and Susumu Tanaka and Clara Amid and Matthew Bellgard and {de Fatima Bonaldo}, Maria and Hidemasa Bono and Bromberg, {Susan K.} and Brookes, {Anthony J.} and Elspeth Bruford and Piero Carninci and Claude Chelala and Christine Couillault and {de Souza}, {Sandro J.} and Debily, {Marie Anne} and Devignes, {Marie Dominique} and Inna Dubchak and Toshinori Endo and Anne Estreicher and Eduardo Eyras and Kaoru Fukami-Kobayashi and Gopinath, {Gopal R.} and Esther Graudens and Yoonsoo Hahn and Michael Han and Han, {Ze Guang} and Kousuke Hanada and Hideki Hanaoka and Erimi Harada and Katsuyuki Hashimoto and Ursula Hinz and Momoki Hirai and Teruyoshi Hishiki and Ian Hopkinson and Sandrine Imbeaud and Hidetoshi Inoko and Alexander Kanapin and Yayoi Kaneko and Takeya Kasukawa and Janet Kelso and Paul Kersey and Reiko Kikuno and Kouichi Kimura and Bernhard Korn and Vladimir Kuryshev and Izabela Makalowska and Takashi Makino and Shuhei Mano and Regine Mariage-Samson and Jun Mashima and Hideo Matsuda and Mewes, {Hans Werner} and Shinsei Minoshima and Keiichi Nagai and Hideki Nagasaki and Naoki Nagata and Rajni Nigam and Osamu Ogasawara and Osamu Ohara and Masafumi Ohtsubo and Norihiro Okada and Toshihisa Okido and Satoshi Oota and Motonori Ota and Toshio Ota and Tetsuji Otsuki and Dominique Piatier-Tonneau and Annemarie Poustka and Ren, {Shuang Xi} and Naruya Saitou and Katsunaga Sakai and Shigetaka Sakamoto and Ryuichi Sakate and Ingo Schupp and Florence Servant and Stephen Sherry and Rie Shiba and Nobuyoshi Shimizu and Mary Shimoyama and Simpson, {Andrew J.} and Bento Soares and Charles Steward and Makiko Suwa and Mami Suzuki and Aiko Takahashi and Gen Tamiya and Hiroshi Tanaka and Todd Taylor and Terwilliger, {Joseph D.} and Per Unneberg and Vamsi Veeramachaneni and Shinya Watanabe and Laurens Wilming and Norikazu Yasuda and Sook Hyang-Yoo and Marvin Stodolsky and Wojciech Makalowski and Mitiko Go and Kenta Nakai and Toshihisa Takagi and Minoru Kanehisa and Yoshiyuki Sakaki and John Quackenbush and Yasushi Okazaki and Yoshihide Hayashizaki and Winston Hide and Ranajit Chakraborty and Ken Nishikawa and Hideaki Sugawara and Yoshio Tateno and Zhu Chen and Michio Oishi and Peter Tonellato and Rolf Apweiler and Kousaku Okubo and Lukas Wagner and Stefan Wiemann and Strausberg, {Robert L.} and Takao Isogai and Charles Auffray and Nobuo Nomura and Takashi Gojobori and Sumio Sugano",
year = "2004",
doi = "10.1371/journal.pbio.0020162",
language = "English",
volume = "2",
journal = "PLoS Biology",
issn = "1544-9173",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Integrative annotation of 21,037 human genes validated by full-length cDNA clones

AU - Imanishi, Tadashi

AU - Itoh, Takeshi

AU - Suzuki, Yutaka

AU - O'Donovan, Claire

AU - Fukuchi, Satoshi

AU - Koyanagi, Kanako O.

AU - Barrero, Roberto A.

AU - Tamura, Takuro

AU - Yamaguchi-Kabata, Yumi

AU - Tanino, Motohiko

AU - Yura, Kei

AU - Miyazaki, Satoru

AU - Ikeo, Kazuho

AU - Homma, Keiichi

AU - Kasprzyk, Arek

AU - Nishikawa, Tetsuo

AU - Hirakawa, Mika

AU - Thierry-Mieg, Jean

AU - Thierry-Mieg, Danielle

AU - Ashurst, Jennifer

AU - Jia, Libin

AU - Nakao, Mitsuteru

AU - Thomas, Michael A.

AU - Mulder, Nicola

AU - Karavidopoulou, Youla

AU - Jin, Lihua

AU - Kim, Sangsoo

AU - Yasuda, Tomohiro

AU - Lenhard, Boris

AU - Eveno, Eric

AU - Suzuki, Yoshiyuki

AU - Yamasaki, Chisato

AU - Takeda, Jun Ichi

AU - Gough, Craig

AU - Hilton, Phillip

AU - Fujii, Yasuyuki

AU - Sakai, Hiroaki

AU - Tanaka, Susumu

AU - Amid, Clara

AU - Bellgard, Matthew

AU - de Fatima Bonaldo, Maria

AU - Bono, Hidemasa

AU - Bromberg, Susan K.

AU - Brookes, Anthony J.

AU - Bruford, Elspeth

AU - Carninci, Piero

AU - Chelala, Claude

AU - Couillault, Christine

AU - de Souza, Sandro J.

AU - Debily, Marie Anne

AU - Devignes, Marie Dominique

AU - Dubchak, Inna

AU - Endo, Toshinori

AU - Estreicher, Anne

AU - Eyras, Eduardo

AU - Fukami-Kobayashi, Kaoru

AU - Gopinath, Gopal R.

AU - Graudens, Esther

AU - Hahn, Yoonsoo

AU - Han, Michael

AU - Han, Ze Guang

AU - Hanada, Kousuke

AU - Hanaoka, Hideki

AU - Harada, Erimi

AU - Hashimoto, Katsuyuki

AU - Hinz, Ursula

AU - Hirai, Momoki

AU - Hishiki, Teruyoshi

AU - Hopkinson, Ian

AU - Imbeaud, Sandrine

AU - Inoko, Hidetoshi

AU - Kanapin, Alexander

AU - Kaneko, Yayoi

AU - Kasukawa, Takeya

AU - Kelso, Janet

AU - Kersey, Paul

AU - Kikuno, Reiko

AU - Kimura, Kouichi

AU - Korn, Bernhard

AU - Kuryshev, Vladimir

AU - Makalowska, Izabela

AU - Makino, Takashi

AU - Mano, Shuhei

AU - Mariage-Samson, Regine

AU - Mashima, Jun

AU - Matsuda, Hideo

AU - Mewes, Hans Werner

AU - Minoshima, Shinsei

AU - Nagai, Keiichi

AU - Nagasaki, Hideki

AU - Nagata, Naoki

AU - Nigam, Rajni

AU - Ogasawara, Osamu

AU - Ohara, Osamu

AU - Ohtsubo, Masafumi

AU - Okada, Norihiro

AU - Okido, Toshihisa

AU - Oota, Satoshi

AU - Ota, Motonori

AU - Ota, Toshio

AU - Otsuki, Tetsuji

AU - Piatier-Tonneau, Dominique

AU - Poustka, Annemarie

AU - Ren, Shuang Xi

AU - Saitou, Naruya

AU - Sakai, Katsunaga

AU - Sakamoto, Shigetaka

AU - Sakate, Ryuichi

AU - Schupp, Ingo

AU - Servant, Florence

AU - Sherry, Stephen

AU - Shiba, Rie

AU - Shimizu, Nobuyoshi

AU - Shimoyama, Mary

AU - Simpson, Andrew J.

AU - Soares, Bento

AU - Steward, Charles

AU - Suwa, Makiko

AU - Suzuki, Mami

AU - Takahashi, Aiko

AU - Tamiya, Gen

AU - Tanaka, Hiroshi

AU - Taylor, Todd

AU - Terwilliger, Joseph D.

AU - Unneberg, Per

AU - Veeramachaneni, Vamsi

AU - Watanabe, Shinya

AU - Wilming, Laurens

AU - Yasuda, Norikazu

AU - Hyang-Yoo, Sook

AU - Stodolsky, Marvin

AU - Makalowski, Wojciech

AU - Go, Mitiko

AU - Nakai, Kenta

AU - Takagi, Toshihisa

AU - Kanehisa, Minoru

AU - Sakaki, Yoshiyuki

AU - Quackenbush, John

AU - Okazaki, Yasushi

AU - Hayashizaki, Yoshihide

AU - Hide, Winston

AU - Chakraborty, Ranajit

AU - Nishikawa, Ken

AU - Sugawara, Hideaki

AU - Tateno, Yoshio

AU - Chen, Zhu

AU - Oishi, Michio

AU - Tonellato, Peter

AU - Apweiler, Rolf

AU - Okubo, Kousaku

AU - Wagner, Lukas

AU - Wiemann, Stefan

AU - Strausberg, Robert L.

AU - Isogai, Takao

AU - Auffray, Charles

AU - Nomura, Nobuo

AU - Gojobori, Takashi

AU - Sugano, Sumio

PY - 2004

Y1 - 2004

N2 - The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.

AB - The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.

UR - http://www.scopus.com/inward/record.url?scp=4344623260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4344623260&partnerID=8YFLogxK

U2 - 10.1371/journal.pbio.0020162

DO - 10.1371/journal.pbio.0020162

M3 - Article

C2 - 15103394

AN - SCOPUS:4344623260

VL - 2

JO - PLoS Biology

JF - PLoS Biology

SN - 1544-9173

IS - 6

ER -