TY - JOUR
T1 - Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing
AU - Nagasaki, Masao
AU - Kuroki, Yoko
AU - Shibata, Tomoko F.
AU - Katsuoka, Fumiki
AU - Mimori, Takahiro
AU - Kawai, Yosuke
AU - Minegishi, Naoko
AU - Hozawa, Atsushi
AU - Kuriyama, Shinichi
AU - Suzuki, Yoichi
AU - Kawame, Hiroshi
AU - Nagami, Fuji
AU - Takai-Igarashi, Takako
AU - Ogishima, Soichi
AU - Kojima, Kaname
AU - Misawa, Kazuharu
AU - Tanabe, Osamu
AU - Fuse, Nobuo
AU - Tanaka, Hiroshi
AU - Yaegashi, Nobuo
AU - Kinoshita, Kengo
AU - Kure, Shiego
AU - Yasuda, Jun
AU - Yamamoto, Masayuki
N1 - Publisher Copyright:
© 2019, The Author(s).
PY - 2019/12/1
Y1 - 2019/12/1
N2 - In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100 bps to ~10,000 bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With this result, we constructed JRGv1 (Japanese Reference Genome version 1) by integrating the 903 verified insertions, totaling 1,086,173 bases, shared by at least two Japanese individuals into GRCh38. We also constructed decoyJRGv1 by concatenating 3559 verified insertions, totaling 2,536,870 bases, shared by at least two Japanese individuals or by six other assemblies. This assembly improved the alignment ratio by 0.4% on average. These results demonstrate the importance of refining the reference assembly and creating a population-specific reference genome. JRGv1 and decoyJRGv1 are available at the JRG website.
AB - In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100 bps to ~10,000 bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With this result, we constructed JRGv1 (Japanese Reference Genome version 1) by integrating the 903 verified insertions, totaling 1,086,173 bases, shared by at least two Japanese individuals into GRCh38. We also constructed decoyJRGv1 by concatenating 3559 verified insertions, totaling 2,536,870 bases, shared by at least two Japanese individuals or by six other assemblies. This assembly improved the alignment ratio by 0.4% on average. These results demonstrate the importance of refining the reference assembly and creating a population-specific reference genome. JRGv1 and decoyJRGv1 are available at the JRG website.
UR - http://www.scopus.com/inward/record.url?scp=85069295139&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069295139&partnerID=8YFLogxK
U2 - 10.1038/s41439-019-0057-7
DO - 10.1038/s41439-019-0057-7
M3 - Article
AN - SCOPUS:85069295139
SN - 2054-345X
VL - 6
JO - Human Genome Variation
JF - Human Genome Variation
IS - 1
M1 - 27
ER -