TY - JOUR
T1 - PBSIM
T2 - PacBio reads simulator - Toward accurate genome assembly
AU - Ono, Yukiteru
AU - Asai, Kiyoshi
AU - Hamada, Michiaki
N1 - Funding Information:
Funding: MEXT KAKENHI [Grant-in-Aid for Young Scientists (A): 24680031 to M.H., in part]; Grant-in-Aid for Scientific Research on Innovative Areas (to M.H. and K.A., in part).
PY - 2013/1
Y1 - 2013/1
N2 - Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.
AB - Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.
UR - http://www.scopus.com/inward/record.url?scp=84871779381&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84871779381&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bts649
DO - 10.1093/bioinformatics/bts649
M3 - Article
C2 - 23129296
AN - SCOPUS:84871779381
VL - 29
SP - 119
EP - 121
JO - Bioinformatics
JF - Bioinformatics
SN - 1367-4803
IS - 1
ER -