PBSIM: PacBio reads simulator - Toward accurate genome assembly

Yukiteru Ono, Kiyoshi Asai, Michiaki Hamada

Research output: Contribution to journalArticle

102 Citations (Scopus)

Abstract

Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.

Original languageEnglish
Pages (from-to)119-121
Number of pages3
JournalBioinformatics
Volume29
Issue number1
DOIs
Publication statusPublished - 2013 Jan
Externally publishedYes

Fingerprint

Normal Distribution
Libraries
Genome
Simulator
Genes
Simulators
Sequencing
Error Rate
Coverage
Log Normal Distribution
Error correction
Normal distribution
Error Correction
Model-based
Sampling
Target
Datasets

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

PBSIM : PacBio reads simulator - Toward accurate genome assembly. / Ono, Yukiteru; Asai, Kiyoshi; Hamada, Michiaki.

In: Bioinformatics, Vol. 29, No. 1, 01.2013, p. 119-121.

Research output: Contribution to journalArticle

Ono, Yukiteru ; Asai, Kiyoshi ; Hamada, Michiaki. / PBSIM : PacBio reads simulator - Toward accurate genome assembly. In: Bioinformatics. 2013 ; Vol. 29, No. 1. pp. 119-121.
@article{33ba1452ac74408d902f842c3fed2493,
title = "PBSIM: PacBio reads simulator - Toward accurate genome assembly",
abstract = "Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.",
author = "Yukiteru Ono and Kiyoshi Asai and Michiaki Hamada",
year = "2013",
month = "1",
doi = "10.1093/bioinformatics/bts649",
language = "English",
volume = "29",
pages = "119--121",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "1",

}

TY - JOUR

T1 - PBSIM

T2 - PacBio reads simulator - Toward accurate genome assembly

AU - Ono, Yukiteru

AU - Asai, Kiyoshi

AU - Hamada, Michiaki

PY - 2013/1

Y1 - 2013/1

N2 - Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.

AB - Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries.Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results.

UR - http://www.scopus.com/inward/record.url?scp=84871779381&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871779381&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts649

DO - 10.1093/bioinformatics/bts649

M3 - Article

C2 - 23129296

AN - SCOPUS:84871779381

VL - 29

SP - 119

EP - 121

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 1

ER -