SAG-QC: Quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions

Toru Maruyama, Tetsushi Mori, Keisuke Yamagishi, Haruko Takeyama

    Research output: Contribution to journalArticle

    2 Citations (Scopus)

    Abstract

    Background: Whole genome amplification techniques have enabled the analysis of unexplored genomic information by sequencing of single-amplified genomes (SAGs). Whole genome amplification of single bacteria is currently challenging because contamination often occurs in experimental processes. Thus, to increase the confidence in the analyses of sequenced SAGs, bioinformatics approaches that identify and exclude non-target sequences from SAGs are required. Since currently reported approaches utilize sequence information in public databases, they have limitations when new strains are the targets of interest. Here, we developed a software SAG-QC that identify and exclude non-target sequences independent of database. Results: In our method, "no template control" sequences acquired during WGA were used. We calculated the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no template control sequences. Based on the results of tests using simulated SAG datasets, the accuracy of our method for predicting non-target sequences was higher than that of currently reported techniques. Subsequently, we applied our tool to actual SAG datasets and evaluated the accuracy of the predictions. Conclusions: Our method works independently of public sequence information for distinguishing SAGs from non-target sequences. This method will be effective when employed against SAG sequences of unexplored strains and we anticipate that it will contribute to the correct interpretation of SAGs.

    Original languageEnglish
    Article number152
    JournalBMC Bioinformatics
    Volume18
    Issue number1
    DOIs
    Publication statusPublished - 2017 Mar 4

    Fingerprint

    Quality Control
    Quality control
    Genome
    Genes
    Chemical analysis
    Amplification
    Template
    Databases
    Bioinformatics
    Computational Biology
    Contamination
    Bacteria
    Sequencing
    Confidence
    Genomics
    Software
    Impurities
    Target
    Prediction

    Keywords

    • Decontamination
    • GUI software
    • Single-cell genomics

    ASJC Scopus subject areas

    • Structural Biology
    • Biochemistry
    • Molecular Biology
    • Computer Science Applications
    • Applied Mathematics

    Cite this

    SAG-QC : Quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions. / Maruyama, Toru; Mori, Tetsushi; Yamagishi, Keisuke; Takeyama, Haruko.

    In: BMC Bioinformatics, Vol. 18, No. 1, 152, 04.03.2017.

    Research output: Contribution to journalArticle

    @article{77a792252e614a22ab3701b5e2d15597,
    title = "SAG-QC: Quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions",
    abstract = "Background: Whole genome amplification techniques have enabled the analysis of unexplored genomic information by sequencing of single-amplified genomes (SAGs). Whole genome amplification of single bacteria is currently challenging because contamination often occurs in experimental processes. Thus, to increase the confidence in the analyses of sequenced SAGs, bioinformatics approaches that identify and exclude non-target sequences from SAGs are required. Since currently reported approaches utilize sequence information in public databases, they have limitations when new strains are the targets of interest. Here, we developed a software SAG-QC that identify and exclude non-target sequences independent of database. Results: In our method, {"}no template control{"} sequences acquired during WGA were used. We calculated the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no template control sequences. Based on the results of tests using simulated SAG datasets, the accuracy of our method for predicting non-target sequences was higher than that of currently reported techniques. Subsequently, we applied our tool to actual SAG datasets and evaluated the accuracy of the predictions. Conclusions: Our method works independently of public sequence information for distinguishing SAGs from non-target sequences. This method will be effective when employed against SAG sequences of unexplored strains and we anticipate that it will contribute to the correct interpretation of SAGs.",
    keywords = "Decontamination, GUI software, Single-cell genomics",
    author = "Toru Maruyama and Tetsushi Mori and Keisuke Yamagishi and Haruko Takeyama",
    year = "2017",
    month = "3",
    day = "4",
    doi = "10.1186/s12859-017-1572-5",
    language = "English",
    volume = "18",
    journal = "BMC Bioinformatics",
    issn = "1471-2105",
    publisher = "BioMed Central",
    number = "1",

    }

    TY - JOUR

    T1 - SAG-QC

    T2 - Quality control of single amplified genome information by subtracting non-target sequences based on sequence compositions

    AU - Maruyama, Toru

    AU - Mori, Tetsushi

    AU - Yamagishi, Keisuke

    AU - Takeyama, Haruko

    PY - 2017/3/4

    Y1 - 2017/3/4

    N2 - Background: Whole genome amplification techniques have enabled the analysis of unexplored genomic information by sequencing of single-amplified genomes (SAGs). Whole genome amplification of single bacteria is currently challenging because contamination often occurs in experimental processes. Thus, to increase the confidence in the analyses of sequenced SAGs, bioinformatics approaches that identify and exclude non-target sequences from SAGs are required. Since currently reported approaches utilize sequence information in public databases, they have limitations when new strains are the targets of interest. Here, we developed a software SAG-QC that identify and exclude non-target sequences independent of database. Results: In our method, "no template control" sequences acquired during WGA were used. We calculated the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no template control sequences. Based on the results of tests using simulated SAG datasets, the accuracy of our method for predicting non-target sequences was higher than that of currently reported techniques. Subsequently, we applied our tool to actual SAG datasets and evaluated the accuracy of the predictions. Conclusions: Our method works independently of public sequence information for distinguishing SAGs from non-target sequences. This method will be effective when employed against SAG sequences of unexplored strains and we anticipate that it will contribute to the correct interpretation of SAGs.

    AB - Background: Whole genome amplification techniques have enabled the analysis of unexplored genomic information by sequencing of single-amplified genomes (SAGs). Whole genome amplification of single bacteria is currently challenging because contamination often occurs in experimental processes. Thus, to increase the confidence in the analyses of sequenced SAGs, bioinformatics approaches that identify and exclude non-target sequences from SAGs are required. Since currently reported approaches utilize sequence information in public databases, they have limitations when new strains are the targets of interest. Here, we developed a software SAG-QC that identify and exclude non-target sequences independent of database. Results: In our method, "no template control" sequences acquired during WGA were used. We calculated the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no template control sequences. Based on the results of tests using simulated SAG datasets, the accuracy of our method for predicting non-target sequences was higher than that of currently reported techniques. Subsequently, we applied our tool to actual SAG datasets and evaluated the accuracy of the predictions. Conclusions: Our method works independently of public sequence information for distinguishing SAGs from non-target sequences. This method will be effective when employed against SAG sequences of unexplored strains and we anticipate that it will contribute to the correct interpretation of SAGs.

    KW - Decontamination

    KW - GUI software

    KW - Single-cell genomics

    UR - http://www.scopus.com/inward/record.url?scp=85014379098&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85014379098&partnerID=8YFLogxK

    U2 - 10.1186/s12859-017-1572-5

    DO - 10.1186/s12859-017-1572-5

    M3 - Article

    C2 - 28259144

    AN - SCOPUS:85014379098

    VL - 18

    JO - BMC Bioinformatics

    JF - BMC Bioinformatics

    SN - 1471-2105

    IS - 1

    M1 - 152

    ER -