Probabilistic alignments with quality scores: An application to short-read mapping toward accurate SNP/indel detection

Michiaki Hamada, Edward Wijaya, Martin C. Frith, Kiyoshi Asai

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Motivation: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.Results: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.

Original languageEnglish
Article numberbtr537
Pages (from-to)3085-3092
Number of pages8
JournalBioinformatics
Volume27
Issue number22
DOIs
Publication statusPublished - 2011 Nov
Externally publishedYes

Fingerprint

Single Nucleotide Polymorphism
Alignment
Statistical Models
Probabilistic Model
Computational Experiments
Decoding
Trade-offs
Prediction
Estimate
Experiments

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Probabilistic alignments with quality scores : An application to short-read mapping toward accurate SNP/indel detection. / Hamada, Michiaki; Wijaya, Edward; Frith, Martin C.; Asai, Kiyoshi.

In: Bioinformatics, Vol. 27, No. 22, btr537, 11.2011, p. 3085-3092.

Research output: Contribution to journalArticle

Hamada, Michiaki ; Wijaya, Edward ; Frith, Martin C. ; Asai, Kiyoshi. / Probabilistic alignments with quality scores : An application to short-read mapping toward accurate SNP/indel detection. In: Bioinformatics. 2011 ; Vol. 27, No. 22. pp. 3085-3092.
@article{168f19121a2a4ee39b190609b7534a57,
title = "Probabilistic alignments with quality scores: An application to short-read mapping toward accurate SNP/indel detection",
abstract = "Motivation: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.Results: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.",
author = "Michiaki Hamada and Edward Wijaya and Frith, {Martin C.} and Kiyoshi Asai",
year = "2011",
month = "11",
doi = "10.1093/bioinformatics/btr537",
language = "English",
volume = "27",
pages = "3085--3092",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "22",

}

TY - JOUR

T1 - Probabilistic alignments with quality scores

T2 - An application to short-read mapping toward accurate SNP/indel detection

AU - Hamada, Michiaki

AU - Wijaya, Edward

AU - Frith, Martin C.

AU - Asai, Kiyoshi

PY - 2011/11

Y1 - 2011/11

N2 - Motivation: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.Results: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.

AB - Motivation: Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses.Results: In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.

UR - http://www.scopus.com/inward/record.url?scp=80755171163&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80755171163&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr537

DO - 10.1093/bioinformatics/btr537

M3 - Article

C2 - 21976422

AN - SCOPUS:80755171163

VL - 27

SP - 3085

EP - 3092

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 22

M1 - btr537

ER -