Improvement of detection performance of fusion genes from RNA-seq data by clustering short reads

Yoshiaki Sota, Shigeto Seno, Hironori Shigeta, Naoki Osato, Masafumi Shimoda, Shinzaburo Noguchi, Hideo Matsuda

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Fusion genes are involved in cancer, and their detection using RNA-Seq is insufficient given the relatively short reading length. Therefore, we proposed a shifted short-read clustering (SSC) method, which focuses on overlapping reads from the same loci and extends them as a representative sequence. To verify their usefulness, we applied the SSC method to RNA-Seq data from four types of cell lines (BT-474, MCF-7, SKBR-3, and T-47D). As the slide width of the SSC method increased to one, two, five, or ten bases, the read length was extended from 201 bases to 217 (108%), 234 (116%), 282 (140%), or 317 (158%) bases, respectively. Furthermore, fusion genes were investigated using STAR-Fusion, a fusion gene detection tool, with and without the SSC method. When one base was shifted by the SSC method, the reads mapped to multiple loci decreased from 9.7% to 4.6%, and the sensitivity of the fusion gene was improved from 47% to 54% on average (BT-474: from 48% to 57%, MCF-7: 49% to 53%, SKBR-3: 50% to 57%, and T-47D: 43% to 50%) compared with original data. When the reads are shifted more, the positive predictive value was also improved. The SSC method could be an effective method for fusion gene detection.

Original languageEnglish
Article number1940008
JournalJournal of Bioinformatics and Computational Biology
Volume17
Issue number3
DOIs
Publication statusPublished - 2019 Jun 1
Externally publishedYes

Keywords

  • Cancer
  • fusion gene
  • RNA-seq
  • SlideSort

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Improvement of detection performance of fusion genes from RNA-seq data by clustering short reads'. Together they form a unique fingerprint.

Cite this