MotiMul: A significant discriminative sequence motif discovery algorithm with multiple testing correction

Koichi Mori, Haruka Ozaki, Tsukasa Fukunaga

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sequence motifs play essential roles in intermolecular interactions such as DNA-protein interactions. The discovery of novel sequence motifs is therefore crucial for revealing gene functions. Various bioinformatics tools have been developed for finding sequence motifs, but until now there has been no software based on statistical hypothesis testing with statistically sound multiple testing correction. Existing software therefore could not control for the type-l error rates. This is because, in the sequence motif discovery problem, conventional multiple testing correction methods produce very low statistical power due to overly-strict correction. We developed MotiMul, which comprehensively finds significant sequence motifs using statistically sound multiple testing correction. Our key idea is the application of Tarone's correction, which improves the statistical power of the hypothesis test by ignoring hypotheses that never become statistically significant. For the efficient enumeration of the significant sequence motifs, we integrated a variant of the PrefixSpan algorithm with Tarone's correction. Simulation and empirical dataset analysis showed that MotiMul is a powerful method for finding biologically meaningful sequence motifs. The source code of MotiMul is freely available at https://github.com/ko-ichimo-ri/MotiMul.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
EditorsTaesung Park, Young-Rae Cho, Xiaohua Tony Hu, Illhoi Yoo, Hyun Goo Woo, Jianxin Wang, Julio Facelli, Seungyoon Nam, Mingon Kang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages186-193
Number of pages8
ISBN (Electronic)9781728162157
DOIs
Publication statusPublished - 2020 Dec 16
Externally publishedYes
Event2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020 - Virtual, Seoul, Korea, Republic of
Duration: 2020 Dec 162020 Dec 19

Publication series

NameProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020

Conference

Conference2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
CountryKorea, Republic of
CityVirtual, Seoul
Period20/12/1620/12/19

Keywords

  • ChIP-seq data analysis
  • frequent pattern mining
  • multiple testing correction
  • sequence motif
  • statistical significance

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems and Management
  • Medicine (miscellaneous)
  • Health Informatics

Fingerprint Dive into the research topics of 'MotiMul: A significant discriminative sequence motif discovery algorithm with multiple testing correction'. Together they form a unique fingerprint.

Cite this