Multi-talker speech recognition based on blind source separation with ad hoc microphone array using smartphones and cloud storage

Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino

Research output: Contribution to journalConference articlepeer-review

10 Citations (Scopus)

Abstract

In this paper, we present a multi-talker speech recognition system based on blind source separation with an ad hoc microphone array, which consists of smartphones and cloud storage. In this system, a mixture of voices from multiple speakers is recorded by each speaker's smartphone, which is automatically transferred to online cloud storage. Our prototype system is realized using iPhone and Dropbox. Although the signals recorded by different iPhones are not synchronized, the blind synchronization technique compensates both the differences in the time offset and the sampling frequency mismatch. Then, auxiliary-function-based independent vector analysis separates the synchronized mixture into each speaker's voice. Finally, automatic speech recognition is applied to transcribe the speech. By experimental evaluation of the multi-talker speech recognition system using Julius, we confirm that it effectively reduces the speech overlap and improves the speech recognition performance.

Original languageEnglish
Pages (from-to)3369-3373
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
Publication statusPublished - 2016
Externally publishedYes
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: 2016 Sep 82016 Sep 16

Keywords

  • Ad hoc microphone array
  • Blind source separation
  • Speech recognition
  • Synchronization

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint Dive into the research topics of 'Multi-talker speech recognition based on blind source separation with ad hoc microphone array using smartphones and cloud storage'. Together they form a unique fingerprint.

Cite this