Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We present methods for automatic speaker identification in noisy environments. To improve noise robustness of speaker identification, we developed two methods, the harmonic structure extraction method and the reliable frame weighting method. The harmonic structure extraction method enables the speaker of input speech signals to be identified after environmental noise has been reduced. This method first extracts harmonic components of the speech from the sound mixtures and then resynthesizes a clean speech signal by using a sinusoidal model driven by harmonic components. The reliable frame weighting method then determines how each frame of the resynthesized speech is reliable (i.e. little influenced by environmental noises) by using two Gaussian mixture models for the speech and noise. The speaker can be robustly identified by attaching importance to reliable frames. Experimental results with thirty speakers showed that our method was able to reduce the influences of environmental noise and achieved an error rate of 10.7%, while the error rate for a conventional method was 18.9%.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages1459-1462
Number of pages4
ISBN (Print)9781604234497
Publication statusPublished - 2006 Jan 1
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: 2006 Sep 172006 Sep 21

Publication series

NameINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume3

Conference

ConferenceINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
CountryUnited States
CityPittsburgh, PA
Period06/9/1706/9/21

    Fingerprint

Keywords

  • Gaussian mixture model
  • Noise robustness
  • Speaker identification
  • Voice extraction
  • Voice reliability

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. (2006). Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP (pp. 1459-1462). (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP; Vol. 3). International Speech Communication Association.