Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We present methods for automatic speaker identification in noisy environments. To improve noise robustness of speaker identification, we developed two methods, the harmonic structure extraction method and the reliable frame weighting method. The harmonic structure extraction method enables the speaker of input speech signals to be identified after environmental noise has been reduced. This method first extracts harmonic components of the speech from the sound mixtures and then resynthesizes a clean speech signal by using a sinusoidal model driven by harmonic components. The reliable frame weighting method then determines how each frame of the resynthesized speech is reliable (i.e. little influenced by environmental noises) by using two Gaussian mixture models for the speech and noise. The speaker can be robustly identified by attaching importance to reliable frames. Experimental results with thirty speakers showed that our method was able to reduce the influences of environmental noise and achieved an error rate of 10.7%, while the error rate for a conventional method was 18.9%.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Pages1459-1462
Number of pages4
Volume3
Publication statusPublished - 2006
Externally publishedYes
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA
Duration: 2006 Sep 172006 Sep 21

Other

OtherINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
CityPittsburgh, PA
Period06/9/1706/9/21

Fingerprint

Acoustic waves

Keywords

  • Gaussian mixture model
  • Noise robustness
  • Speaker identification
  • Voice extraction
  • Voice reliability

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. (2006). Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP (Vol. 3, pp. 1459-1462)

Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. / Fujihara, Hiromasa; Kitahara, Tetsuro; Goto, Masataka; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 3 2006. p. 1459-1462.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fujihara, H, Kitahara, T, Goto, M, Komatani, K, Ogata, T & Okuno, HG 2006, Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. in INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. vol. 3, pp. 1459-1462, INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, Pittsburgh, PA, 06/9/17.
Fujihara H, Kitahara T, Goto M, Komatani K, Ogata T, Okuno HG. Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 3. 2006. p. 1459-1462
Fujihara, Hiromasa ; Kitahara, Tetsuro ; Goto, Masataka ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. Vol. 3 2006. pp. 1459-1462
@inproceedings{addd02d18e3449cf8e336ade33f91883,
title = "Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting",
abstract = "We present methods for automatic speaker identification in noisy environments. To improve noise robustness of speaker identification, we developed two methods, the harmonic structure extraction method and the reliable frame weighting method. The harmonic structure extraction method enables the speaker of input speech signals to be identified after environmental noise has been reduced. This method first extracts harmonic components of the speech from the sound mixtures and then resynthesizes a clean speech signal by using a sinusoidal model driven by harmonic components. The reliable frame weighting method then determines how each frame of the resynthesized speech is reliable (i.e. little influenced by environmental noises) by using two Gaussian mixture models for the speech and noise. The speaker can be robustly identified by attaching importance to reliable frames. Experimental results with thirty speakers showed that our method was able to reduce the influences of environmental noise and achieved an error rate of 10.7{\%}, while the error rate for a conventional method was 18.9{\%}.",
keywords = "Gaussian mixture model, Noise robustness, Speaker identification, Voice extraction, Voice reliability",
author = "Hiromasa Fujihara and Tetsuro Kitahara and Masataka Goto and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
language = "English",
isbn = "9781604234497",
volume = "3",
pages = "1459--1462",
booktitle = "INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP",

}

TY - GEN

T1 - Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting

AU - Fujihara, Hiromasa

AU - Kitahara, Tetsuro

AU - Goto, Masataka

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - We present methods for automatic speaker identification in noisy environments. To improve noise robustness of speaker identification, we developed two methods, the harmonic structure extraction method and the reliable frame weighting method. The harmonic structure extraction method enables the speaker of input speech signals to be identified after environmental noise has been reduced. This method first extracts harmonic components of the speech from the sound mixtures and then resynthesizes a clean speech signal by using a sinusoidal model driven by harmonic components. The reliable frame weighting method then determines how each frame of the resynthesized speech is reliable (i.e. little influenced by environmental noises) by using two Gaussian mixture models for the speech and noise. The speaker can be robustly identified by attaching importance to reliable frames. Experimental results with thirty speakers showed that our method was able to reduce the influences of environmental noise and achieved an error rate of 10.7%, while the error rate for a conventional method was 18.9%.

AB - We present methods for automatic speaker identification in noisy environments. To improve noise robustness of speaker identification, we developed two methods, the harmonic structure extraction method and the reliable frame weighting method. The harmonic structure extraction method enables the speaker of input speech signals to be identified after environmental noise has been reduced. This method first extracts harmonic components of the speech from the sound mixtures and then resynthesizes a clean speech signal by using a sinusoidal model driven by harmonic components. The reliable frame weighting method then determines how each frame of the resynthesized speech is reliable (i.e. little influenced by environmental noises) by using two Gaussian mixture models for the speech and noise. The speaker can be robustly identified by attaching importance to reliable frames. Experimental results with thirty speakers showed that our method was able to reduce the influences of environmental noise and achieved an error rate of 10.7%, while the error rate for a conventional method was 18.9%.

KW - Gaussian mixture model

KW - Noise robustness

KW - Speaker identification

KW - Voice extraction

KW - Voice reliability

UR - http://www.scopus.com/inward/record.url?scp=34547507569&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547507569&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:34547507569

SN - 9781604234497

VL - 3

SP - 1459

EP - 1462

BT - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

ER -