Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions

Masaki Naito, Li Deng, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We propose speaker clustering methods based on the vocal-tract-size related articulatory parameters associated with individual speakers. Two parameters characterizing gross vocal-tract dimensions are first derived from formants of speaker-specific Japanese vowels, and are then used to cluster a total of 148 male Japanese speakers. The resultant speaker clusters are found to be significantly different from the speaker clusters obtained by conventional acoustic criteria. Japanese phoneme recognition experiments are carried out using speaker-clustered tied-state HMMs (HMNets) trained for each cluster. Compared with the baseline gender dependent model, 5.7% of recognition error reduction has been achieved based on the clustering method using vocal-tract parameters.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Pages981-984
Number of pages4
Volume2
DOIs
Publication statusPublished - 1998
Externally publishedYes
Event1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 - Seattle, WA
Duration: 1998 May 121998 May 15

Other

Other1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
CitySeattle, WA
Period98/5/1298/5/15

Fingerprint

Speech recognition
Acoustics
Experiments

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Naito, M., Deng, L., & Sagisaka, Y. (1998). Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 2, pp. 981-984). [675431] https://doi.org/10.1109/ICASSP.1998.675431

Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions. / Naito, Masaki; Deng, Li; Sagisaka, Yoshinori.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2 1998. p. 981-984 675431.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Naito, M, Deng, L & Sagisaka, Y 1998, Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 2, 675431, pp. 981-984, 1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998, Seattle, WA, 98/5/12. https://doi.org/10.1109/ICASSP.1998.675431
Naito M, Deng L, Sagisaka Y. Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2. 1998. p. 981-984. 675431 https://doi.org/10.1109/ICASSP.1998.675431
Naito, Masaki ; Deng, Li ; Sagisaka, Yoshinori. / Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 2 1998. pp. 981-984
@inproceedings{f08c4e78d20b481ba2613027207ddb92,
title = "Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions",
abstract = "We propose speaker clustering methods based on the vocal-tract-size related articulatory parameters associated with individual speakers. Two parameters characterizing gross vocal-tract dimensions are first derived from formants of speaker-specific Japanese vowels, and are then used to cluster a total of 148 male Japanese speakers. The resultant speaker clusters are found to be significantly different from the speaker clusters obtained by conventional acoustic criteria. Japanese phoneme recognition experiments are carried out using speaker-clustered tied-state HMMs (HMNets) trained for each cluster. Compared with the baseline gender dependent model, 5.7{\%} of recognition error reduction has been achieved based on the clustering method using vocal-tract parameters.",
author = "Masaki Naito and Li Deng and Yoshinori Sagisaka",
year = "1998",
doi = "10.1109/ICASSP.1998.675431",
language = "English",
isbn = "0780344286",
volume = "2",
pages = "981--984",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions

AU - Naito, Masaki

AU - Deng, Li

AU - Sagisaka, Yoshinori

PY - 1998

Y1 - 1998

N2 - We propose speaker clustering methods based on the vocal-tract-size related articulatory parameters associated with individual speakers. Two parameters characterizing gross vocal-tract dimensions are first derived from formants of speaker-specific Japanese vowels, and are then used to cluster a total of 148 male Japanese speakers. The resultant speaker clusters are found to be significantly different from the speaker clusters obtained by conventional acoustic criteria. Japanese phoneme recognition experiments are carried out using speaker-clustered tied-state HMMs (HMNets) trained for each cluster. Compared with the baseline gender dependent model, 5.7% of recognition error reduction has been achieved based on the clustering method using vocal-tract parameters.

AB - We propose speaker clustering methods based on the vocal-tract-size related articulatory parameters associated with individual speakers. Two parameters characterizing gross vocal-tract dimensions are first derived from formants of speaker-specific Japanese vowels, and are then used to cluster a total of 148 male Japanese speakers. The resultant speaker clusters are found to be significantly different from the speaker clusters obtained by conventional acoustic criteria. Japanese phoneme recognition experiments are carried out using speaker-clustered tied-state HMMs (HMNets) trained for each cluster. Compared with the baseline gender dependent model, 5.7% of recognition error reduction has been achieved based on the clustering method using vocal-tract parameters.

UR - http://www.scopus.com/inward/record.url?scp=33847268014&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847268014&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.1998.675431

DO - 10.1109/ICASSP.1998.675431

M3 - Conference contribution

SN - 0780344286

SN - 9780780344280

VL - 2

SP - 981

EP - 984

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -