Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions

Masaki Naito, Li Deng, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

We propose speaker clustering methods based on the vocal-tract-size related articulatory parameters associated with individual speakers. Two parameters characterizing gross vocal-tract dimensions are first derived from formants of speaker-specific Japanese vowels, and are then used to cluster a total of 148 male Japanese speakers. The resultant speaker clusters are found to be significantly different from the speaker clusters obtained by conventional acoustic criteria. Japanese phoneme recognition experiments are carried out using speaker-clustered tied-state HMMs (HMNets) trained for each cluster. Compared with the baseline gender dependent model, 5.7% of recognition error reduction has been achieved based on the clustering method using vocal-tract parameters.

Original languageEnglish
Title of host publicationProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
Pages981-984
Number of pages4
DOIs
Publication statusPublished - 1998 Dec 1
Externally publishedYes
Event1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 - Seattle, WA, United States
Duration: 1998 May 121998 May 15

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2
ISSN (Print)1520-6149

Conference

Conference1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
Country/TerritoryUnited States
CitySeattle, WA
Period98/5/1298/5/15

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Speaker clustering for speech recognition using the parameters characterizing vocal-tract dimensions'. Together they form a unique fingerprint.

Cite this