Speaker normalized acoustic modeling based on 3-D viterbi decoding

T. Fukada, Y. Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper describes a novel method for speaker normalization based on a frequency warping approach to reduce variations due to speaker-induced factors such as the vocal tract length. In our approach, a speaker normalized acoustic model is trained using time-varying (i.e., state, phoneme or word dependent) warping factors, while in the conventional approaches, the frequency warping factor is fixed for each speaker. These time-varying frequency warping factors are determined by a 3-dimensional (i.e., input frames, HMM states and warping factors) Viterbi decoding procedure. Experimental results on Japanese spontaneous speech recognition show that the proposed method yields a 9.7% improvement in speech recognition accuracy compared to the conventional speaker-independent model.

Original languageEnglish
Title of host publicationProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
Pages437-440
Number of pages4
DOIs
Publication statusPublished - 1998 Dec 1
Externally publishedYes
Event1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 - Seattle, WA, United States
Duration: 1998 May 121998 May 15

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume1
ISSN (Print)1520-6149

Conference

Conference1998 23rd IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998
CountryUnited States
CitySeattle, WA
Period98/5/1298/5/15

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Speaker normalized acoustic modeling based on 3-D viterbi decoding'. Together they form a unique fingerprint.

  • Cite this

    Fukada, T., & Sagisaka, Y. (1998). Speaker normalized acoustic modeling based on 3-D viterbi decoding. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 (pp. 437-440). [674461] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 1). https://doi.org/10.1109/ICASSP.1998.674461