A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition

Jin Song Zhang, Shu Wu Zhang, Yoshinori Sagisaka, Satoshi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper presents our approach to enhance the portability of acoustic models by mitigating the phonetic mismatch arising from a new testing task which is rather different from the training data. The approach is a hybrid one which combines knowledge-based context categorization to generate a context rich set of subword units, and data-driven-based acoustic model clustering on the level of context category. Compared with the conventional approach of only phonetic decision tree based model clustering and unseen model generation, the new approach improved greatly the desired subword coverage for the new testing domain, and achieved an error rate reduction by 10.8% for Chinese character accuracy in the recognition experiments. Together with the effect of the newly adopted basic units of 9 glottal stops, we achieved a total 23.5% error rate reduction in the testing compared to the baseline system.

Original languageEnglish
Title of host publicationEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
PublisherInternational Speech Communication Association
Pages1661-1664
Number of pages4
ISBN (Electronic)8790834100, 9788790834104
Publication statusPublished - 2001
Externally publishedYes
Event7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 - Aalborg, Denmark
Duration: 2001 Sep 32001 Sep 7

Other

Other7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001
CountryDenmark
CityAalborg
Period01/9/301/9/7

Fingerprint

Speech recognition
acoustics
Acoustics
Speech analysis
phonetics
Testing
Decision trees
mismatch
coverage
experiment
knowledge
Experiments

ASJC Scopus subject areas

  • Communication
  • Linguistics and Language
  • Computer Science Applications
  • Software

Cite this

Zhang, J. S., Zhang, S. W., Sagisaka, Y., & Nakamura, S. (2001). A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition. In EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology (pp. 1661-1664). International Speech Communication Association.

A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition. / Zhang, Jin Song; Zhang, Shu Wu; Sagisaka, Yoshinori; Nakamura, Satoshi.

EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. International Speech Communication Association, 2001. p. 1661-1664.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, JS, Zhang, SW, Sagisaka, Y & Nakamura, S 2001, A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition. in EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. International Speech Communication Association, pp. 1661-1664, 7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001, Aalborg, Denmark, 01/9/3.
Zhang JS, Zhang SW, Sagisaka Y, Nakamura S. A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition. In EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. International Speech Communication Association. 2001. p. 1661-1664
Zhang, Jin Song ; Zhang, Shu Wu ; Sagisaka, Yoshinori ; Nakamura, Satoshi. / A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition. EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. International Speech Communication Association, 2001. pp. 1661-1664
@inproceedings{527f57bc95894a898850b687a1575364,
title = "A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition",
abstract = "This paper presents our approach to enhance the portability of acoustic models by mitigating the phonetic mismatch arising from a new testing task which is rather different from the training data. The approach is a hybrid one which combines knowledge-based context categorization to generate a context rich set of subword units, and data-driven-based acoustic model clustering on the level of context category. Compared with the conventional approach of only phonetic decision tree based model clustering and unseen model generation, the new approach improved greatly the desired subword coverage for the new testing domain, and achieved an error rate reduction by 10.8{\%} for Chinese character accuracy in the recognition experiments. Together with the effect of the newly adopted basic units of 9 glottal stops, we achieved a total 23.5{\%} error rate reduction in the testing compared to the baseline system.",
author = "Zhang, {Jin Song} and Zhang, {Shu Wu} and Yoshinori Sagisaka and Satoshi Nakamura",
year = "2001",
language = "English",
pages = "1661--1664",
booktitle = "EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology",
publisher = "International Speech Communication Association",

}

TY - GEN

T1 - A hybrid approach to enhance task portability of acoustic models in Chinese speech recognition

AU - Zhang, Jin Song

AU - Zhang, Shu Wu

AU - Sagisaka, Yoshinori

AU - Nakamura, Satoshi

PY - 2001

Y1 - 2001

N2 - This paper presents our approach to enhance the portability of acoustic models by mitigating the phonetic mismatch arising from a new testing task which is rather different from the training data. The approach is a hybrid one which combines knowledge-based context categorization to generate a context rich set of subword units, and data-driven-based acoustic model clustering on the level of context category. Compared with the conventional approach of only phonetic decision tree based model clustering and unseen model generation, the new approach improved greatly the desired subword coverage for the new testing domain, and achieved an error rate reduction by 10.8% for Chinese character accuracy in the recognition experiments. Together with the effect of the newly adopted basic units of 9 glottal stops, we achieved a total 23.5% error rate reduction in the testing compared to the baseline system.

AB - This paper presents our approach to enhance the portability of acoustic models by mitigating the phonetic mismatch arising from a new testing task which is rather different from the training data. The approach is a hybrid one which combines knowledge-based context categorization to generate a context rich set of subword units, and data-driven-based acoustic model clustering on the level of context category. Compared with the conventional approach of only phonetic decision tree based model clustering and unseen model generation, the new approach improved greatly the desired subword coverage for the new testing domain, and achieved an error rate reduction by 10.8% for Chinese character accuracy in the recognition experiments. Together with the effect of the newly adopted basic units of 9 glottal stops, we achieved a total 23.5% error rate reduction in the testing compared to the baseline system.

UR - http://www.scopus.com/inward/record.url?scp=85009079604&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009079604&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85009079604

SP - 1661

EP - 1664

BT - EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology

PB - International Speech Communication Association

ER -