Speech recognition based on acoustically derived segment units

Toshiaki Fukada, Michiel Bacchiani, Kuldip K. Paliwal, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

This paper describes a new method of word model generation based on acoustically derived segment units (henceforth ASUs). An ASU-based approach has the advantages of growing out of human pre-determined phonemes and of consistently generating acoustic units by using the maximum likelihood (ML) criterion. The former advantage is effective when it is difficult to map acoustics to a phone such as with highly co-articulated spontaneous speech. In order to implement an ASU-based modeling approach in a speech recognition system, we must first solve two points: (1) How do we design an inventory of acoustically-derived segmental units and (2) How do we model the pronunciations of lexical entries in terms of the ASUs. As for the second question, we propose an ASU-based word model generation method by composing the ASU statistics, that is, their means, variances and durations. The effectiveness of the proposed method is shown through spontaneous word recognition experiments.

Original languageEnglish
Title of host publicationInternational Conference on Spoken Language Processing, ICSLP, Proceedings
Editors Anon
Place of PublicationPiscataway, NJ, United States
PublisherIEEE
Pages1077-1080
Number of pages4
Volume2
Publication statusPublished - 1996
Externally publishedYes
EventProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4) - Philadelphia, PA, USA
Duration: 1996 Oct 31996 Oct 6

Other

OtherProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4)
CityPhiladelphia, PA, USA
Period96/10/396/10/6

Fingerprint

Speech recognition
Acoustics
Maximum likelihood
Statistics
Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Fukada, T., Bacchiani, M., Paliwal, K. K., & Sagisaka, Y. (1996). Speech recognition based on acoustically derived segment units. In Anon (Ed.), International Conference on Spoken Language Processing, ICSLP, Proceedings (Vol. 2, pp. 1077-1080). Piscataway, NJ, United States: IEEE.

Speech recognition based on acoustically derived segment units. / Fukada, Toshiaki; Bacchiani, Michiel; Paliwal, Kuldip K.; Sagisaka, Yoshinori.

International Conference on Spoken Language Processing, ICSLP, Proceedings. ed. / Anon. Vol. 2 Piscataway, NJ, United States : IEEE, 1996. p. 1077-1080.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fukada, T, Bacchiani, M, Paliwal, KK & Sagisaka, Y 1996, Speech recognition based on acoustically derived segment units. in Anon (ed.), International Conference on Spoken Language Processing, ICSLP, Proceedings. vol. 2, IEEE, Piscataway, NJ, United States, pp. 1077-1080, Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4), Philadelphia, PA, USA, 96/10/3.
Fukada T, Bacchiani M, Paliwal KK, Sagisaka Y. Speech recognition based on acoustically derived segment units. In Anon, editor, International Conference on Spoken Language Processing, ICSLP, Proceedings. Vol. 2. Piscataway, NJ, United States: IEEE. 1996. p. 1077-1080
Fukada, Toshiaki ; Bacchiani, Michiel ; Paliwal, Kuldip K. ; Sagisaka, Yoshinori. / Speech recognition based on acoustically derived segment units. International Conference on Spoken Language Processing, ICSLP, Proceedings. editor / Anon. Vol. 2 Piscataway, NJ, United States : IEEE, 1996. pp. 1077-1080
@inproceedings{0597efe4805e4424a957cc8c38f98065,
title = "Speech recognition based on acoustically derived segment units",
abstract = "This paper describes a new method of word model generation based on acoustically derived segment units (henceforth ASUs). An ASU-based approach has the advantages of growing out of human pre-determined phonemes and of consistently generating acoustic units by using the maximum likelihood (ML) criterion. The former advantage is effective when it is difficult to map acoustics to a phone such as with highly co-articulated spontaneous speech. In order to implement an ASU-based modeling approach in a speech recognition system, we must first solve two points: (1) How do we design an inventory of acoustically-derived segmental units and (2) How do we model the pronunciations of lexical entries in terms of the ASUs. As for the second question, we propose an ASU-based word model generation method by composing the ASU statistics, that is, their means, variances and durations. The effectiveness of the proposed method is shown through spontaneous word recognition experiments.",
author = "Toshiaki Fukada and Michiel Bacchiani and Paliwal, {Kuldip K.} and Yoshinori Sagisaka",
year = "1996",
language = "English",
volume = "2",
pages = "1077--1080",
editor = "Anon",
booktitle = "International Conference on Spoken Language Processing, ICSLP, Proceedings",
publisher = "IEEE",

}

TY - GEN

T1 - Speech recognition based on acoustically derived segment units

AU - Fukada, Toshiaki

AU - Bacchiani, Michiel

AU - Paliwal, Kuldip K.

AU - Sagisaka, Yoshinori

PY - 1996

Y1 - 1996

N2 - This paper describes a new method of word model generation based on acoustically derived segment units (henceforth ASUs). An ASU-based approach has the advantages of growing out of human pre-determined phonemes and of consistently generating acoustic units by using the maximum likelihood (ML) criterion. The former advantage is effective when it is difficult to map acoustics to a phone such as with highly co-articulated spontaneous speech. In order to implement an ASU-based modeling approach in a speech recognition system, we must first solve two points: (1) How do we design an inventory of acoustically-derived segmental units and (2) How do we model the pronunciations of lexical entries in terms of the ASUs. As for the second question, we propose an ASU-based word model generation method by composing the ASU statistics, that is, their means, variances and durations. The effectiveness of the proposed method is shown through spontaneous word recognition experiments.

AB - This paper describes a new method of word model generation based on acoustically derived segment units (henceforth ASUs). An ASU-based approach has the advantages of growing out of human pre-determined phonemes and of consistently generating acoustic units by using the maximum likelihood (ML) criterion. The former advantage is effective when it is difficult to map acoustics to a phone such as with highly co-articulated spontaneous speech. In order to implement an ASU-based modeling approach in a speech recognition system, we must first solve two points: (1) How do we design an inventory of acoustically-derived segmental units and (2) How do we model the pronunciations of lexical entries in terms of the ASUs. As for the second question, we propose an ASU-based word model generation method by composing the ASU statistics, that is, their means, variances and durations. The effectiveness of the proposed method is shown through spontaneous word recognition experiments.

UR - http://www.scopus.com/inward/record.url?scp=0030369299&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030369299&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0030369299

VL - 2

SP - 1077

EP - 1080

BT - International Conference on Spoken Language Processing, ICSLP, Proceedings

A2 - Anon, null

PB - IEEE

CY - Piscataway, NJ, United States

ER -