Design of a speech recognition system based on acoustically derived segmental units

M. Bacchiani, M. Ostendorf, Yoshinori Sagisaka, K. Paliwal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

The design of speech recognition system based on acoustically-derived, segmental units can be divided in three steps: unit design, lexicon building and pronunciation modeling. We formulate an iterative unit design procedure which consistently uses a maximum likelihood (ML) objective in successive application of resegmentation and model re-estimation. The lexicon building allows multi-word entries in the lexicon but restricts the number of these entries in order to avoid a too costly search. Selected multi-word lexical entries are those with high frequency (such as function words) and those which consistently exhibit cross-word phone assimilation. The stochastic pronunciation model represents the likelihood of a particular acoustic segment sequence given the phonetic baseform of a lexical item, where the sequence of baseform phones are treated as a Markov state sequence and each state can emit multiple segments.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherIEEE
Pages443-446
Number of pages4
Volume1
Publication statusPublished - 1996
Externally publishedYes
EventProceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP. Part 1 (of 6) - Atlanta, GA, USA
Duration: 1996 May 71996 May 10

Other

OtherProceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP. Part 1 (of 6)
CityAtlanta, GA, USA
Period96/5/796/5/10

Fingerprint

speech recognition
Speech recognition
entry
Speech analysis
Stochastic models
phonetics
Maximum likelihood
assimilation
Acoustics
acoustics

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Bacchiani, M., Ostendorf, M., Sagisaka, Y., & Paliwal, K. (1996). Design of a speech recognition system based on acoustically derived segmental units. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 1, pp. 443-446). IEEE.

Design of a speech recognition system based on acoustically derived segmental units. / Bacchiani, M.; Ostendorf, M.; Sagisaka, Yoshinori; Paliwal, K.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1 IEEE, 1996. p. 443-446.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bacchiani, M, Ostendorf, M, Sagisaka, Y & Paliwal, K 1996, Design of a speech recognition system based on acoustically derived segmental units. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 1, IEEE, pp. 443-446, Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP. Part 1 (of 6), Atlanta, GA, USA, 96/5/7.
Bacchiani M, Ostendorf M, Sagisaka Y, Paliwal K. Design of a speech recognition system based on acoustically derived segmental units. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1. IEEE. 1996. p. 443-446
Bacchiani, M. ; Ostendorf, M. ; Sagisaka, Yoshinori ; Paliwal, K. / Design of a speech recognition system based on acoustically derived segmental units. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 1 IEEE, 1996. pp. 443-446
@inproceedings{ca05c4ccfeb94a9d9218915a3e2d8082,
title = "Design of a speech recognition system based on acoustically derived segmental units",
abstract = "The design of speech recognition system based on acoustically-derived, segmental units can be divided in three steps: unit design, lexicon building and pronunciation modeling. We formulate an iterative unit design procedure which consistently uses a maximum likelihood (ML) objective in successive application of resegmentation and model re-estimation. The lexicon building allows multi-word entries in the lexicon but restricts the number of these entries in order to avoid a too costly search. Selected multi-word lexical entries are those with high frequency (such as function words) and those which consistently exhibit cross-word phone assimilation. The stochastic pronunciation model represents the likelihood of a particular acoustic segment sequence given the phonetic baseform of a lexical item, where the sequence of baseform phones are treated as a Markov state sequence and each state can emit multiple segments.",
author = "M. Bacchiani and M. Ostendorf and Yoshinori Sagisaka and K. Paliwal",
year = "1996",
language = "English",
volume = "1",
pages = "443--446",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "IEEE",

}

TY - GEN

T1 - Design of a speech recognition system based on acoustically derived segmental units

AU - Bacchiani, M.

AU - Ostendorf, M.

AU - Sagisaka, Yoshinori

AU - Paliwal, K.

PY - 1996

Y1 - 1996

N2 - The design of speech recognition system based on acoustically-derived, segmental units can be divided in three steps: unit design, lexicon building and pronunciation modeling. We formulate an iterative unit design procedure which consistently uses a maximum likelihood (ML) objective in successive application of resegmentation and model re-estimation. The lexicon building allows multi-word entries in the lexicon but restricts the number of these entries in order to avoid a too costly search. Selected multi-word lexical entries are those with high frequency (such as function words) and those which consistently exhibit cross-word phone assimilation. The stochastic pronunciation model represents the likelihood of a particular acoustic segment sequence given the phonetic baseform of a lexical item, where the sequence of baseform phones are treated as a Markov state sequence and each state can emit multiple segments.

AB - The design of speech recognition system based on acoustically-derived, segmental units can be divided in three steps: unit design, lexicon building and pronunciation modeling. We formulate an iterative unit design procedure which consistently uses a maximum likelihood (ML) objective in successive application of resegmentation and model re-estimation. The lexicon building allows multi-word entries in the lexicon but restricts the number of these entries in order to avoid a too costly search. Selected multi-word lexical entries are those with high frequency (such as function words) and those which consistently exhibit cross-word phone assimilation. The stochastic pronunciation model represents the likelihood of a particular acoustic segment sequence given the phonetic baseform of a lexical item, where the sequence of baseform phones are treated as a Markov state sequence and each state can emit multiple segments.

UR - http://www.scopus.com/inward/record.url?scp=0029725372&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029725372&partnerID=8YFLogxK

M3 - Conference contribution

VL - 1

SP - 443

EP - 446

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - IEEE

ER -