Structured discriminative models for speech recognition: An overview

Mark John Francis Gales, Shinji Watanabe, Eric Fosler-Lussier

Research output: Contribution to journalReview article

21 Citations (Scopus)

Abstract

Automatic speech recognition (ASR) systems classify structured sequence data, where the label sequences (sentences) must be inferred from the observation sequences (the acoustic waveform). The sequential nature of the task is one of the reasons why generative classifiers, based on combining hidden Markov model (HMM) acoustic models and N-gram language models using Bayes rule, have become the dominant technology used in ASR. Conversely, machine learning and natural language processing (NLP) research areas are increasingly dominated by discriminative approaches, where the class posteriors are directly modeled. This article describes recent work in the area of structured discriminative models for ASR. To handle continuous, variable length observation sequences, the approaches applied to NLP tasks must be modified. This article discusses a variety of approaches for applying structured discriminative models to ASR, both from the current literature and possible future approaches. We concentrate on structured models themselves, the descriptive features of observations commonly used within the models, and various options for optimizing the parameters of the model.

Original languageEnglish
Article number6296527
Pages (from-to)70-81
Number of pages12
JournalIEEE Signal Processing Magazine
Volume29
Issue number6
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Speech Recognition
Speech recognition
Automatic Speech Recognition
Natural Language
Bayes Rule
Acoustic Model
N-gram
Model
Language Model
Continuous Variables
Acoustics
Waveform
Markov Model
Machine Learning
Classify
Classifier
Hidden Markov models
Processing
Learning systems
Labels

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Applied Mathematics

Cite this

Structured discriminative models for speech recognition : An overview. / Gales, Mark John Francis; Watanabe, Shinji; Fosler-Lussier, Eric.

In: IEEE Signal Processing Magazine, Vol. 29, No. 6, 6296527, 2012, p. 70-81.

Research output: Contribution to journalReview article

Gales, Mark John Francis ; Watanabe, Shinji ; Fosler-Lussier, Eric. / Structured discriminative models for speech recognition : An overview. In: IEEE Signal Processing Magazine. 2012 ; Vol. 29, No. 6. pp. 70-81.
@article{3c559c6115834cb6add5fcb06902faf2,
title = "Structured discriminative models for speech recognition: An overview",
abstract = "Automatic speech recognition (ASR) systems classify structured sequence data, where the label sequences (sentences) must be inferred from the observation sequences (the acoustic waveform). The sequential nature of the task is one of the reasons why generative classifiers, based on combining hidden Markov model (HMM) acoustic models and N-gram language models using Bayes rule, have become the dominant technology used in ASR. Conversely, machine learning and natural language processing (NLP) research areas are increasingly dominated by discriminative approaches, where the class posteriors are directly modeled. This article describes recent work in the area of structured discriminative models for ASR. To handle continuous, variable length observation sequences, the approaches applied to NLP tasks must be modified. This article discusses a variety of approaches for applying structured discriminative models to ASR, both from the current literature and possible future approaches. We concentrate on structured models themselves, the descriptive features of observations commonly used within the models, and various options for optimizing the parameters of the model.",
author = "Gales, {Mark John Francis} and Shinji Watanabe and Eric Fosler-Lussier",
year = "2012",
doi = "10.1109/MSP.2012.2207140",
language = "English",
volume = "29",
pages = "70--81",
journal = "IEEE Signal Processing Magazine",
issn = "1053-5888",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

TY - JOUR

T1 - Structured discriminative models for speech recognition

T2 - An overview

AU - Gales, Mark John Francis

AU - Watanabe, Shinji

AU - Fosler-Lussier, Eric

PY - 2012

Y1 - 2012

N2 - Automatic speech recognition (ASR) systems classify structured sequence data, where the label sequences (sentences) must be inferred from the observation sequences (the acoustic waveform). The sequential nature of the task is one of the reasons why generative classifiers, based on combining hidden Markov model (HMM) acoustic models and N-gram language models using Bayes rule, have become the dominant technology used in ASR. Conversely, machine learning and natural language processing (NLP) research areas are increasingly dominated by discriminative approaches, where the class posteriors are directly modeled. This article describes recent work in the area of structured discriminative models for ASR. To handle continuous, variable length observation sequences, the approaches applied to NLP tasks must be modified. This article discusses a variety of approaches for applying structured discriminative models to ASR, both from the current literature and possible future approaches. We concentrate on structured models themselves, the descriptive features of observations commonly used within the models, and various options for optimizing the parameters of the model.

AB - Automatic speech recognition (ASR) systems classify structured sequence data, where the label sequences (sentences) must be inferred from the observation sequences (the acoustic waveform). The sequential nature of the task is one of the reasons why generative classifiers, based on combining hidden Markov model (HMM) acoustic models and N-gram language models using Bayes rule, have become the dominant technology used in ASR. Conversely, machine learning and natural language processing (NLP) research areas are increasingly dominated by discriminative approaches, where the class posteriors are directly modeled. This article describes recent work in the area of structured discriminative models for ASR. To handle continuous, variable length observation sequences, the approaches applied to NLP tasks must be modified. This article discusses a variety of approaches for applying structured discriminative models to ASR, both from the current literature and possible future approaches. We concentrate on structured models themselves, the descriptive features of observations commonly used within the models, and various options for optimizing the parameters of the model.

UR - http://www.scopus.com/inward/record.url?scp=85032751545&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032751545&partnerID=8YFLogxK

U2 - 10.1109/MSP.2012.2207140

DO - 10.1109/MSP.2012.2207140

M3 - Review article

AN - SCOPUS:85032751545

VL - 29

SP - 70

EP - 81

JO - IEEE Signal Processing Magazine

JF - IEEE Signal Processing Magazine

SN - 1053-5888

IS - 6

M1 - 6296527

ER -