Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data

Shinji Watanabe, Takaaki Hori, Atsushi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

This paper describes a discriminative approach that further advances the framework for Weighted Finite State Transducer (WFST) based decoding. The approach introduces additional linear models for adjusting the scores of a decoding graph composed of conventional information source models (e.g., hidden Markov models and N-gram models), and reviews the WFST-based decoding process as a linear classifier for structured data (e.g., sequential multiclass data). The difficulty with the approach is that the number of dimensions of the additional linear models becomes very large in proportion to the number of arcs in a WFST, and our previous study only applied it to a small task (TIMIT phoneme recognition). This paper proposes a training method for a large-scale linear classifier employed in WFST-based decoding by using a distributed perceptron algorithm. The experimental results show that the proposed approach was successfully applied to a large vocabulary continuous speech recognition task, and achieved an improvement compared with the performance of the minimum phone error based discriminative training of acoustic models.

Original languageEnglish
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
Pages346-349
Number of pages4
Publication statusPublished - 2010
Externally publishedYes
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba
Duration: 2010 Sep 262010 Sep 30

Other

Other11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CityMakuhari, Chiba
Period10/9/2610/9/30

Fingerprint

Vocabulary
Transducers
Linear Models
Neural Networks (Computer)
Acoustics
Speech Recognition
Classifier
Continuous Speech
Decoding

Keywords

  • Distributed perceptron
  • Large vocabulary continuous speech recognition
  • Linear classifier
  • Speech recognition
  • Weighted finite state transducer

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Cite this

Watanabe, S., Hori, T., & Nakamura, A. (2010). Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 (pp. 346-349)

Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. / Watanabe, Shinji; Hori, Takaaki; Nakamura, Atsushi.

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 346-349.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Watanabe, S, Hori, T & Nakamura, A 2010, Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. pp. 346-349, 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, 10/9/26.
Watanabe S, Hori T, Nakamura A. Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 346-349
Watanabe, Shinji ; Hori, Takaaki ; Nakamura, Atsushi. / Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. pp. 346-349
@inproceedings{00432b1bbc944e2eb1f4cf131b7e48d6,
title = "Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data",
abstract = "This paper describes a discriminative approach that further advances the framework for Weighted Finite State Transducer (WFST) based decoding. The approach introduces additional linear models for adjusting the scores of a decoding graph composed of conventional information source models (e.g., hidden Markov models and N-gram models), and reviews the WFST-based decoding process as a linear classifier for structured data (e.g., sequential multiclass data). The difficulty with the approach is that the number of dimensions of the additional linear models becomes very large in proportion to the number of arcs in a WFST, and our previous study only applied it to a small task (TIMIT phoneme recognition). This paper proposes a training method for a large-scale linear classifier employed in WFST-based decoding by using a distributed perceptron algorithm. The experimental results show that the proposed approach was successfully applied to a large vocabulary continuous speech recognition task, and achieved an improvement compared with the performance of the minimum phone error based discriminative training of acoustic models.",
keywords = "Distributed perceptron, Large vocabulary continuous speech recognition, Linear classifier, Speech recognition, Weighted finite state transducer",
author = "Shinji Watanabe and Takaaki Hori and Atsushi Nakamura",
year = "2010",
language = "English",
pages = "346--349",
booktitle = "Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010",

}

TY - GEN

T1 - Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data

AU - Watanabe, Shinji

AU - Hori, Takaaki

AU - Nakamura, Atsushi

PY - 2010

Y1 - 2010

N2 - This paper describes a discriminative approach that further advances the framework for Weighted Finite State Transducer (WFST) based decoding. The approach introduces additional linear models for adjusting the scores of a decoding graph composed of conventional information source models (e.g., hidden Markov models and N-gram models), and reviews the WFST-based decoding process as a linear classifier for structured data (e.g., sequential multiclass data). The difficulty with the approach is that the number of dimensions of the additional linear models becomes very large in proportion to the number of arcs in a WFST, and our previous study only applied it to a small task (TIMIT phoneme recognition). This paper proposes a training method for a large-scale linear classifier employed in WFST-based decoding by using a distributed perceptron algorithm. The experimental results show that the proposed approach was successfully applied to a large vocabulary continuous speech recognition task, and achieved an improvement compared with the performance of the minimum phone error based discriminative training of acoustic models.

AB - This paper describes a discriminative approach that further advances the framework for Weighted Finite State Transducer (WFST) based decoding. The approach introduces additional linear models for adjusting the scores of a decoding graph composed of conventional information source models (e.g., hidden Markov models and N-gram models), and reviews the WFST-based decoding process as a linear classifier for structured data (e.g., sequential multiclass data). The difficulty with the approach is that the number of dimensions of the additional linear models becomes very large in proportion to the number of arcs in a WFST, and our previous study only applied it to a small task (TIMIT phoneme recognition). This paper proposes a training method for a large-scale linear classifier employed in WFST-based decoding by using a distributed perceptron algorithm. The experimental results show that the proposed approach was successfully applied to a large vocabulary continuous speech recognition task, and achieved an improvement compared with the performance of the minimum phone error based discriminative training of acoustic models.

KW - Distributed perceptron

KW - Large vocabulary continuous speech recognition

KW - Linear classifier

KW - Speech recognition

KW - Weighted finite state transducer

UR - http://www.scopus.com/inward/record.url?scp=79959846027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959846027&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:79959846027

SP - 346

EP - 349

BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

ER -