Decoding network optimization using minimum transition error training

Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

The discriminative optimization of decoding networks is important for minimizing speech recognition error. Recently, several methods have been reported that optimize decoding networks by extending weighted finite state transducer (WFST)-based decoding processes to a linear classification process. In this paper, we model decoding processes by using conditional random fields (CRFs). Since the maximum mutual information (MMI) training technique is straightforwardly applicable for CRF training, several sophisticated training methods proposed as the variants of MMI can be incorporated in our decoding network optimization. This paper adapts the boosted MMI and the differenced MMI methods for decoding network optimization so that state transition errors are minimized in WFST decoding. We evaluated the proposed methods by conducting large-vocabulary continuous speech recognition experiments. We confirmed that the CRF-based framework and transition error minimization are efficient for improving the accuracy of automatic speech recognizers.

Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages4197-4200
Number of pages4
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto
Duration: 2012 Mar 252012 Mar 30

Other

Other2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
CityKyoto
Period12/3/2512/3/30

Fingerprint

Decoding
Transducers
Continuous speech recognition
Speech recognition
Experiments

Keywords

  • Automatic speech recognition
  • conditional random fields
  • transition errors
  • weighed finite-state transducers

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Kubo, Y., Watanabe, S., & Nakamura, A. (2012). Decoding network optimization using minimum transition error training. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings (pp. 4197-4200). [6288844] https://doi.org/10.1109/ICASSP.2012.6288844

Decoding network optimization using minimum transition error training. / Kubo, Yotaro; Watanabe, Shinji; Nakamura, Atsushi.

2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. p. 4197-4200 6288844.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kubo, Y, Watanabe, S & Nakamura, A 2012, Decoding network optimization using minimum transition error training. in 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings., 6288844, pp. 4197-4200, 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, Kyoto, 12/3/25. https://doi.org/10.1109/ICASSP.2012.6288844
Kubo Y, Watanabe S, Nakamura A. Decoding network optimization using minimum transition error training. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. p. 4197-4200. 6288844 https://doi.org/10.1109/ICASSP.2012.6288844
Kubo, Yotaro ; Watanabe, Shinji ; Nakamura, Atsushi. / Decoding network optimization using minimum transition error training. 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. pp. 4197-4200
@inproceedings{2e9817195105453eb4904d62e300d7f6,
title = "Decoding network optimization using minimum transition error training",
abstract = "The discriminative optimization of decoding networks is important for minimizing speech recognition error. Recently, several methods have been reported that optimize decoding networks by extending weighted finite state transducer (WFST)-based decoding processes to a linear classification process. In this paper, we model decoding processes by using conditional random fields (CRFs). Since the maximum mutual information (MMI) training technique is straightforwardly applicable for CRF training, several sophisticated training methods proposed as the variants of MMI can be incorporated in our decoding network optimization. This paper adapts the boosted MMI and the differenced MMI methods for decoding network optimization so that state transition errors are minimized in WFST decoding. We evaluated the proposed methods by conducting large-vocabulary continuous speech recognition experiments. We confirmed that the CRF-based framework and transition error minimization are efficient for improving the accuracy of automatic speech recognizers.",
keywords = "Automatic speech recognition, conditional random fields, transition errors, weighed finite-state transducers",
author = "Yotaro Kubo and Shinji Watanabe and Atsushi Nakamura",
year = "2012",
doi = "10.1109/ICASSP.2012.6288844",
language = "English",
isbn = "9781467300469",
pages = "4197--4200",
booktitle = "2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings",

}

TY - GEN

T1 - Decoding network optimization using minimum transition error training

AU - Kubo, Yotaro

AU - Watanabe, Shinji

AU - Nakamura, Atsushi

PY - 2012

Y1 - 2012

N2 - The discriminative optimization of decoding networks is important for minimizing speech recognition error. Recently, several methods have been reported that optimize decoding networks by extending weighted finite state transducer (WFST)-based decoding processes to a linear classification process. In this paper, we model decoding processes by using conditional random fields (CRFs). Since the maximum mutual information (MMI) training technique is straightforwardly applicable for CRF training, several sophisticated training methods proposed as the variants of MMI can be incorporated in our decoding network optimization. This paper adapts the boosted MMI and the differenced MMI methods for decoding network optimization so that state transition errors are minimized in WFST decoding. We evaluated the proposed methods by conducting large-vocabulary continuous speech recognition experiments. We confirmed that the CRF-based framework and transition error minimization are efficient for improving the accuracy of automatic speech recognizers.

AB - The discriminative optimization of decoding networks is important for minimizing speech recognition error. Recently, several methods have been reported that optimize decoding networks by extending weighted finite state transducer (WFST)-based decoding processes to a linear classification process. In this paper, we model decoding processes by using conditional random fields (CRFs). Since the maximum mutual information (MMI) training technique is straightforwardly applicable for CRF training, several sophisticated training methods proposed as the variants of MMI can be incorporated in our decoding network optimization. This paper adapts the boosted MMI and the differenced MMI methods for decoding network optimization so that state transition errors are minimized in WFST decoding. We evaluated the proposed methods by conducting large-vocabulary continuous speech recognition experiments. We confirmed that the CRF-based framework and transition error minimization are efficient for improving the accuracy of automatic speech recognizers.

KW - Automatic speech recognition

KW - conditional random fields

KW - transition errors

KW - weighed finite-state transducers

UR - http://www.scopus.com/inward/record.url?scp=84865213310&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865213310&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2012.6288844

DO - 10.1109/ICASSP.2012.6288844

M3 - Conference contribution

AN - SCOPUS:84865213310

SN - 9781467300469

SP - 4197

EP - 4200

BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings

ER -