Bag of ARCS

New representation of speech segment features based on finite state machines

Shinji Watanabe, Yotaro Kubo, Takanobu Oba, Takaaki Hori, Atsushi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes a new feature representation, Bag Of Arcs (BOA) for speech segments. A speech segment in BOA is simply represented as a set of counts for unique arcs in a finite state machine. Similar to the Bag Of Words model (BOW), BOA disregards the order of arcs, and thus, efficiently models speech segments. A strong motivation to use BOA is provided by a fact that the BOA representation is tightly connected to the output of a Weighted Finite State Transducer (WFST) based ASR decoder. Thus, BOA directly represents elements in the search network of a WFST-based ASR decoder, and can include information about context-dependent HMM topologies, lexicons, and back-off smoothed n-gram networks. In addition, the counts of BOA are accumulated by using the WFST decoder output directly, and we do not require an additional overhead and a change of decoding algorithms to extract the features. Consequently, we can combine the ASR decoder and post-processing without a process to extract word features from the decoder outputs or re-compiling WFST networks. We show the effectiveness of the proposed approach for some ASR post-processing applications in utterance classification experiments, and in speaker adaptation experiments by achieving absolute 1% improvement in WER from baseline results. We also show examples of latent semantic analysis for BOA by using latent Dirichlet allocation.

Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages4201-4204
Number of pages4
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto
Duration: 2012 Mar 252012 Mar 30

Other

Other2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
CityKyoto
Period12/3/2512/3/30

Fingerprint

Finite automata
Transducers
Processing
Decoding
Experiments
Semantics
Topology

Keywords

  • Bag Of Arcs (BOA)
  • finite state machine
  • speaker recognition
  • Speech segment feature
  • utterance classification

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Watanabe, S., Kubo, Y., Oba, T., Hori, T., & Nakamura, A. (2012). Bag of ARCS: New representation of speech segment features based on finite state machines. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings (pp. 4201-4204). [6288845] https://doi.org/10.1109/ICASSP.2012.6288845

Bag of ARCS : New representation of speech segment features based on finite state machines. / Watanabe, Shinji; Kubo, Yotaro; Oba, Takanobu; Hori, Takaaki; Nakamura, Atsushi.

2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. p. 4201-4204 6288845.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Watanabe, S, Kubo, Y, Oba, T, Hori, T & Nakamura, A 2012, Bag of ARCS: New representation of speech segment features based on finite state machines. in 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings., 6288845, pp. 4201-4204, 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012, Kyoto, 12/3/25. https://doi.org/10.1109/ICASSP.2012.6288845
Watanabe S, Kubo Y, Oba T, Hori T, Nakamura A. Bag of ARCS: New representation of speech segment features based on finite state machines. In 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. p. 4201-4204. 6288845 https://doi.org/10.1109/ICASSP.2012.6288845
Watanabe, Shinji ; Kubo, Yotaro ; Oba, Takanobu ; Hori, Takaaki ; Nakamura, Atsushi. / Bag of ARCS : New representation of speech segment features based on finite state machines. 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings. 2012. pp. 4201-4204
@inproceedings{41f0bf09f7084612b89205928142b86a,
title = "Bag of ARCS: New representation of speech segment features based on finite state machines",
abstract = "This paper proposes a new feature representation, Bag Of Arcs (BOA) for speech segments. A speech segment in BOA is simply represented as a set of counts for unique arcs in a finite state machine. Similar to the Bag Of Words model (BOW), BOA disregards the order of arcs, and thus, efficiently models speech segments. A strong motivation to use BOA is provided by a fact that the BOA representation is tightly connected to the output of a Weighted Finite State Transducer (WFST) based ASR decoder. Thus, BOA directly represents elements in the search network of a WFST-based ASR decoder, and can include information about context-dependent HMM topologies, lexicons, and back-off smoothed n-gram networks. In addition, the counts of BOA are accumulated by using the WFST decoder output directly, and we do not require an additional overhead and a change of decoding algorithms to extract the features. Consequently, we can combine the ASR decoder and post-processing without a process to extract word features from the decoder outputs or re-compiling WFST networks. We show the effectiveness of the proposed approach for some ASR post-processing applications in utterance classification experiments, and in speaker adaptation experiments by achieving absolute 1{\%} improvement in WER from baseline results. We also show examples of latent semantic analysis for BOA by using latent Dirichlet allocation.",
keywords = "Bag Of Arcs (BOA), finite state machine, speaker recognition, Speech segment feature, utterance classification",
author = "Shinji Watanabe and Yotaro Kubo and Takanobu Oba and Takaaki Hori and Atsushi Nakamura",
year = "2012",
doi = "10.1109/ICASSP.2012.6288845",
language = "English",
isbn = "9781467300469",
pages = "4201--4204",
booktitle = "2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings",

}

TY - GEN

T1 - Bag of ARCS

T2 - New representation of speech segment features based on finite state machines

AU - Watanabe, Shinji

AU - Kubo, Yotaro

AU - Oba, Takanobu

AU - Hori, Takaaki

AU - Nakamura, Atsushi

PY - 2012

Y1 - 2012

N2 - This paper proposes a new feature representation, Bag Of Arcs (BOA) for speech segments. A speech segment in BOA is simply represented as a set of counts for unique arcs in a finite state machine. Similar to the Bag Of Words model (BOW), BOA disregards the order of arcs, and thus, efficiently models speech segments. A strong motivation to use BOA is provided by a fact that the BOA representation is tightly connected to the output of a Weighted Finite State Transducer (WFST) based ASR decoder. Thus, BOA directly represents elements in the search network of a WFST-based ASR decoder, and can include information about context-dependent HMM topologies, lexicons, and back-off smoothed n-gram networks. In addition, the counts of BOA are accumulated by using the WFST decoder output directly, and we do not require an additional overhead and a change of decoding algorithms to extract the features. Consequently, we can combine the ASR decoder and post-processing without a process to extract word features from the decoder outputs or re-compiling WFST networks. We show the effectiveness of the proposed approach for some ASR post-processing applications in utterance classification experiments, and in speaker adaptation experiments by achieving absolute 1% improvement in WER from baseline results. We also show examples of latent semantic analysis for BOA by using latent Dirichlet allocation.

AB - This paper proposes a new feature representation, Bag Of Arcs (BOA) for speech segments. A speech segment in BOA is simply represented as a set of counts for unique arcs in a finite state machine. Similar to the Bag Of Words model (BOW), BOA disregards the order of arcs, and thus, efficiently models speech segments. A strong motivation to use BOA is provided by a fact that the BOA representation is tightly connected to the output of a Weighted Finite State Transducer (WFST) based ASR decoder. Thus, BOA directly represents elements in the search network of a WFST-based ASR decoder, and can include information about context-dependent HMM topologies, lexicons, and back-off smoothed n-gram networks. In addition, the counts of BOA are accumulated by using the WFST decoder output directly, and we do not require an additional overhead and a change of decoding algorithms to extract the features. Consequently, we can combine the ASR decoder and post-processing without a process to extract word features from the decoder outputs or re-compiling WFST networks. We show the effectiveness of the proposed approach for some ASR post-processing applications in utterance classification experiments, and in speaker adaptation experiments by achieving absolute 1% improvement in WER from baseline results. We also show examples of latent semantic analysis for BOA by using latent Dirichlet allocation.

KW - Bag Of Arcs (BOA)

KW - finite state machine

KW - speaker recognition

KW - Speech segment feature

KW - utterance classification

UR - http://www.scopus.com/inward/record.url?scp=84867602643&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867602643&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2012.6288845

DO - 10.1109/ICASSP.2012.6288845

M3 - Conference contribution

SN - 9781467300469

SP - 4201

EP - 4204

BT - 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings

ER -