An improvement in automatic speech recognition using soft missing feature masks for robot audition

Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We describe integration of preprocessing and automatic speech recognition based on Missing-Feature-Theory (MFT) to recognize a highly interfered speech signal, such as the signal in a narrow angle between a desired and interfered speakers. As a speech signal separated from a mixture of speech signals includes the leakage from other speech signals, recognition performance of the separated speech degrades. An important problem is estimating the leakage in time-frequency components. Once the leakage is estimated, we can generate missing feature masks (MFM) automatically by using our method. A new weighted sigmoid function is introduced for our MFM generation method. An experiment shows that a word correct rate improves from 66 % to 74 % by using our MFM generation method tuned by a search base approach in the parameter space.

Original languageEnglish
Title of host publicationIEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings
Pages964-969
Number of pages6
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Taipei
Duration: 2010 Oct 182010 Oct 22

Other

Other23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010
CityTaipei
Period10/10/1810/10/22

Fingerprint

Audition
Speech recognition
Masks
Robots
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Control and Systems Engineering

Cite this

Takahashi, T., Nakadai, K., Komatani, K., Ogata, T., & Okuno, H. G. (2010). An improvement in automatic speech recognition using soft missing feature masks for robot audition. In IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings (pp. 964-969). [5650540] https://doi.org/10.1109/IROS.2010.5650540

An improvement in automatic speech recognition using soft missing feature masks for robot audition. / Takahashi, Toru; Nakadai, Kazuhiro; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings. 2010. p. 964-969 5650540.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Takahashi, T, Nakadai, K, Komatani, K, Ogata, T & Okuno, HG 2010, An improvement in automatic speech recognition using soft missing feature masks for robot audition. in IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings., 5650540, pp. 964-969, 23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010, Taipei, 10/10/18. https://doi.org/10.1109/IROS.2010.5650540
Takahashi T, Nakadai K, Komatani K, Ogata T, Okuno HG. An improvement in automatic speech recognition using soft missing feature masks for robot audition. In IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings. 2010. p. 964-969. 5650540 https://doi.org/10.1109/IROS.2010.5650540
Takahashi, Toru ; Nakadai, Kazuhiro ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / An improvement in automatic speech recognition using soft missing feature masks for robot audition. IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings. 2010. pp. 964-969
@inproceedings{941900a8ba2d467ca831c4caca30b0bf,
title = "An improvement in automatic speech recognition using soft missing feature masks for robot audition",
abstract = "We describe integration of preprocessing and automatic speech recognition based on Missing-Feature-Theory (MFT) to recognize a highly interfered speech signal, such as the signal in a narrow angle between a desired and interfered speakers. As a speech signal separated from a mixture of speech signals includes the leakage from other speech signals, recognition performance of the separated speech degrades. An important problem is estimating the leakage in time-frequency components. Once the leakage is estimated, we can generate missing feature masks (MFM) automatically by using our method. A new weighted sigmoid function is introduced for our MFM generation method. An experiment shows that a word correct rate improves from 66 {\%} to 74 {\%} by using our MFM generation method tuned by a search base approach in the parameter space.",
author = "Toru Takahashi and Kazuhiro Nakadai and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2010",
doi = "10.1109/IROS.2010.5650540",
language = "English",
isbn = "9781424466757",
pages = "964--969",
booktitle = "IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings",

}

TY - GEN

T1 - An improvement in automatic speech recognition using soft missing feature masks for robot audition

AU - Takahashi, Toru

AU - Nakadai, Kazuhiro

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2010

Y1 - 2010

N2 - We describe integration of preprocessing and automatic speech recognition based on Missing-Feature-Theory (MFT) to recognize a highly interfered speech signal, such as the signal in a narrow angle between a desired and interfered speakers. As a speech signal separated from a mixture of speech signals includes the leakage from other speech signals, recognition performance of the separated speech degrades. An important problem is estimating the leakage in time-frequency components. Once the leakage is estimated, we can generate missing feature masks (MFM) automatically by using our method. A new weighted sigmoid function is introduced for our MFM generation method. An experiment shows that a word correct rate improves from 66 % to 74 % by using our MFM generation method tuned by a search base approach in the parameter space.

AB - We describe integration of preprocessing and automatic speech recognition based on Missing-Feature-Theory (MFT) to recognize a highly interfered speech signal, such as the signal in a narrow angle between a desired and interfered speakers. As a speech signal separated from a mixture of speech signals includes the leakage from other speech signals, recognition performance of the separated speech degrades. An important problem is estimating the leakage in time-frequency components. Once the leakage is estimated, we can generate missing feature masks (MFM) automatically by using our method. A new weighted sigmoid function is introduced for our MFM generation method. An experiment shows that a word correct rate improves from 66 % to 74 % by using our MFM generation method tuned by a search base approach in the parameter space.

UR - http://www.scopus.com/inward/record.url?scp=78651493797&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78651493797&partnerID=8YFLogxK

U2 - 10.1109/IROS.2010.5650540

DO - 10.1109/IROS.2010.5650540

M3 - Conference contribution

SN - 9781424466757

SP - 964

EP - 969

BT - IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings

ER -