Soft missing-feature mask generation for simultaneous speech recognition system in robots

Toru Takahashi, Shun'ichi Yamamoto, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages992-995
Number of pages4
Publication statusPublished - 2008
Externally publishedYes
EventINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
Duration: 2008 Sep 222008 Sep 26

Other

OtherINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
CountryAustralia
CityBrisbane, QLD
Period08/9/2208/9/26

Fingerprint

Masks
Speech recognition
Robots
Bins
Sigmoid Colon
Weights and Measures

Keywords

  • Missing feature theory
  • Robot audition
  • Simultaneous speech recognition
  • Soft mask
  • Speech recognition

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Cite this

Takahashi, T., Yamamoto, S., Nakadai, K., Komatani, K., Ogata, T., & Okuno, H. G. (2008). Soft missing-feature mask generation for simultaneous speech recognition system in robots. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 992-995)

Soft missing-feature mask generation for simultaneous speech recognition system in robots. / Takahashi, Toru; Yamamoto, Shun'ichi; Nakadai, Kazuhiro; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. p. 992-995.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Takahashi, T, Yamamoto, S, Nakadai, K, Komatani, K, Ogata, T & Okuno, HG 2008, Soft missing-feature mask generation for simultaneous speech recognition system in robots. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. pp. 992-995, INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association, Brisbane, QLD, Australia, 08/9/22.
Takahashi T, Yamamoto S, Nakadai K, Komatani K, Ogata T, Okuno HG. Soft missing-feature mask generation for simultaneous speech recognition system in robots. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. p. 992-995
Takahashi, Toru ; Yamamoto, Shun'ichi ; Nakadai, Kazuhiro ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Soft missing-feature mask generation for simultaneous speech recognition system in robots. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2008. pp. 992-995
@inproceedings{f07d313cca4d403fb6b62f7ed83747c5,
title = "Soft missing-feature mask generation for simultaneous speech recognition system in robots",
abstract = "This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5{\%} for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80{\%} for peripheral talkers and from 93 to 97{\%} for front speech when speakers were 90 degrees apart.",
keywords = "Missing feature theory, Robot audition, Simultaneous speech recognition, Soft mask, Speech recognition",
author = "Toru Takahashi and Shun'ichi Yamamoto and Kazuhiro Nakadai and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2008",
language = "English",
pages = "992--995",
booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

}

TY - GEN

T1 - Soft missing-feature mask generation for simultaneous speech recognition system in robots

AU - Takahashi, Toru

AU - Yamamoto, Shun'ichi

AU - Nakadai, Kazuhiro

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2008

Y1 - 2008

N2 - This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart.

AB - This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart.

KW - Missing feature theory

KW - Robot audition

KW - Simultaneous speech recognition

KW - Soft mask

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84867201614&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867201614&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84867201614

SP - 992

EP - 995

BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

ER -