TY - JOUR
T1 - Soft missing-feature mask generation for simultaneous speech recognition system in robots
AU - Takahashi, Toru
AU - Yamamoto, Shun'ichi
AU - Nakadai, Kazuhiro
AU - Komatani, Kazunori
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2008/12/1
Y1 - 2008/12/1
N2 - This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart.
AB - This paper addresses automatic soft missing-feature mask (MFM) generation based on a leak energy estimation for a simultaneous speech recognition system. An MFM is used as a weight for probability calculation in a recognition process. In a previous work, a threshold-base-zero-or-one function was applied to decide if spectral parameter can be reliable or not for each frequency bin. The function is extended into a weighted sigmoid function which has two free parameters. In addition, a contribution ratio of static features is introduced for the probability calculation in a recognition process which static and dynamic features are input. The ratio can be implemented as a part of soft mask. The average recognition rate based on a soft MFM improved by about 5% for all directions from a conventional system based on a hard MFM. Word recognition rates improved from 70 to 80% for peripheral talkers and from 93 to 97% for front speech when speakers were 90 degrees apart.
KW - Missing feature theory
KW - Robot audition
KW - Simultaneous speech recognition
KW - Soft mask
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=84867201614&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867201614&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84867201614
SN - 2308-457X
SP - 992
EP - 995
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
Y2 - 22 September 2008 through 26 September 2008
ER -