TY - GEN
T1 - Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions
AU - Hirasawa, Yasuharu
AU - Takahashi, Toru
AU - Komatani, Kazunori
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2010
Y1 - 2010
N2 - In real-world situations, a robot may often encounter "under- determined" situation, where there are more sound sources than microphones. This paper presents a speech separation method using a new constraint on the harmonic structure for a simultaneous speech-recognition system in under-determined conditions. The requirements for a speech separation method in a simultaneous speech-recognition system are (1) ability to handle a large number of talkers, and (2) reduction of distortion in acoustic features. Conventional methods use a maximum likelihood estimation in sound source separation, which fulfills requirement (1). Since it is a general approach, the performance is limited when separating speech. This paper presents a two-stage method to improve the separation. The first stage uses maximum likelihood estimation and extracts the harmonic structure, and the second stage exploits the harmonic structure as a new constraint to achieve requirement (2). We carried out an experiment that simulated three simultaneous utterances using impulse responses recorded by two microphones in an anechoic chamber. The experimental results revealed that our method could improve speech recognition correctness by about four points.
AB - In real-world situations, a robot may often encounter "under- determined" situation, where there are more sound sources than microphones. This paper presents a speech separation method using a new constraint on the harmonic structure for a simultaneous speech-recognition system in under-determined conditions. The requirements for a speech separation method in a simultaneous speech-recognition system are (1) ability to handle a large number of talkers, and (2) reduction of distortion in acoustic features. Conventional methods use a maximum likelihood estimation in sound source separation, which fulfills requirement (1). Since it is a general approach, the performance is limited when separating speech. This paper presents a two-stage method to improve the separation. The first stage uses maximum likelihood estimation and extracts the harmonic structure, and the second stage exploits the harmonic structure as a new constraint to achieve requirement (2). We carried out an experiment that simulated three simultaneous utterances using impulse responses recorded by two microphones in an anechoic chamber. The experimental results revealed that our method could improve speech recognition correctness by about four points.
UR - http://www.scopus.com/inward/record.url?scp=78651494931&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651494931&partnerID=8YFLogxK
U2 - 10.1109/IROS.2010.5651078
DO - 10.1109/IROS.2010.5651078
M3 - Conference contribution
AN - SCOPUS:78651494931
SN - 9781424466757
T3 - IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings
SP - 450
EP - 457
BT - IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings
T2 - 23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010
Y2 - 18 October 2010 through 22 October 2010
ER -