Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions

Yasuharu Hirasawa, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

研究成果: Conference contribution

抄録

In real-world situations, a robot may often encounter "under- determined" situation, where there are more sound sources than microphones. This paper presents a speech separation method using a new constraint on the harmonic structure for a simultaneous speech-recognition system in under-determined conditions. The requirements for a speech separation method in a simultaneous speech-recognition system are (1) ability to handle a large number of talkers, and (2) reduction of distortion in acoustic features. Conventional methods use a maximum likelihood estimation in sound source separation, which fulfills requirement (1). Since it is a general approach, the performance is limited when separating speech. This paper presents a two-stage method to improve the separation. The first stage uses maximum likelihood estimation and extracts the harmonic structure, and the second stage exploits the harmonic structure as a new constraint to achieve requirement (2). We carried out an experiment that simulated three simultaneous utterances using impulse responses recorded by two microphones in an anechoic chamber. The experimental results revealed that our method could improve speech recognition correctness by about four points.

元の言語English
ホスト出版物のタイトルIEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings
ページ450-457
ページ数8
DOI
出版物ステータスPublished - 2010
外部発表Yes
イベント23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Taipei
継続期間: 2010 10 182010 10 22

Other

Other23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010
Taipei
期間10/10/1810/10/22

Fingerprint

Speech recognition
Maximum likelihood estimation
Microphones
Acoustic waves
Source separation
Anechoic chambers
Impulse response
Acoustics
Robots
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Control and Systems Engineering

これを引用

Hirasawa, Y., Takahashi, T., Komatani, K., Ogata, T., & Okuno, H. G. (2010). Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions. : IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings (pp. 450-457). [5651078] https://doi.org/10.1109/IROS.2010.5651078

Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions. / Hirasawa, Yasuharu; Takahashi, Toru; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings. 2010. p. 450-457 5651078.

研究成果: Conference contribution

Hirasawa, Y, Takahashi, T, Komatani, K, Ogata, T & Okuno, HG 2010, Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions. : IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings., 5651078, pp. 450-457, 23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010, Taipei, 10/10/18. https://doi.org/10.1109/IROS.2010.5651078
Hirasawa Y, Takahashi T, Komatani K, Ogata T, Okuno HG. Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions. : IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings. 2010. p. 450-457. 5651078 https://doi.org/10.1109/IROS.2010.5651078
Hirasawa, Yasuharu ; Takahashi, Toru ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions. IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings. 2010. pp. 450-457
@inproceedings{30056d28a60442819108f369fa74e531,
title = "Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions",
abstract = "In real-world situations, a robot may often encounter {"}under- determined{"} situation, where there are more sound sources than microphones. This paper presents a speech separation method using a new constraint on the harmonic structure for a simultaneous speech-recognition system in under-determined conditions. The requirements for a speech separation method in a simultaneous speech-recognition system are (1) ability to handle a large number of talkers, and (2) reduction of distortion in acoustic features. Conventional methods use a maximum likelihood estimation in sound source separation, which fulfills requirement (1). Since it is a general approach, the performance is limited when separating speech. This paper presents a two-stage method to improve the separation. The first stage uses maximum likelihood estimation and extracts the harmonic structure, and the second stage exploits the harmonic structure as a new constraint to achieve requirement (2). We carried out an experiment that simulated three simultaneous utterances using impulse responses recorded by two microphones in an anechoic chamber. The experimental results revealed that our method could improve speech recognition correctness by about four points.",
author = "Yasuharu Hirasawa and Toru Takahashi and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2010",
doi = "10.1109/IROS.2010.5651078",
language = "English",
isbn = "9781424466757",
pages = "450--457",
booktitle = "IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings",

}

TY - GEN

T1 - Exploiting harmonic structures to improve separating simultaneous speech in under-determined conditions

AU - Hirasawa, Yasuharu

AU - Takahashi, Toru

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2010

Y1 - 2010

N2 - In real-world situations, a robot may often encounter "under- determined" situation, where there are more sound sources than microphones. This paper presents a speech separation method using a new constraint on the harmonic structure for a simultaneous speech-recognition system in under-determined conditions. The requirements for a speech separation method in a simultaneous speech-recognition system are (1) ability to handle a large number of talkers, and (2) reduction of distortion in acoustic features. Conventional methods use a maximum likelihood estimation in sound source separation, which fulfills requirement (1). Since it is a general approach, the performance is limited when separating speech. This paper presents a two-stage method to improve the separation. The first stage uses maximum likelihood estimation and extracts the harmonic structure, and the second stage exploits the harmonic structure as a new constraint to achieve requirement (2). We carried out an experiment that simulated three simultaneous utterances using impulse responses recorded by two microphones in an anechoic chamber. The experimental results revealed that our method could improve speech recognition correctness by about four points.

AB - In real-world situations, a robot may often encounter "under- determined" situation, where there are more sound sources than microphones. This paper presents a speech separation method using a new constraint on the harmonic structure for a simultaneous speech-recognition system in under-determined conditions. The requirements for a speech separation method in a simultaneous speech-recognition system are (1) ability to handle a large number of talkers, and (2) reduction of distortion in acoustic features. Conventional methods use a maximum likelihood estimation in sound source separation, which fulfills requirement (1). Since it is a general approach, the performance is limited when separating speech. This paper presents a two-stage method to improve the separation. The first stage uses maximum likelihood estimation and extracts the harmonic structure, and the second stage exploits the harmonic structure as a new constraint to achieve requirement (2). We carried out an experiment that simulated three simultaneous utterances using impulse responses recorded by two microphones in an anechoic chamber. The experimental results revealed that our method could improve speech recognition correctness by about four points.

UR - http://www.scopus.com/inward/record.url?scp=78651494931&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78651494931&partnerID=8YFLogxK

U2 - 10.1109/IROS.2010.5651078

DO - 10.1109/IROS.2010.5651078

M3 - Conference contribution

AN - SCOPUS:78651494931

SN - 9781424466757

SP - 450

EP - 457

BT - IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings

ER -