Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures

Yasuharu Hirasawa, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages348-358
Number of pages11
Volume6703 LNAI
EditionPART 1
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event24th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2011 - Syracuse, NY
Duration: 2011 Jun 282011 Jul 1

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume6703 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other24th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2011
CitySyracuse, NY
Period11/6/2811/7/1

Fingerprint

Harmonic
Robot
Acoustic waves
Robots
Speech recognition
L1-norm
Source separation
Anechoic chambers
Microphones
Source Separation
Automatic Speech Recognition
Speech Signal
Speech Recognition
Acoustics
Computational Cost
Correctness
Exceed
Sensors
Real-time
Sensor

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Hirasawa, Y., Takahashi, T., Ogata, T., & Okuno, H. G. (2011). Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (PART 1 ed., Vol. 6703 LNAI, pp. 348-358). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6703 LNAI, No. PART 1). https://doi.org/10.1007/978-3-642-21822-4_35

Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures. / Hirasawa, Yasuharu; Takahashi, Toru; Ogata, Tetsuya; Okuno, Hiroshi G.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6703 LNAI PART 1. ed. 2011. p. 348-358 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6703 LNAI, No. PART 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hirasawa, Y, Takahashi, T, Ogata, T & Okuno, HG 2011, Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 edn, vol. 6703 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 6703 LNAI, pp. 348-358, 24th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2011, Syracuse, NY, 11/6/28. https://doi.org/10.1007/978-3-642-21822-4_35
Hirasawa Y, Takahashi T, Ogata T, Okuno HG. Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). PART 1 ed. Vol. 6703 LNAI. 2011. p. 348-358. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). https://doi.org/10.1007/978-3-642-21822-4_35
Hirasawa, Yasuharu ; Takahashi, Toru ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 6703 LNAI PART 1. ed. 2011. pp. 348-358 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).
@inproceedings{50514075446b48d58a4698b4aad0c085,
title = "Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures",
abstract = "In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation.",
author = "Yasuharu Hirasawa and Toru Takahashi and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2011",
doi = "10.1007/978-3-642-21822-4_35",
language = "English",
isbn = "9783642218217",
volume = "6703 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
number = "PART 1",
pages = "348--358",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
edition = "PART 1",

}

TY - GEN

T1 - Robot with two ears listens to more than two simultaneous utterances by exploiting harmonic structures

AU - Hirasawa, Yasuharu

AU - Takahashi, Toru

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2011

Y1 - 2011

N2 - In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation.

AB - In real-world situations, people often hear more than two simultaneous sounds. For robots, when the number of sound sources exceeds that of sensors, the situation is called under-determined, and robots with two ears need to deal with this situation. Some studies on under-determined sound source separation use L1-norm minimization methods, but the performance of automatic speech recognition with separated speech signals is poor due to its spectral distortion. In this paper, a two-stage separation method to improve separation quality with low computational cost is presented. The first stage uses a L1-norm minimization method in order to extract the harmonic structures. The second stage exploits reliable harmonic structures to maintain acoustic features. Experiments that simulate three utterances recorded by two microphones in an anechoic chamber show that our method improves speech recognition correctness by about three points and is fast enough for real-time separation.

UR - http://www.scopus.com/inward/record.url?scp=79960496413&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960496413&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-21822-4_35

DO - 10.1007/978-3-642-21822-4_35

M3 - Conference contribution

AN - SCOPUS:79960496413

SN - 9783642218217

VL - 6703 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 348

EP - 358

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -