Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals

Shun'ichi Yamamoto, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean Marc Valin, Ryu Takeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Since a robot usually hears a mixture of sounds, in particular, simultaneous speech signals, it should be able to localize, separate, and recognize each speech signal. Since separated speech signals suffer from spectral distortion, normal automatic speech recognition (ASR) may fail in recognizing such distorted speech signals. Yamamoto et al. proposed using the Missing Feature Theory to mask corrupt features in ASR, and developed the automatic missing-feature-mask generation (AMG) system by using information obtained by sound source separation (SSS). Our evaluations of recognition performance of the system indicate possibilities for improving it by optimizing many of its parameters. We used genetic algorithms to optimize these parameters. Each chromosome consists of a set of parameters for SSS and AMG, and each chromosome is evaluated by recognition rate of separated sounds. We obtained an optimized sets of parameters for each distance (from 50 cm to 250 cm by 50 cm) and direction (30, 60, and 90 degree intervals) for two simultaneous speech signals. The average isolated word recognition rates ranged from 84.9% to 94.7%.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages207-217
Number of pages11
Volume4031 LNAI
Publication statusPublished - 2006
Externally publishedYes
Event19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2006 - Annecy
Duration: 2006 Jun 272006 Jun 30

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4031 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2006
CityAnnecy
Period06/6/2706/6/30

Fingerprint

Speech Signal
Audition
Hearing
Robot
Genetic algorithms
Genetic Algorithm
Robots
Acoustic waves
Mask
Source separation
Masks
Source Separation
Automatic Speech Recognition
Chromosomes
Speech recognition
Chromosome
Optimise
Information Systems
Interval
Sound

Keywords

  • Microphone array
  • Robot audition
  • Robot-human interaction
  • Simultaneous Speakers
  • Sound source separation
  • Speech recognition

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., Takeda, R., ... Okuno, H. G. (2006). Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4031 LNAI, pp. 207-217). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4031 LNAI).

Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals. / Yamamoto, Shun'ichi; Nakadai, Kazuhiro; Nakano, Mikio; Tsujino, Hiroshi; Valin, Jean Marc; Takeda, Ryu; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4031 LNAI 2006. p. 207-217 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4031 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yamamoto, S, Nakadai, K, Nakano, M, Tsujino, H, Valin, JM, Takeda, R, Komatani, K, Ogata, T & Okuno, HG 2006, Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4031 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4031 LNAI, pp. 207-217, 19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2006, Annecy, 06/6/27.
Yamamoto S, Nakadai K, Nakano M, Tsujino H, Valin JM, Takeda R et al. Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4031 LNAI. 2006. p. 207-217. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Yamamoto, Shun'ichi ; Nakadai, Kazuhiro ; Nakano, Mikio ; Tsujino, Hiroshi ; Valin, Jean Marc ; Takeda, Ryu ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4031 LNAI 2006. pp. 207-217 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{04a5230c5dbd4a428d68b56645075df2,
title = "Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals",
abstract = "Since a robot usually hears a mixture of sounds, in particular, simultaneous speech signals, it should be able to localize, separate, and recognize each speech signal. Since separated speech signals suffer from spectral distortion, normal automatic speech recognition (ASR) may fail in recognizing such distorted speech signals. Yamamoto et al. proposed using the Missing Feature Theory to mask corrupt features in ASR, and developed the automatic missing-feature-mask generation (AMG) system by using information obtained by sound source separation (SSS). Our evaluations of recognition performance of the system indicate possibilities for improving it by optimizing many of its parameters. We used genetic algorithms to optimize these parameters. Each chromosome consists of a set of parameters for SSS and AMG, and each chromosome is evaluated by recognition rate of separated sounds. We obtained an optimized sets of parameters for each distance (from 50 cm to 250 cm by 50 cm) and direction (30, 60, and 90 degree intervals) for two simultaneous speech signals. The average isolated word recognition rates ranged from 84.9{\%} to 94.7{\%}.",
keywords = "Microphone array, Robot audition, Robot-human interaction, Simultaneous Speakers, Sound source separation, Speech recognition",
author = "Shun'ichi Yamamoto and Kazuhiro Nakadai and Mikio Nakano and Hiroshi Tsujino and Valin, {Jean Marc} and Ryu Takeda and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
language = "English",
isbn = "3540354530",
volume = "4031 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "207--217",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals

AU - Yamamoto, Shun'ichi

AU - Nakadai, Kazuhiro

AU - Nakano, Mikio

AU - Tsujino, Hiroshi

AU - Valin, Jean Marc

AU - Takeda, Ryu

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - Since a robot usually hears a mixture of sounds, in particular, simultaneous speech signals, it should be able to localize, separate, and recognize each speech signal. Since separated speech signals suffer from spectral distortion, normal automatic speech recognition (ASR) may fail in recognizing such distorted speech signals. Yamamoto et al. proposed using the Missing Feature Theory to mask corrupt features in ASR, and developed the automatic missing-feature-mask generation (AMG) system by using information obtained by sound source separation (SSS). Our evaluations of recognition performance of the system indicate possibilities for improving it by optimizing many of its parameters. We used genetic algorithms to optimize these parameters. Each chromosome consists of a set of parameters for SSS and AMG, and each chromosome is evaluated by recognition rate of separated sounds. We obtained an optimized sets of parameters for each distance (from 50 cm to 250 cm by 50 cm) and direction (30, 60, and 90 degree intervals) for two simultaneous speech signals. The average isolated word recognition rates ranged from 84.9% to 94.7%.

AB - Since a robot usually hears a mixture of sounds, in particular, simultaneous speech signals, it should be able to localize, separate, and recognize each speech signal. Since separated speech signals suffer from spectral distortion, normal automatic speech recognition (ASR) may fail in recognizing such distorted speech signals. Yamamoto et al. proposed using the Missing Feature Theory to mask corrupt features in ASR, and developed the automatic missing-feature-mask generation (AMG) system by using information obtained by sound source separation (SSS). Our evaluations of recognition performance of the system indicate possibilities for improving it by optimizing many of its parameters. We used genetic algorithms to optimize these parameters. Each chromosome consists of a set of parameters for SSS and AMG, and each chromosome is evaluated by recognition rate of separated sounds. We obtained an optimized sets of parameters for each distance (from 50 cm to 250 cm by 50 cm) and direction (30, 60, and 90 degree intervals) for two simultaneous speech signals. The average isolated word recognition rates ranged from 84.9% to 94.7%.

KW - Microphone array

KW - Robot audition

KW - Robot-human interaction

KW - Simultaneous Speakers

KW - Sound source separation

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=33746191291&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746191291&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33746191291

SN - 3540354530

SN - 9783540354536

VL - 4031 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 207

EP - 217

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -