Speech recognition of double talk using SAFIA-based audio segregation

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Double-talk recognition under a distant microphone condition, a serious problem in speech applications in a real environment, is realized through use of modified SAFIA acoustic model adaptation or training. The original SAFIA is a high-performance audio segregation method based on band selection using two directivity microphones. We have modified SAFIA by adopting array signal processing have realized optimal directivity for SAFIA.We also used generalized harmonic analysis (GHA) instead of FFT for the spectral analysis in SAFIA to remove the effect of windowing which causes sound-quality degradation in SAFIA. These modifications of SAFIA enable good segregation in a human auditory sense, but the quality is still insufficient for recognition. Because SAFIA causes some particular distortion, we used MLLR-based acoustic model adaptation immunity training to be robust to the distortion of SAFIA. These efforts enabled 76.2% word accuracy under the condition that the SN ratio is 0 dB, this represents a 45% reduction in the error obtained in the case where only array signal processing was used, and a 30% error reduction compared with when only SAFIAbased audio segregation was used.

    Original languageEnglish
    Title of host publicationEUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology
    PublisherInternational Speech Communication Association
    Pages1285-1288
    Number of pages4
    Publication statusPublished - 2003
    Event8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland
    Duration: 2003 Sep 12003 Sep 4

    Other

    Other8th European Conference on Speech Communication and Technology, EUROSPEECH 2003
    CountrySwitzerland
    CityGeneva
    Period03/9/103/9/4

    Fingerprint

    Microphones
    Speech recognition
    segregation
    Signal processing
    Acoustics
    acoustics
    Harmonic analysis
    Fast Fourier transforms
    Spectrum analysis
    cause
    immunity
    Acoustic waves
    Degradation
    performance

    ASJC Scopus subject areas

    • Computer Science Applications
    • Software
    • Linguistics and Language
    • Communication

    Cite this

    Sekiya, T., Ogawa, T., & Kobayashi, T. (2003). Speech recognition of double talk using SAFIA-based audio segregation. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology (pp. 1285-1288). International Speech Communication Association.

    Speech recognition of double talk using SAFIA-based audio segregation. / Sekiya, Toshiyuki; Ogawa, Tetsuji; Kobayashi, Tetsunori.

    EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. p. 1285-1288.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Sekiya, T, Ogawa, T & Kobayashi, T 2003, Speech recognition of double talk using SAFIA-based audio segregation. in EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, pp. 1285-1288, 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, Geneva, Switzerland, 03/9/1.
    Sekiya T, Ogawa T, Kobayashi T. Speech recognition of double talk using SAFIA-based audio segregation. In EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association. 2003. p. 1285-1288
    Sekiya, Toshiyuki ; Ogawa, Tetsuji ; Kobayashi, Tetsunori. / Speech recognition of double talk using SAFIA-based audio segregation. EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology. International Speech Communication Association, 2003. pp. 1285-1288
    @inproceedings{616cfa8e5f3e49e1978985c18d7f1fd6,
    title = "Speech recognition of double talk using SAFIA-based audio segregation",
    abstract = "Double-talk recognition under a distant microphone condition, a serious problem in speech applications in a real environment, is realized through use of modified SAFIA acoustic model adaptation or training. The original SAFIA is a high-performance audio segregation method based on band selection using two directivity microphones. We have modified SAFIA by adopting array signal processing have realized optimal directivity for SAFIA.We also used generalized harmonic analysis (GHA) instead of FFT for the spectral analysis in SAFIA to remove the effect of windowing which causes sound-quality degradation in SAFIA. These modifications of SAFIA enable good segregation in a human auditory sense, but the quality is still insufficient for recognition. Because SAFIA causes some particular distortion, we used MLLR-based acoustic model adaptation immunity training to be robust to the distortion of SAFIA. These efforts enabled 76.2{\%} word accuracy under the condition that the SN ratio is 0 dB, this represents a 45{\%} reduction in the error obtained in the case where only array signal processing was used, and a 30{\%} error reduction compared with when only SAFIAbased audio segregation was used.",
    author = "Toshiyuki Sekiya and Tetsuji Ogawa and Tetsunori Kobayashi",
    year = "2003",
    language = "English",
    pages = "1285--1288",
    booktitle = "EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology",
    publisher = "International Speech Communication Association",

    }

    TY - GEN

    T1 - Speech recognition of double talk using SAFIA-based audio segregation

    AU - Sekiya, Toshiyuki

    AU - Ogawa, Tetsuji

    AU - Kobayashi, Tetsunori

    PY - 2003

    Y1 - 2003

    N2 - Double-talk recognition under a distant microphone condition, a serious problem in speech applications in a real environment, is realized through use of modified SAFIA acoustic model adaptation or training. The original SAFIA is a high-performance audio segregation method based on band selection using two directivity microphones. We have modified SAFIA by adopting array signal processing have realized optimal directivity for SAFIA.We also used generalized harmonic analysis (GHA) instead of FFT for the spectral analysis in SAFIA to remove the effect of windowing which causes sound-quality degradation in SAFIA. These modifications of SAFIA enable good segregation in a human auditory sense, but the quality is still insufficient for recognition. Because SAFIA causes some particular distortion, we used MLLR-based acoustic model adaptation immunity training to be robust to the distortion of SAFIA. These efforts enabled 76.2% word accuracy under the condition that the SN ratio is 0 dB, this represents a 45% reduction in the error obtained in the case where only array signal processing was used, and a 30% error reduction compared with when only SAFIAbased audio segregation was used.

    AB - Double-talk recognition under a distant microphone condition, a serious problem in speech applications in a real environment, is realized through use of modified SAFIA acoustic model adaptation or training. The original SAFIA is a high-performance audio segregation method based on band selection using two directivity microphones. We have modified SAFIA by adopting array signal processing have realized optimal directivity for SAFIA.We also used generalized harmonic analysis (GHA) instead of FFT for the spectral analysis in SAFIA to remove the effect of windowing which causes sound-quality degradation in SAFIA. These modifications of SAFIA enable good segregation in a human auditory sense, but the quality is still insufficient for recognition. Because SAFIA causes some particular distortion, we used MLLR-based acoustic model adaptation immunity training to be robust to the distortion of SAFIA. These efforts enabled 76.2% word accuracy under the condition that the SN ratio is 0 dB, this represents a 45% reduction in the error obtained in the case where only array signal processing was used, and a 30% error reduction compared with when only SAFIAbased audio segregation was used.

    UR - http://www.scopus.com/inward/record.url?scp=85009231308&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85009231308&partnerID=8YFLogxK

    M3 - Conference contribution

    SP - 1285

    EP - 1288

    BT - EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology

    PB - International Speech Communication Association

    ER -