Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding

Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Yutaka Kaneda, Takeshi Yamada, Takanobu Nishiura, Tetsunori Kobayashi, Shiro Ise, Hiroshi Saruwatari

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    7 Citations (Scopus)

    Abstract

    The sound data for open evaluation is necessary for studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustic environments. This paper reports on our project for acoustic data collection. There are many kinds of sound scenes in real environments. The sound scene is specified by sound sources and room acoustics. The number of combinations of the sound sources, source positions and rooms is huge in real acoustic environments. We assumed that the sound in the environments can be simulated by convolution of the isolated sound sources and impulse responses. As an isolated sound source, hundred kinds of environment sounds and speech sounds are collected. The impulse responses are collected in various acoustic environments. Additionally we collected sounds from a moving source. In this paper, progress of our sound scene database collection project and application to environment sound recognition and hands-free speech recognition are described.

    Original languageEnglish
    Title of host publicationProceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages161-164
    Number of pages4
    Volume2
    ISBN (Electronic)0780373049
    DOIs
    Publication statusPublished - 2002
    Event2002 IEEE International Conference on Multimedia and Expo, ICME 2002 - Lausanne
    Duration: 2002 Aug 262002 Aug 29

    Other

    Other2002 IEEE International Conference on Multimedia and Expo, ICME 2002
    CityLausanne
    Period02/8/2602/8/29

    Fingerprint

    Speech recognition
    Acoustics
    Acoustic waves
    Speech Recognition
    Sound
    Speech Sounds
    Freedom of Speech
    Impulse response
    Convolution

    ASJC Scopus subject areas

    • Archaeology
    • Electrical and Electronic Engineering

    Cite this

    Nakamura, S., Hiyane, K., Asano, F., Kaneda, Y., Yamada, T., Nishiura, T., ... Saruwatari, H. (2002). Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding. In Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002 (Vol. 2, pp. 161-164). [1035537] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICME.2002.1035537

    Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding. / Nakamura, Satoshi; Hiyane, Kazuo; Asano, Futoshi; Kaneda, Yutaka; Yamada, Takeshi; Nishiura, Takanobu; Kobayashi, Tetsunori; Ise, Shiro; Saruwatari, Hiroshi.

    Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002. Vol. 2 Institute of Electrical and Electronics Engineers Inc., 2002. p. 161-164 1035537.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Nakamura, S, Hiyane, K, Asano, F, Kaneda, Y, Yamada, T, Nishiura, T, Kobayashi, T, Ise, S & Saruwatari, H 2002, Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding. in Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002. vol. 2, 1035537, Institute of Electrical and Electronics Engineers Inc., pp. 161-164, 2002 IEEE International Conference on Multimedia and Expo, ICME 2002, Lausanne, 02/8/26. https://doi.org/10.1109/ICME.2002.1035537
    Nakamura S, Hiyane K, Asano F, Kaneda Y, Yamada T, Nishiura T et al. Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding. In Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002. Vol. 2. Institute of Electrical and Electronics Engineers Inc. 2002. p. 161-164. 1035537 https://doi.org/10.1109/ICME.2002.1035537
    Nakamura, Satoshi ; Hiyane, Kazuo ; Asano, Futoshi ; Kaneda, Yutaka ; Yamada, Takeshi ; Nishiura, Takanobu ; Kobayashi, Tetsunori ; Ise, Shiro ; Saruwatari, Hiroshi. / Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding. Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002. Vol. 2 Institute of Electrical and Electronics Engineers Inc., 2002. pp. 161-164
    @inproceedings{5669709f1460456f9562ecbffcf40c45,
    title = "Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding",
    abstract = "The sound data for open evaluation is necessary for studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustic environments. This paper reports on our project for acoustic data collection. There are many kinds of sound scenes in real environments. The sound scene is specified by sound sources and room acoustics. The number of combinations of the sound sources, source positions and rooms is huge in real acoustic environments. We assumed that the sound in the environments can be simulated by convolution of the isolated sound sources and impulse responses. As an isolated sound source, hundred kinds of environment sounds and speech sounds are collected. The impulse responses are collected in various acoustic environments. Additionally we collected sounds from a moving source. In this paper, progress of our sound scene database collection project and application to environment sound recognition and hands-free speech recognition are described.",
    author = "Satoshi Nakamura and Kazuo Hiyane and Futoshi Asano and Yutaka Kaneda and Takeshi Yamada and Takanobu Nishiura and Tetsunori Kobayashi and Shiro Ise and Hiroshi Saruwatari",
    year = "2002",
    doi = "10.1109/ICME.2002.1035537",
    language = "English",
    volume = "2",
    pages = "161--164",
    booktitle = "Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",
    address = "United States",

    }

    TY - GEN

    T1 - Design and collection of acoustic sound data for hands-free speech recognition and sound scene understanding

    AU - Nakamura, Satoshi

    AU - Hiyane, Kazuo

    AU - Asano, Futoshi

    AU - Kaneda, Yutaka

    AU - Yamada, Takeshi

    AU - Nishiura, Takanobu

    AU - Kobayashi, Tetsunori

    AU - Ise, Shiro

    AU - Saruwatari, Hiroshi

    PY - 2002

    Y1 - 2002

    N2 - The sound data for open evaluation is necessary for studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustic environments. This paper reports on our project for acoustic data collection. There are many kinds of sound scenes in real environments. The sound scene is specified by sound sources and room acoustics. The number of combinations of the sound sources, source positions and rooms is huge in real acoustic environments. We assumed that the sound in the environments can be simulated by convolution of the isolated sound sources and impulse responses. As an isolated sound source, hundred kinds of environment sounds and speech sounds are collected. The impulse responses are collected in various acoustic environments. Additionally we collected sounds from a moving source. In this paper, progress of our sound scene database collection project and application to environment sound recognition and hands-free speech recognition are described.

    AB - The sound data for open evaluation is necessary for studies such as sound source localization, sound retrieval, sound recognition and hands-free speech recognition in real acoustic environments. This paper reports on our project for acoustic data collection. There are many kinds of sound scenes in real environments. The sound scene is specified by sound sources and room acoustics. The number of combinations of the sound sources, source positions and rooms is huge in real acoustic environments. We assumed that the sound in the environments can be simulated by convolution of the isolated sound sources and impulse responses. As an isolated sound source, hundred kinds of environment sounds and speech sounds are collected. The impulse responses are collected in various acoustic environments. Additionally we collected sounds from a moving source. In this paper, progress of our sound scene database collection project and application to environment sound recognition and hands-free speech recognition are described.

    UR - http://www.scopus.com/inward/record.url?scp=84872976579&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84872976579&partnerID=8YFLogxK

    U2 - 10.1109/ICME.2002.1035537

    DO - 10.1109/ICME.2002.1035537

    M3 - Conference contribution

    AN - SCOPUS:84872976579

    VL - 2

    SP - 161

    EP - 164

    BT - Proceedings - 2002 IEEE International Conference on Multimedia and Expo, ICME 2002

    PB - Institute of Electrical and Electronics Engineers Inc.

    ER -