Dynamic integration of multiple feature streams for robust real-time LVCSR

Shoei Sato, Kazuo Onoe, Kio Kobayashi, Shinich Homma, Torn Imai, Tohru Takagi, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We present a novel method of integrating the likelihoods of multiple feature streams for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a heavier weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to bring out discriminative ability. The weight is calculated in real time from mutual information between an input stream and active HMM states in a search space. In this paper, we describe three features that are extracted through auditory filters by taking into account the human auditory system extracting amplitude and frequency modulations. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments using field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9% relative to the best result obtained from a single stream.

    Original languageEnglish
    Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    Pages345-348
    Number of pages4
    Volume1
    Publication statusPublished - 2007
    Event8th Annual Conference of the International Speech Communication Association, Interspeech 2007 - Antwerp
    Duration: 2007 Aug 272007 Aug 31

    Other

    Other8th Annual Conference of the International Speech Communication Association, Interspeech 2007
    CityAntwerp
    Period07/8/2707/8/31

    Fingerprint

    Speech recognition
    Speech Recognition
    Real-time
    Robust Speech Recognition
    Amplitude Modulation
    Field Experiment
    Frequency Modulation
    Amplitude modulation
    Frequency modulation
    Mutual Information
    broadcast
    Broadcast
    Search Space
    speaking
    Likelihood
    news
    Filter
    Calculate
    experiment
    ability

    Keywords

    • Active hypotheses
    • Entropy
    • Mutual information
    • Speech recognition
    • Stream integration

    ASJC Scopus subject areas

    • Computer Science Applications
    • Software
    • Modelling and Simulation
    • Linguistics and Language
    • Communication

    Cite this

    Sato, S., Onoe, K., Kobayashi, K., Homma, S., Imai, T., Takagi, T., & Kobayashi, T. (2007). Dynamic integration of multiple feature streams for robust real-time LVCSR. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 1, pp. 345-348)

    Dynamic integration of multiple feature streams for robust real-time LVCSR. / Sato, Shoei; Onoe, Kazuo; Kobayashi, Kio; Homma, Shinich; Imai, Torn; Takagi, Tohru; Kobayashi, Tetsunori.

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 1 2007. p. 345-348.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Sato, S, Onoe, K, Kobayashi, K, Homma, S, Imai, T, Takagi, T & Kobayashi, T 2007, Dynamic integration of multiple feature streams for robust real-time LVCSR. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. vol. 1, pp. 345-348, 8th Annual Conference of the International Speech Communication Association, Interspeech 2007, Antwerp, 07/8/27.
    Sato S, Onoe K, Kobayashi K, Homma S, Imai T, Takagi T et al. Dynamic integration of multiple feature streams for robust real-time LVCSR. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 1. 2007. p. 345-348
    Sato, Shoei ; Onoe, Kazuo ; Kobayashi, Kio ; Homma, Shinich ; Imai, Torn ; Takagi, Tohru ; Kobayashi, Tetsunori. / Dynamic integration of multiple feature streams for robust real-time LVCSR. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 1 2007. pp. 345-348
    @inproceedings{51adaaf03d7b4f2bbfad92c88c18671e,
    title = "Dynamic integration of multiple feature streams for robust real-time LVCSR",
    abstract = "We present a novel method of integrating the likelihoods of multiple feature streams for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a heavier weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to bring out discriminative ability. The weight is calculated in real time from mutual information between an input stream and active HMM states in a search space. In this paper, we describe three features that are extracted through auditory filters by taking into account the human auditory system extracting amplitude and frequency modulations. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments using field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9{\%} relative to the best result obtained from a single stream.",
    keywords = "Active hypotheses, Entropy, Mutual information, Speech recognition, Stream integration",
    author = "Shoei Sato and Kazuo Onoe and Kio Kobayashi and Shinich Homma and Torn Imai and Tohru Takagi and Tetsunori Kobayashi",
    year = "2007",
    language = "English",
    isbn = "9781605603162",
    volume = "1",
    pages = "345--348",
    booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

    }

    TY - GEN

    T1 - Dynamic integration of multiple feature streams for robust real-time LVCSR

    AU - Sato, Shoei

    AU - Onoe, Kazuo

    AU - Kobayashi, Kio

    AU - Homma, Shinich

    AU - Imai, Torn

    AU - Takagi, Tohru

    AU - Kobayashi, Tetsunori

    PY - 2007

    Y1 - 2007

    N2 - We present a novel method of integrating the likelihoods of multiple feature streams for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a heavier weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to bring out discriminative ability. The weight is calculated in real time from mutual information between an input stream and active HMM states in a search space. In this paper, we describe three features that are extracted through auditory filters by taking into account the human auditory system extracting amplitude and frequency modulations. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments using field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9% relative to the best result obtained from a single stream.

    AB - We present a novel method of integrating the likelihoods of multiple feature streams for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a heavier weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to bring out discriminative ability. The weight is calculated in real time from mutual information between an input stream and active HMM states in a search space. In this paper, we describe three features that are extracted through auditory filters by taking into account the human auditory system extracting amplitude and frequency modulations. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments using field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9% relative to the best result obtained from a single stream.

    KW - Active hypotheses

    KW - Entropy

    KW - Mutual information

    KW - Speech recognition

    KW - Stream integration

    UR - http://www.scopus.com/inward/record.url?scp=56249098679&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=56249098679&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:56249098679

    SN - 9781605603162

    VL - 1

    SP - 345

    EP - 348

    BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

    ER -