Multiscale recurrent neural network based language model

Tsuyoshi Morioka, Tomoharu Iwata, Takaaki Hori, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    5 Citations (Scopus)

    Abstract

    We describe a novel recurrent neural network-based language model (RNNLM) dealing with multiple time-scales of contexts. The RNNLM is now a technical standard in language model- ing because it remembers some lengths of contexts. However, the RNNLM can only deal with a single time-scale of a con- text, regardless of the subsequent words and topic of the spo- ken utterance, even though the optimal time-scale of the con- text can vary under such conditions. In contrast, our multiscale RNNLM enables incorporating with sufficient flexibility, and it makes use of various time-scales of contexts simultaneously and with proper weights for predicting the next word. Experi- mental comparisons carried out in large vocabulary spontaneous speech recognition demonstrate that introducing the multiple time-scales of contexts into the RNNLM yielded improvements over existing RNNLMs in terms of the perplexity and word er- ror rate.

    Original languageEnglish
    Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    PublisherInternational Speech and Communication Association
    Pages2366-2370
    Number of pages5
    Volume2015-January
    Publication statusPublished - 2015
    Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
    Duration: 2015 Sep 62015 Sep 10

    Other

    Other16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
    CountryGermany
    CityDresden
    Period15/9/615/9/10

    Fingerprint

    Recurrent neural networks
    Language Model
    Recurrent Neural Networks
    Multiple Time Scales
    Time Scales
    Speech recognition
    Speech Recognition
    Error Rate
    Flexibility
    Vary
    Sufficient
    Context
    Modeling
    Demonstrate

    Keywords

    • Multiscale dynamics
    • Recurrent neural network based language model (RNNLM)
    • Speech recognition

    ASJC Scopus subject areas

    • Language and Linguistics
    • Human-Computer Interaction
    • Signal Processing
    • Software
    • Modelling and Simulation

    Cite this

    Morioka, T., Iwata, T., Hori, T., & Kobayashi, T. (2015). Multiscale recurrent neural network based language model. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2015-January, pp. 2366-2370). International Speech and Communication Association.

    Multiscale recurrent neural network based language model. / Morioka, Tsuyoshi; Iwata, Tomoharu; Hori, Takaaki; Kobayashi, Tetsunori.

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 2015-January International Speech and Communication Association, 2015. p. 2366-2370.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Morioka, T, Iwata, T, Hori, T & Kobayashi, T 2015, Multiscale recurrent neural network based language model. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. vol. 2015-January, International Speech and Communication Association, pp. 2366-2370, 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015, Dresden, Germany, 15/9/6.
    Morioka T, Iwata T, Hori T, Kobayashi T. Multiscale recurrent neural network based language model. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 2015-January. International Speech and Communication Association. 2015. p. 2366-2370
    Morioka, Tsuyoshi ; Iwata, Tomoharu ; Hori, Takaaki ; Kobayashi, Tetsunori. / Multiscale recurrent neural network based language model. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Vol. 2015-January International Speech and Communication Association, 2015. pp. 2366-2370
    @inproceedings{cce2ccad1ccc4502a68d43fbc0b4539f,
    title = "Multiscale recurrent neural network based language model",
    abstract = "We describe a novel recurrent neural network-based language model (RNNLM) dealing with multiple time-scales of contexts. The RNNLM is now a technical standard in language model- ing because it remembers some lengths of contexts. However, the RNNLM can only deal with a single time-scale of a con- text, regardless of the subsequent words and topic of the spo- ken utterance, even though the optimal time-scale of the con- text can vary under such conditions. In contrast, our multiscale RNNLM enables incorporating with sufficient flexibility, and it makes use of various time-scales of contexts simultaneously and with proper weights for predicting the next word. Experi- mental comparisons carried out in large vocabulary spontaneous speech recognition demonstrate that introducing the multiple time-scales of contexts into the RNNLM yielded improvements over existing RNNLMs in terms of the perplexity and word er- ror rate.",
    keywords = "Multiscale dynamics, Recurrent neural network based language model (RNNLM), Speech recognition",
    author = "Tsuyoshi Morioka and Tomoharu Iwata and Takaaki Hori and Tetsunori Kobayashi",
    year = "2015",
    language = "English",
    volume = "2015-January",
    pages = "2366--2370",
    booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
    publisher = "International Speech and Communication Association",

    }

    TY - GEN

    T1 - Multiscale recurrent neural network based language model

    AU - Morioka, Tsuyoshi

    AU - Iwata, Tomoharu

    AU - Hori, Takaaki

    AU - Kobayashi, Tetsunori

    PY - 2015

    Y1 - 2015

    N2 - We describe a novel recurrent neural network-based language model (RNNLM) dealing with multiple time-scales of contexts. The RNNLM is now a technical standard in language model- ing because it remembers some lengths of contexts. However, the RNNLM can only deal with a single time-scale of a con- text, regardless of the subsequent words and topic of the spo- ken utterance, even though the optimal time-scale of the con- text can vary under such conditions. In contrast, our multiscale RNNLM enables incorporating with sufficient flexibility, and it makes use of various time-scales of contexts simultaneously and with proper weights for predicting the next word. Experi- mental comparisons carried out in large vocabulary spontaneous speech recognition demonstrate that introducing the multiple time-scales of contexts into the RNNLM yielded improvements over existing RNNLMs in terms of the perplexity and word er- ror rate.

    AB - We describe a novel recurrent neural network-based language model (RNNLM) dealing with multiple time-scales of contexts. The RNNLM is now a technical standard in language model- ing because it remembers some lengths of contexts. However, the RNNLM can only deal with a single time-scale of a con- text, regardless of the subsequent words and topic of the spo- ken utterance, even though the optimal time-scale of the con- text can vary under such conditions. In contrast, our multiscale RNNLM enables incorporating with sufficient flexibility, and it makes use of various time-scales of contexts simultaneously and with proper weights for predicting the next word. Experi- mental comparisons carried out in large vocabulary spontaneous speech recognition demonstrate that introducing the multiple time-scales of contexts into the RNNLM yielded improvements over existing RNNLMs in terms of the perplexity and word er- ror rate.

    KW - Multiscale dynamics

    KW - Recurrent neural network based language model (RNNLM)

    KW - Speech recognition

    UR - http://www.scopus.com/inward/record.url?scp=84959146903&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84959146903&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:84959146903

    VL - 2015-January

    SP - 2366

    EP - 2370

    BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

    PB - International Speech and Communication Association

    ER -