Bilinear map of filter-bank outputs for DNN-based speech recognition

Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    Filter-bank outputs are extended into tensors to yield precise acoustic features for speech recognition using deep neural networks (DNNs). The filter-bank outputs with temporal contexts form a time-frequency pattern of speech and have been shown to be effective as a feature parameter for DNN-based acoustic models. We attempt to project the filter-bank outputs onto a tensor product space using decorrelation followed by a bilinear map to improve acoustic separability in feature extraction. This extension makes extracting a more precise structure of the time-frequency pattern possible because the bilinear map yields higher-order correlations of features. Experimental comparisons carried out in phoneme recognition demonstrate that the tensor feature provides comparable results to the filter-bank feature, and the fusion of the two features yields an improvement over each feature.

    Original languageEnglish
    Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    PublisherInternational Speech and Communication Association
    Pages16-20
    Number of pages5
    Volume2015-January
    Publication statusPublished - 2015
    Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
    Duration: 2015 Sept 62015 Sept 10

    Other

    Other16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
    Country/TerritoryGermany
    CityDresden
    Period15/9/615/9/10

    Keywords

    • Bilinear map
    • Deep neural network
    • Feature extraction
    • Speech recognition
    • Tensor

    ASJC Scopus subject areas

    • Language and Linguistics
    • Human-Computer Interaction
    • Signal Processing
    • Software
    • Modelling and Simulation

    Fingerprint

    Dive into the research topics of 'Bilinear map of filter-bank outputs for DNN-based speech recognition'. Together they form a unique fingerprint.

    Cite this