抄録
Filter-bank outputs are extended into tensors to yield precise acoustic features for speech recognition using deep neural networks (DNNs). The filter-bank outputs with temporal contexts form a time-frequency pattern of speech and have been shown to be effective as a feature parameter for DNN-based acoustic models. We attempt to project the filter-bank outputs onto a tensor product space using decorrelation followed by a bilinear map to improve acoustic separability in feature extraction. This extension makes extracting a more precise structure of the time-frequency pattern possible because the bilinear map yields higher-order correlations of features. Experimental comparisons carried out in phoneme recognition demonstrate that the tensor feature provides comparable results to the filter-bank feature, and the fusion of the two features yields an improvement over each feature.
本文言語 | English |
---|---|
ホスト出版物のタイトル | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
出版社 | International Speech and Communication Association |
ページ | 16-20 |
ページ数 | 5 |
巻 | 2015-January |
出版ステータス | Published - 2015 |
イベント | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany 継続期間: 2015 9月 6 → 2015 9月 10 |
Other
Other | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 |
---|---|
国/地域 | Germany |
City | Dresden |
Period | 15/9/6 → 15/9/10 |
ASJC Scopus subject areas
- 言語および言語学
- 人間とコンピュータの相互作用
- 信号処理
- ソフトウェア
- モデリングとシミュレーション