Gamma Boltzmann Machine for Audio Modeling

Toru Nakashika*, Kohei Yatabe

*この研究の対応する著者

研究成果査読

抄録

This paper presents an energy-based probabilistic model that handles nonnegative data in consideration of both linear and logarithmic scales. In audio applications, magnitude of time-frequency representation, including spectrogram, is regarded as one of the most important features. Such magnitude-based features have been extensively utilized in learning-based audio processing. Since a logarithmic scale is important in terms of auditory perception, the features are usually computed with a logarithmic function. That is, a logarithmic function is applied within the computation of features so that a learning machine does not have to explicitly model the logarithmic scale. We think in a different way and propose a restricted Boltzmann machine (RBM) that simultaneously models linear- and log-magnitude spectra. RBM is a stochastic neural network that can discover data representations without supervision. To manage both linear and logarithmic scales, we define an energy function based on both scales. This energy function results in a conditional distribution (of the observable data, given hidden units) that is written as the gamma distribution, and hence the proposed RBM is termed gamma-Bernoulli RBM. The proposed gamma-Bernoulli RBM was compared to the ordinary Gaussian-Bernoulli RBM by speech representation experiments. Both objective and subjective evaluations illustrated the advantage of the proposed model.

本文言語English
論文番号9478208
ページ(範囲)2591-2605
ページ数15
ジャーナルIEEE/ACM Transactions on Audio Speech and Language Processing
29
DOI
出版ステータスPublished - 2021

ASJC Scopus subject areas

  • コンピュータ サイエンス(その他)
  • 音響学および超音波学
  • 計算数学
  • 電子工学および電気工学

フィンガープリント

「Gamma Boltzmann Machine for Audio Modeling」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル