Coupled initialization of multi-channel non-negative matrix factorization based on spatial and spectral information

Yuuki Tachioka, Tomohiro Narita, Iori Miura, Takanobu Uramoto, Natsuki Monta, Shingo Uenohara, Ken'ichi Furuya, Shinji Watanabe, Jonathan Le Roux

Research output: Contribution to journalConference article

5 Citations (Scopus)


Multi-channel non-negative matrix factorization (MNMF) is a multi-channel extension of NMF and often outperforms NMF because it can deal with spatial and spectral information simultaneously. On the other hand, MNMF has a larger number of parameters and its performance heavily depends on the initial values. MNMF factorizes an observation matrix into four matrices: spatial correlation, basis, cluster-indicator latent variables, and activation matrices. This paper proposes effective initialization methods for these matrices. First, the spatial correlation matrix, which shows the largest initial value dependencies, is initialized using the cross-spectrum method from enhanced speech by binary masking. Second, when the target is speech, constructing bases from phonemes existing in an utterance can improve the performance: this paper proposes a speech bases selection by using automatic speech recognition (ASR). Third, we also propose an initialization method for the cluster-indicator latent variables that couple the spatial and spectral information, which can achieve the simultaneous optimization of above two matrices. Experiments on a noisy ASR task show that the proposed initialization significantly improves the performance of MNMF by reducing the initial value dependencies.

Original languageEnglish
Pages (from-to)2461-2465
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2017 Jan 1
Externally publishedYes
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 2017 Aug 202017 Aug 24



  • Automatic speech recognition
  • Noisy speech
  • Non-negative matrix factorization
  • Spatial correlation
  • Speech basis

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this