Teacher-student learning for low-latency online speech enhancement using WAVe-U-net

Sotaro Nakaoka, Li Li, Shota Inoue, Shoji Makino

研究成果: Conference article査読

7 被引用数 (Scopus)


In this paper, we propose a low-latency online extension of wave-U-net for single-channel speech enhancement, which utilizes teacher-student learning to reduce the system latency while keeping the enhancement performance high. Wave-U-net is a recently proposed end-to-end source separation method, which achieved remarkable performance in singing voice separation and speech enhancement tasks. Since the enhancement is performed in the time domain, wave-U-net can efficiently model phase information and address the domain transformation limitation, where the time-frequency domain is normally adopted. In this paper, we apply wave-U-net to face-to-face applications such as hearing aids and in-car communication systems, where a strictly low-latency of less than 10 ms is required. To this end, we investigate online versions of wave-U-net and propose the use of teacher-student learning to prevent the performance degradation caused by the reduction in input segment length such that the system delay in a CPU is less than 10 ms. The experimental results revealed that the proposed model could perform in real-time with low-latency and high performance, achieving a signal-to-distortion ratio improvement of about 8.73 dB.

ジャーナルICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
出版ステータスPublished - 2021
イベント2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
継続期間: 2021 6月 62021 6月 11

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学


「Teacher-student learning for low-latency online speech enhancement using WAVe-U-net」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。