In this paper, we propose a low-latency online extension of wave-U-net for single-channel speech enhancement, which utilizes teacher-student learning to reduce the system latency while keeping the enhancement performance high. Wave-U-net is a recently proposed end-to-end source separation method, which achieved remarkable performance in singing voice separation and speech enhancement tasks. Since the enhancement is performed in the time domain, wave-U-net can efficiently model phase information and address the domain transformation limitation, where the time-frequency domain is normally adopted. In this paper, we apply wave-U-net to face-to-face applications such as hearing aids and in-car communication systems, where a strictly low-latency of less than 10 ms is required. To this end, we investigate online versions of wave-U-net and propose the use of teacher-student learning to prevent the performance degradation caused by the reduction in input segment length such that the system delay in a CPU is less than 10 ms. The experimental results revealed that the proposed model could perform in real-time with low-latency and high performance, achieving a signal-to-distortion ratio improvement of about 8.73 dB.
|ジャーナル||ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings|
|出版ステータス||Published - 2021|
|イベント||2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada|
継続期間: 2021 6月 6 → 2021 6月 11
ASJC Scopus subject areas