TY - JOUR
T1 - Recent developments on ESPNeT toolkit boosted by conformer
AU - Guo, Pengcheng
AU - Boyer, Florian
AU - Chang, Xuankai
AU - Hayashi, Tomoki
AU - Higuchi, Yosuke
AU - Inaguma, Hirofumi
AU - Kamo, Naoyuki
AU - Li, Chenda
AU - Garcia-Romero, Daniel
AU - Shi, Jiatong
AU - Shi, Jing
AU - Watanabe, Shinji
AU - Wei, Kun
AU - Zhang, Wangyou
AU - Zhang, Yuekai
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.
AB - In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.
KW - Conformer
KW - End-to-end speech processing
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85106193794&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106193794&partnerID=8YFLogxK
U2 - 10.1109/ICASSP39728.2021.9414858
DO - 10.1109/ICASSP39728.2021.9414858
M3 - Conference article
AN - SCOPUS:85106193794
VL - 2021-June
SP - 5874
EP - 5878
JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
SN - 0736-7791
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Y2 - 6 June 2021 through 11 June 2021
ER -