TY - GEN
T1 - Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling
AU - Xu, Hainan
AU - DIng, Shuoyang
AU - Watanabe, Shinji
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - Most end-to-end speech recognition systems model text directly as a sequence of characters or sub-words. Current approaches to sub-word extraction only consider character sequence frequencies, which at times produce inferior sub-word segmentation that might lead to erroneous speech recognition output. We propose pronunciation-assisted sub-word modeling (PASM), a sub-word extraction method that leverages the pronunciation information of a word. Experiments show that the proposed method can greatly improve upon the character-based baseline, and also outperform commonly used byte-pair encoding methods.
AB - Most end-to-end speech recognition systems model text directly as a sequence of characters or sub-words. Current approaches to sub-word extraction only consider character sequence frequencies, which at times produce inferior sub-word segmentation that might lead to erroneous speech recognition output. We propose pronunciation-assisted sub-word modeling (PASM), a sub-word extraction method that leverages the pronunciation information of a word. Experiments show that the proposed method can greatly improve upon the character-based baseline, and also outperform commonly used byte-pair encoding methods.
KW - end-to-end models
KW - speech recognition
KW - sub-word modeling
UR - http://www.scopus.com/inward/record.url?scp=85068978138&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068978138&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8682494
DO - 10.1109/ICASSP.2019.8682494
M3 - Conference contribution
AN - SCOPUS:85068978138
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7110
EP - 7114
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Y2 - 12 May 2019 through 17 May 2019
ER -