TY - JOUR
T1 - Online Continual Learning of End-to-End Speech Recognition Models
AU - Yang, Muqiao
AU - Lane, Ian
AU - Watanabe, Shinji
N1 - Funding Information:
This work used the Extreme Science and Engineering Discovery Environment (XSEDE) [1], which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system [2], which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC).
Publisher Copyright:
Copyright © 2022 ISCA.
PY - 2022
Y1 - 2022
N2 - Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic speech recognition has focused on the adaptation of models across multiple different speech recognition tasks, in this paper we propose an experimental setting for online continual learning for automatic speech recognition of a single task. Specifically focusing on the case where additional training data for the same task becomes available incrementally over time, we demonstrate the effectiveness of performing incremental model updates to end-to-end speech recognition models with an online Gradient Episodic Memory (GEM) method. Moreover, we show that with online continual learning and a selective sampling strategy, we can maintain an accuracy that is similar to retraining a model from scratch while requiring significantly lower computation costs. We have also verified our method with self-supervised learning (SSL) features.
AB - Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic speech recognition has focused on the adaptation of models across multiple different speech recognition tasks, in this paper we propose an experimental setting for online continual learning for automatic speech recognition of a single task. Specifically focusing on the case where additional training data for the same task becomes available incrementally over time, we demonstrate the effectiveness of performing incremental model updates to end-to-end speech recognition models with an online Gradient Episodic Memory (GEM) method. Moreover, we show that with online continual learning and a selective sampling strategy, we can maintain an accuracy that is similar to retraining a model from scratch while requiring significantly lower computation costs. We have also verified our method with self-supervised learning (SSL) features.
KW - automatic speech recognition
KW - continual learning
KW - lifelong learning
UR - http://www.scopus.com/inward/record.url?scp=85140089478&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140089478&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2022-11093
DO - 10.21437/Interspeech.2022-11093
M3 - Conference article
AN - SCOPUS:85140089478
SN - 2308-457X
VL - 2022-September
SP - 2668
EP - 2672
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022
Y2 - 18 September 2022 through 22 September 2022
ER -