TY - GEN
T1 - Low-Resource Contextual Topic Identification on Speech
AU - Liu, Chunxi
AU - Wiesner, Matthew
AU - Watanabe, Shinji
AU - Harman, Craig
AU - Trmal, Jan
AU - Dehak, Najim
AU - Khudanpur, Sanjeev
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/2/11
Y1 - 2019/2/11
N2 - In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified. We first present a general purpose method for topic ID on spoken segments in low-resource languages, using a cascade of universal acoustic modeling, translation lexicons to English, and English-language topic classification. Next, instead of classifying each segment independently, we demonstrate that exploring the contextual dependencies across sequential segments can provide large improvements. In particular, we propose an attention-based contextual model which is able to leverage the contexts in a selective manner. We test both our contextual and non-contextual models on four LORELEI languages, and on all but one our attention-based contextual model significantly outperforms the context-independent models.
AB - In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified. We first present a general purpose method for topic ID on spoken segments in low-resource languages, using a cascade of universal acoustic modeling, translation lexicons to English, and English-language topic classification. Next, instead of classifying each segment independently, we demonstrate that exploring the contextual dependencies across sequential segments can provide large improvements. In particular, we propose an attention-based contextual model which is able to leverage the contexts in a selective manner. We test both our contextual and non-contextual models on four LORELEI languages, and on all but one our attention-based contextual model significantly outperforms the context-independent models.
KW - Topic identification
KW - attention
KW - recurrent neural networks
KW - universal acoustic modeling
UR - http://www.scopus.com/inward/record.url?scp=85063079080&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063079080&partnerID=8YFLogxK
U2 - 10.1109/SLT.2018.8639544
DO - 10.1109/SLT.2018.8639544
M3 - Conference contribution
AN - SCOPUS:85063079080
T3 - 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
SP - 656
EP - 663
BT - 2018 IEEE Spoken Language Technology Workshop, SLT 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Spoken Language Technology Workshop, SLT 2018
Y2 - 18 December 2018 through 21 December 2018
ER -