In human-human conversations, listeners often convey intentions to their speakers through feedbacks comprising reflexive short responses. The speakers then recognize these intentions and dynamically change the conversational plans to transmit information more efficiently. For the design of spoken dialogue systems that deliver a massive amount of information, such as news, it is essential to accurately capture users' intentions from reflexive short responses to efficiently select or eliminate the information to be transmitted depending on the user's needs. However, such short responses from users are normally too short to recognize their actual intentions only from the prosodic and linguistic features of their short responses. In this paper, we propose a user's short-response intention-recognition model that accounts for the previous system's utterances as the context of the conversation in addition to prosodic and linguistic features of user's utterances. To achieve this, we define types of short response intentions in terms of effective information transmission and created new dataset by annotating over the interaction data collected using our spoken dialogue system. Our experimental results demonstrate that the classification accuracy can be improved using the linguistic features of the system's previous utterances encoded by Bidirectional Encoder Representations from Transformers (BERT) as the conversational context.
|ジャーナル||Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH|
|出版ステータス||Published - 2019|
|イベント||20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019 - Graz, Austria|
継続期間: 2019 9月 15 → 2019 9月 19
ASJC Scopus subject areas