A conversational system which can generate back-channel feedback of proper content in proper timing by utilizing FST based early detectable decoder and prosody analysis is proposed. In human conversation, we do not take turns in order, but we give the back-channel feedbacks during the partner's speech. By receiving these feedbacks, speakers can know the partner's state and feel comfortable to speak. Therefore, spoken dialogue systems should be able to generate back-channel feedbacks in synchronization with user's utterances. The appropriateness of these feedbacks depends on the contents and the timings. The contents strongly depend on the contents of the dialogue partner's utterance, and the timings strongly depend on the prosody of the partner's utterance. In order to determine the content of the feedback earlier than the end of the utterance, we use finite state transducer based speech recognizer. We used prosody information, especially F0 and power of the utterance, to extract the proper timing of the feedback. We implemented these modules and applied them to the spoken dialogue system on the humanoid robot ROBISUKE. Experimental results show the effectiveness of our methods.
|Number of pages||4|
|Publication status||Published - 2005 Dec 1|
|Event||9th European Conference on Speech Communication and Technology - Lisbon, Portugal|
Duration: 2005 Sep 4 → 2005 Sep 8
|Conference||9th European Conference on Speech Communication and Technology|
|Period||05/9/4 → 05/9/8|
ASJC Scopus subject areas