Dialogue between human subjects by voice is based on linguistic information contained in the utterance. In addition, the psychological state of the utterer and information complementing the dialogue are represented by prosody, facial expression, and head movement, making the dialogue proceed smoothly. Such information, which co-occurs with the utterance and supports the smooth transmission of linguistic information, is called paralinguistic information. This paper considers the attitude of the utterer as represented by prosody and head gestures as paralinguistic information. Methods of recognizing such respective information are proposed, and a dialogue robot is realized on the basis of the proposed method. In the recognition of the utterance attitude by prosody, the positive or negative attitude of the utterance is recognized on the basis of F0 pattern and the phoneme duration. In the recognition of head gestures, a nod is defined as representing a positive attitude and a tilt or shake of the head as representing a negative attitude. These three motions are recognized with the optical flow as the feature parameters, using HMM as a stochastic model. It is shown experimentally that the proposed method achieves the same recognition ability as humans. It is also shown that a dialogue robot incorporating the proposed method achieves a rhythmic, efficient dialogue, which has not been the case in the past.
ASJC Scopus subject areas