The automated annotation of conversational video by semantic miscommunication labels is a challenging topic. Although miscommunications are often obvious to the speakers as well as the observers, it is difficult for machines to detect them from the low-level features. We investigate the utility of gestural cues in this paper among various non-verbal features. Compared with gesture recognition tasks in human-computer interaction, this process is difficult due to the lack of understanding on which cues contribute to miscommunications and the implicitness of gestures. Nine simple gestural features are taken from gesture data, and both simple and complex classifiers are constructed using machine learning. The experimental results suggest that there is no single gestural feature that can predict or explain the occurrence of semantic miscommunication in our setting.
ASJC Scopus subject areas
- コンピュータ ネットワークおよび通信