In multimedia data analysis, automated indexing of conversational video is an emerging topic. One challenging problem in this topic is the recognition of higher-level concepts, such as miscommunications in conversations. While detecting miscommunications is generally easy for speakers as well as observers, it is not currently understood which cues contribute to their detection and to what extent. To make use of the knowledge on gestural cues in multimedia systems, the applicability of machine learning is investigated as a means of detecting miscommunication from gestural patterns observed in psychotherapeutic face-to-face conversations. Various features are taken from gesture data, and both simple and complex classifiers are constructed using these features. Both short-term and long-term effects are tested using different time window sizes. Also, two types of gestures, communicative and non-communicative, are considered. The experimental results suggest that there is no single gestural feature that can explain the occurrence of semantic miscommunication. Another interesting finding is that gestural cues correlate more with long-term gestural patterns than with short-term ones.