Cross-cultural assessment of automatically generated multimodal referring expressions in a virtual world

Ielka Van Der Sluis*, Saturnino Luz, Werner Breitfuß, Mitsuru Ishizuka, Helmut Prendinger

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


This paper presents an assessment of automatically generated multimodal referring expressions as produced by embodied conversational agents in a virtual world. The algorithm used for this purpose employs general principles of human motor control and cooperativity in dialogues that can be parameterised so as to vary the precision of the pointing gestures and the amount of linguistic information included in the referring expressions. The study assessed how native speakers of English and Japanese perceived three different algorithmic outputs for multimodal referring behaviour in terms of understandability, human-likeness and a social practice (selling). Results show that users generally prefer mobile agents that are economical in their linguistic descriptions to stationary verbose agents. They also show the need for further calibration of the algorithm to accommodate the differences between the two groups. In addition to the detailed description of the set up and results of the study, the paper discusses implications for the design and use of agents, methodological issues that arose while conducting the cross-cultural study and directions for future work.

Original languageEnglish
Pages (from-to)611-629
Number of pages19
JournalInternational Journal of Human Computer Studies
Issue number9
Publication statusPublished - 2012 Sept
Externally publishedYes


  • Cross-cultural differences
  • Dialogue
  • Perception of multimodal referring expressions
  • Translation
  • Virtual worlds

ASJC Scopus subject areas

  • Hardware and Architecture
  • Engineering(all)
  • Software
  • Human-Computer Interaction
  • Human Factors and Ergonomics
  • Education


Dive into the research topics of 'Cross-cultural assessment of automatically generated multimodal referring expressions in a virtual world'. Together they form a unique fingerprint.

Cite this