Towards expressive musical robots: A cross-modal framework for emotional gesture, voice and music

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

It has been long speculated that expression of emotions from different modalities have the same underlying 'code', whether it be a dance step, musical phrase, or tone of voice. This is the first attempt to implement this theory across three modalities, inspired by the polyvalence and repeatability of robotics. We propose a unifying framework to generate emotions across voice, gesture, and music, by representing emotional states as a 4-parameter tuple of speed, intensity, regularity, and extent (SIRE). Our results show that a simple 4-tuple can capture four emotions recognizable at greater than chance across gesture and voice, and at least two emotions across all three modalities. An application for multi-modal, expressive music robots is discussed.

Original languageEnglish
Article number3
JournalEurasip Journal on Audio, Speech, and Music Processing
Volume2012
Issue number1
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

emotions
music
robots
Robots
Robotics
robotics
regularity

Keywords

  • Affective computing
  • Entertainment robots
  • Gesture

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

@article{eab41aa6485345dcb5a096733ff53f5e,
title = "Towards expressive musical robots: A cross-modal framework for emotional gesture, voice and music",
abstract = "It has been long speculated that expression of emotions from different modalities have the same underlying 'code', whether it be a dance step, musical phrase, or tone of voice. This is the first attempt to implement this theory across three modalities, inspired by the polyvalence and repeatability of robotics. We propose a unifying framework to generate emotions across voice, gesture, and music, by representing emotional states as a 4-parameter tuple of speed, intensity, regularity, and extent (SIRE). Our results show that a simple 4-tuple can capture four emotions recognizable at greater than chance across gesture and voice, and at least two emotions across all three modalities. An application for multi-modal, expressive music robots is discussed.",
keywords = "Affective computing, Entertainment robots, Gesture",
author = "Angelica Lim and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2012",
doi = "10.1186/1687-4722-2012-3",
language = "English",
volume = "2012",
journal = "Eurasip Journal on Audio, Speech, and Music Processing",
issn = "1687-4714",
publisher = "Springer Publishing Company",
number = "1",

}

TY - JOUR

T1 - Towards expressive musical robots

T2 - A cross-modal framework for emotional gesture, voice and music

AU - Lim, Angelica

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2012

Y1 - 2012

N2 - It has been long speculated that expression of emotions from different modalities have the same underlying 'code', whether it be a dance step, musical phrase, or tone of voice. This is the first attempt to implement this theory across three modalities, inspired by the polyvalence and repeatability of robotics. We propose a unifying framework to generate emotions across voice, gesture, and music, by representing emotional states as a 4-parameter tuple of speed, intensity, regularity, and extent (SIRE). Our results show that a simple 4-tuple can capture four emotions recognizable at greater than chance across gesture and voice, and at least two emotions across all three modalities. An application for multi-modal, expressive music robots is discussed.

AB - It has been long speculated that expression of emotions from different modalities have the same underlying 'code', whether it be a dance step, musical phrase, or tone of voice. This is the first attempt to implement this theory across three modalities, inspired by the polyvalence and repeatability of robotics. We propose a unifying framework to generate emotions across voice, gesture, and music, by representing emotional states as a 4-parameter tuple of speed, intensity, regularity, and extent (SIRE). Our results show that a simple 4-tuple can capture four emotions recognizable at greater than chance across gesture and voice, and at least two emotions across all three modalities. An application for multi-modal, expressive music robots is discussed.

KW - Affective computing

KW - Entertainment robots

KW - Gesture

UR - http://www.scopus.com/inward/record.url?scp=84873841216&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873841216&partnerID=8YFLogxK

U2 - 10.1186/1687-4722-2012-3

DO - 10.1186/1687-4722-2012-3

M3 - Article

AN - SCOPUS:84873841216

VL - 2012

JO - Eurasip Journal on Audio, Speech, and Music Processing

JF - Eurasip Journal on Audio, Speech, and Music Processing

SN - 1687-4714

IS - 1

M1 - 3

ER -