Multimodal integration learning of robot behavior using deep neural networks

Kuniaki Noda, Hiroaki Arie, Yuki Suga, Tetsuya Ogata

    Research output: Contribution to journalArticle

    62 Citations (Scopus)

    Abstract

    For humans to accurately understand the world around them, multimodal integration is essential because it enhances perceptual precision and reduces ambiguity. Computational models replicating such human ability may contribute to the practical use of robots in daily human living environments; however, primarily because of scalability problems that conventional machine learning algorithms suffer from, sensory-motor information processing in robotic applications has typically been achieved via modal-dependent processes. In this paper, we propose a novel computational framework enabling the integration of sensory-motor time-series data and the self-organization of multimodal fused representations based on a deep learning approach. To evaluate our proposed model, we conducted two behavior-learning experiments utilizing a humanoid robot; the experiments consisted of object manipulation and bell-ringing tasks. From our experimental results, we show that large amounts of sensory-motor information, including raw RGB images, sound spectrums, and joint angles, are directly fused to generate higher-level multimodal representations. Further, we demonstrated that our proposed framework realizes the following three functions: (1) cross-modal memory retrieval utilizing the information complementation capability of the deep autoencoder; (2) noise-robust behavior recognition utilizing the generalization capability of multimodal features; and (3) multimodal causality acquisition and sensory-motor prediction based on the acquired causality.

    Original languageEnglish
    Pages (from-to)721-736
    Number of pages16
    JournalRobotics and Autonomous Systems
    Volume62
    Issue number6
    DOIs
    Publication statusPublished - 2014

    Fingerprint

    Robot
    Robots
    Neural Networks
    Causality
    Humanoid Robot
    Complementation
    Learning algorithms
    Learning systems
    Scalability
    Time series
    Self-organization
    Time Series Data
    Robotics
    Information Processing
    Experiments
    Computational Model
    Acoustic waves
    Experiment
    Manipulation
    Learning Algorithm

    Keywords

    • Cross-modal memory retrieval
    • Deep learning
    • Multimodal integration
    • Object manipulation

    ASJC Scopus subject areas

    • Software
    • Mathematics(all)
    • Control and Systems Engineering
    • Computer Science Applications

    Cite this

    Multimodal integration learning of robot behavior using deep neural networks. / Noda, Kuniaki; Arie, Hiroaki; Suga, Yuki; Ogata, Tetsuya.

    In: Robotics and Autonomous Systems, Vol. 62, No. 6, 2014, p. 721-736.

    Research output: Contribution to journalArticle

    Noda, Kuniaki ; Arie, Hiroaki ; Suga, Yuki ; Ogata, Tetsuya. / Multimodal integration learning of robot behavior using deep neural networks. In: Robotics and Autonomous Systems. 2014 ; Vol. 62, No. 6. pp. 721-736.
    @article{8370b94419954e448c724222ee6cde30,
    title = "Multimodal integration learning of robot behavior using deep neural networks",
    abstract = "For humans to accurately understand the world around them, multimodal integration is essential because it enhances perceptual precision and reduces ambiguity. Computational models replicating such human ability may contribute to the practical use of robots in daily human living environments; however, primarily because of scalability problems that conventional machine learning algorithms suffer from, sensory-motor information processing in robotic applications has typically been achieved via modal-dependent processes. In this paper, we propose a novel computational framework enabling the integration of sensory-motor time-series data and the self-organization of multimodal fused representations based on a deep learning approach. To evaluate our proposed model, we conducted two behavior-learning experiments utilizing a humanoid robot; the experiments consisted of object manipulation and bell-ringing tasks. From our experimental results, we show that large amounts of sensory-motor information, including raw RGB images, sound spectrums, and joint angles, are directly fused to generate higher-level multimodal representations. Further, we demonstrated that our proposed framework realizes the following three functions: (1) cross-modal memory retrieval utilizing the information complementation capability of the deep autoencoder; (2) noise-robust behavior recognition utilizing the generalization capability of multimodal features; and (3) multimodal causality acquisition and sensory-motor prediction based on the acquired causality.",
    keywords = "Cross-modal memory retrieval, Deep learning, Multimodal integration, Object manipulation",
    author = "Kuniaki Noda and Hiroaki Arie and Yuki Suga and Tetsuya Ogata",
    year = "2014",
    doi = "10.1016/j.robot.2014.03.003",
    language = "English",
    volume = "62",
    pages = "721--736",
    journal = "Robotics and Autonomous Systems",
    issn = "0921-8890",
    publisher = "Elsevier",
    number = "6",

    }

    TY - JOUR

    T1 - Multimodal integration learning of robot behavior using deep neural networks

    AU - Noda, Kuniaki

    AU - Arie, Hiroaki

    AU - Suga, Yuki

    AU - Ogata, Tetsuya

    PY - 2014

    Y1 - 2014

    N2 - For humans to accurately understand the world around them, multimodal integration is essential because it enhances perceptual precision and reduces ambiguity. Computational models replicating such human ability may contribute to the practical use of robots in daily human living environments; however, primarily because of scalability problems that conventional machine learning algorithms suffer from, sensory-motor information processing in robotic applications has typically been achieved via modal-dependent processes. In this paper, we propose a novel computational framework enabling the integration of sensory-motor time-series data and the self-organization of multimodal fused representations based on a deep learning approach. To evaluate our proposed model, we conducted two behavior-learning experiments utilizing a humanoid robot; the experiments consisted of object manipulation and bell-ringing tasks. From our experimental results, we show that large amounts of sensory-motor information, including raw RGB images, sound spectrums, and joint angles, are directly fused to generate higher-level multimodal representations. Further, we demonstrated that our proposed framework realizes the following three functions: (1) cross-modal memory retrieval utilizing the information complementation capability of the deep autoencoder; (2) noise-robust behavior recognition utilizing the generalization capability of multimodal features; and (3) multimodal causality acquisition and sensory-motor prediction based on the acquired causality.

    AB - For humans to accurately understand the world around them, multimodal integration is essential because it enhances perceptual precision and reduces ambiguity. Computational models replicating such human ability may contribute to the practical use of robots in daily human living environments; however, primarily because of scalability problems that conventional machine learning algorithms suffer from, sensory-motor information processing in robotic applications has typically been achieved via modal-dependent processes. In this paper, we propose a novel computational framework enabling the integration of sensory-motor time-series data and the self-organization of multimodal fused representations based on a deep learning approach. To evaluate our proposed model, we conducted two behavior-learning experiments utilizing a humanoid robot; the experiments consisted of object manipulation and bell-ringing tasks. From our experimental results, we show that large amounts of sensory-motor information, including raw RGB images, sound spectrums, and joint angles, are directly fused to generate higher-level multimodal representations. Further, we demonstrated that our proposed framework realizes the following three functions: (1) cross-modal memory retrieval utilizing the information complementation capability of the deep autoencoder; (2) noise-robust behavior recognition utilizing the generalization capability of multimodal features; and (3) multimodal causality acquisition and sensory-motor prediction based on the acquired causality.

    KW - Cross-modal memory retrieval

    KW - Deep learning

    KW - Multimodal integration

    KW - Object manipulation

    UR - http://www.scopus.com/inward/record.url?scp=84899525901&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84899525901&partnerID=8YFLogxK

    U2 - 10.1016/j.robot.2014.03.003

    DO - 10.1016/j.robot.2014.03.003

    M3 - Article

    AN - SCOPUS:84899525901

    VL - 62

    SP - 721

    EP - 736

    JO - Robotics and Autonomous Systems

    JF - Robotics and Autonomous Systems

    SN - 0921-8890

    IS - 6

    ER -