Improvement of audio-visual score following in robot ensemble with human guitarist

Tatsuhiko Itohara, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno

    研究成果: Conference contribution

    6 引用 (Scopus)

    抄録

    Our goal is to create an ensemble between human guitarists and music robots, e.g., singing and playing instruments robots. Such robots need to detect the tempo and beat time of the music. Score following and beat tracking, which requires and does not requires a score, are commonly used for this purpose. Score following is an incremental audio-to-score alignment. Although most score following methods assume that players have a precise score, most scores for guitarists have only melody and chord sequences without any beat patterns. An audio-visual beat tracking for guitarists is reported that improves the accuracy of beat detection. However, the result of this method is still insufficient because it uses only onset information, not pitch information, and because the hand tracking shows low accuracy. In this paper, we report a multimodal score following for a guitar performance, an extension of an audio-visual beat tracking method. The main difference is to use chord sequences to improve tracking of audio signals and depth information to improve tracking of guitar playing. Chord sequences are used for the calculation of chord correlation between the input and a score. Depth information is used in the guitar plane masking by three dimensional Hough transform, for the stable detection of a hand. Finally, the system extracts score positions and tempos by a particle-filter based integration of audio and visual features, The resulting score following system improves the tempo and the score position of a performance by 0.2 [sec] compared to an existing system.

    元の言語English
    ホスト出版物のタイトルIEEE-RAS International Conference on Humanoid Robots
    ページ574-579
    ページ数6
    DOI
    出版物ステータスPublished - 2012
    イベント2012 12th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2012 - Osaka
    継続期間: 2012 11 292012 12 1

    Other

    Other2012 12th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2012
    Osaka
    期間12/11/2912/12/1

    Fingerprint

    Robots
    Hough transforms

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Vision and Pattern Recognition
    • Hardware and Architecture
    • Human-Computer Interaction
    • Electrical and Electronic Engineering

    これを引用

    Itohara, T., Nakadai, K., Ogata, T., & Okuno, H. G. (2012). Improvement of audio-visual score following in robot ensemble with human guitarist. : IEEE-RAS International Conference on Humanoid Robots (pp. 574-579). [6651577] https://doi.org/10.1109/HUMANOIDS.2012.6651577

    Improvement of audio-visual score following in robot ensemble with human guitarist. / Itohara, Tatsuhiko; Nakadai, Kazuhiro; Ogata, Tetsuya; Okuno, Hiroshi G.

    IEEE-RAS International Conference on Humanoid Robots. 2012. p. 574-579 6651577.

    研究成果: Conference contribution

    Itohara, T, Nakadai, K, Ogata, T & Okuno, HG 2012, Improvement of audio-visual score following in robot ensemble with human guitarist. : IEEE-RAS International Conference on Humanoid Robots., 6651577, pp. 574-579, 2012 12th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2012, Osaka, 12/11/29. https://doi.org/10.1109/HUMANOIDS.2012.6651577
    Itohara T, Nakadai K, Ogata T, Okuno HG. Improvement of audio-visual score following in robot ensemble with human guitarist. : IEEE-RAS International Conference on Humanoid Robots. 2012. p. 574-579. 6651577 https://doi.org/10.1109/HUMANOIDS.2012.6651577
    Itohara, Tatsuhiko ; Nakadai, Kazuhiro ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Improvement of audio-visual score following in robot ensemble with human guitarist. IEEE-RAS International Conference on Humanoid Robots. 2012. pp. 574-579
    @inproceedings{ca46f25865014893aea0774a76c793c7,
    title = "Improvement of audio-visual score following in robot ensemble with human guitarist",
    abstract = "Our goal is to create an ensemble between human guitarists and music robots, e.g., singing and playing instruments robots. Such robots need to detect the tempo and beat time of the music. Score following and beat tracking, which requires and does not requires a score, are commonly used for this purpose. Score following is an incremental audio-to-score alignment. Although most score following methods assume that players have a precise score, most scores for guitarists have only melody and chord sequences without any beat patterns. An audio-visual beat tracking for guitarists is reported that improves the accuracy of beat detection. However, the result of this method is still insufficient because it uses only onset information, not pitch information, and because the hand tracking shows low accuracy. In this paper, we report a multimodal score following for a guitar performance, an extension of an audio-visual beat tracking method. The main difference is to use chord sequences to improve tracking of audio signals and depth information to improve tracking of guitar playing. Chord sequences are used for the calculation of chord correlation between the input and a score. Depth information is used in the guitar plane masking by three dimensional Hough transform, for the stable detection of a hand. Finally, the system extracts score positions and tempos by a particle-filter based integration of audio and visual features, The resulting score following system improves the tempo and the score position of a performance by 0.2 [sec] compared to an existing system.",
    author = "Tatsuhiko Itohara and Kazuhiro Nakadai and Tetsuya Ogata and Okuno, {Hiroshi G.}",
    year = "2012",
    doi = "10.1109/HUMANOIDS.2012.6651577",
    language = "English",
    isbn = "9781467313698",
    pages = "574--579",
    booktitle = "IEEE-RAS International Conference on Humanoid Robots",

    }

    TY - GEN

    T1 - Improvement of audio-visual score following in robot ensemble with human guitarist

    AU - Itohara, Tatsuhiko

    AU - Nakadai, Kazuhiro

    AU - Ogata, Tetsuya

    AU - Okuno, Hiroshi G.

    PY - 2012

    Y1 - 2012

    N2 - Our goal is to create an ensemble between human guitarists and music robots, e.g., singing and playing instruments robots. Such robots need to detect the tempo and beat time of the music. Score following and beat tracking, which requires and does not requires a score, are commonly used for this purpose. Score following is an incremental audio-to-score alignment. Although most score following methods assume that players have a precise score, most scores for guitarists have only melody and chord sequences without any beat patterns. An audio-visual beat tracking for guitarists is reported that improves the accuracy of beat detection. However, the result of this method is still insufficient because it uses only onset information, not pitch information, and because the hand tracking shows low accuracy. In this paper, we report a multimodal score following for a guitar performance, an extension of an audio-visual beat tracking method. The main difference is to use chord sequences to improve tracking of audio signals and depth information to improve tracking of guitar playing. Chord sequences are used for the calculation of chord correlation between the input and a score. Depth information is used in the guitar plane masking by three dimensional Hough transform, for the stable detection of a hand. Finally, the system extracts score positions and tempos by a particle-filter based integration of audio and visual features, The resulting score following system improves the tempo and the score position of a performance by 0.2 [sec] compared to an existing system.

    AB - Our goal is to create an ensemble between human guitarists and music robots, e.g., singing and playing instruments robots. Such robots need to detect the tempo and beat time of the music. Score following and beat tracking, which requires and does not requires a score, are commonly used for this purpose. Score following is an incremental audio-to-score alignment. Although most score following methods assume that players have a precise score, most scores for guitarists have only melody and chord sequences without any beat patterns. An audio-visual beat tracking for guitarists is reported that improves the accuracy of beat detection. However, the result of this method is still insufficient because it uses only onset information, not pitch information, and because the hand tracking shows low accuracy. In this paper, we report a multimodal score following for a guitar performance, an extension of an audio-visual beat tracking method. The main difference is to use chord sequences to improve tracking of audio signals and depth information to improve tracking of guitar playing. Chord sequences are used for the calculation of chord correlation between the input and a score. Depth information is used in the guitar plane masking by three dimensional Hough transform, for the stable detection of a hand. Finally, the system extracts score positions and tempos by a particle-filter based integration of audio and visual features, The resulting score following system improves the tempo and the score position of a performance by 0.2 [sec] compared to an existing system.

    UR - http://www.scopus.com/inward/record.url?scp=84891054713&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84891054713&partnerID=8YFLogxK

    U2 - 10.1109/HUMANOIDS.2012.6651577

    DO - 10.1109/HUMANOIDS.2012.6651577

    M3 - Conference contribution

    AN - SCOPUS:84891054713

    SN - 9781467313698

    SP - 574

    EP - 579

    BT - IEEE-RAS International Conference on Humanoid Robots

    ER -