Incremental polyphonic audio to score alignment using beat tracking for singer robots

Takuma Otsuka, Kazumasa Murata, Kazuhiro Nakadai, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

研究成果: Conference contribution

11 引用 (Scopus)

抄録

We aim at developing a singer robot capable of listening to music with its own "ears" and interacting with a human's musical performance. Such a singer robot requires at least three functions: listening to the music, understanding what position in the music is being performed, and generating a singing voice. In this paper, we focus on the second function, that is, the capability to align an audio signal to its musical score represented symbolically. Issues underlying the score alignment problem are: (1) diversity in the sounds of various musical instruments, (2) difference between the audio signal and the musical score, (3) fluctuation in tempo of the musical performance. Our solutions to these issues are as follows: (1) the design of features based on a chroma vector in the 12-tone model and onset of the sound, (2) defining the rareness for each tone based on the idea that scarcely used tone is salient in the audio signal, and (3) the use of a switching Kalman filter for robust tempo estimation. The experimental result shows that our score alignment method improves the average of cumulative absolute errors in score alignment by 29% using 100 popular music tunes compared to the beat tracking without score alignment.

元の言語English
ホスト出版物のタイトル2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009
ページ2289-2296
ページ数8
DOI
出版物ステータスPublished - 2009 12 11
外部発表Yes
イベント2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009 - St. Louis, MO
継続期間: 2009 10 112009 10 15

Other

Other2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009
St. Louis, MO
期間09/10/1109/10/15

Fingerprint

Robots
Acoustic waves
Musical instruments
Kalman filters

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Control and Systems Engineering

これを引用

Otsuka, T., Murata, K., Nakadai, K., Takahashi, T., Komatani, K., Ogata, T., & Okuno, H. G. (2009). Incremental polyphonic audio to score alignment using beat tracking for singer robots. : 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009 (pp. 2289-2296). [5354637] https://doi.org/10.1109/IROS.2009.5354637

Incremental polyphonic audio to score alignment using beat tracking for singer robots. / Otsuka, Takuma; Murata, Kazumasa; Nakadai, Kazuhiro; Takahashi, Toru; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009. 2009. p. 2289-2296 5354637.

研究成果: Conference contribution

Otsuka, T, Murata, K, Nakadai, K, Takahashi, T, Komatani, K, Ogata, T & Okuno, HG 2009, Incremental polyphonic audio to score alignment using beat tracking for singer robots. : 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009., 5354637, pp. 2289-2296, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009, St. Louis, MO, 09/10/11. https://doi.org/10.1109/IROS.2009.5354637
Otsuka T, Murata K, Nakadai K, Takahashi T, Komatani K, Ogata T その他. Incremental polyphonic audio to score alignment using beat tracking for singer robots. : 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009. 2009. p. 2289-2296. 5354637 https://doi.org/10.1109/IROS.2009.5354637
Otsuka, Takuma ; Murata, Kazumasa ; Nakadai, Kazuhiro ; Takahashi, Toru ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Incremental polyphonic audio to score alignment using beat tracking for singer robots. 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009. 2009. pp. 2289-2296
@inproceedings{0f7a75ed028543a5a1acb7604a261457,
title = "Incremental polyphonic audio to score alignment using beat tracking for singer robots",
abstract = "We aim at developing a singer robot capable of listening to music with its own {"}ears{"} and interacting with a human's musical performance. Such a singer robot requires at least three functions: listening to the music, understanding what position in the music is being performed, and generating a singing voice. In this paper, we focus on the second function, that is, the capability to align an audio signal to its musical score represented symbolically. Issues underlying the score alignment problem are: (1) diversity in the sounds of various musical instruments, (2) difference between the audio signal and the musical score, (3) fluctuation in tempo of the musical performance. Our solutions to these issues are as follows: (1) the design of features based on a chroma vector in the 12-tone model and onset of the sound, (2) defining the rareness for each tone based on the idea that scarcely used tone is salient in the audio signal, and (3) the use of a switching Kalman filter for robust tempo estimation. The experimental result shows that our score alignment method improves the average of cumulative absolute errors in score alignment by 29{\%} using 100 popular music tunes compared to the beat tracking without score alignment.",
author = "Takuma Otsuka and Kazumasa Murata and Kazuhiro Nakadai and Toru Takahashi and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2009",
month = "12",
day = "11",
doi = "10.1109/IROS.2009.5354637",
language = "English",
isbn = "9781424438044",
pages = "2289--2296",
booktitle = "2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009",

}

TY - GEN

T1 - Incremental polyphonic audio to score alignment using beat tracking for singer robots

AU - Otsuka, Takuma

AU - Murata, Kazumasa

AU - Nakadai, Kazuhiro

AU - Takahashi, Toru

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2009/12/11

Y1 - 2009/12/11

N2 - We aim at developing a singer robot capable of listening to music with its own "ears" and interacting with a human's musical performance. Such a singer robot requires at least three functions: listening to the music, understanding what position in the music is being performed, and generating a singing voice. In this paper, we focus on the second function, that is, the capability to align an audio signal to its musical score represented symbolically. Issues underlying the score alignment problem are: (1) diversity in the sounds of various musical instruments, (2) difference between the audio signal and the musical score, (3) fluctuation in tempo of the musical performance. Our solutions to these issues are as follows: (1) the design of features based on a chroma vector in the 12-tone model and onset of the sound, (2) defining the rareness for each tone based on the idea that scarcely used tone is salient in the audio signal, and (3) the use of a switching Kalman filter for robust tempo estimation. The experimental result shows that our score alignment method improves the average of cumulative absolute errors in score alignment by 29% using 100 popular music tunes compared to the beat tracking without score alignment.

AB - We aim at developing a singer robot capable of listening to music with its own "ears" and interacting with a human's musical performance. Such a singer robot requires at least three functions: listening to the music, understanding what position in the music is being performed, and generating a singing voice. In this paper, we focus on the second function, that is, the capability to align an audio signal to its musical score represented symbolically. Issues underlying the score alignment problem are: (1) diversity in the sounds of various musical instruments, (2) difference between the audio signal and the musical score, (3) fluctuation in tempo of the musical performance. Our solutions to these issues are as follows: (1) the design of features based on a chroma vector in the 12-tone model and onset of the sound, (2) defining the rareness for each tone based on the idea that scarcely used tone is salient in the audio signal, and (3) the use of a switching Kalman filter for robust tempo estimation. The experimental result shows that our score alignment method improves the average of cumulative absolute errors in score alignment by 29% using 100 popular music tunes compared to the beat tracking without score alignment.

UR - http://www.scopus.com/inward/record.url?scp=76249090933&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=76249090933&partnerID=8YFLogxK

U2 - 10.1109/IROS.2009.5354637

DO - 10.1109/IROS.2009.5354637

M3 - Conference contribution

AN - SCOPUS:76249090933

SN - 9781424438044

SP - 2289

EP - 2296

BT - 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009

ER -