Transcribing vocal expression from polyphonic music

Yukara Ikemiya, Katsutoshi Itoyama, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

A method for transcribing vocal expressions such as vibrato, glissando, and kobushi separately from polyphonic music is described. The expressions appear as fluctuation in the fundamental frequency contour of the singing voice. They can be used for search and retrieval of music and for expressive singing voice synthesis based on singing style since they strongly reflect the individuality of the singer. The fundamental frequency contour of the singing voice is estimated using the Viterbi algorithm with limitation from a corresponding note sequence. Next, the notes are aligned with the fundamental frequency sequence temporally. Finally, each expression is identified and parameterized in accordance with designed rules. Experiments demonstrated that this method can transcribe expressions in the singing voice from commercial recordings.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3127-3131
Number of pages5
ISBN (Print)9781479928927
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence
Duration: 2014 May 42014 May 9

Other

Other2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
CityFlorence
Period14/5/414/5/9

Fingerprint

Viterbi algorithm
Speech synthesis
Experiments

Keywords

  • F0 estimation
  • Singing voice analysis
  • Vocal expression identification / transcription

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Ikemiya, Y., Itoyama, K., & Okuno, H. G. (2014). Transcribing vocal expression from polyphonic music. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 3127-3131). [6854176] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2014.6854176

Transcribing vocal expression from polyphonic music. / Ikemiya, Yukara; Itoyama, Katsutoshi; Okuno, Hiroshi G.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014. p. 3127-3131 6854176.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ikemiya, Y, Itoyama, K & Okuno, HG 2014, Transcribing vocal expression from polyphonic music. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 6854176, Institute of Electrical and Electronics Engineers Inc., pp. 3127-3131, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, 14/5/4. https://doi.org/10.1109/ICASSP.2014.6854176
Ikemiya Y, Itoyama K, Okuno HG. Transcribing vocal expression from polyphonic music. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2014. p. 3127-3131. 6854176 https://doi.org/10.1109/ICASSP.2014.6854176
Ikemiya, Yukara ; Itoyama, Katsutoshi ; Okuno, Hiroshi G. / Transcribing vocal expression from polyphonic music. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 3127-3131
@inproceedings{b8a0a695870f4521b7dcc46c26419ee7,
title = "Transcribing vocal expression from polyphonic music",
abstract = "A method for transcribing vocal expressions such as vibrato, glissando, and kobushi separately from polyphonic music is described. The expressions appear as fluctuation in the fundamental frequency contour of the singing voice. They can be used for search and retrieval of music and for expressive singing voice synthesis based on singing style since they strongly reflect the individuality of the singer. The fundamental frequency contour of the singing voice is estimated using the Viterbi algorithm with limitation from a corresponding note sequence. Next, the notes are aligned with the fundamental frequency sequence temporally. Finally, each expression is identified and parameterized in accordance with designed rules. Experiments demonstrated that this method can transcribe expressions in the singing voice from commercial recordings.",
keywords = "F0 estimation, Singing voice analysis, Vocal expression identification / transcription",
author = "Yukara Ikemiya and Katsutoshi Itoyama and Okuno, {Hiroshi G.}",
year = "2014",
doi = "10.1109/ICASSP.2014.6854176",
language = "English",
isbn = "9781479928927",
pages = "3127--3131",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Transcribing vocal expression from polyphonic music

AU - Ikemiya, Yukara

AU - Itoyama, Katsutoshi

AU - Okuno, Hiroshi G.

PY - 2014

Y1 - 2014

N2 - A method for transcribing vocal expressions such as vibrato, glissando, and kobushi separately from polyphonic music is described. The expressions appear as fluctuation in the fundamental frequency contour of the singing voice. They can be used for search and retrieval of music and for expressive singing voice synthesis based on singing style since they strongly reflect the individuality of the singer. The fundamental frequency contour of the singing voice is estimated using the Viterbi algorithm with limitation from a corresponding note sequence. Next, the notes are aligned with the fundamental frequency sequence temporally. Finally, each expression is identified and parameterized in accordance with designed rules. Experiments demonstrated that this method can transcribe expressions in the singing voice from commercial recordings.

AB - A method for transcribing vocal expressions such as vibrato, glissando, and kobushi separately from polyphonic music is described. The expressions appear as fluctuation in the fundamental frequency contour of the singing voice. They can be used for search and retrieval of music and for expressive singing voice synthesis based on singing style since they strongly reflect the individuality of the singer. The fundamental frequency contour of the singing voice is estimated using the Viterbi algorithm with limitation from a corresponding note sequence. Next, the notes are aligned with the fundamental frequency sequence temporally. Finally, each expression is identified and parameterized in accordance with designed rules. Experiments demonstrated that this method can transcribe expressions in the singing voice from commercial recordings.

KW - F0 estimation

KW - Singing voice analysis

KW - Vocal expression identification / transcription

UR - http://www.scopus.com/inward/record.url?scp=84905276862&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905276862&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2014.6854176

DO - 10.1109/ICASSP.2014.6854176

M3 - Conference contribution

AN - SCOPUS:84905276862

SN - 9781479928927

SP - 3127

EP - 3131

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -