F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

This paper describes a method for estimating F0s of vocal from polyphonic audio signals. Because melody is sung by a singer in many musical pieces, the estimation of F0s of the vocal part is useful for many applications. Based on existing multiple-F0 estimation method, we evaluate the vocal probabilities of the harmonic structure of each F0 candidate. In order to calculate the vocal probabilities of the harmonic structure, we extract and resynthesize the harmonic structure by using a sinusoidal model and extract feature vectors. Then, we evaluate the vocal probability by using vocal and non-vocal Gaussian mixture models (GMMs). Finally, we track F0 trajectories using these probabilities based on Viterbi search. Experimental results show that our method improves estimation accuracy from 78.1% to 84.3%, which is 28.3% reduction of misestimation.

Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume5
Publication statusPublished - 2006
Externally publishedYes
Event2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 - Toulouse
Duration: 2006 May 142006 May 19

Other

Other2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
CityToulouse
Period06/5/1406/5/19

Fingerprint

audio signals
harmonics
estimating
Trajectories
trajectories
Statistical Models

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. (2006). F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (Vol. 5). [1661260]

F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. / Fujihara, Hiromasa; Kitahara, Tetsuro; Goto, Masataka; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 5 2006. 1661260.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fujihara, H, Kitahara, T, Goto, M, Komatani, K, Ogata, T & Okuno, HG 2006, F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. vol. 5, 1661260, 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, Toulouse, 06/5/14.
Fujihara H, Kitahara T, Goto M, Komatani K, Ogata T, Okuno HG. F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 5. 2006. 1661260
Fujihara, Hiromasa ; Kitahara, Tetsuro ; Goto, Masataka ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Vol. 5 2006.
@inproceedings{ae356ff9cf2146a3af2ccb19624836af,
title = "F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search",
abstract = "This paper describes a method for estimating F0s of vocal from polyphonic audio signals. Because melody is sung by a singer in many musical pieces, the estimation of F0s of the vocal part is useful for many applications. Based on existing multiple-F0 estimation method, we evaluate the vocal probabilities of the harmonic structure of each F0 candidate. In order to calculate the vocal probabilities of the harmonic structure, we extract and resynthesize the harmonic structure by using a sinusoidal model and extract feature vectors. Then, we evaluate the vocal probability by using vocal and non-vocal Gaussian mixture models (GMMs). Finally, we track F0 trajectories using these probabilities based on Viterbi search. Experimental results show that our method improves estimation accuracy from 78.1{\%} to 84.3{\%}, which is 28.3{\%} reduction of misestimation.",
author = "Hiromasa Fujihara and Tetsuro Kitahara and Masataka Goto and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2006",
language = "English",
isbn = "142440469X",
volume = "5",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

}

TY - GEN

T1 - F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search

AU - Fujihara, Hiromasa

AU - Kitahara, Tetsuro

AU - Goto, Masataka

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2006

Y1 - 2006

N2 - This paper describes a method for estimating F0s of vocal from polyphonic audio signals. Because melody is sung by a singer in many musical pieces, the estimation of F0s of the vocal part is useful for many applications. Based on existing multiple-F0 estimation method, we evaluate the vocal probabilities of the harmonic structure of each F0 candidate. In order to calculate the vocal probabilities of the harmonic structure, we extract and resynthesize the harmonic structure by using a sinusoidal model and extract feature vectors. Then, we evaluate the vocal probability by using vocal and non-vocal Gaussian mixture models (GMMs). Finally, we track F0 trajectories using these probabilities based on Viterbi search. Experimental results show that our method improves estimation accuracy from 78.1% to 84.3%, which is 28.3% reduction of misestimation.

AB - This paper describes a method for estimating F0s of vocal from polyphonic audio signals. Because melody is sung by a singer in many musical pieces, the estimation of F0s of the vocal part is useful for many applications. Based on existing multiple-F0 estimation method, we evaluate the vocal probabilities of the harmonic structure of each F0 candidate. In order to calculate the vocal probabilities of the harmonic structure, we extract and resynthesize the harmonic structure by using a sinusoidal model and extract feature vectors. Then, we evaluate the vocal probability by using vocal and non-vocal Gaussian mixture models (GMMs). Finally, we track F0 trajectories using these probabilities based on Viterbi search. Experimental results show that our method improves estimation accuracy from 78.1% to 84.3%, which is 28.3% reduction of misestimation.

UR - http://www.scopus.com/inward/record.url?scp=33947678880&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33947678880&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33947678880

SN - 142440469X

SN - 9781424404698

VL - 5

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ER -