Hybrid voice conversion of unit selection and generation using prosody dependent HMM

Tadashi Okubo, Ryo Mochizuki, Tetsunori Kobayashi

    Research output: Contribution to journalArticle

    6 Citations (Scopus)

    Abstract

    We propose a hybrid voice conversion method which employs a combination of techniques using HMM-based unit selection and spectrum generation. In the proposed method, the HMM-based unit selection selects the most likely unit for the required phoneme context from the target speaker's corpus when candidates of the target unit exist in the corpus. Unit selection is performed based on the sequence of the spectral probability distribution obtained from the adapted HMMs. On the other hand, when a target unit does not exist in a corpus, a target waveform is generated from the adapted HMM sequence by maximizing the spectral likelihood. The proposed method also employs the HMM in which the spectral probability distribution is adjusted to the target prosody using the weight defined by the prosodic probability of each distribution. To show the effectiveness of the proposed method, sound quality and speaker individuality tests were conducted. The results revealed that the proposed method could produce high-quality speech and individuality of the synthesized sound was more similar to the target speaker compared to conventional methods.

    Original languageEnglish
    Pages (from-to)2775-2782
    Number of pages8
    JournalIEICE Transactions on Information and Systems
    VolumeE89-D
    Issue number11
    DOIs
    Publication statusPublished - 2006 Nov

    Fingerprint

    Probability distributions
    Acoustic waves

    Keywords

    • HMM
    • MLLR
    • Speech synthesis
    • Unit selection
    • Voice conversion

    ASJC Scopus subject areas

    • Information Systems
    • Computer Graphics and Computer-Aided Design
    • Software

    Cite this

    Hybrid voice conversion of unit selection and generation using prosody dependent HMM. / Okubo, Tadashi; Mochizuki, Ryo; Kobayashi, Tetsunori.

    In: IEICE Transactions on Information and Systems, Vol. E89-D, No. 11, 11.2006, p. 2775-2782.

    Research output: Contribution to journalArticle

    @article{e617fe8b8b3b4af3ba3ba4589c749903,
    title = "Hybrid voice conversion of unit selection and generation using prosody dependent HMM",
    abstract = "We propose a hybrid voice conversion method which employs a combination of techniques using HMM-based unit selection and spectrum generation. In the proposed method, the HMM-based unit selection selects the most likely unit for the required phoneme context from the target speaker's corpus when candidates of the target unit exist in the corpus. Unit selection is performed based on the sequence of the spectral probability distribution obtained from the adapted HMMs. On the other hand, when a target unit does not exist in a corpus, a target waveform is generated from the adapted HMM sequence by maximizing the spectral likelihood. The proposed method also employs the HMM in which the spectral probability distribution is adjusted to the target prosody using the weight defined by the prosodic probability of each distribution. To show the effectiveness of the proposed method, sound quality and speaker individuality tests were conducted. The results revealed that the proposed method could produce high-quality speech and individuality of the synthesized sound was more similar to the target speaker compared to conventional methods.",
    keywords = "HMM, MLLR, Speech synthesis, Unit selection, Voice conversion",
    author = "Tadashi Okubo and Ryo Mochizuki and Tetsunori Kobayashi",
    year = "2006",
    month = "11",
    doi = "10.1093/ietisy/e89-d.11.2775",
    language = "English",
    volume = "E89-D",
    pages = "2775--2782",
    journal = "IEICE Transactions on Information and Systems",
    issn = "0916-8532",
    publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
    number = "11",

    }

    TY - JOUR

    T1 - Hybrid voice conversion of unit selection and generation using prosody dependent HMM

    AU - Okubo, Tadashi

    AU - Mochizuki, Ryo

    AU - Kobayashi, Tetsunori

    PY - 2006/11

    Y1 - 2006/11

    N2 - We propose a hybrid voice conversion method which employs a combination of techniques using HMM-based unit selection and spectrum generation. In the proposed method, the HMM-based unit selection selects the most likely unit for the required phoneme context from the target speaker's corpus when candidates of the target unit exist in the corpus. Unit selection is performed based on the sequence of the spectral probability distribution obtained from the adapted HMMs. On the other hand, when a target unit does not exist in a corpus, a target waveform is generated from the adapted HMM sequence by maximizing the spectral likelihood. The proposed method also employs the HMM in which the spectral probability distribution is adjusted to the target prosody using the weight defined by the prosodic probability of each distribution. To show the effectiveness of the proposed method, sound quality and speaker individuality tests were conducted. The results revealed that the proposed method could produce high-quality speech and individuality of the synthesized sound was more similar to the target speaker compared to conventional methods.

    AB - We propose a hybrid voice conversion method which employs a combination of techniques using HMM-based unit selection and spectrum generation. In the proposed method, the HMM-based unit selection selects the most likely unit for the required phoneme context from the target speaker's corpus when candidates of the target unit exist in the corpus. Unit selection is performed based on the sequence of the spectral probability distribution obtained from the adapted HMMs. On the other hand, when a target unit does not exist in a corpus, a target waveform is generated from the adapted HMM sequence by maximizing the spectral likelihood. The proposed method also employs the HMM in which the spectral probability distribution is adjusted to the target prosody using the weight defined by the prosodic probability of each distribution. To show the effectiveness of the proposed method, sound quality and speaker individuality tests were conducted. The results revealed that the proposed method could produce high-quality speech and individuality of the synthesized sound was more similar to the target speaker compared to conventional methods.

    KW - HMM

    KW - MLLR

    KW - Speech synthesis

    KW - Unit selection

    KW - Voice conversion

    UR - http://www.scopus.com/inward/record.url?scp=33845586220&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33845586220&partnerID=8YFLogxK

    U2 - 10.1093/ietisy/e89-d.11.2775

    DO - 10.1093/ietisy/e89-d.11.2775

    M3 - Article

    VL - E89-D

    SP - 2775

    EP - 2782

    JO - IEICE Transactions on Information and Systems

    JF - IEICE Transactions on Information and Systems

    SN - 0916-8532

    IS - 11

    ER -