Perceptual similarity measurement of speech by combination of acoustic features

Yoshihiro Adachi, Shinichi Kawamoto, Shigeo Morishima, Satoshi Nakamura

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    10 Citations (Scopus)

    Abstract

    Future cast system is a new entertainment system where participant's face is captured and rendered into the movie as an instant Computer Graphics (CG) movie star, which had been first exhibited at the 2005 World Exposition in Aichi Japan. We are working to add new functionality which enables mapping not only faces but also speech individualities to the cast. Our approach is to find a speaker with the closest speech individuality and apply voice conversion. This paper investigates acoustic features to estimate perceptual similarity of speech individuality. We propose a method linearly combined eight acoustic features related to the perception of speech individualities. The proposed method optimizes weights for the acoustic features considering perceptual similarities. We have evaluated performance of our method with Spearman's rank correlation coefficients to perceptual similarities. As the results, the experiments evidenced that the proposed method achieves a correlation coefficient of 0.66.

    Original languageEnglish
    Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
    Pages4861-4864
    Number of pages4
    DOIs
    Publication statusPublished - 2008
    Event2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP - Las Vegas, NV
    Duration: 2008 Mar 312008 Apr 4

    Other

    Other2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
    CityLas Vegas, NV
    Period08/3/3108/4/4

    Fingerprint

    Acoustics
    acoustics
    correlation coefficients
    casts
    computer graphics
    Computer graphics
    Japan
    Stars
    stars
    estimates
    Experiments

    Keywords

    • Acoustic correlators
    • Speaker recognition
    • Speech analysis

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Signal Processing
    • Acoustics and Ultrasonics

    Cite this

    Adachi, Y., Kawamoto, S., Morishima, S., & Nakamura, S. (2008). Perceptual similarity measurement of speech by combination of acoustic features. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 4861-4864). [4518746] https://doi.org/10.1109/ICASSP.2008.4518746

    Perceptual similarity measurement of speech by combination of acoustic features. / Adachi, Yoshihiro; Kawamoto, Shinichi; Morishima, Shigeo; Nakamura, Satoshi.

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2008. p. 4861-4864 4518746.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Adachi, Y, Kawamoto, S, Morishima, S & Nakamura, S 2008, Perceptual similarity measurement of speech by combination of acoustic features. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 4518746, pp. 4861-4864, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Las Vegas, NV, 08/3/31. https://doi.org/10.1109/ICASSP.2008.4518746
    Adachi Y, Kawamoto S, Morishima S, Nakamura S. Perceptual similarity measurement of speech by combination of acoustic features. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2008. p. 4861-4864. 4518746 https://doi.org/10.1109/ICASSP.2008.4518746
    Adachi, Yoshihiro ; Kawamoto, Shinichi ; Morishima, Shigeo ; Nakamura, Satoshi. / Perceptual similarity measurement of speech by combination of acoustic features. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. 2008. pp. 4861-4864
    @inproceedings{e9aeeb836344401e946b3c9302a861cc,
    title = "Perceptual similarity measurement of speech by combination of acoustic features",
    abstract = "Future cast system is a new entertainment system where participant's face is captured and rendered into the movie as an instant Computer Graphics (CG) movie star, which had been first exhibited at the 2005 World Exposition in Aichi Japan. We are working to add new functionality which enables mapping not only faces but also speech individualities to the cast. Our approach is to find a speaker with the closest speech individuality and apply voice conversion. This paper investigates acoustic features to estimate perceptual similarity of speech individuality. We propose a method linearly combined eight acoustic features related to the perception of speech individualities. The proposed method optimizes weights for the acoustic features considering perceptual similarities. We have evaluated performance of our method with Spearman's rank correlation coefficients to perceptual similarities. As the results, the experiments evidenced that the proposed method achieves a correlation coefficient of 0.66.",
    keywords = "Acoustic correlators, Speaker recognition, Speech analysis",
    author = "Yoshihiro Adachi and Shinichi Kawamoto and Shigeo Morishima and Satoshi Nakamura",
    year = "2008",
    doi = "10.1109/ICASSP.2008.4518746",
    language = "English",
    isbn = "1424414849",
    pages = "4861--4864",
    booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

    }

    TY - GEN

    T1 - Perceptual similarity measurement of speech by combination of acoustic features

    AU - Adachi, Yoshihiro

    AU - Kawamoto, Shinichi

    AU - Morishima, Shigeo

    AU - Nakamura, Satoshi

    PY - 2008

    Y1 - 2008

    N2 - Future cast system is a new entertainment system where participant's face is captured and rendered into the movie as an instant Computer Graphics (CG) movie star, which had been first exhibited at the 2005 World Exposition in Aichi Japan. We are working to add new functionality which enables mapping not only faces but also speech individualities to the cast. Our approach is to find a speaker with the closest speech individuality and apply voice conversion. This paper investigates acoustic features to estimate perceptual similarity of speech individuality. We propose a method linearly combined eight acoustic features related to the perception of speech individualities. The proposed method optimizes weights for the acoustic features considering perceptual similarities. We have evaluated performance of our method with Spearman's rank correlation coefficients to perceptual similarities. As the results, the experiments evidenced that the proposed method achieves a correlation coefficient of 0.66.

    AB - Future cast system is a new entertainment system where participant's face is captured and rendered into the movie as an instant Computer Graphics (CG) movie star, which had been first exhibited at the 2005 World Exposition in Aichi Japan. We are working to add new functionality which enables mapping not only faces but also speech individualities to the cast. Our approach is to find a speaker with the closest speech individuality and apply voice conversion. This paper investigates acoustic features to estimate perceptual similarity of speech individuality. We propose a method linearly combined eight acoustic features related to the perception of speech individualities. The proposed method optimizes weights for the acoustic features considering perceptual similarities. We have evaluated performance of our method with Spearman's rank correlation coefficients to perceptual similarities. As the results, the experiments evidenced that the proposed method achieves a correlation coefficient of 0.66.

    KW - Acoustic correlators

    KW - Speaker recognition

    KW - Speech analysis

    UR - http://www.scopus.com/inward/record.url?scp=51449118243&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=51449118243&partnerID=8YFLogxK

    U2 - 10.1109/ICASSP.2008.4518746

    DO - 10.1109/ICASSP.2008.4518746

    M3 - Conference contribution

    SN - 1424414849

    SN - 9781424414840

    SP - 4861

    EP - 4864

    BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

    ER -