Discrimination of “hot potato voice” caused by upper airway obstruction utilizing a support vector machine

Shintaro Fujimura, Tsuyoshi Kojima, Yusuke Okanoue, Kazuhiko Shoji, Masato Inoue, Ryusuke Hori

    Research output: Contribution to journalArticle

    Abstract

    Objectives/Hypothesis: “Hot potato voice” (HPV) is a thick, muffled voice caused by pharyngeal or laryngeal diseases characterized by severe upper airway obstruction, including acute epiglottitis and peritonsillitis. To develop a method for determining upper-airway emergency based on this important vocal feature, we investigated the acoustic characteristics of HPV using a physical, articulatory speech synthesis model. The results of the simulation were then applied to design a computerized recognition framework using a mel-frequency cepstral coefficient domain support vector machine (SVM). Study Design: Quasi-experimental research design. Methods: Changes in the voice spectral envelope caused by upper airway obstructions were analyzed using a hybrid time-frequency model of articulatory speech synthesis. We evaluated variations in the formant structure and thresholds of critical vocal tract area functions that triggered HPV. The SVMs were trained using a dataset of 2,200 synthetic voice samples generated by an articulatory synthesizer. Voice classification experiments on test datasets of real patient voices were then performed. Results: On phonation of the Japanese vowel /e/, the frequency of the second formant fell and coalesced with that of the first formant as the area function of the oropharynx decreased. Changes in higher-order formants varied according to constriction location. The highest accuracy afforded by the SVM classifier trained with synthetic data was 88.3%. Conclusions: HPV caused by upper airway obstruction has a highly characteristic spectral envelope. Based on this distinctive voice feature, our SVM classifier, who was trained using synthetic data, was able to diagnose upper-airway obstructions with a high degree of accuracy. Level of Evidence: 2c Laryngoscope, 2018.

    Original languageEnglish
    JournalLaryngoscope
    DOIs
    Publication statusAccepted/In press - 2018 Jan 1

    Fingerprint

    Airway Obstruction
    Solanum tuberosum
    Pharyngeal Diseases
    Research Design
    Laryngeal Diseases
    Support Vector Machine
    Epiglottitis
    Laryngoscopes
    Phonation
    Oropharynx
    Acoustics
    Constriction
    Emergencies

    Keywords

    • articulatory speech synthesis
    • Hot potato voice
    • support vector machine
    • upper airway obstruction

    ASJC Scopus subject areas

    • Otorhinolaryngology

    Cite this

    Discrimination of “hot potato voice” caused by upper airway obstruction utilizing a support vector machine. / Fujimura, Shintaro; Kojima, Tsuyoshi; Okanoue, Yusuke; Shoji, Kazuhiko; Inoue, Masato; Hori, Ryusuke.

    In: Laryngoscope, 01.01.2018.

    Research output: Contribution to journalArticle

    Fujimura, Shintaro ; Kojima, Tsuyoshi ; Okanoue, Yusuke ; Shoji, Kazuhiko ; Inoue, Masato ; Hori, Ryusuke. / Discrimination of “hot potato voice” caused by upper airway obstruction utilizing a support vector machine. In: Laryngoscope. 2018.
    @article{cb01a417b49a4f538f945bdbf3a9949f,
    title = "Discrimination of “hot potato voice” caused by upper airway obstruction utilizing a support vector machine",
    abstract = "Objectives/Hypothesis: “Hot potato voice” (HPV) is a thick, muffled voice caused by pharyngeal or laryngeal diseases characterized by severe upper airway obstruction, including acute epiglottitis and peritonsillitis. To develop a method for determining upper-airway emergency based on this important vocal feature, we investigated the acoustic characteristics of HPV using a physical, articulatory speech synthesis model. The results of the simulation were then applied to design a computerized recognition framework using a mel-frequency cepstral coefficient domain support vector machine (SVM). Study Design: Quasi-experimental research design. Methods: Changes in the voice spectral envelope caused by upper airway obstructions were analyzed using a hybrid time-frequency model of articulatory speech synthesis. We evaluated variations in the formant structure and thresholds of critical vocal tract area functions that triggered HPV. The SVMs were trained using a dataset of 2,200 synthetic voice samples generated by an articulatory synthesizer. Voice classification experiments on test datasets of real patient voices were then performed. Results: On phonation of the Japanese vowel /e/, the frequency of the second formant fell and coalesced with that of the first formant as the area function of the oropharynx decreased. Changes in higher-order formants varied according to constriction location. The highest accuracy afforded by the SVM classifier trained with synthetic data was 88.3{\%}. Conclusions: HPV caused by upper airway obstruction has a highly characteristic spectral envelope. Based on this distinctive voice feature, our SVM classifier, who was trained using synthetic data, was able to diagnose upper-airway obstructions with a high degree of accuracy. Level of Evidence: 2c Laryngoscope, 2018.",
    keywords = "articulatory speech synthesis, Hot potato voice, support vector machine, upper airway obstruction",
    author = "Shintaro Fujimura and Tsuyoshi Kojima and Yusuke Okanoue and Kazuhiko Shoji and Masato Inoue and Ryusuke Hori",
    year = "2018",
    month = "1",
    day = "1",
    doi = "10.1002/lary.27584",
    language = "English",
    journal = "Laryngoscope",
    issn = "0023-852X",
    publisher = "John Wiley and Sons Inc.",

    }

    TY - JOUR

    T1 - Discrimination of “hot potato voice” caused by upper airway obstruction utilizing a support vector machine

    AU - Fujimura, Shintaro

    AU - Kojima, Tsuyoshi

    AU - Okanoue, Yusuke

    AU - Shoji, Kazuhiko

    AU - Inoue, Masato

    AU - Hori, Ryusuke

    PY - 2018/1/1

    Y1 - 2018/1/1

    N2 - Objectives/Hypothesis: “Hot potato voice” (HPV) is a thick, muffled voice caused by pharyngeal or laryngeal diseases characterized by severe upper airway obstruction, including acute epiglottitis and peritonsillitis. To develop a method for determining upper-airway emergency based on this important vocal feature, we investigated the acoustic characteristics of HPV using a physical, articulatory speech synthesis model. The results of the simulation were then applied to design a computerized recognition framework using a mel-frequency cepstral coefficient domain support vector machine (SVM). Study Design: Quasi-experimental research design. Methods: Changes in the voice spectral envelope caused by upper airway obstructions were analyzed using a hybrid time-frequency model of articulatory speech synthesis. We evaluated variations in the formant structure and thresholds of critical vocal tract area functions that triggered HPV. The SVMs were trained using a dataset of 2,200 synthetic voice samples generated by an articulatory synthesizer. Voice classification experiments on test datasets of real patient voices were then performed. Results: On phonation of the Japanese vowel /e/, the frequency of the second formant fell and coalesced with that of the first formant as the area function of the oropharynx decreased. Changes in higher-order formants varied according to constriction location. The highest accuracy afforded by the SVM classifier trained with synthetic data was 88.3%. Conclusions: HPV caused by upper airway obstruction has a highly characteristic spectral envelope. Based on this distinctive voice feature, our SVM classifier, who was trained using synthetic data, was able to diagnose upper-airway obstructions with a high degree of accuracy. Level of Evidence: 2c Laryngoscope, 2018.

    AB - Objectives/Hypothesis: “Hot potato voice” (HPV) is a thick, muffled voice caused by pharyngeal or laryngeal diseases characterized by severe upper airway obstruction, including acute epiglottitis and peritonsillitis. To develop a method for determining upper-airway emergency based on this important vocal feature, we investigated the acoustic characteristics of HPV using a physical, articulatory speech synthesis model. The results of the simulation were then applied to design a computerized recognition framework using a mel-frequency cepstral coefficient domain support vector machine (SVM). Study Design: Quasi-experimental research design. Methods: Changes in the voice spectral envelope caused by upper airway obstructions were analyzed using a hybrid time-frequency model of articulatory speech synthesis. We evaluated variations in the formant structure and thresholds of critical vocal tract area functions that triggered HPV. The SVMs were trained using a dataset of 2,200 synthetic voice samples generated by an articulatory synthesizer. Voice classification experiments on test datasets of real patient voices were then performed. Results: On phonation of the Japanese vowel /e/, the frequency of the second formant fell and coalesced with that of the first formant as the area function of the oropharynx decreased. Changes in higher-order formants varied according to constriction location. The highest accuracy afforded by the SVM classifier trained with synthetic data was 88.3%. Conclusions: HPV caused by upper airway obstruction has a highly characteristic spectral envelope. Based on this distinctive voice feature, our SVM classifier, who was trained using synthetic data, was able to diagnose upper-airway obstructions with a high degree of accuracy. Level of Evidence: 2c Laryngoscope, 2018.

    KW - articulatory speech synthesis

    KW - Hot potato voice

    KW - support vector machine

    KW - upper airway obstruction

    UR - http://www.scopus.com/inward/record.url?scp=85057800413&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85057800413&partnerID=8YFLogxK

    U2 - 10.1002/lary.27584

    DO - 10.1002/lary.27584

    M3 - Article

    JO - Laryngoscope

    JF - Laryngoscope

    SN - 0023-852X

    ER -