Comparison of several acoustic features for the vowel sequence reproduction of a talking robot

Vo Nhu Thanh, Hideyuki Sawada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This study compares several acoustic features for developing an automatic vowel sequence reproduction system for a talking robot, which is a mechanical vocalization system modeling the human articulatory system. Matlab-based control system is used to analyze a recorded sound and drives the articulatory motors of the talking robot. A novel method based on short-time energy analysis is used to extract a human speech and translate into a sequence of sound elements for the sequence of vowels reproduction. Then, several phonemes detection methods including the direct cross-correlation analysis, the linear predictive coding (LPC) association, the partial correlation (PARCOR) coefficients analysis, and the formant frequencies comparison are applied to each sound element to give the corrected command for the talking robot to repeat the sound sequentially. Finally, experiments to compare these techniques and verify the working behavior of the robot are performed. The result of the tests indicates that the robot is able to repeat a sequence of vowels spoken by a human with a successful rate of more than 70% for the PARCOR analysis technique and the formant frequencies comparison technique. The greatest accuracy for repeating the sequence is given by the formant comparison method, while the direct cross-correlation method delivers the least accuracy.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1137-1142
Number of pages6
ISBN (Electronic)9781509023943
DOIs
Publication statusPublished - 2016 Sep 1
Externally publishedYes
Event13th IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016 - Harbin, Heilongjiang, China
Duration: 2016 Aug 72016 Aug 10

Other

Other13th IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016
CountryChina
CityHarbin, Heilongjiang
Period16/8/716/8/10

Fingerprint

Acoustics
Robots
Acoustic waves
Correlation methods
Control systems
Experiments

Keywords

  • acoustic features
  • cross correlation
  • Talking-robot
  • vowel sequence

ASJC Scopus subject areas

  • Mechanical Engineering
  • Artificial Intelligence
  • Computer Science Applications
  • Software

Cite this

Thanh, V. N., & Sawada, H. (2016). Comparison of several acoustic features for the vowel sequence reproduction of a talking robot. In 2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016 (pp. 1137-1142). [7558722] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICMA.2016.7558722

Comparison of several acoustic features for the vowel sequence reproduction of a talking robot. / Thanh, Vo Nhu; Sawada, Hideyuki.

2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 1137-1142 7558722.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Thanh, VN & Sawada, H 2016, Comparison of several acoustic features for the vowel sequence reproduction of a talking robot. in 2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016., 7558722, Institute of Electrical and Electronics Engineers Inc., pp. 1137-1142, 13th IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016, Harbin, Heilongjiang, China, 16/8/7. https://doi.org/10.1109/ICMA.2016.7558722
Thanh VN, Sawada H. Comparison of several acoustic features for the vowel sequence reproduction of a talking robot. In 2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 1137-1142. 7558722 https://doi.org/10.1109/ICMA.2016.7558722
Thanh, Vo Nhu ; Sawada, Hideyuki. / Comparison of several acoustic features for the vowel sequence reproduction of a talking robot. 2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 1137-1142
@inproceedings{b3ce6a8b016a4bc0829a22d02d72f20e,
title = "Comparison of several acoustic features for the vowel sequence reproduction of a talking robot",
abstract = "This study compares several acoustic features for developing an automatic vowel sequence reproduction system for a talking robot, which is a mechanical vocalization system modeling the human articulatory system. Matlab-based control system is used to analyze a recorded sound and drives the articulatory motors of the talking robot. A novel method based on short-time energy analysis is used to extract a human speech and translate into a sequence of sound elements for the sequence of vowels reproduction. Then, several phonemes detection methods including the direct cross-correlation analysis, the linear predictive coding (LPC) association, the partial correlation (PARCOR) coefficients analysis, and the formant frequencies comparison are applied to each sound element to give the corrected command for the talking robot to repeat the sound sequentially. Finally, experiments to compare these techniques and verify the working behavior of the robot are performed. The result of the tests indicates that the robot is able to repeat a sequence of vowels spoken by a human with a successful rate of more than 70{\%} for the PARCOR analysis technique and the formant frequencies comparison technique. The greatest accuracy for repeating the sequence is given by the formant comparison method, while the direct cross-correlation method delivers the least accuracy.",
keywords = "acoustic features, cross correlation, Talking-robot, vowel sequence",
author = "Thanh, {Vo Nhu} and Hideyuki Sawada",
year = "2016",
month = "9",
day = "1",
doi = "10.1109/ICMA.2016.7558722",
language = "English",
pages = "1137--1142",
booktitle = "2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Comparison of several acoustic features for the vowel sequence reproduction of a talking robot

AU - Thanh, Vo Nhu

AU - Sawada, Hideyuki

PY - 2016/9/1

Y1 - 2016/9/1

N2 - This study compares several acoustic features for developing an automatic vowel sequence reproduction system for a talking robot, which is a mechanical vocalization system modeling the human articulatory system. Matlab-based control system is used to analyze a recorded sound and drives the articulatory motors of the talking robot. A novel method based on short-time energy analysis is used to extract a human speech and translate into a sequence of sound elements for the sequence of vowels reproduction. Then, several phonemes detection methods including the direct cross-correlation analysis, the linear predictive coding (LPC) association, the partial correlation (PARCOR) coefficients analysis, and the formant frequencies comparison are applied to each sound element to give the corrected command for the talking robot to repeat the sound sequentially. Finally, experiments to compare these techniques and verify the working behavior of the robot are performed. The result of the tests indicates that the robot is able to repeat a sequence of vowels spoken by a human with a successful rate of more than 70% for the PARCOR analysis technique and the formant frequencies comparison technique. The greatest accuracy for repeating the sequence is given by the formant comparison method, while the direct cross-correlation method delivers the least accuracy.

AB - This study compares several acoustic features for developing an automatic vowel sequence reproduction system for a talking robot, which is a mechanical vocalization system modeling the human articulatory system. Matlab-based control system is used to analyze a recorded sound and drives the articulatory motors of the talking robot. A novel method based on short-time energy analysis is used to extract a human speech and translate into a sequence of sound elements for the sequence of vowels reproduction. Then, several phonemes detection methods including the direct cross-correlation analysis, the linear predictive coding (LPC) association, the partial correlation (PARCOR) coefficients analysis, and the formant frequencies comparison are applied to each sound element to give the corrected command for the talking robot to repeat the sound sequentially. Finally, experiments to compare these techniques and verify the working behavior of the robot are performed. The result of the tests indicates that the robot is able to repeat a sequence of vowels spoken by a human with a successful rate of more than 70% for the PARCOR analysis technique and the formant frequencies comparison technique. The greatest accuracy for repeating the sequence is given by the formant comparison method, while the direct cross-correlation method delivers the least accuracy.

KW - acoustic features

KW - cross correlation

KW - Talking-robot

KW - vowel sequence

UR - http://www.scopus.com/inward/record.url?scp=84991229907&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991229907&partnerID=8YFLogxK

U2 - 10.1109/ICMA.2016.7558722

DO - 10.1109/ICMA.2016.7558722

M3 - Conference contribution

AN - SCOPUS:84991229907

SP - 1137

EP - 1142

BT - 2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -