TY - GEN
T1 - Comparison of several acoustic features for the vowel sequence reproduction of a talking robot
AU - Thanh, Vo Nhu
AU - Sawada, Hideyuki
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/9/1
Y1 - 2016/9/1
N2 - This study compares several acoustic features for developing an automatic vowel sequence reproduction system for a talking robot, which is a mechanical vocalization system modeling the human articulatory system. Matlab-based control system is used to analyze a recorded sound and drives the articulatory motors of the talking robot. A novel method based on short-time energy analysis is used to extract a human speech and translate into a sequence of sound elements for the sequence of vowels reproduction. Then, several phonemes detection methods including the direct cross-correlation analysis, the linear predictive coding (LPC) association, the partial correlation (PARCOR) coefficients analysis, and the formant frequencies comparison are applied to each sound element to give the corrected command for the talking robot to repeat the sound sequentially. Finally, experiments to compare these techniques and verify the working behavior of the robot are performed. The result of the tests indicates that the robot is able to repeat a sequence of vowels spoken by a human with a successful rate of more than 70% for the PARCOR analysis technique and the formant frequencies comparison technique. The greatest accuracy for repeating the sequence is given by the formant comparison method, while the direct cross-correlation method delivers the least accuracy.
AB - This study compares several acoustic features for developing an automatic vowel sequence reproduction system for a talking robot, which is a mechanical vocalization system modeling the human articulatory system. Matlab-based control system is used to analyze a recorded sound and drives the articulatory motors of the talking robot. A novel method based on short-time energy analysis is used to extract a human speech and translate into a sequence of sound elements for the sequence of vowels reproduction. Then, several phonemes detection methods including the direct cross-correlation analysis, the linear predictive coding (LPC) association, the partial correlation (PARCOR) coefficients analysis, and the formant frequencies comparison are applied to each sound element to give the corrected command for the talking robot to repeat the sound sequentially. Finally, experiments to compare these techniques and verify the working behavior of the robot are performed. The result of the tests indicates that the robot is able to repeat a sequence of vowels spoken by a human with a successful rate of more than 70% for the PARCOR analysis technique and the formant frequencies comparison technique. The greatest accuracy for repeating the sequence is given by the formant comparison method, while the direct cross-correlation method delivers the least accuracy.
KW - Talking-robot
KW - acoustic features
KW - cross correlation
KW - vowel sequence
UR - http://www.scopus.com/inward/record.url?scp=84991229907&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991229907&partnerID=8YFLogxK
U2 - 10.1109/ICMA.2016.7558722
DO - 10.1109/ICMA.2016.7558722
M3 - Conference contribution
AN - SCOPUS:84991229907
T3 - 2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016
SP - 1137
EP - 1142
BT - 2016 IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th IEEE International Conference on Mechatronics and Automation, IEEE ICMA 2016
Y2 - 7 August 2016 through 10 August 2016
ER -