TY - JOUR
T1 - Computer-assisted assessment of phonetic fluency in a second language
T2 - a longitudinal study of Japanese learners of French
AU - Detey, Sylvain
AU - Fontan, Lionel
AU - Le Coz, Maxime
AU - Jmel, Saïd
N1 - Funding Information:
This research has been supported by the Japanese Society for the Promotion of Science, Grant-in-Aid (B) n° 23320121 and n°15H03227 to Sylvain Detey. We would like to thank Yuji Kawaguchi and Corentin Barcat from Tokyo University of Foreign Studies, Xavier Aumont from Archean Technologies, as well as the students who took part in the study. The work presented in this article has benefited from comments from the audience at CAP-FIPF2017 (Kyoto, Japan, cf. Le Coz et al., 2017 ), FLORAL-IPFC2017 (Paris, France, cf. Fontan et al., 2017 ) Interspeech2018 (Hyderabad, India, cf. Fontan, Le Coz and Detey, 2018 ) and AFLS2018 (Toulouse, France, cf. Detey et al., 2018a ) conferences, as well as from the very useful remarks and suggestions from three anonymous reviewers. We would like to thank them very warmly, as well as the Chief Editor and Subject Editor of this journal.
Funding Information:
This research has been supported by the Japanese Society for the Promotion of Science, Grant-in-Aid (B) n? 23320121 and n?15H03227 to Sylvain Detey. We would like to thank Yuji Kawaguchi and Corentin Barcat from Tokyo University of Foreign Studies, Xavier Aumont from Archean Technologies, as well as the students who took part in the study. The work presented in this article has benefited from comments from the audience at CAP-FIPF2017 (Kyoto, Japan, cf. Le Coz et al. 2017), FLORAL-IPFC2017 (Paris, France, cf. Fontan et al. 2017) Interspeech2018 (Hyderabad, India, cf. Fontan, Le Coz and Detey, 2018) and AFLS2018 (Toulouse, France, cf. Detey et al. 2018a) conferences, as well as from the very useful remarks and suggestions from three anonymous reviewers. We would like to thank them very warmly, as well as the Chief Editor and Subject Editor of this journal.
Publisher Copyright:
© 2020 Elsevier B.V.
PY - 2020/12
Y1 - 2020/12
N2 - Automatic second language (L2) speech fluency assessment has been one of the ultimate goals of several projects aiming at designing Computer-Assisted Pronunciation Training (CAPT) tools for L2 learners. Usually, three challenges must be tackled in order to solve the issues at stake: 1) Defining fluency from a threefold interdisciplinary perspective (acoustic and perceptual phonetics, computer science, L2 education); 2) Using a cost-effective algorithm; 3) Testing the procedure with actual learners’ data. Despite rapid technical developments in the field of automatic speech processing, the tools which are actually available for learners are still scarce, and most of them rely on automatic speech recognition (ASR). Moreover, most research on the topic is focusing on English as the target L2. Therefore, in this article, we address the following research questions: (a) is it possible to use a non-ASR-based low-level signal segmentation algorithm to predict human expert assessment of phonetic fluency in beginner Japanese learners of French in a text-reading task during the first stages of their learning? (b) if the answer to (a) is positive, then what are the best predictors of phonetic fluency among a set of available measures (see below for more details)? (c) is it possible to use this algorithm to monitor the evolution of phonetic fluency (and of its associated predictors) in these learners in a longitudinal study? As a first step, a corpus of French sentences read aloud by 12 Japanese learners of different proficiency levels in French was used to design a prediction system. The read-aloud speech data was perceptually annotated by three human experts on four dimensions: overall speech fluency, speech rate, regularity of speech rate, speech fluidity (i.e. smoothness of transitions between phones). Inter-rater agreement and reliability were high for all dimensions, and the average human ratings were compared with the scores provided by our prediction system. The results show strong correlations between human and automatic scores of speech rate and regularity of speech rate, and a weak correlation for speech fluidity. Automatic scores were finally combined together through a multiple linear regression model in order to predict overall speech fluency. The best model led to a correlation coefficient of .92 between automatic and human ratings, with a root-mean-square error of .38. In the second step of this study, a corpus of identical sentences read aloud four times over two years by 12 Japanese learners of French (after 4, 7, 12, and 19 months of French courses in Japan) was fed to the automatic system. The results show regular progress in overall speech fluency, which fits with the regular progress the Japanese learners under scrutiny were expected to make through their academic program in French at their university in Japan every semester. Our study suggests a positive answer to our first and third research questions, with speech rate as the best predictor to answer our second research question. In a pedagogical perspective, it seems that such a simple algorithm could be integrated in a CAPT tool to monitor learners’ progress in phonetic fluency in reading-aloud tasks.
AB - Automatic second language (L2) speech fluency assessment has been one of the ultimate goals of several projects aiming at designing Computer-Assisted Pronunciation Training (CAPT) tools for L2 learners. Usually, three challenges must be tackled in order to solve the issues at stake: 1) Defining fluency from a threefold interdisciplinary perspective (acoustic and perceptual phonetics, computer science, L2 education); 2) Using a cost-effective algorithm; 3) Testing the procedure with actual learners’ data. Despite rapid technical developments in the field of automatic speech processing, the tools which are actually available for learners are still scarce, and most of them rely on automatic speech recognition (ASR). Moreover, most research on the topic is focusing on English as the target L2. Therefore, in this article, we address the following research questions: (a) is it possible to use a non-ASR-based low-level signal segmentation algorithm to predict human expert assessment of phonetic fluency in beginner Japanese learners of French in a text-reading task during the first stages of their learning? (b) if the answer to (a) is positive, then what are the best predictors of phonetic fluency among a set of available measures (see below for more details)? (c) is it possible to use this algorithm to monitor the evolution of phonetic fluency (and of its associated predictors) in these learners in a longitudinal study? As a first step, a corpus of French sentences read aloud by 12 Japanese learners of different proficiency levels in French was used to design a prediction system. The read-aloud speech data was perceptually annotated by three human experts on four dimensions: overall speech fluency, speech rate, regularity of speech rate, speech fluidity (i.e. smoothness of transitions between phones). Inter-rater agreement and reliability were high for all dimensions, and the average human ratings were compared with the scores provided by our prediction system. The results show strong correlations between human and automatic scores of speech rate and regularity of speech rate, and a weak correlation for speech fluidity. Automatic scores were finally combined together through a multiple linear regression model in order to predict overall speech fluency. The best model led to a correlation coefficient of .92 between automatic and human ratings, with a root-mean-square error of .38. In the second step of this study, a corpus of identical sentences read aloud four times over two years by 12 Japanese learners of French (after 4, 7, 12, and 19 months of French courses in Japan) was fed to the automatic system. The results show regular progress in overall speech fluency, which fits with the regular progress the Japanese learners under scrutiny were expected to make through their academic program in French at their university in Japan every semester. Our study suggests a positive answer to our first and third research questions, with speech rate as the best predictor to answer our second research question. In a pedagogical perspective, it seems that such a simple algorithm could be integrated in a CAPT tool to monitor learners’ progress in phonetic fluency in reading-aloud tasks.
KW - Assessment
KW - Automatic
KW - Fluency
KW - French
KW - Japanese
KW - Longitudinal
UR - http://www.scopus.com/inward/record.url?scp=85092398874&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092398874&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2020.10.001
DO - 10.1016/j.specom.2020.10.001
M3 - Article
AN - SCOPUS:85092398874
SN - 0167-6393
VL - 125
SP - 69
EP - 79
JO - Speech Communication
JF - Speech Communication
ER -