TY - GEN
T1 - VocalTurk
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
AU - Saito, Susumu
AU - Ide, Yuta
AU - Nakano, Teppei
AU - Ogawa, Tetsuji
N1 - Publisher Copyright:
Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - This paper presents VocalTurk, a feasibility study of crowdsourced speaker identification based on our worker dataset collected in Amazon Mechanical Turk. Crowdsourced data labeling has already been acknowledged in speech data processing nowadays, but empirical analysis that answer to common questions such as “how accurate are workers capable of labeling speech data?” and “what does a good speech-labeling microtask interface look like?” still remain underexplored, which would limit the quality and scale of the dataset collection. Focusing on the speaker identification task in particular, we thus conducted two studies in Amazon Mechanical Turk: i) hired 3,800+ unique workers to test their performances and confidences in giving answers to voice pair comparison tasks, and ii) additionally assigned more-difficult tasks of 1-vs-N voice set comparisons to 350+ top-scoring workers to test their accuracy-speed performances across patterns of N = {1, 3, 5}. The results revealed some positive findings that would motivate speech researchers toward crowdsourced data labeling, such as that the top-scoring workers were capable of giving labels to our voice comparison pairs with 99% accuracy after majority voting, as well as they were even capable of batch-labeling which significantly shortened up to 34% of their completion time but still with no statistically-significant degradation in accuracy.
AB - This paper presents VocalTurk, a feasibility study of crowdsourced speaker identification based on our worker dataset collected in Amazon Mechanical Turk. Crowdsourced data labeling has already been acknowledged in speech data processing nowadays, but empirical analysis that answer to common questions such as “how accurate are workers capable of labeling speech data?” and “what does a good speech-labeling microtask interface look like?” still remain underexplored, which would limit the quality and scale of the dataset collection. Focusing on the speaker identification task in particular, we thus conducted two studies in Amazon Mechanical Turk: i) hired 3,800+ unique workers to test their performances and confidences in giving answers to voice pair comparison tasks, and ii) additionally assigned more-difficult tasks of 1-vs-N voice set comparisons to 350+ top-scoring workers to test their accuracy-speed performances across patterns of N = {1, 3, 5}. The results revealed some positive findings that would motivate speech researchers toward crowdsourced data labeling, such as that the top-scoring workers were capable of giving labels to our voice comparison pairs with 99% accuracy after majority voting, as well as they were even capable of batch-labeling which significantly shortened up to 34% of their completion time but still with no statistically-significant degradation in accuracy.
KW - Crowdsourcing
KW - Labeling
KW - Voice comparison
UR - http://www.scopus.com/inward/record.url?scp=85119207674&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119207674&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-464
DO - 10.21437/Interspeech.2021-464
M3 - Conference contribution
AN - SCOPUS:85119207674
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 2932
EP - 2936
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
Y2 - 30 August 2021 through 3 September 2021
ER -