Investigating the impact of automated transcripts on non-native speakers' listening comprehension

Xun Cao, Naomi Yamashita, Toru Ishida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Real-Time transcripts generated by automatic speech recognition (ASR) technologies hold potential to facilitate non-native speakers' (NNSs) listening comprehension. While introducing another modality (i.e., ASR transcripts) to NNSs provides supplemental information to understand speech, it also runs the risk of overwhelming them with excessive information. The aim of this paper is to understand the advantages and disadvantages of presenting ASR transcripts to NNSs and to study how such transcripts affect listening experiences. To explore these issues, we conducted a laboratory experiment with 20 NNSs who engaged in two listening tasks in different conditions: Audio only and audio+ASR transcripts. In each condition, the participants described the comprehension problems they encountered while listening. From the analysis, we found that ASR transcripts helped NNSs solve certain problems (e.g., "do not recognize words they know"), but imperfect ASR transcripts (e.g., errors and no punctuation) sometimes confused them and even generated new problems. Furthermore, post-Task interviews and gaze analysis of the participants revealed that NNSs did not have enough time to fully exploit the transcripts. For example, NNSs had difficulty shifting between multimodal contents. Based on our findings, we discuss the implications for designing better multimodal interfaces for NNSs.

Original languageEnglish
Title of host publicationICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction
EditorsCatherine Pelachaud, Yukiko I. Nakano, Toyoaki Nishida, Carlos Busso, Louis-Philippe Morency, Elisabeth Andre
PublisherAssociation for Computing Machinery, Inc
Pages121-128
Number of pages8
ISBN (Electronic)9781450345569
DOIs
Publication statusPublished - 2016 Oct 31
Externally publishedYes
Event18th ACM International Conference on Multimodal Interaction, ICMI 2016 - Tokyo, Japan
Duration: 2016 Nov 122016 Nov 16

Publication series

NameICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction

Conference

Conference18th ACM International Conference on Multimodal Interaction, ICMI 2016
CountryJapan
CityTokyo
Period16/11/1216/11/16

Fingerprint

Speech recognition
Experiments

Keywords

  • Automatic speech recognition (ASR) transcripts
  • Eye gaze
  • Listening comprehension problems
  • Non-native speakers (NNSS)

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Cite this

Cao, X., Yamashita, N., & Ishida, T. (2016). Investigating the impact of automated transcripts on non-native speakers' listening comprehension. In C. Pelachaud, Y. I. Nakano, T. Nishida, C. Busso, L-P. Morency, & E. Andre (Eds.), ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction (pp. 121-128). (ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction). Association for Computing Machinery, Inc. https://doi.org/10.1145/2993148.2993161

Investigating the impact of automated transcripts on non-native speakers' listening comprehension. / Cao, Xun; Yamashita, Naomi; Ishida, Toru.

ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction. ed. / Catherine Pelachaud; Yukiko I. Nakano; Toyoaki Nishida; Carlos Busso; Louis-Philippe Morency; Elisabeth Andre. Association for Computing Machinery, Inc, 2016. p. 121-128 (ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cao, X, Yamashita, N & Ishida, T 2016, Investigating the impact of automated transcripts on non-native speakers' listening comprehension. in C Pelachaud, YI Nakano, T Nishida, C Busso, L-P Morency & E Andre (eds), ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction. ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction, Association for Computing Machinery, Inc, pp. 121-128, 18th ACM International Conference on Multimodal Interaction, ICMI 2016, Tokyo, Japan, 16/11/12. https://doi.org/10.1145/2993148.2993161
Cao X, Yamashita N, Ishida T. Investigating the impact of automated transcripts on non-native speakers' listening comprehension. In Pelachaud C, Nakano YI, Nishida T, Busso C, Morency L-P, Andre E, editors, ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction. Association for Computing Machinery, Inc. 2016. p. 121-128. (ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction). https://doi.org/10.1145/2993148.2993161
Cao, Xun ; Yamashita, Naomi ; Ishida, Toru. / Investigating the impact of automated transcripts on non-native speakers' listening comprehension. ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction. editor / Catherine Pelachaud ; Yukiko I. Nakano ; Toyoaki Nishida ; Carlos Busso ; Louis-Philippe Morency ; Elisabeth Andre. Association for Computing Machinery, Inc, 2016. pp. 121-128 (ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction).
@inproceedings{cca94d8834104664ba712bded08f8f72,
title = "Investigating the impact of automated transcripts on non-native speakers' listening comprehension",
abstract = "Real-Time transcripts generated by automatic speech recognition (ASR) technologies hold potential to facilitate non-native speakers' (NNSs) listening comprehension. While introducing another modality (i.e., ASR transcripts) to NNSs provides supplemental information to understand speech, it also runs the risk of overwhelming them with excessive information. The aim of this paper is to understand the advantages and disadvantages of presenting ASR transcripts to NNSs and to study how such transcripts affect listening experiences. To explore these issues, we conducted a laboratory experiment with 20 NNSs who engaged in two listening tasks in different conditions: Audio only and audio+ASR transcripts. In each condition, the participants described the comprehension problems they encountered while listening. From the analysis, we found that ASR transcripts helped NNSs solve certain problems (e.g., {"}do not recognize words they know{"}), but imperfect ASR transcripts (e.g., errors and no punctuation) sometimes confused them and even generated new problems. Furthermore, post-Task interviews and gaze analysis of the participants revealed that NNSs did not have enough time to fully exploit the transcripts. For example, NNSs had difficulty shifting between multimodal contents. Based on our findings, we discuss the implications for designing better multimodal interfaces for NNSs.",
keywords = "Automatic speech recognition (ASR) transcripts, Eye gaze, Listening comprehension problems, Non-native speakers (NNSS)",
author = "Xun Cao and Naomi Yamashita and Toru Ishida",
year = "2016",
month = "10",
day = "31",
doi = "10.1145/2993148.2993161",
language = "English",
series = "ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction",
publisher = "Association for Computing Machinery, Inc",
pages = "121--128",
editor = "Catherine Pelachaud and Nakano, {Yukiko I.} and Toyoaki Nishida and Carlos Busso and Louis-Philippe Morency and Elisabeth Andre",
booktitle = "ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction",

}

TY - GEN

T1 - Investigating the impact of automated transcripts on non-native speakers' listening comprehension

AU - Cao, Xun

AU - Yamashita, Naomi

AU - Ishida, Toru

PY - 2016/10/31

Y1 - 2016/10/31

N2 - Real-Time transcripts generated by automatic speech recognition (ASR) technologies hold potential to facilitate non-native speakers' (NNSs) listening comprehension. While introducing another modality (i.e., ASR transcripts) to NNSs provides supplemental information to understand speech, it also runs the risk of overwhelming them with excessive information. The aim of this paper is to understand the advantages and disadvantages of presenting ASR transcripts to NNSs and to study how such transcripts affect listening experiences. To explore these issues, we conducted a laboratory experiment with 20 NNSs who engaged in two listening tasks in different conditions: Audio only and audio+ASR transcripts. In each condition, the participants described the comprehension problems they encountered while listening. From the analysis, we found that ASR transcripts helped NNSs solve certain problems (e.g., "do not recognize words they know"), but imperfect ASR transcripts (e.g., errors and no punctuation) sometimes confused them and even generated new problems. Furthermore, post-Task interviews and gaze analysis of the participants revealed that NNSs did not have enough time to fully exploit the transcripts. For example, NNSs had difficulty shifting between multimodal contents. Based on our findings, we discuss the implications for designing better multimodal interfaces for NNSs.

AB - Real-Time transcripts generated by automatic speech recognition (ASR) technologies hold potential to facilitate non-native speakers' (NNSs) listening comprehension. While introducing another modality (i.e., ASR transcripts) to NNSs provides supplemental information to understand speech, it also runs the risk of overwhelming them with excessive information. The aim of this paper is to understand the advantages and disadvantages of presenting ASR transcripts to NNSs and to study how such transcripts affect listening experiences. To explore these issues, we conducted a laboratory experiment with 20 NNSs who engaged in two listening tasks in different conditions: Audio only and audio+ASR transcripts. In each condition, the participants described the comprehension problems they encountered while listening. From the analysis, we found that ASR transcripts helped NNSs solve certain problems (e.g., "do not recognize words they know"), but imperfect ASR transcripts (e.g., errors and no punctuation) sometimes confused them and even generated new problems. Furthermore, post-Task interviews and gaze analysis of the participants revealed that NNSs did not have enough time to fully exploit the transcripts. For example, NNSs had difficulty shifting between multimodal contents. Based on our findings, we discuss the implications for designing better multimodal interfaces for NNSs.

KW - Automatic speech recognition (ASR) transcripts

KW - Eye gaze

KW - Listening comprehension problems

KW - Non-native speakers (NNSS)

UR - http://www.scopus.com/inward/record.url?scp=85016585467&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016585467&partnerID=8YFLogxK

U2 - 10.1145/2993148.2993161

DO - 10.1145/2993148.2993161

M3 - Conference contribution

AN - SCOPUS:85016585467

T3 - ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction

SP - 121

EP - 128

BT - ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction

A2 - Pelachaud, Catherine

A2 - Nakano, Yukiko I.

A2 - Nishida, Toyoaki

A2 - Busso, Carlos

A2 - Morency, Louis-Philippe

A2 - Andre, Elisabeth

PB - Association for Computing Machinery, Inc

ER -