Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue

Mao Saeki, Yoichi Matsuyama, Satoshi Kobashikawa, Tetsuji Ogawa, Tetsunori Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper analyzes the effectiveness of different modalities in automated speaking proficiency scoring in an online dialogue task of non-native speakers. Conversational competence of a language learner can be assessed through the use of multimodal behaviors such as speech content, prosody, and visual cues. Although lexical and acoustic features have been widely studied, there has been no study on the usage of visual features, such as facial expressions and eye gaze. To build an automated speaking proficiency scoring system using multi-modal features, we first constructed an online video interview dataset of 210 Japanese English-learners with annotations of their speaking proficiency. We then examined two approaches for incorporating visual features and compared the effectiveness of each modality. Results show the end-to-end approach with deep neural networks achieves a higher correlation with human scoring than one with handcrafted features. Modalities are effective in the order of lexical, acoustic, and visual features.

Original languageEnglish
Title of host publication2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages629-635
Number of pages7
ISBN (Electronic)9781728170664
DOIs
Publication statusPublished - 2021 Jan 19
Event2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Shenzhen, China
Duration: 2021 Jan 192021 Jan 22

Publication series

Name2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

Conference

Conference2021 IEEE Spoken Language Technology Workshop, SLT 2021
Country/TerritoryChina
CityVirtual, Shenzhen
Period21/1/1921/1/22

Keywords

  • BERT (Bidirectional Encoder Representations from Transformers)
  • Speaking proficiency assessment
  • multi-modal machine learning

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Analysis of Multimodal Features for Speaking Proficiency Scoring in an Interview Dialogue'. Together they form a unique fingerprint.

Cite this