End-to-end ASR to jointly predict transcriptions and linguistic annotations

Motoi Omachi, Yuya Fujita, Shinji Watanabe, Matthew Wiesner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We propose a Transformer-based sequence-to-sequence model for automatic speech recognition (ASR) capable of simultaneously transcribing and annotating audio with linguistic information such as phonemic transcripts or part-of-speech (POS) tags. Since linguistic information is important in natural language processing (NLP), the proposed ASR is especially useful for speech interface applications, including spoken dialogue systems and speech translation, which combine ASR and NLP. To produce linguistic annotations, we train the ASR system using modified training targets: each grapheme or multi-grapheme unit in the target transcript is followed by an aligned phoneme sequence and/or POS tag. Since our method has access to the underlying audio data, we can estimate linguistic annotations more accurately than pipeline approaches in which NLP-based methods are applied to a hypothesized ASR transcript. Experimental results on Japanese and English datasets show that the proposed ASR system is capable of simultaneously producing high-quality transcriptions and linguistic annotations.

Original languageEnglish
Title of host publicationNAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1861-1871
Number of pages11
ISBN (Electronic)9781954085466
Publication statusPublished - 2021
Externally publishedYes
Event2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021 - Virtual, Online
Duration: 2021 Jun 62021 Jun 11

Publication series

NameNAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021
CityVirtual, Online
Period21/6/621/6/11

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Software

Fingerprint

Dive into the research topics of 'End-to-end ASR to jointly predict transcriptions and linguistic annotations'. Together they form a unique fingerprint.

Cite this