SEQUENCE TRANSDUCTION WITH GRAPH-BASED SUPERVISION

Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if these rules are optimal and do lead to the best possible ASR results. In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, e.g., for studying different transition rules, implementing different transducer losses, or restricting alignments. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T, while also ensuring a strictly monotonic alignment, which will allow better optimization of the decoding procedure. For example, the proposed CTC-like transducer achieves an improvement of 4.8% on the test-other condition of LibriSpeech relative to an equivalent RNN-T based system.

Original languageEnglish
Title of host publication2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages146-150
Number of pages5
ISBN (Electronic)9781665405409
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore
Duration: 2022 May 232022 May 27

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2022-May
ISSN (Print)1520-6149

Conference

Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
Country/TerritorySingapore
CityVirtual, Online
Period22/5/2322/5/27

Keywords

  • ASR
  • CTC
  • GTC-T
  • RNN-T
  • transducer

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'SEQUENCE TRANSDUCTION WITH GRAPH-BASED SUPERVISION'. Together they form a unique fingerprint.

Cite this