ESPnet-ST IWSLT 2021 Offline Speech Translation System

Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper describes the ESPnet-ST group’s IWSLT 2021 submission in the offline speech translation track. This year we made various efforts on training data, architecture, and audio segmentation. On the data side, we investigated sequence-level knowledge distillation (SeqKD) for end-to-end (E2E) speech translation. Specifically, we used multi-referenced SeqKD from multiple teachers trained on different amounts of bitext. On the architecture side, we adopted the Conformer encoder and the Multi-Decoder architecture, which equips dedicated decoders for speech recognition and translation tasks in a unified encoder-decoder model and enables search in both source and target language spaces during inference. We also significantly improved audio segmentation by using the pyannote.audio toolkit and merging multiple short segments for long context modeling. Experimental evaluations showed that each of them contributed to large improvements in translation performance. Our best E2E system combined all the above techniques with model ensembling and achieved 31.4 BLEU on the 2-ref of tst2021 and 21.2 BLEU and 19.3 BLEU on the two single references of tst2021.

Original languageEnglish
Title of host publicationIWSLT 2021 - 18th International Conference on Spoken Language Translation, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages100-109
Number of pages10
ISBN (Electronic)9781954085749
Publication statusPublished - 2021
Externally publishedYes
Event18th International Conference on Spoken Language Translation, IWSLT 2021 - Virtual, Bangkok, Thailand
Duration: 2021 Aug 52021 Aug 6

Publication series

NameIWSLT 2021 - 18th International Conference on Spoken Language Translation, Proceedings

Conference

Conference18th International Conference on Spoken Language Translation, IWSLT 2021
Country/TerritoryThailand
CityVirtual, Bangkok
Period21/8/521/8/6

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'ESPnet-ST IWSLT 2021 Offline Speech Translation System'. Together they form a unique fingerprint.

Cite this