Large-scale AMR Corpus with Re-generated Sentences: Domain Adaptive Pre-training on ACL Anthology Corpus

Ming Zhao, Yaling Wang, Yves Lepage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Meaning Representation (AMR) is a broad -coverage formalism for capturing the semantics of a given sentence. However, domain adaptation of AMR is limited by the shortage of annotated AMR graphs. In this paper, we explore and build a new large-scale dataset with 2.3 million AMRs in the domain of academic writing. Additionally, we prove that 30% of them are of similar quality as the annotated data in the downstream AMR-to-text task. Our results outperform previous graph-based approaches by over 11 BLEU points. We provide a pipeline that integrates automated generation and evaluation. This can help explore other AMR benchmarks.

Original languageEnglish
Title of host publicationProceedings - ICACSIS 2022
Subtitle of host publication14th International Conference on Advanced Computer Science and Information Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages19-24
Number of pages6
ISBN (Electronic)9781665489362
DOIs
Publication statusPublished - 2022
Event14th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2022 - Virtual, Online, Indonesia
Duration: 2022 Oct 12022 Oct 3

Publication series

NameProceedings - ICACSIS 2022: 14th International Conference on Advanced Computer Science and Information Systems

Conference

Conference14th International Conference on Advanced Computer Science and Information Systems, ICACSIS 2022
Country/TerritoryIndonesia
CityVirtual, Online
Period22/10/122/10/3

Keywords

  • Abstract Meaning Representation
  • Academic Writing
  • Domain Adaptation

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Large-scale AMR Corpus with Re-generated Sentences: Domain Adaptive Pre-training on ACL Anthology Corpus'. Together they form a unique fingerprint.

Cite this