Abstract
Retrieving similar sentences from a given collection of sentences is essential in a range of applications. In this work, we propose a novel method to retrieve several sentences that cover an input sentence in form and meaning with minimal redundancy, so as to enhance the overall coverage quality of the output sentences. We focus on the hierarchical granularity levels of sentence pieces, matching from common or similar n-grams to finer-grained words o subwords, using techniques from similar sentence retrieval and monolingual phrase alignment. Our method shows promising source and target coverage evaluation results when applied to parallel corpora. This shows the potential of our approach if integrated into an example-based machine translation system.
Original language | English |
---|---|
Pages | 436-445 |
Number of pages | 10 |
Publication status | Published - 2021 |
Event | 35th Pacific Asia Conference on Language, Information and Computation, PACLIC 2021 - Shanghai, China Duration: 2021 Nov 5 → 2021 Nov 7 |
Conference
Conference | 35th Pacific Asia Conference on Language, Information and Computation, PACLIC 2021 |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 21/11/5 → 21/11/7 |
ASJC Scopus subject areas
- Artificial Intelligence
- Human-Computer Interaction
- Linguistics and Language