Abstract
We present a sub-sentential alignment method that extracts high quality multi-word alignments from sentence-aligned multilingual parallel corpora. Unlike other methods, it exploits low frequency terms, which makes it highly scalable. As it relies on alingual concepts, it can process any number of languages at once. Experiments have shown that it is competitive with state-of-the-art methods.
Original language | English |
---|---|
Pages (from-to) | 214-218 |
Number of pages | 5 |
Journal | International Conference Recent Advances in Natural Language Processing, RANLP |
Publication status | Published - 2009 |
Event | International Conference on Recent Advances in Natural Language Processing, RANLP-2009 - Borovets, Bulgaria Duration: 2009 Sep 14 → 2009 Sep 16 |
Keywords
- Hapax
- Low frequency term
- Sampling
- Sub-sentential alignment
ASJC Scopus subject areas
- Software
- Computer Science Applications
- Artificial Intelligence
- Electrical and Electronic Engineering