Abstract
We address the problem of predicting unseen words by relying on the organization of the vocabulary of a language as exhibited by paradigm tables. We present a pipeline to automatically produce paradigm tables from all the words contained in a text. We measure how many unseen words from an unseen test text can be predicted using the paradigm tables obtained from a training text. Experiments are carried out in several languages to compare the morphological richness of languages, and also the richness of the vocabulary of different authors.
Original language | English |
---|---|
Pages (from-to) | 51-60 |
Number of pages | 10 |
Journal | CEUR Workshop Proceedings |
Volume | 1815 |
Publication status | Published - 2016 |
Keywords
- Paradigm tables
- Unseen words
- Word predictability
ASJC Scopus subject areas
- Computer Science(all)