Abstract
To improve the segmentation velocity and storage efficiency of the Chinese word segmentation algorithm, this paper proposes a characteristic matching algorithm based on pair coding. The characteristic value is extracted from the Chinese character position. This method can support fuzzy matching and don't need match multi-character Chinese words, so the characteristic value extraction is extracted from the adjacent Chinese character position. In addition, the data compression method can contribute to reduce storage space and improve the performance of Chinese word segmentation.
Original language | English |
---|---|
Pages (from-to) | 526-530 |
Number of pages | 5 |
Journal | Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology |
Volume | 38 |
Issue number | 4 |
Publication status | Published - 2014 Aug 30 |
Externally published | Yes |
Keywords
- Characteristic matching
- Characteristic value
- Chinese word segmentation
- Data compression
- Fuzzy matching
- Hash
- Pair coding
ASJC Scopus subject areas
- Engineering(all)