Chinese word segmentation algorithm based on pair coding

Bingyi Zhang, Bo Wei, Jiancheng Chen, Jie Wei, Guozheng Rao

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

To improve the segmentation velocity and storage efficiency of the Chinese word segmentation algorithm, this paper proposes a characteristic matching algorithm based on pair coding. The characteristic value is extracted from the Chinese character position. This method can support fuzzy matching and don't need match multi-character Chinese words, so the characteristic value extraction is extracted from the adjacent Chinese character position. In addition, the data compression method can contribute to reduce storage space and improve the performance of Chinese word segmentation.

Original languageEnglish
Pages (from-to)526-530
Number of pages5
JournalNanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology
Volume38
Issue number4
Publication statusPublished - 2014 Aug 30
Externally publishedYes

Fingerprint

Data compression

Keywords

  • Characteristic matching
  • Characteristic value
  • Chinese word segmentation
  • Data compression
  • Fuzzy matching
  • Hash
  • Pair coding

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Chinese word segmentation algorithm based on pair coding. / Zhang, Bingyi; Wei, Bo; Chen, Jiancheng; Wei, Jie; Rao, Guozheng.

In: Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology, Vol. 38, No. 4, 30.08.2014, p. 526-530.

Research output: Contribution to journalArticle

Zhang, Bingyi ; Wei, Bo ; Chen, Jiancheng ; Wei, Jie ; Rao, Guozheng. / Chinese word segmentation algorithm based on pair coding. In: Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology. 2014 ; Vol. 38, No. 4. pp. 526-530.
@article{e6fced743c1e43e2b02c6c8e95f39c2e,
title = "Chinese word segmentation algorithm based on pair coding",
abstract = "To improve the segmentation velocity and storage efficiency of the Chinese word segmentation algorithm, this paper proposes a characteristic matching algorithm based on pair coding. The characteristic value is extracted from the Chinese character position. This method can support fuzzy matching and don't need match multi-character Chinese words, so the characteristic value extraction is extracted from the adjacent Chinese character position. In addition, the data compression method can contribute to reduce storage space and improve the performance of Chinese word segmentation.",
keywords = "Characteristic matching, Characteristic value, Chinese word segmentation, Data compression, Fuzzy matching, Hash, Pair coding",
author = "Bingyi Zhang and Bo Wei and Jiancheng Chen and Jie Wei and Guozheng Rao",
year = "2014",
month = "8",
day = "30",
language = "English",
volume = "38",
pages = "526--530",
journal = "Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology",
issn = "1005-9830",
publisher = "Nanjing University of Science and Technology",
number = "4",

}

TY - JOUR

T1 - Chinese word segmentation algorithm based on pair coding

AU - Zhang, Bingyi

AU - Wei, Bo

AU - Chen, Jiancheng

AU - Wei, Jie

AU - Rao, Guozheng

PY - 2014/8/30

Y1 - 2014/8/30

N2 - To improve the segmentation velocity and storage efficiency of the Chinese word segmentation algorithm, this paper proposes a characteristic matching algorithm based on pair coding. The characteristic value is extracted from the Chinese character position. This method can support fuzzy matching and don't need match multi-character Chinese words, so the characteristic value extraction is extracted from the adjacent Chinese character position. In addition, the data compression method can contribute to reduce storage space and improve the performance of Chinese word segmentation.

AB - To improve the segmentation velocity and storage efficiency of the Chinese word segmentation algorithm, this paper proposes a characteristic matching algorithm based on pair coding. The characteristic value is extracted from the Chinese character position. This method can support fuzzy matching and don't need match multi-character Chinese words, so the characteristic value extraction is extracted from the adjacent Chinese character position. In addition, the data compression method can contribute to reduce storage space and improve the performance of Chinese word segmentation.

KW - Characteristic matching

KW - Characteristic value

KW - Chinese word segmentation

KW - Data compression

KW - Fuzzy matching

KW - Hash

KW - Pair coding

UR - http://www.scopus.com/inward/record.url?scp=84907092884&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907092884&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84907092884

VL - 38

SP - 526

EP - 530

JO - Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology

JF - Nanjing Li Gong Daxue Xuebao/Journal of Nanjing University of Science and Technology

SN - 1005-9830

IS - 4

ER -