Graph-based word clustering using a web search engine

Yutaka Matsuo, Takeshi Sakaki, Kôki Uchiyama, Mitsuru Ishizuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

65 Citations (Scopus)

Abstract

Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts. Each pair of words is queried to a search engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word cooccurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a web directory and WordNet.

Original languageEnglish
Title of host publicationCOLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages542-550
Number of pages9
Publication statusPublished - 2006
Externally publishedYes
Event11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006 - Sydney, NSW
Duration: 2006 Jul 222006 Jul 23

Other

Other11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006
CitySydney, NSW
Period06/7/2206/7/23

    Fingerprint

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Matsuo, Y., Sakaki, T., Uchiyama, K., & Ishizuka, M. (2006). Graph-based word clustering using a web search engine. In COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 542-550)