Graph-based word clustering using a web search engine

Yutaka Matsuo*, Takeshi Sakaki, Kôki Uchiyama, Mitsuru Ishizuka

*この研究の対応する著者

研究成果: Conference contribution

78 被引用数 (Scopus)

抄録

Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts. Each pair of words is queried to a search engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word cooccurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a web directory and WordNet.

本文言語English
ホスト出版物のタイトルCOLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
ページ542-550
ページ数9
出版ステータスPublished - 2006
外部発表はい
イベント11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006 - Sydney, NSW
継続期間: 2006 7月 222006 7月 23

Other

Other11th Conference on Empirical Methods in Natural Language Proceessing, EMNLP 2006, Held in Conjunction with COLING/ACL 2006
CitySydney, NSW
Period06/7/2206/7/23

ASJC Scopus subject areas

  • 計算理論と計算数学
  • コンピュータ サイエンスの応用
  • 情報システム

フィンガープリント

「Graph-based word clustering using a web search engine」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル