Extracting representative phrases from Wikipedia article sections

Shan Liu, Mizuho Iwaihara

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

Nowadays, Wikipedia has become one of the most important tools for searching information. Since its long articles are taking time to read, as well as section titles are sometimes too short to capture comprehensive summarization, we aim at extracting informative phrases that readers can refer to. Existing work on topic labelling works effectively and performs well on document categorization, but inadequate for granularity of detailed contents. Besides, existing keyphrase construction methods just perform well on very short texts. So we try to extract phrases which represent the target section content well among other sections within the same Wikipedia article. We also incorporate related external articles to increase candidate phrases. Then we apply FP-growth to obtain frequently co-occurring word sets. After that, we apply improved features which characterize desired properties from different aspects. Then, we apply gradient descent on our ranking function to obtain reasonable weighting on the features. For evaluation, we combine Normalized Google Distance (NGD) and nDCG to measure semantic relatedness between generated phrases and hidden original section titles.

本文言語English
ホスト出版物のタイトル2016 IEEE/ACIS 15th International Conference on Computer and Information Science, ICIS 2016 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ISBN(電子版)9781509008063
DOI
出版ステータスPublished - 2016 8 23
イベント15th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2016 - Okayama, Japan
継続期間: 2016 6 262016 6 29

Other

Other15th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2016
国/地域Japan
CityOkayama
Period16/6/2616/6/29

ASJC Scopus subject areas

  • コンピュータ サイエンス(全般)
  • エネルギー工学および電力技術
  • 制御と最適化

フィンガープリント

「Extracting representative phrases from Wikipedia article sections」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル