Solving Feature Sparseness in Text Classification using Core-Periphery Decomposition

Xia Cui, Sadamori Kojaku, Naoki Masuda, Danushka Bollegala

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

Feature sparseness is a problem common to cross-domain and short-text classification tasks. To overcome this feature sparseness problem, we propose a novel method based on graph decomposition to find candidate features for expanding feature vectors. Specifically, we first create a feature-relatedness graph, which is subsequently decomposed into core-periphery (CP) pairs and use the peripheries as the expansion candidates of the cores. We expand both training and test instances using the computed related features and use them to train a text classifier. We observe that prioritising features that are common to both training and test instances as cores during the CP decomposition to further improve the accuracy of text classification. We evaluate the proposed CP-decomposition-based feature expansion method on benchmark datasets for cross-domain sentiment classification and short-text classification. Our experimental results show that the proposed method consistently outperforms all baselines on short-text classification tasks, and perform competitively with pivot-based cross-domain sentiment classification methods.

本文言語English
ホスト出版物のタイトルNAACL HLT 2018 - Lexical and Computational Semantics, SEM 2018, Proceedings of the 7th Conference
編集者Malvina Nissim, Jonathan Berant, Alessandro Lenci
出版社Association for Computational Linguistics (ACL)
ページ255-264
ページ数10
ISBN(電子版)9781948087223
出版ステータスPublished - 2018
外部発表はい
イベント7th Joint Conference on Lexical and Computational Semantics, SEM 2018, co-located with NAACL HLT 2018 - New Orleans, United States
継続期間: 2018 6月 52018 6月 6

出版物シリーズ

名前NAACL HLT 2018 - Lexical and Computational Semantics, SEM 2018, Proceedings of the 7th Conference

Conference

Conference7th Joint Conference on Lexical and Computational Semantics, SEM 2018, co-located with NAACL HLT 2018
国/地域United States
CityNew Orleans
Period18/6/518/6/6

ASJC Scopus subject areas

  • 言語学および言語
  • 言語および言語学
  • コンピュータ サイエンスの応用

フィンガープリント

「Solving Feature Sparseness in Text Classification using Core-Periphery Decomposition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル