Contribution of Improved Character Embedding and Latent Posting Styles to Authorship Attribution of Short Texts

Wenjing Huang, Rui Su, Mizuho Iwaihara

研究成果: Conference contribution

1 被引用数 (Scopus)

抄録

Text contents generated by social networking platforms tend to be short. The problem of authorship attribution on short texts is to determine the author of a given collection of short posts, which is more challenging than that on long texts. Considering the textual characteristics of sparsity and using informal terms, we propose a method of learning text representations using a mixture of words and character n-grams, as input to the architecture of deep neural networks. In this way we make full use of user mentions and topic mentions in posts. We also focus on the textual implicit characteristics and incorporate ten latent posting styles into the models. Our experimental evaluations on tweets show a significant improvement over baselines. We achieve a best accuracy of 83.6%, which is 7.5% improvement over the state-of-the-art. Further experiments with increasing number of authors also demonstrate the superiority of our models.

本文言語English
ホスト出版物のタイトルWeb and Big Data - 4th International Joint Conference, APWeb-WAIM 2020, Proceedings
編集者Xin Wang, Rui Zhang, Young-Koo Lee, Le Sun, Yang-Sae Moon
出版社Springer Science and Business Media Deutschland GmbH
ページ261-269
ページ数9
ISBN(印刷版)9783030602895
DOI
出版ステータスPublished - 2020
イベント4th Asia-Pacific Web and Web-Age Information Management, Joint Conference on Web and Big Data, APWeb-WAIM 2020 - Tianjin, China
継続期間: 2020 9 182020 9 20

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12318 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference4th Asia-Pacific Web and Web-Age Information Management, Joint Conference on Web and Big Data, APWeb-WAIM 2020
国/地域China
CityTianjin
Period20/9/1820/9/20

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Contribution of Improved Character Embedding and Latent Posting Styles to Authorship Attribution of Short Texts」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル