Integrating RoBERTa Fine-Tuning and User Writing Styles for Authorship Attribution of Short Texts

Xiangyu Wang, Mizuho Iwaihara*

*この研究の対応する著者

研究成果: Conference contribution

抄録

Authorship Attribution (AA) is a fundamental branch of text classification, aiming at identifying the authors of given texts. However, authorship attribution of short texts faces many challenges like short text, feature sparsity and non-standardization of casual words. Recent studies have shown that deep learning methods can greatly improve the accuracy of AA tasks, however they still represent user posts using a set of predefined features (e.g., word n-grams and character n-grams) and adopt text classification methods to solve this task. In this paper, we propose a hybrid model to solve author attribution of short texts. The first part is a pretrained language model based on RoBERTa to produce post representations that are aware of tweet-related stylistic features and their contextualities. The second part is a CNN model built on a number of feature embeddings to represent users' writing styles. Finally, we assemble these representations for final AA classification. Our experimental results show that our model on tweets shows the state-of-the-art result on a known tweet AA dataset.

本文言語English
ホスト出版物のタイトルWeb and Big Data - 5th International Joint Conference, APWeb-WAIM 2021, Proceedings
編集者Leong Hou U, Marc Spaniol, Yasushi Sakurai, Junying Chen
出版社Springer Science and Business Media Deutschland GmbH
ページ413-421
ページ数9
ISBN(印刷版)9783030858957
DOI
出版ステータスPublished - 2021
イベント5th International Joint Conference on Asia-Pacific Web and Web-Age Information Management, APWeb-WAIM 2021 - Guangzhou, China
継続期間: 2021 8 232021 8 25

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12858 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference5th International Joint Conference on Asia-Pacific Web and Web-Age Information Management, APWeb-WAIM 2021
国/地域China
CityGuangzhou
Period21/8/2321/8/25

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Integrating RoBERTa Fine-Tuning and User Writing Styles for Authorship Attribution of Short Texts」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル