TY - GEN
T1 - Utilizing latent posting style for authorship attribution on short texts
AU - Leepaisomboon, Patamawadee
AU - Iwaihara, Mizuho
N1 - Funding Information:
The authors are grateful for constrictive discussions by the members of Data Engineering Laboratory, IPS, Waseda University.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - Character n-grams and word n-grams are the most widely used features for authorship attribution on short texts. In this paper, we propose a new method which exploits latent posting styles estimated from authors' short texts. The new posting style features characterize each user's posting style through sentiment orientation and post length. Concise hidden posting styles are captured by Latent Dirichlet Allocation (LDA), where we consider two types of LDA models. Then the vectors of latent posting styles are concatenated with averaged word embeddings of character n-grams and word n-grams, to be used to train a support vector machine. Our results show that combining latent posting styles with the traditional features can improve the accuracy of authorship attribution up to 5.2%.
AB - Character n-grams and word n-grams are the most widely used features for authorship attribution on short texts. In this paper, we propose a new method which exploits latent posting styles estimated from authors' short texts. The new posting style features characterize each user's posting style through sentiment orientation and post length. Concise hidden posting styles are captured by Latent Dirichlet Allocation (LDA), where we consider two types of LDA models. Then the vectors of latent posting styles are concatenated with averaged word embeddings of character n-grams and word n-grams, to be used to train a support vector machine. Our results show that combining latent posting styles with the traditional features can improve the accuracy of authorship attribution up to 5.2%.
KW - Authorship attribution
KW - Latent dirichlet allocation
KW - Sentiment
KW - Short text
KW - Social network
KW - Support vector machine
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85075185948&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075185948&partnerID=8YFLogxK
U2 - 10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00184
DO - 10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00184
M3 - Conference contribution
AN - SCOPUS:85075185948
T3 - Proceedings - IEEE 17th International Conference on Dependable, Autonomic and Secure Computing, IEEE 17th International Conference on Pervasive Intelligence and Computing, IEEE 5th International Conference on Cloud and Big Data Computing, 4th Cyber Science and Technology Congress, DASC-PiCom-CBDCom-CyberSciTech 2019
SP - 1015
EP - 1022
BT - Proceedings - IEEE 17th International Conference on Dependable, Autonomic and Secure Computing, IEEE 17th International Conference on Pervasive Intelligence and Computing, IEEE 5th International Conference on Cloud and Big Data Computing, 4th Cyber Science and Technology Congress, DASC-PiCom-CBDCom-CyberSciTech 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th IEEE International Conference on Dependable, Autonomic and Secure Computing, IEEE 17th International Conference on Pervasive Intelligence and Computing, IEEE 5th International Conference on Cloud and Big Data Computing, 4th Cyber Science and Technology Congress, DASC-PiCom-CBDCom-CyberSciTech 2019
Y2 - 5 August 2019 through 8 August 2019
ER -