Hybrid Phishing URL Detection Using Segmented Word Embedding

Eint Sandi Aung*, Hayato Yamana

*この研究の対応する著者

研究成果: Conference contribution

抄録

Phishing is a type of cybercrime committed by attackers to steal sensitive information. This paper focuses on URL-based phishing detection, i.e., detecting phishing webpages by analyzing the URL. Previously proposed methods tackled this problem; however, insufficient word tokenization of URLs arises unknown words, which degrades the detection accuracy. To solve the unknown-word problem, we propose a new tokenization algorithm, called URL-Tokenizer, which integrates BERT and WordSegment tokenizers, besides utilizing 24 NLP features. Then, we adopt the URL-Tokenizer to the DNN-CNN hybrid model to leverage the detection accuracy. Our experiment using the Ebbu2017 dataset confirmed that our word-DNN-CNN achieves an AUC of 99.89% compared to the state-of-the-art DNN-BiLSTM with an AUC of 98.78%.

本文言語English
ホスト出版物のタイトルInformation Integration and Web Intelligence - 24th International Conference, iiWAS 2022, Proceedings
編集者Eric Pardede, Pari Delir Haghighi, Ismail Khalil, Gabriele Kotsis
出版社Springer Science and Business Media Deutschland GmbH
ページ507-518
ページ数12
ISBN(印刷版)9783031210464
DOI
出版ステータスPublished - 2022
イベント24th International Conference on Information Integration and Web Intelligence, iiWAS 2022, held in conjunction with the 20th International Conference on Advances in Mobile Computing and Multimedia Intelligence, MoMM 2022 - Virtual, Online
継続期間: 2022 11月 282022 11月 30

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13635 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference24th International Conference on Information Integration and Web Intelligence, iiWAS 2022, held in conjunction with the 20th International Conference on Advances in Mobile Computing and Multimedia Intelligence, MoMM 2022
CityVirtual, Online
Period22/11/2822/11/30

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Hybrid Phishing URL Detection Using Segmented Word Embedding」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル