Segmentation-based Phishing URL Detection

Eint Sandi Aung, Hayato Yamana

研究成果: Conference contribution

抄録

Uniform resource locators (URLs), used for referencing web pages, play a vital role in cyber fraud because of their complicated structure; phishers, or in other words, attackers, employ tricky by-passing techniques to deceive users. Thus, information extracted from URLs might indicate significant and meaningful patterns essential for phishing detection. To enhance the accuracy of URL-based phishing detection, we need an accurate word segmentation technique to split URLs correctly. However, in contrast to traditional word segmentation techniques used in natural language processing (NLP), URL segmentation requires meticulous attention, as tokenization, the process of turning meaningless data into meaningful data, is not as easy to apply as in NLP. In our work, we concentrate on URL segmentation to propose a novel tokenization method, named URL-Tokenizer, by combining the Bert tokenizer and WordSegment tokenizer, in addition to adopting character-level and word-level segmentations simultaneously. Our experimental evaluations in detecting the phishing URLs show that our proposed method achieves a high accuracy of 95.7% with a balanced dataset, and 97.7% with an imbalanced dataset, whereas baseline models achieved 85.4% with a balanced dataset and 85.1% with an imbalanced dataset.

本文言語English
ホスト出版物のタイトルProceedings - 2021 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2021
出版社Association for Computing Machinery
ページ550-556
ページ数7
ISBN(電子版)9781450391153
DOI
出版ステータスPublished - 2021 12月 14
イベント2021 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2021 - Virtual, Online, Australia
継続期間: 2021 12月 142021 12月 17

出版物シリーズ

名前ACM International Conference Proceeding Series

Conference

Conference2021 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2021
国/地域Australia
CityVirtual, Online
Period21/12/1421/12/17

ASJC Scopus subject areas

  • 人間とコンピュータの相互作用
  • コンピュータ ネットワークおよび通信
  • コンピュータ ビジョンおよびパターン認識
  • ソフトウェア

フィンガープリント

「Segmentation-based Phishing URL Detection」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル