URL-based phishing detection using the entropy of non- A lphanumeric characters

Eint Sandi Aung, Hayato Yamana

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Phishing is a type of personal information theft in which phishers lure users to steal sensitive information. Phishing detection mechanisms using various techniques have been developed. Our hypothesis is that phishers create fake websites with as little information as possible in a webpage, which makes it difficult for content- A nd visual similarity-based detections by analyzing the webpage content. To overcome this, we focus on the use of Uniform Resource Locators (URLs) to detect phishing. Since previous work extracts specific special-character features, we assume that non- A lphanumeric (NAN) character distributions highly impact the performance of URL-based detection. We hence propose a new feature called the entropy of NAN characters for URL-based phishing detection. Experimental evaluation with balanced and imbalanced datasets shows 96% ROC AUC on the balanced dataset and 89% ROC AUC on the imbalanced dataset, which increases the ROC AUC as 5 to 6% from without adopting our proposed feature.

Original languageEnglish
Title of host publication21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings
EditorsMaria Indrawan-Santiago, Eric Pardede, Ivan Luiz Salvadori, Matthias Steinbauer, Ismail Khalil, Gabriele Anderst-Kotsis
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450371797
DOIs
Publication statusPublished - 2019 Dec 2
Event21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Munich, Germany
Duration: 2019 Dec 22019 Dec 4

Publication series

NameACM International Conference Proceeding Series

Conference

Conference21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019
CountryGermany
CityMunich
Period19/12/219/12/4

Keywords

  • Detection
  • Phishing
  • URL
  • Webpage

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'URL-based phishing detection using the entropy of non- A lphanumeric characters'. Together they form a unique fingerprint.

  • Cite this

    Aung, E. S., & Yamana, H. (2019). URL-based phishing detection using the entropy of non- A lphanumeric characters. In M. Indrawan-Santiago, E. Pardede, I. L. Salvadori, M. Steinbauer, I. Khalil, & G. Anderst-Kotsis (Eds.), 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3366030.3366064