TY - JOUR
T1 - Analyzing spatial structure of IP addresses for detecting malicious websites
AU - Chiba, Daiki
AU - Tobe, Kazuhiro
AU - Mori, Tatsuya
AU - Goto, Shigeki
N1 - Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2013
Y1 - 2013
N2 - Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing one of several blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information for new malicious websites. To tackle this problem, this paper proposes a new scheme for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URLs and DNS records. While the strings that form URLs or DNS records are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-byte strings. In this paper, a lightweight and scalable detection scheme that is based on machine learning techniques is developed and evaluated. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. The effectiveness of our approach is validated by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our scheme can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.
AB - Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing one of several blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information for new malicious websites. To tackle this problem, this paper proposes a new scheme for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URLs and DNS records. While the strings that form URLs or DNS records are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-byte strings. In this paper, a lightweight and scalable detection scheme that is based on machine learning techniques is developed and evaluated. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. The effectiveness of our approach is validated by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our scheme can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.
KW - Computer viruses
KW - Drive-by-download attacks
KW - IP address
KW - Machine learning
KW - Web/Mail security
UR - http://www.scopus.com/inward/record.url?scp=84880158671&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84880158671&partnerID=8YFLogxK
U2 - 10.2197/ipsjjip.21.539
DO - 10.2197/ipsjjip.21.539
M3 - Article
AN - SCOPUS:84880158671
VL - 21
SP - 539
EP - 550
JO - Journal of Information Processing
JF - Journal of Information Processing
SN - 0387-5806
IS - 3
ER -