Analyzing spatial structure of IP addresses for detecting malicious websites

Daiki Chiba, Kazuhiro Tobe, Tatsuya Mori, Shigeki Goto

    Research output: Contribution to journalArticle

    1 Citation (Scopus)

    Abstract

    Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing one of several blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information for new malicious websites. To tackle this problem, this paper proposes a new scheme for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URLs and DNS records. While the strings that form URLs or DNS records are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-byte strings. In this paper, a lightweight and scalable detection scheme that is based on machine learning techniques is developed and evaluated. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. The effectiveness of our approach is validated by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our scheme can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.

    Original languageEnglish
    Pages (from-to)539-550
    Number of pages12
    JournalJournal of Information Processing
    Volume21
    Issue number3
    DOIs
    Publication statusPublished - 2013

    Fingerprint

    Websites
    World Wide Web
    Learning systems
    Malware

    Keywords

    • Computer viruses
    • Drive-by-download attacks
    • IP address
    • Machine learning
    • Web/Mail security

    ASJC Scopus subject areas

    • Computer Science(all)

    Cite this

    Analyzing spatial structure of IP addresses for detecting malicious websites. / Chiba, Daiki; Tobe, Kazuhiro; Mori, Tatsuya; Goto, Shigeki.

    In: Journal of Information Processing, Vol. 21, No. 3, 2013, p. 539-550.

    Research output: Contribution to journalArticle

    Chiba, Daiki ; Tobe, Kazuhiro ; Mori, Tatsuya ; Goto, Shigeki. / Analyzing spatial structure of IP addresses for detecting malicious websites. In: Journal of Information Processing. 2013 ; Vol. 21, No. 3. pp. 539-550.
    @article{50e3cdf6fec24cddb1f42b0f62404d7d,
    title = "Analyzing spatial structure of IP addresses for detecting malicious websites",
    abstract = "Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing one of several blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information for new malicious websites. To tackle this problem, this paper proposes a new scheme for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URLs and DNS records. While the strings that form URLs or DNS records are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-byte strings. In this paper, a lightweight and scalable detection scheme that is based on machine learning techniques is developed and evaluated. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. The effectiveness of our approach is validated by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our scheme can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.",
    keywords = "Computer viruses, Drive-by-download attacks, IP address, Machine learning, Web/Mail security",
    author = "Daiki Chiba and Kazuhiro Tobe and Tatsuya Mori and Shigeki Goto",
    year = "2013",
    doi = "10.2197/ipsjjip.21.539",
    language = "English",
    volume = "21",
    pages = "539--550",
    journal = "Journal of Information Processing",
    issn = "0387-5806",
    publisher = "Information Processing Society of Japan",
    number = "3",

    }

    TY - JOUR

    T1 - Analyzing spatial structure of IP addresses for detecting malicious websites

    AU - Chiba, Daiki

    AU - Tobe, Kazuhiro

    AU - Mori, Tatsuya

    AU - Goto, Shigeki

    PY - 2013

    Y1 - 2013

    N2 - Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing one of several blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information for new malicious websites. To tackle this problem, this paper proposes a new scheme for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URLs and DNS records. While the strings that form URLs or DNS records are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-byte strings. In this paper, a lightweight and scalable detection scheme that is based on machine learning techniques is developed and evaluated. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. The effectiveness of our approach is validated by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our scheme can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.

    AB - Web-based malware attacks have become one of the most serious threats that need to be addressed urgently. Several approaches that have attracted attention as promising ways of detecting such malware include employing one of several blacklists. However, these conventional approaches often fail to detect new attacks owing to the versatility of malicious websites. Thus, it is difficult to maintain up-to-date blacklists with information for new malicious websites. To tackle this problem, this paper proposes a new scheme for detecting malicious websites using the characteristics of IP addresses. Our approach leverages the empirical observation that IP addresses are more stable than other metrics such as URLs and DNS records. While the strings that form URLs or DNS records are highly variable, IP addresses are less variable, i.e., IPv4 address space is mapped onto 4-byte strings. In this paper, a lightweight and scalable detection scheme that is based on machine learning techniques is developed and evaluated. The aim of this study is not to provide a single solution that effectively detects web-based malware but to develop a technique that compensates the drawbacks of existing approaches. The effectiveness of our approach is validated by using real IP address data from existing blacklists and real traffic data on a campus network. The results demonstrate that our scheme can expand the coverage/accuracy of existing blacklists and also detect unknown malicious websites that are not covered by conventional approaches.

    KW - Computer viruses

    KW - Drive-by-download attacks

    KW - IP address

    KW - Machine learning

    KW - Web/Mail security

    UR - http://www.scopus.com/inward/record.url?scp=84880158671&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84880158671&partnerID=8YFLogxK

    U2 - 10.2197/ipsjjip.21.539

    DO - 10.2197/ipsjjip.21.539

    M3 - Article

    VL - 21

    SP - 539

    EP - 550

    JO - Journal of Information Processing

    JF - Journal of Information Processing

    SN - 0387-5806

    IS - 3

    ER -