Automating URL blacklist generation with similarity search approach

Bo Sun, Mitsuaki Akiyama, Takeshi Yagi, Mitsuhiro Hatada, Tatsuya Mori

    Research output: Contribution to journalArticle

    2 Citations (Scopus)

    Abstract

    Modern web users may encounter a browser security threat called drive-by-download attacks when surfing on the Internet. Drive-by-download attacks make use of exploit codes to take control of user's web browser. Many web users do not take such underlying threats into account while clicking URLs. URL Blacklist is one of the practical approaches to thwarting browser-targeted attacks. However, URL Blacklist cannot cope with previously unseen malicious URLs. Therefore, to make a URL blacklist effective, it is crucial to keep the URLs updated. Given these observations, we propose a framework called automatic blacklist generator (AutoBLG) that automates the collection of new malicious URLs by starting from a given existing URL blacklist. The primary mechanism of AutoBLG is expanding the search space of web pages while reducing the amount of URLs to be analyzed by applying several pre-filters such as similarity search to accelerate the process of generating blacklists. AutoBLG consists of three primary components: URL expansion, URL filtration, and URL verification. Through extensive analysis using a high-performance web client honeypot, we demonstrate that AutoBLG can successfully discover new and previously unknown drive-by-download URLs from the vast web space.

    Original languageEnglish
    Pages (from-to)873-882
    Number of pages10
    JournalIEICE Transactions on Information and Systems
    VolumeE99D
    Issue number4
    DOIs
    Publication statusPublished - 2016 Apr 1

    Fingerprint

    Websites
    World Wide Web
    Web browsers
    Internet

    Keywords

    • Drive-by-download
    • Machine learning
    • Search space
    • URL blacklist
    • Web client honeypot

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Software
    • Artificial Intelligence
    • Hardware and Architecture
    • Computer Vision and Pattern Recognition

    Cite this

    Automating URL blacklist generation with similarity search approach. / Sun, Bo; Akiyama, Mitsuaki; Yagi, Takeshi; Hatada, Mitsuhiro; Mori, Tatsuya.

    In: IEICE Transactions on Information and Systems, Vol. E99D, No. 4, 01.04.2016, p. 873-882.

    Research output: Contribution to journalArticle

    Sun, Bo ; Akiyama, Mitsuaki ; Yagi, Takeshi ; Hatada, Mitsuhiro ; Mori, Tatsuya. / Automating URL blacklist generation with similarity search approach. In: IEICE Transactions on Information and Systems. 2016 ; Vol. E99D, No. 4. pp. 873-882.
    @article{dafcb7fb9eba4930b94fb9a9f014d4e7,
    title = "Automating URL blacklist generation with similarity search approach",
    abstract = "Modern web users may encounter a browser security threat called drive-by-download attacks when surfing on the Internet. Drive-by-download attacks make use of exploit codes to take control of user's web browser. Many web users do not take such underlying threats into account while clicking URLs. URL Blacklist is one of the practical approaches to thwarting browser-targeted attacks. However, URL Blacklist cannot cope with previously unseen malicious URLs. Therefore, to make a URL blacklist effective, it is crucial to keep the URLs updated. Given these observations, we propose a framework called automatic blacklist generator (AutoBLG) that automates the collection of new malicious URLs by starting from a given existing URL blacklist. The primary mechanism of AutoBLG is expanding the search space of web pages while reducing the amount of URLs to be analyzed by applying several pre-filters such as similarity search to accelerate the process of generating blacklists. AutoBLG consists of three primary components: URL expansion, URL filtration, and URL verification. Through extensive analysis using a high-performance web client honeypot, we demonstrate that AutoBLG can successfully discover new and previously unknown drive-by-download URLs from the vast web space.",
    keywords = "Drive-by-download, Machine learning, Search space, URL blacklist, Web client honeypot",
    author = "Bo Sun and Mitsuaki Akiyama and Takeshi Yagi and Mitsuhiro Hatada and Tatsuya Mori",
    year = "2016",
    month = "4",
    day = "1",
    doi = "10.1587/transinf.2015ICP0027",
    language = "English",
    volume = "E99D",
    pages = "873--882",
    journal = "IEICE Transactions on Information and Systems",
    issn = "0916-8532",
    publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
    number = "4",

    }

    TY - JOUR

    T1 - Automating URL blacklist generation with similarity search approach

    AU - Sun, Bo

    AU - Akiyama, Mitsuaki

    AU - Yagi, Takeshi

    AU - Hatada, Mitsuhiro

    AU - Mori, Tatsuya

    PY - 2016/4/1

    Y1 - 2016/4/1

    N2 - Modern web users may encounter a browser security threat called drive-by-download attacks when surfing on the Internet. Drive-by-download attacks make use of exploit codes to take control of user's web browser. Many web users do not take such underlying threats into account while clicking URLs. URL Blacklist is one of the practical approaches to thwarting browser-targeted attacks. However, URL Blacklist cannot cope with previously unseen malicious URLs. Therefore, to make a URL blacklist effective, it is crucial to keep the URLs updated. Given these observations, we propose a framework called automatic blacklist generator (AutoBLG) that automates the collection of new malicious URLs by starting from a given existing URL blacklist. The primary mechanism of AutoBLG is expanding the search space of web pages while reducing the amount of URLs to be analyzed by applying several pre-filters such as similarity search to accelerate the process of generating blacklists. AutoBLG consists of three primary components: URL expansion, URL filtration, and URL verification. Through extensive analysis using a high-performance web client honeypot, we demonstrate that AutoBLG can successfully discover new and previously unknown drive-by-download URLs from the vast web space.

    AB - Modern web users may encounter a browser security threat called drive-by-download attacks when surfing on the Internet. Drive-by-download attacks make use of exploit codes to take control of user's web browser. Many web users do not take such underlying threats into account while clicking URLs. URL Blacklist is one of the practical approaches to thwarting browser-targeted attacks. However, URL Blacklist cannot cope with previously unseen malicious URLs. Therefore, to make a URL blacklist effective, it is crucial to keep the URLs updated. Given these observations, we propose a framework called automatic blacklist generator (AutoBLG) that automates the collection of new malicious URLs by starting from a given existing URL blacklist. The primary mechanism of AutoBLG is expanding the search space of web pages while reducing the amount of URLs to be analyzed by applying several pre-filters such as similarity search to accelerate the process of generating blacklists. AutoBLG consists of three primary components: URL expansion, URL filtration, and URL verification. Through extensive analysis using a high-performance web client honeypot, we demonstrate that AutoBLG can successfully discover new and previously unknown drive-by-download URLs from the vast web space.

    KW - Drive-by-download

    KW - Machine learning

    KW - Search space

    KW - URL blacklist

    KW - Web client honeypot

    UR - http://www.scopus.com/inward/record.url?scp=84962911290&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84962911290&partnerID=8YFLogxK

    U2 - 10.1587/transinf.2015ICP0027

    DO - 10.1587/transinf.2015ICP0027

    M3 - Article

    AN - SCOPUS:84962911290

    VL - E99D

    SP - 873

    EP - 882

    JO - IEICE Transactions on Information and Systems

    JF - IEICE Transactions on Information and Systems

    SN - 0916-8532

    IS - 4

    ER -