Building a scalable web tracking detection system

Implementation and the empirical study

Yumehisa Haga, Yuta Takata, Mitsuaki Akiyama, Tatsuya Mori

    Research output: Contribution to journalArticle

    Abstract

    Web tracking is widely used as a means to track user's behavior on websites. While web tracking provides new opportunities of e-commerce, it also includes certain risks such as privacy infringement. Therefore, analyzing such risks in the wild Internet is meaningful to make the user's privacy transparent. This work aims to understand how the web tracking has been adopted to prominent websites. We also aim to understand their resilience to the ad-blocking techniques. Web tracking-enabled websites collect the information called the web browser fingerprints, which can be used to identify users. We develop a scalable system that can detect fingerprinting by using both dynamic and static analyses. If a tracking site makes use of many and strong fingerprints, the site is likely resilient to the ad-blocking techniques. We also analyze the connectivity of the third-party tracking sites, which are linked from multiple websites. The link analysis allows us to extract the group of associated tracking sites and understand how influential these sites are. Based on the analyses of 100,000 websites, we quantify the potential risks of the web tracking-enabled websites. We reveal that there are 226 websites that adopt fingerprints that cannot be detected with the most of off-the-shelf anti-tracking tools. We also reveal that a major, resilient third-party tracking site is linked to 50.0 % of the top-100,000 popular websites.

    Original languageEnglish
    Pages (from-to)1663-1670
    Number of pages8
    JournalIEICE Transactions on Information and Systems
    VolumeE100D
    Issue number8
    DOIs
    Publication statusPublished - 2017 Aug 1

    Fingerprint

    Websites
    World Wide Web
    Web browsers
    Internet

    Keywords

    • Web browser fingerprint
    • Web tracking

    ASJC Scopus subject areas

    • Software
    • Hardware and Architecture
    • Computer Vision and Pattern Recognition
    • Artificial Intelligence
    • Electrical and Electronic Engineering

    Cite this

    Building a scalable web tracking detection system : Implementation and the empirical study. / Haga, Yumehisa; Takata, Yuta; Akiyama, Mitsuaki; Mori, Tatsuya.

    In: IEICE Transactions on Information and Systems, Vol. E100D, No. 8, 01.08.2017, p. 1663-1670.

    Research output: Contribution to journalArticle

    Haga, Yumehisa ; Takata, Yuta ; Akiyama, Mitsuaki ; Mori, Tatsuya. / Building a scalable web tracking detection system : Implementation and the empirical study. In: IEICE Transactions on Information and Systems. 2017 ; Vol. E100D, No. 8. pp. 1663-1670.
    @article{cb565f98efff41ab97535ebf7cc15822,
    title = "Building a scalable web tracking detection system: Implementation and the empirical study",
    abstract = "Web tracking is widely used as a means to track user's behavior on websites. While web tracking provides new opportunities of e-commerce, it also includes certain risks such as privacy infringement. Therefore, analyzing such risks in the wild Internet is meaningful to make the user's privacy transparent. This work aims to understand how the web tracking has been adopted to prominent websites. We also aim to understand their resilience to the ad-blocking techniques. Web tracking-enabled websites collect the information called the web browser fingerprints, which can be used to identify users. We develop a scalable system that can detect fingerprinting by using both dynamic and static analyses. If a tracking site makes use of many and strong fingerprints, the site is likely resilient to the ad-blocking techniques. We also analyze the connectivity of the third-party tracking sites, which are linked from multiple websites. The link analysis allows us to extract the group of associated tracking sites and understand how influential these sites are. Based on the analyses of 100,000 websites, we quantify the potential risks of the web tracking-enabled websites. We reveal that there are 226 websites that adopt fingerprints that cannot be detected with the most of off-the-shelf anti-tracking tools. We also reveal that a major, resilient third-party tracking site is linked to 50.0 {\%} of the top-100,000 popular websites.",
    keywords = "Web browser fingerprint, Web tracking",
    author = "Yumehisa Haga and Yuta Takata and Mitsuaki Akiyama and Tatsuya Mori",
    year = "2017",
    month = "8",
    day = "1",
    doi = "10.1587/transinf.2016ICP0020",
    language = "English",
    volume = "E100D",
    pages = "1663--1670",
    journal = "IEICE Transactions on Information and Systems",
    issn = "0916-8532",
    publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
    number = "8",

    }

    TY - JOUR

    T1 - Building a scalable web tracking detection system

    T2 - Implementation and the empirical study

    AU - Haga, Yumehisa

    AU - Takata, Yuta

    AU - Akiyama, Mitsuaki

    AU - Mori, Tatsuya

    PY - 2017/8/1

    Y1 - 2017/8/1

    N2 - Web tracking is widely used as a means to track user's behavior on websites. While web tracking provides new opportunities of e-commerce, it also includes certain risks such as privacy infringement. Therefore, analyzing such risks in the wild Internet is meaningful to make the user's privacy transparent. This work aims to understand how the web tracking has been adopted to prominent websites. We also aim to understand their resilience to the ad-blocking techniques. Web tracking-enabled websites collect the information called the web browser fingerprints, which can be used to identify users. We develop a scalable system that can detect fingerprinting by using both dynamic and static analyses. If a tracking site makes use of many and strong fingerprints, the site is likely resilient to the ad-blocking techniques. We also analyze the connectivity of the third-party tracking sites, which are linked from multiple websites. The link analysis allows us to extract the group of associated tracking sites and understand how influential these sites are. Based on the analyses of 100,000 websites, we quantify the potential risks of the web tracking-enabled websites. We reveal that there are 226 websites that adopt fingerprints that cannot be detected with the most of off-the-shelf anti-tracking tools. We also reveal that a major, resilient third-party tracking site is linked to 50.0 % of the top-100,000 popular websites.

    AB - Web tracking is widely used as a means to track user's behavior on websites. While web tracking provides new opportunities of e-commerce, it also includes certain risks such as privacy infringement. Therefore, analyzing such risks in the wild Internet is meaningful to make the user's privacy transparent. This work aims to understand how the web tracking has been adopted to prominent websites. We also aim to understand their resilience to the ad-blocking techniques. Web tracking-enabled websites collect the information called the web browser fingerprints, which can be used to identify users. We develop a scalable system that can detect fingerprinting by using both dynamic and static analyses. If a tracking site makes use of many and strong fingerprints, the site is likely resilient to the ad-blocking techniques. We also analyze the connectivity of the third-party tracking sites, which are linked from multiple websites. The link analysis allows us to extract the group of associated tracking sites and understand how influential these sites are. Based on the analyses of 100,000 websites, we quantify the potential risks of the web tracking-enabled websites. We reveal that there are 226 websites that adopt fingerprints that cannot be detected with the most of off-the-shelf anti-tracking tools. We also reveal that a major, resilient third-party tracking site is linked to 50.0 % of the top-100,000 popular websites.

    KW - Web browser fingerprint

    KW - Web tracking

    UR - http://www.scopus.com/inward/record.url?scp=85026509498&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85026509498&partnerID=8YFLogxK

    U2 - 10.1587/transinf.2016ICP0020

    DO - 10.1587/transinf.2016ICP0020

    M3 - Article

    VL - E100D

    SP - 1663

    EP - 1670

    JO - IEICE Transactions on Information and Systems

    JF - IEICE Transactions on Information and Systems

    SN - 0916-8532

    IS - 8

    ER -