Clone or relative?

Understanding the origins of similar Android apps

Yuta Ishii, Takuya Watanabe, Mitsuaki Akiyama, Tatsuya Mori

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    8 Citations (Scopus)

    Abstract

    Since it is not hard to repackage an Android app, there are many cloned apps, which we call "clones" in this work. As previous studies have reported, clones are generated for bad purposes by malicious parties, e.g., adding malicious functions, injecting/replacing advertising modules, and piracy. Besides such clones, there are legitimate, similar apps, which we call "relatives" in this work. These relatives are not clones but are similar in nature; i.e., they are generated by the same app-building service or by the same developer using a same template. Given these observations, this paper aims to answer the following two research questions: (RQ1) How can we distinguish between clones and relatives? (RQ2) What is the breakdown of clones and relatives in the official and third-party marketplaces? To answer the first research question, we developed a scalable framework called APPraiser that systematically extracts similar apps and classifies them into clones and relatives. We note that our key algorithms, which leverage sparseness of the data, have the time complexity of O(n) in practice. To answer the second research question, we applied the APPraiser framework to the over 1.3 millions of apps collected from official and third-party marketplaces. Our analysis revealed the following findings: In the official marketplace, 79% of similar apps were attributed to relatives while, in the third-party marketplace, 50% of similar apps were attributed to clones. The majority of relatives are apps developed by prolific developers in both marketplaces. We also found that in the third-party market, of the clones that were originally published in the official market, 76% of them are malware. To the best of our knowledge, this is the first work that clarified the breakdown of "similar" Android apps, and quantified their origins using a huge dataset equivalent to the size of official market.

    Original languageEnglish
    Title of host publicationIWSPA 2016 - Proceedings of the 2016 ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2016
    PublisherAssociation for Computing Machinery, Inc
    Pages25-32
    Number of pages8
    ISBN (Print)9781450340779
    DOIs
    Publication statusPublished - 2016 Mar 11
    Event2016 2nd ACM International Workshop on Security and Privacy Analytics, IWSPA 2016 - New Orleans, United States
    Duration: 2016 Mar 11 → …

    Other

    Other2016 2nd ACM International Workshop on Security and Privacy Analytics, IWSPA 2016
    CountryUnited States
    CityNew Orleans
    Period16/3/11 → …

    Fingerprint

    Application programs
    Android (operating system)
    Marketing

    Keywords

    • Android
    • Large-scale data
    • Mobile security
    • Repackaging

    ASJC Scopus subject areas

    • Software
    • Computer Science Applications
    • Computational Theory and Mathematics
    • Computer Networks and Communications
    • Information Systems

    Cite this

    Ishii, Y., Watanabe, T., Akiyama, M., & Mori, T. (2016). Clone or relative? Understanding the origins of similar Android apps. In IWSPA 2016 - Proceedings of the 2016 ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2016 (pp. 25-32). Association for Computing Machinery, Inc. https://doi.org/10.1145/2875475.2875480

    Clone or relative? Understanding the origins of similar Android apps. / Ishii, Yuta; Watanabe, Takuya; Akiyama, Mitsuaki; Mori, Tatsuya.

    IWSPA 2016 - Proceedings of the 2016 ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2016. Association for Computing Machinery, Inc, 2016. p. 25-32.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Ishii, Y, Watanabe, T, Akiyama, M & Mori, T 2016, Clone or relative? Understanding the origins of similar Android apps. in IWSPA 2016 - Proceedings of the 2016 ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2016. Association for Computing Machinery, Inc, pp. 25-32, 2016 2nd ACM International Workshop on Security and Privacy Analytics, IWSPA 2016, New Orleans, United States, 16/3/11. https://doi.org/10.1145/2875475.2875480
    Ishii Y, Watanabe T, Akiyama M, Mori T. Clone or relative? Understanding the origins of similar Android apps. In IWSPA 2016 - Proceedings of the 2016 ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2016. Association for Computing Machinery, Inc. 2016. p. 25-32 https://doi.org/10.1145/2875475.2875480
    Ishii, Yuta ; Watanabe, Takuya ; Akiyama, Mitsuaki ; Mori, Tatsuya. / Clone or relative? Understanding the origins of similar Android apps. IWSPA 2016 - Proceedings of the 2016 ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2016. Association for Computing Machinery, Inc, 2016. pp. 25-32
    @inproceedings{38caf11a718346af9aead62474b48d7e,
    title = "Clone or relative?: Understanding the origins of similar Android apps",
    abstract = "Since it is not hard to repackage an Android app, there are many cloned apps, which we call {"}clones{"} in this work. As previous studies have reported, clones are generated for bad purposes by malicious parties, e.g., adding malicious functions, injecting/replacing advertising modules, and piracy. Besides such clones, there are legitimate, similar apps, which we call {"}relatives{"} in this work. These relatives are not clones but are similar in nature; i.e., they are generated by the same app-building service or by the same developer using a same template. Given these observations, this paper aims to answer the following two research questions: (RQ1) How can we distinguish between clones and relatives? (RQ2) What is the breakdown of clones and relatives in the official and third-party marketplaces? To answer the first research question, we developed a scalable framework called APPraiser that systematically extracts similar apps and classifies them into clones and relatives. We note that our key algorithms, which leverage sparseness of the data, have the time complexity of O(n) in practice. To answer the second research question, we applied the APPraiser framework to the over 1.3 millions of apps collected from official and third-party marketplaces. Our analysis revealed the following findings: In the official marketplace, 79{\%} of similar apps were attributed to relatives while, in the third-party marketplace, 50{\%} of similar apps were attributed to clones. The majority of relatives are apps developed by prolific developers in both marketplaces. We also found that in the third-party market, of the clones that were originally published in the official market, 76{\%} of them are malware. To the best of our knowledge, this is the first work that clarified the breakdown of {"}similar{"} Android apps, and quantified their origins using a huge dataset equivalent to the size of official market.",
    keywords = "Android, Large-scale data, Mobile security, Repackaging",
    author = "Yuta Ishii and Takuya Watanabe and Mitsuaki Akiyama and Tatsuya Mori",
    year = "2016",
    month = "3",
    day = "11",
    doi = "10.1145/2875475.2875480",
    language = "English",
    isbn = "9781450340779",
    pages = "25--32",
    booktitle = "IWSPA 2016 - Proceedings of the 2016 ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2016",
    publisher = "Association for Computing Machinery, Inc",

    }

    TY - GEN

    T1 - Clone or relative?

    T2 - Understanding the origins of similar Android apps

    AU - Ishii, Yuta

    AU - Watanabe, Takuya

    AU - Akiyama, Mitsuaki

    AU - Mori, Tatsuya

    PY - 2016/3/11

    Y1 - 2016/3/11

    N2 - Since it is not hard to repackage an Android app, there are many cloned apps, which we call "clones" in this work. As previous studies have reported, clones are generated for bad purposes by malicious parties, e.g., adding malicious functions, injecting/replacing advertising modules, and piracy. Besides such clones, there are legitimate, similar apps, which we call "relatives" in this work. These relatives are not clones but are similar in nature; i.e., they are generated by the same app-building service or by the same developer using a same template. Given these observations, this paper aims to answer the following two research questions: (RQ1) How can we distinguish between clones and relatives? (RQ2) What is the breakdown of clones and relatives in the official and third-party marketplaces? To answer the first research question, we developed a scalable framework called APPraiser that systematically extracts similar apps and classifies them into clones and relatives. We note that our key algorithms, which leverage sparseness of the data, have the time complexity of O(n) in practice. To answer the second research question, we applied the APPraiser framework to the over 1.3 millions of apps collected from official and third-party marketplaces. Our analysis revealed the following findings: In the official marketplace, 79% of similar apps were attributed to relatives while, in the third-party marketplace, 50% of similar apps were attributed to clones. The majority of relatives are apps developed by prolific developers in both marketplaces. We also found that in the third-party market, of the clones that were originally published in the official market, 76% of them are malware. To the best of our knowledge, this is the first work that clarified the breakdown of "similar" Android apps, and quantified their origins using a huge dataset equivalent to the size of official market.

    AB - Since it is not hard to repackage an Android app, there are many cloned apps, which we call "clones" in this work. As previous studies have reported, clones are generated for bad purposes by malicious parties, e.g., adding malicious functions, injecting/replacing advertising modules, and piracy. Besides such clones, there are legitimate, similar apps, which we call "relatives" in this work. These relatives are not clones but are similar in nature; i.e., they are generated by the same app-building service or by the same developer using a same template. Given these observations, this paper aims to answer the following two research questions: (RQ1) How can we distinguish between clones and relatives? (RQ2) What is the breakdown of clones and relatives in the official and third-party marketplaces? To answer the first research question, we developed a scalable framework called APPraiser that systematically extracts similar apps and classifies them into clones and relatives. We note that our key algorithms, which leverage sparseness of the data, have the time complexity of O(n) in practice. To answer the second research question, we applied the APPraiser framework to the over 1.3 millions of apps collected from official and third-party marketplaces. Our analysis revealed the following findings: In the official marketplace, 79% of similar apps were attributed to relatives while, in the third-party marketplace, 50% of similar apps were attributed to clones. The majority of relatives are apps developed by prolific developers in both marketplaces. We also found that in the third-party market, of the clones that were originally published in the official market, 76% of them are malware. To the best of our knowledge, this is the first work that clarified the breakdown of "similar" Android apps, and quantified their origins using a huge dataset equivalent to the size of official market.

    KW - Android

    KW - Large-scale data

    KW - Mobile security

    KW - Repackaging

    UR - http://www.scopus.com/inward/record.url?scp=84966621847&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84966621847&partnerID=8YFLogxK

    U2 - 10.1145/2875475.2875480

    DO - 10.1145/2875475.2875480

    M3 - Conference contribution

    SN - 9781450340779

    SP - 25

    EP - 32

    BT - IWSPA 2016 - Proceedings of the 2016 ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2016

    PB - Association for Computing Machinery, Inc

    ER -