Discovering similar malware samples using API call topics

Akinori Fujino, Junichi Murakami, Tatsuya Mori

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    11 Citations (Scopus)

    Abstract

    To automate malware analysis, dynamic malware analysis systems have attracted increasing attention from both the industry and research communities. Of the various logs collected by such systems, the API call is a very promising source of information for characterizing malware behavior. This work aims to extract similar malware samples automatically using the concept of 'API call topics,' which represents a set of API calls that are intrinsic to a specific group of malware samples. We first convert Win32 API calls into 'API words.' We then apply non-negative matrix factorization (NMF) clustering analysis to the corpus of the extracted API words. NMF automatically generates the API call topics from the API words. The contributions of this work can be summarized as follows. We present an unsupervised approach to extract API call topics from a large corpus of API calls. Through analysis of the API call logs collected from thousands of malware samples, we demonstrate that the extracted API call topics can detect similar malware samples. The proposed approach is expected to be useful for automating the process of analyzing a huge volume of logs collected from dynamic malware analysis systems.

    Original languageEnglish
    Title of host publication2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages140-147
    Number of pages8
    ISBN (Print)9781479963904
    DOIs
    Publication statusPublished - 2015 Jul 14
    Event2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015 - Las Vegas, United States
    Duration: 2015 Jan 92015 Jan 12

    Other

    Other2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015
    CountryUnited States
    CityLas Vegas
    Period15/1/915/1/12

    Fingerprint

    Application programming interfaces (API)
    Factorization
    Malware
    Dynamic analysis
    Computer systems

    ASJC Scopus subject areas

    • Computer Networks and Communications

    Cite this

    Fujino, A., Murakami, J., & Mori, T. (2015). Discovering similar malware samples using API call topics. In 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015 (pp. 140-147). [7157960] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CCNC.2015.7157960

    Discovering similar malware samples using API call topics. / Fujino, Akinori; Murakami, Junichi; Mori, Tatsuya.

    2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 140-147 7157960.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Fujino, A, Murakami, J & Mori, T 2015, Discovering similar malware samples using API call topics. in 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015., 7157960, Institute of Electrical and Electronics Engineers Inc., pp. 140-147, 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015, Las Vegas, United States, 15/1/9. https://doi.org/10.1109/CCNC.2015.7157960
    Fujino A, Murakami J, Mori T. Discovering similar malware samples using API call topics. In 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 140-147. 7157960 https://doi.org/10.1109/CCNC.2015.7157960
    Fujino, Akinori ; Murakami, Junichi ; Mori, Tatsuya. / Discovering similar malware samples using API call topics. 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 140-147
    @inproceedings{8a8a59d0af744c29be15f6e45f1829b4,
    title = "Discovering similar malware samples using API call topics",
    abstract = "To automate malware analysis, dynamic malware analysis systems have attracted increasing attention from both the industry and research communities. Of the various logs collected by such systems, the API call is a very promising source of information for characterizing malware behavior. This work aims to extract similar malware samples automatically using the concept of 'API call topics,' which represents a set of API calls that are intrinsic to a specific group of malware samples. We first convert Win32 API calls into 'API words.' We then apply non-negative matrix factorization (NMF) clustering analysis to the corpus of the extracted API words. NMF automatically generates the API call topics from the API words. The contributions of this work can be summarized as follows. We present an unsupervised approach to extract API call topics from a large corpus of API calls. Through analysis of the API call logs collected from thousands of malware samples, we demonstrate that the extracted API call topics can detect similar malware samples. The proposed approach is expected to be useful for automating the process of analyzing a huge volume of logs collected from dynamic malware analysis systems.",
    author = "Akinori Fujino and Junichi Murakami and Tatsuya Mori",
    year = "2015",
    month = "7",
    day = "14",
    doi = "10.1109/CCNC.2015.7157960",
    language = "English",
    isbn = "9781479963904",
    pages = "140--147",
    booktitle = "2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",

    }

    TY - GEN

    T1 - Discovering similar malware samples using API call topics

    AU - Fujino, Akinori

    AU - Murakami, Junichi

    AU - Mori, Tatsuya

    PY - 2015/7/14

    Y1 - 2015/7/14

    N2 - To automate malware analysis, dynamic malware analysis systems have attracted increasing attention from both the industry and research communities. Of the various logs collected by such systems, the API call is a very promising source of information for characterizing malware behavior. This work aims to extract similar malware samples automatically using the concept of 'API call topics,' which represents a set of API calls that are intrinsic to a specific group of malware samples. We first convert Win32 API calls into 'API words.' We then apply non-negative matrix factorization (NMF) clustering analysis to the corpus of the extracted API words. NMF automatically generates the API call topics from the API words. The contributions of this work can be summarized as follows. We present an unsupervised approach to extract API call topics from a large corpus of API calls. Through analysis of the API call logs collected from thousands of malware samples, we demonstrate that the extracted API call topics can detect similar malware samples. The proposed approach is expected to be useful for automating the process of analyzing a huge volume of logs collected from dynamic malware analysis systems.

    AB - To automate malware analysis, dynamic malware analysis systems have attracted increasing attention from both the industry and research communities. Of the various logs collected by such systems, the API call is a very promising source of information for characterizing malware behavior. This work aims to extract similar malware samples automatically using the concept of 'API call topics,' which represents a set of API calls that are intrinsic to a specific group of malware samples. We first convert Win32 API calls into 'API words.' We then apply non-negative matrix factorization (NMF) clustering analysis to the corpus of the extracted API words. NMF automatically generates the API call topics from the API words. The contributions of this work can be summarized as follows. We present an unsupervised approach to extract API call topics from a large corpus of API calls. Through analysis of the API call logs collected from thousands of malware samples, we demonstrate that the extracted API call topics can detect similar malware samples. The proposed approach is expected to be useful for automating the process of analyzing a huge volume of logs collected from dynamic malware analysis systems.

    UR - http://www.scopus.com/inward/record.url?scp=84943196475&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84943196475&partnerID=8YFLogxK

    U2 - 10.1109/CCNC.2015.7157960

    DO - 10.1109/CCNC.2015.7157960

    M3 - Conference contribution

    SN - 9781479963904

    SP - 140

    EP - 147

    BT - 2015 12th Annual IEEE Consumer Communications and Networking Conference, CCNC 2015

    PB - Institute of Electrical and Electronics Engineers Inc.

    ER -