Automatically generating malware analysis reports using sandbox logs

Bo Sun, Akinori Fujino, Tatsuya Mori, Tao Ban, Takeshi Takahashi, Daisuke Inoue

    Research output: Contribution to journalArticle

    Abstract

    Analyzing a malware sample requires much more time and cost than creating it. To understand the behavior of a given malware sample, security analysts often make use of API call logs collected by the dynamic malware analysis tools such as a sandbox. As the amount of the log generated for a malware sample could become tremendously large, inspecting the log requires a time-consuming effort. Meanwhile, antivirus vendors usually publish malware analysis reports (vendor reports) on their websites. These malware analysis reports are the results of careful analysis done by security experts. The problem is that even though there are such analyzed examples for malware samples, associating the vendor reports with the sandbox logs is difficult. This makes security analysts not able to retrieve useful information described in vendor reports. To address this issue, we developed a system called AMAR-Generator that aims to automate the generation of malware analysis reports based on sandbox logs by making use of existing vendor reports. Aiming at a convenient assistant tool for security analysts, our system employs techniques including template matching, API behavior mapping, and malicious behavior database to produce concise human-readable reports that describe the malicious behaviors of malware programs. Through the performance evaluation, we first demonstrate that AMAR-Generator can generate human-readable reports that can be used by a security analyst as the first step of the malware analysis. We also demonstrate that AMAR-Generator can identify the malicious behaviors that are conducted by malware from the sandbox logs; the detection rates are up to 96.74%, 100%, and 74.87% on the sandbox logs collected in 2013, 2014, and 2015, respectively. We also present that it can detect malicious behaviors from unknown types of sandbox logs.

    Original languageEnglish
    Pages (from-to)2622-2632
    Number of pages11
    JournalIEICE Transactions on Information and Systems
    VolumeE101D
    Issue number11
    DOIs
    Publication statusPublished - 2018 Nov 1

    Fingerprint

    Application programming interfaces (API)
    Malware
    Template matching
    Websites
    Costs

    Keywords

    • Automated report generating
    • Malware analysis
    • Natural language processing
    • Sandbox logs

    ASJC Scopus subject areas

    • Software
    • Hardware and Architecture
    • Computer Vision and Pattern Recognition
    • Electrical and Electronic Engineering
    • Artificial Intelligence

    Cite this

    Automatically generating malware analysis reports using sandbox logs. / Sun, Bo; Fujino, Akinori; Mori, Tatsuya; Ban, Tao; Takahashi, Takeshi; Inoue, Daisuke.

    In: IEICE Transactions on Information and Systems, Vol. E101D, No. 11, 01.11.2018, p. 2622-2632.

    Research output: Contribution to journalArticle

    Sun, Bo ; Fujino, Akinori ; Mori, Tatsuya ; Ban, Tao ; Takahashi, Takeshi ; Inoue, Daisuke. / Automatically generating malware analysis reports using sandbox logs. In: IEICE Transactions on Information and Systems. 2018 ; Vol. E101D, No. 11. pp. 2622-2632.
    @article{73a62cfb986345c69e06317cc2b11334,
    title = "Automatically generating malware analysis reports using sandbox logs",
    abstract = "Analyzing a malware sample requires much more time and cost than creating it. To understand the behavior of a given malware sample, security analysts often make use of API call logs collected by the dynamic malware analysis tools such as a sandbox. As the amount of the log generated for a malware sample could become tremendously large, inspecting the log requires a time-consuming effort. Meanwhile, antivirus vendors usually publish malware analysis reports (vendor reports) on their websites. These malware analysis reports are the results of careful analysis done by security experts. The problem is that even though there are such analyzed examples for malware samples, associating the vendor reports with the sandbox logs is difficult. This makes security analysts not able to retrieve useful information described in vendor reports. To address this issue, we developed a system called AMAR-Generator that aims to automate the generation of malware analysis reports based on sandbox logs by making use of existing vendor reports. Aiming at a convenient assistant tool for security analysts, our system employs techniques including template matching, API behavior mapping, and malicious behavior database to produce concise human-readable reports that describe the malicious behaviors of malware programs. Through the performance evaluation, we first demonstrate that AMAR-Generator can generate human-readable reports that can be used by a security analyst as the first step of the malware analysis. We also demonstrate that AMAR-Generator can identify the malicious behaviors that are conducted by malware from the sandbox logs; the detection rates are up to 96.74{\%}, 100{\%}, and 74.87{\%} on the sandbox logs collected in 2013, 2014, and 2015, respectively. We also present that it can detect malicious behaviors from unknown types of sandbox logs.",
    keywords = "Automated report generating, Malware analysis, Natural language processing, Sandbox logs",
    author = "Bo Sun and Akinori Fujino and Tatsuya Mori and Tao Ban and Takeshi Takahashi and Daisuke Inoue",
    year = "2018",
    month = "11",
    day = "1",
    doi = "10.1587/transinf.2017ICP0011",
    language = "English",
    volume = "E101D",
    pages = "2622--2632",
    journal = "IEICE Transactions on Information and Systems",
    issn = "0916-8532",
    publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
    number = "11",

    }

    TY - JOUR

    T1 - Automatically generating malware analysis reports using sandbox logs

    AU - Sun, Bo

    AU - Fujino, Akinori

    AU - Mori, Tatsuya

    AU - Ban, Tao

    AU - Takahashi, Takeshi

    AU - Inoue, Daisuke

    PY - 2018/11/1

    Y1 - 2018/11/1

    N2 - Analyzing a malware sample requires much more time and cost than creating it. To understand the behavior of a given malware sample, security analysts often make use of API call logs collected by the dynamic malware analysis tools such as a sandbox. As the amount of the log generated for a malware sample could become tremendously large, inspecting the log requires a time-consuming effort. Meanwhile, antivirus vendors usually publish malware analysis reports (vendor reports) on their websites. These malware analysis reports are the results of careful analysis done by security experts. The problem is that even though there are such analyzed examples for malware samples, associating the vendor reports with the sandbox logs is difficult. This makes security analysts not able to retrieve useful information described in vendor reports. To address this issue, we developed a system called AMAR-Generator that aims to automate the generation of malware analysis reports based on sandbox logs by making use of existing vendor reports. Aiming at a convenient assistant tool for security analysts, our system employs techniques including template matching, API behavior mapping, and malicious behavior database to produce concise human-readable reports that describe the malicious behaviors of malware programs. Through the performance evaluation, we first demonstrate that AMAR-Generator can generate human-readable reports that can be used by a security analyst as the first step of the malware analysis. We also demonstrate that AMAR-Generator can identify the malicious behaviors that are conducted by malware from the sandbox logs; the detection rates are up to 96.74%, 100%, and 74.87% on the sandbox logs collected in 2013, 2014, and 2015, respectively. We also present that it can detect malicious behaviors from unknown types of sandbox logs.

    AB - Analyzing a malware sample requires much more time and cost than creating it. To understand the behavior of a given malware sample, security analysts often make use of API call logs collected by the dynamic malware analysis tools such as a sandbox. As the amount of the log generated for a malware sample could become tremendously large, inspecting the log requires a time-consuming effort. Meanwhile, antivirus vendors usually publish malware analysis reports (vendor reports) on their websites. These malware analysis reports are the results of careful analysis done by security experts. The problem is that even though there are such analyzed examples for malware samples, associating the vendor reports with the sandbox logs is difficult. This makes security analysts not able to retrieve useful information described in vendor reports. To address this issue, we developed a system called AMAR-Generator that aims to automate the generation of malware analysis reports based on sandbox logs by making use of existing vendor reports. Aiming at a convenient assistant tool for security analysts, our system employs techniques including template matching, API behavior mapping, and malicious behavior database to produce concise human-readable reports that describe the malicious behaviors of malware programs. Through the performance evaluation, we first demonstrate that AMAR-Generator can generate human-readable reports that can be used by a security analyst as the first step of the malware analysis. We also demonstrate that AMAR-Generator can identify the malicious behaviors that are conducted by malware from the sandbox logs; the detection rates are up to 96.74%, 100%, and 74.87% on the sandbox logs collected in 2013, 2014, and 2015, respectively. We also present that it can detect malicious behaviors from unknown types of sandbox logs.

    KW - Automated report generating

    KW - Malware analysis

    KW - Natural language processing

    KW - Sandbox logs

    UR - http://www.scopus.com/inward/record.url?scp=85056095905&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85056095905&partnerID=8YFLogxK

    U2 - 10.1587/transinf.2017ICP0011

    DO - 10.1587/transinf.2017ICP0011

    M3 - Article

    AN - SCOPUS:85056095905

    VL - E101D

    SP - 2622

    EP - 2632

    JO - IEICE Transactions on Information and Systems

    JF - IEICE Transactions on Information and Systems

    SN - 0916-8532

    IS - 11

    ER -