Automatic labeling of the elements of a vulnerability report CVE with NLP

Kensuke Sumoto, Kenta Kanakogi, Hironori Washizaki, Naohiko Tsuda, Nobukazu Yoshioka, Yoshiaki Fukazawa, Hideyuki Kanuka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a method to label each element in CVE descriptions by applying Named Entity Recognition (NER). For NER, we used BERT, a transformer-based natural language processing model. Using NER with machine learning can label information from CVE descriptions even if there are some distortions in the data. An experiment involving manually prepared label information for 1000 CVE descriptions shows that the labeling accuracy of the proposed method is about 0.81 for precision and about 0.89 for recall. In addition, we devise a way to train the data by dividing it into labels. Our proposed method can be used to label each element automatically from CVE descriptions.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science, IRI 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages164-165
Number of pages2
ISBN (Electronic)9781665466035
DOIs
Publication statusPublished - 2022
Event23rd IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2022 - Virtual, Online, United States
Duration: 2022 Aug 92022 Aug 11

Publication series

NameProceedings - 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science, IRI 2022

Conference

Conference23rd IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2022
Country/TerritoryUnited States
CityVirtual, Online
Period22/8/922/8/11

Keywords

  • BERT
  • CVE
  • Technological
  • named entity recognition
  • natural language processing
  • security knowledge repository

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Decision Sciences (miscellaneous)
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Automatic labeling of the elements of a vulnerability report CVE with NLP'. Together they form a unique fingerprint.

Cite this