Exploration into gray area: Efficient labeling for malicious domain name detection

Naoki Fukushi, Daiki Chiba, Mitsuaki Akiyama, Masato Uchida

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

This paper presents a method to reduce the labeling cost when acquiring training data for a system that detects malicious domain names by supervised machine learning. The conventional system requires large quantities of both benign and malicious domain names to be prepared as training data to obtain a classifier with high classification accuracy. In general, malicious domain names are observed less frequently than benign domain names. Therefore, it is difficult to acquire a large number of malicious domain names without a dedicated labeling method. We propose a method based on active learning that labels data around the decision boundary of classification, i.e., in the gray area, and we show that the classification accuracy can be improved by only using approximately 2.5% of the training data used by the conventional system. An additional disadvantage of the conventional system is that, if the classifier is trained with a small amount of training data, its generalization ability cannot be guaranteed. We propose a method based on ensemble learning that integrates multiple classifiers, and we show that the classification accuracy can be stabilized and improved.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE 43rd Annual Computer Software and Applications Conference, COMPSAC 2019
EditorsVladimir Getov, Jean-Luc Gaudiot, Nariyoshi Yamai, Stelvio Cimato, Morris Chang, Yuuichi Teranishi, Ji-Jiang Yang, Hong Va Leong, Hossian Shahriar, Michiharu Takemoto, Dave Towey, Hiroki Takakura, Atilla Elci, Susumu Takeuchi, Satish Puri
PublisherIEEE Computer Society
Pages770-775
Number of pages6
ISBN (Electronic)9781728126074
DOIs
Publication statusPublished - 2019 Jul
Event43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019 - Milwaukee, United States
Duration: 2019 Jul 152019 Jul 19

Publication series

NameProceedings - International Computer Software and Applications Conference
Volume1
ISSN (Print)0730-3157

Conference

Conference43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019
Country/TerritoryUnited States
CityMilwaukee
Period19/7/1519/7/19

Keywords

  • Active learning
  • Data labeling
  • Ensemble learning
  • Malicious domain name

ASJC Scopus subject areas

  • Software
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Exploration into gray area: Efficient labeling for malicious domain name detection'. Together they form a unique fingerprint.

Cite this