Deep learning with data augmentation to add data around classification boundaries

Hideki Fujinami*, Gendo Kumoi, Masayuki Goto

*この研究の対応する著者

研究成果: Article査読

抄録

Data augmentation methods are used as a technique to improve generalization by increasing the number of training data in image classification. However, most of these methods are not a data driven algorithm, the degree of improvement of generalization ability by performing these data augmentation methods differs between the domains of image data for training. Generative models are researched to use for augmenting data recently. In particular, Generative Adversarial Networks (GANs) (Goodfellow et al., 2014) that can generate clean image get attention as an excellent innovation in machine learning. As GANs extension method, there is a method called CGANs (Mirza and Osindero, 2014) that can be used for data augmentation. When enough training data for each class are not prepared for classification model, the same is true for training CGANs. In such case, CGAN generates noisy images. This makes a classification model to underfit to the original training data. Moreover, when a CGAN approximates the training data distribution, the CGAN generates new training data in the same region where training data densely exist. In such case, augmented data can't reduce overfitting on the original training data. Therefore, our research contributes to augment data which meets these two requirements. In this study, we propose a method to generate data by the class specific GAN with small training data and selectively add generated data to the training data set that improves classification accuracy by using the entropy of the classification model. The feature of the proposed method is that it focuses on the positional relationship between data and the classification hyperplane in deep learning. In the proposed method, the entropy of the classification model is used to measure the positional relationship between the classification boundary and the data. As a result, the generalization performance is improved by adding the data around the classification boundary as new training data.

本文言語English
ページ(範囲)384-397
ページ数14
ジャーナルIndustrial Engineering and Management Systems
20
3
DOI
出版ステータスPublished - 2021 9月

ASJC Scopus subject areas

  • 社会科学(全般)
  • 経済学、計量経済学および金融学(全般)

フィンガープリント

「Deep learning with data augmentation to add data around classification boundaries」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル