Oversampling the minority class in a multi-linear feature space for imbalanced data classification

Peifeng Liang, Weite Li, Takayuki Furuzuki

Research output: Contribution to journalArticle

Abstract

This paper proposes a novel oversampling method for imbalanced data classification, in which the minority class samples are synthesized in a feature space to avoid the generated minority samples falling into the majority class regions. For this purpose, it introduces a multi-linear feature space (MLFS) based on a quasi-linear kernel, which is composed from a pretrained neural network (NN). By using the quasi-linear kernel, the proposed MLFS oversampling method avoids computing directly the Euclidean distances among the samples when oversampling the minority class and mapping the samples to high-dimensional feature space, which makes it easy to be applied to classification of high-dimensional datasets. On the other hand, by using kernel learning instead of representation learning using the NN, it makes an unsupervised learning, even a transfer learning, to be easily employed for the pretraining of NNs because a kernel is usually less dependent on a specific problem, which makes it possible to avoid considering the imbalance problem at the stage of pretraining the NN. Finally, a method is developed to oversample the synthetic minority samples by computing the quasi-linear kernel matrix instead of computing very high dimensional MLFS feature vectors directly. The proposed MLFS oversampling method is applied to different real-world datasets including image dataset, and simulation results confirm the effectiveness of the proposed method.

Original languageEnglish
JournalIEEJ Transactions on Electrical and Electronic Engineering
DOIs
Publication statusAccepted/In press - 2018 Jan 1

Fingerprint

Neural networks
Unsupervised learning

Keywords

  • Imbalanced data classification
  • Kernel composition
  • Multi-linear feature space
  • Oversampling
  • Support vector machine

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

@article{929bf47b1348404e9fb074e94fd2002c,
title = "Oversampling the minority class in a multi-linear feature space for imbalanced data classification",
abstract = "This paper proposes a novel oversampling method for imbalanced data classification, in which the minority class samples are synthesized in a feature space to avoid the generated minority samples falling into the majority class regions. For this purpose, it introduces a multi-linear feature space (MLFS) based on a quasi-linear kernel, which is composed from a pretrained neural network (NN). By using the quasi-linear kernel, the proposed MLFS oversampling method avoids computing directly the Euclidean distances among the samples when oversampling the minority class and mapping the samples to high-dimensional feature space, which makes it easy to be applied to classification of high-dimensional datasets. On the other hand, by using kernel learning instead of representation learning using the NN, it makes an unsupervised learning, even a transfer learning, to be easily employed for the pretraining of NNs because a kernel is usually less dependent on a specific problem, which makes it possible to avoid considering the imbalance problem at the stage of pretraining the NN. Finally, a method is developed to oversample the synthetic minority samples by computing the quasi-linear kernel matrix instead of computing very high dimensional MLFS feature vectors directly. The proposed MLFS oversampling method is applied to different real-world datasets including image dataset, and simulation results confirm the effectiveness of the proposed method.",
keywords = "Imbalanced data classification, Kernel composition, Multi-linear feature space, Oversampling, Support vector machine",
author = "Peifeng Liang and Weite Li and Takayuki Furuzuki",
year = "2018",
month = "1",
day = "1",
doi = "10.1002/tee.22715",
language = "English",
journal = "IEEJ Transactions on Electrical and Electronic Engineering",
issn = "1931-4973",
publisher = "John Wiley and Sons Inc.",

}

TY - JOUR

T1 - Oversampling the minority class in a multi-linear feature space for imbalanced data classification

AU - Liang, Peifeng

AU - Li, Weite

AU - Furuzuki, Takayuki

PY - 2018/1/1

Y1 - 2018/1/1

N2 - This paper proposes a novel oversampling method for imbalanced data classification, in which the minority class samples are synthesized in a feature space to avoid the generated minority samples falling into the majority class regions. For this purpose, it introduces a multi-linear feature space (MLFS) based on a quasi-linear kernel, which is composed from a pretrained neural network (NN). By using the quasi-linear kernel, the proposed MLFS oversampling method avoids computing directly the Euclidean distances among the samples when oversampling the minority class and mapping the samples to high-dimensional feature space, which makes it easy to be applied to classification of high-dimensional datasets. On the other hand, by using kernel learning instead of representation learning using the NN, it makes an unsupervised learning, even a transfer learning, to be easily employed for the pretraining of NNs because a kernel is usually less dependent on a specific problem, which makes it possible to avoid considering the imbalance problem at the stage of pretraining the NN. Finally, a method is developed to oversample the synthetic minority samples by computing the quasi-linear kernel matrix instead of computing very high dimensional MLFS feature vectors directly. The proposed MLFS oversampling method is applied to different real-world datasets including image dataset, and simulation results confirm the effectiveness of the proposed method.

AB - This paper proposes a novel oversampling method for imbalanced data classification, in which the minority class samples are synthesized in a feature space to avoid the generated minority samples falling into the majority class regions. For this purpose, it introduces a multi-linear feature space (MLFS) based on a quasi-linear kernel, which is composed from a pretrained neural network (NN). By using the quasi-linear kernel, the proposed MLFS oversampling method avoids computing directly the Euclidean distances among the samples when oversampling the minority class and mapping the samples to high-dimensional feature space, which makes it easy to be applied to classification of high-dimensional datasets. On the other hand, by using kernel learning instead of representation learning using the NN, it makes an unsupervised learning, even a transfer learning, to be easily employed for the pretraining of NNs because a kernel is usually less dependent on a specific problem, which makes it possible to avoid considering the imbalance problem at the stage of pretraining the NN. Finally, a method is developed to oversample the synthetic minority samples by computing the quasi-linear kernel matrix instead of computing very high dimensional MLFS feature vectors directly. The proposed MLFS oversampling method is applied to different real-world datasets including image dataset, and simulation results confirm the effectiveness of the proposed method.

KW - Imbalanced data classification

KW - Kernel composition

KW - Multi-linear feature space

KW - Oversampling

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=85047724495&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047724495&partnerID=8YFLogxK

U2 - 10.1002/tee.22715

DO - 10.1002/tee.22715

M3 - Article

AN - SCOPUS:85047724495

JO - IEEJ Transactions on Electrical and Electronic Engineering

JF - IEEJ Transactions on Electrical and Electronic Engineering

SN - 1931-4973

ER -