A new segmented oversampling method for imbalanced data classification using quasi-linear SVM

Bo Zhou, Weite Li, Takayuki Furuzuki

研究成果: Article

1 引用 (Scopus)

抄録

Data imbalance occurs on most real-world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential problems in generating wrong and unnecessary synthetic samples, which makes the learning tasks difficult. This paper proposes a new segmented oversampling method for imbalanced data classification. The input space is first partitioned into several linearly separable local partitions along the potential separation boundary by introducing a bottom-up, minimal-spanning-tree-based clustering method; an oversampling method is then applied within each local linear partition to prevent the generation of wrong and unnecessary synthetic samples; a quasi-linear support vector machine is finally used to realize the classification by taking advantages of the local linear partitions. Simulation results on different real-world datasets show that the proposed segmented oversampling method is effective for imbalanced data classifications.

元の言語English
ページ(範囲)891-898
ページ数8
ジャーナルIEEJ Transactions on Electrical and Electronic Engineering
12
発行部数6
DOI
出版物ステータスPublished - 2017 11 1

Fingerprint

Support vector machines
Classifiers

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

これを引用

@article{abf1ab33db5d408f86384d3b03483f6b,
title = "A new segmented oversampling method for imbalanced data classification using quasi-linear SVM",
abstract = "Data imbalance occurs on most real-world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential problems in generating wrong and unnecessary synthetic samples, which makes the learning tasks difficult. This paper proposes a new segmented oversampling method for imbalanced data classification. The input space is first partitioned into several linearly separable local partitions along the potential separation boundary by introducing a bottom-up, minimal-spanning-tree-based clustering method; an oversampling method is then applied within each local linear partition to prevent the generation of wrong and unnecessary synthetic samples; a quasi-linear support vector machine is finally used to realize the classification by taking advantages of the local linear partitions. Simulation results on different real-world datasets show that the proposed segmented oversampling method is effective for imbalanced data classifications.",
keywords = "imbalanced classification, kernel composition, local linear partition, oversampling method, support vector machine",
author = "Bo Zhou and Weite Li and Takayuki Furuzuki",
year = "2017",
month = "11",
day = "1",
doi = "10.1002/tee.22480",
language = "English",
volume = "12",
pages = "891--898",
journal = "IEEJ Transactions on Electrical and Electronic Engineering",
issn = "1931-4973",
publisher = "John Wiley and Sons Inc.",
number = "6",

}

TY - JOUR

T1 - A new segmented oversampling method for imbalanced data classification using quasi-linear SVM

AU - Zhou, Bo

AU - Li, Weite

AU - Furuzuki, Takayuki

PY - 2017/11/1

Y1 - 2017/11/1

N2 - Data imbalance occurs on most real-world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential problems in generating wrong and unnecessary synthetic samples, which makes the learning tasks difficult. This paper proposes a new segmented oversampling method for imbalanced data classification. The input space is first partitioned into several linearly separable local partitions along the potential separation boundary by introducing a bottom-up, minimal-spanning-tree-based clustering method; an oversampling method is then applied within each local linear partition to prevent the generation of wrong and unnecessary synthetic samples; a quasi-linear support vector machine is finally used to realize the classification by taking advantages of the local linear partitions. Simulation results on different real-world datasets show that the proposed segmented oversampling method is effective for imbalanced data classifications.

AB - Data imbalance occurs on most real-world classification problems and decreases the performance of classifiers. An oversampling method addresses the imbalance problem by generating synthetic samples to balance the data distribution. However, many of the existing oversampling methods have potential problems in generating wrong and unnecessary synthetic samples, which makes the learning tasks difficult. This paper proposes a new segmented oversampling method for imbalanced data classification. The input space is first partitioned into several linearly separable local partitions along the potential separation boundary by introducing a bottom-up, minimal-spanning-tree-based clustering method; an oversampling method is then applied within each local linear partition to prevent the generation of wrong and unnecessary synthetic samples; a quasi-linear support vector machine is finally used to realize the classification by taking advantages of the local linear partitions. Simulation results on different real-world datasets show that the proposed segmented oversampling method is effective for imbalanced data classifications.

KW - imbalanced classification

KW - kernel composition

KW - local linear partition

KW - oversampling method

KW - support vector machine

UR - http://www.scopus.com/inward/record.url?scp=85022322535&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85022322535&partnerID=8YFLogxK

U2 - 10.1002/tee.22480

DO - 10.1002/tee.22480

M3 - Article

AN - SCOPUS:85022322535

VL - 12

SP - 891

EP - 898

JO - IEEJ Transactions on Electrical and Electronic Engineering

JF - IEEJ Transactions on Electrical and Electronic Engineering

SN - 1931-4973

IS - 6

ER -