TY - GEN

T1 - Quasi-Linear SVM with Local Offsets for High-dimensional Imbalanced Data Classification

AU - Yanze, Li

AU - Ogai, Harutoshi

N1 - Publisher Copyright:
© 2020 The Society of Instrument and Control Engineers - SICE.

PY - 2020/9/23

Y1 - 2020/9/23

N2 - Imbalanced problems often occur in the classification problem. A special case is within-class imbalance, which worsen the imbalance distribution problem and increase the learning concept complexity. Most methods for solving imbalanced data classification focus on finding a globe boundary to solve between-class imbalance problem. My thesis proposes a effective quasi-linear network with local offsets adjustment for imbalanced classification problems. First, we proposed a gated piecewise linear network, an autoencoder-based partitioning method is modified for imbalanced datasets to divide input space into multiple linearly separable partitions along the potential separation boundary. Construct a quasi-linear SVM based on the gated signal that obtained by autoencoder partitioning information. Then training a neural network that let F-score as loss function to generate the local offsets on each local cluster. Finally a quasi-linear SVM classifier with local offsets is constructed for the imbalanced datasets. Our proposed method avoids calculating Euclidean distance, so it can be applied to high dimensional datasets. Simulation results on different real world datasets that our method is effective for imbalanced data classification especially in high-dimensional data.

AB - Imbalanced problems often occur in the classification problem. A special case is within-class imbalance, which worsen the imbalance distribution problem and increase the learning concept complexity. Most methods for solving imbalanced data classification focus on finding a globe boundary to solve between-class imbalance problem. My thesis proposes a effective quasi-linear network with local offsets adjustment for imbalanced classification problems. First, we proposed a gated piecewise linear network, an autoencoder-based partitioning method is modified for imbalanced datasets to divide input space into multiple linearly separable partitions along the potential separation boundary. Construct a quasi-linear SVM based on the gated signal that obtained by autoencoder partitioning information. Then training a neural network that let F-score as loss function to generate the local offsets on each local cluster. Finally a quasi-linear SVM classifier with local offsets is constructed for the imbalanced datasets. Our proposed method avoids calculating Euclidean distance, so it can be applied to high dimensional datasets. Simulation results on different real world datasets that our method is effective for imbalanced data classification especially in high-dimensional data.

KW - F-measure

KW - imbalaced data classification

KW - kernel composition

KW - support vector machine

KW - within-class imbalances

UR - http://www.scopus.com/inward/record.url?scp=85096359749&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85096359749&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85096359749

T3 - 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan, SICE 2020

SP - 882

EP - 887

BT - 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan, SICE 2020

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 59th Annual Conference of the Society of Instrument and Control Engineers of Japan, SICE 2020

Y2 - 23 September 2020 through 26 September 2020

ER -