TY - JOUR
T1 - A Hybrid Model for Nonlinear Regression with Missing Data Using Quasilinear Kernel
AU - Zhu, Huilin
AU - Tian, Yanling
AU - Ren, Yanni
AU - Hu, Jinglu
N1 - Publisher Copyright:
© 2020 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - In both the research and engineering fields, missing data is a serious problem that cannot be overlooked. Therefore, available datasets with missing data are a challenge to be modeled by conventional global prediction models. In this paper, we propose a hybrid model consisting of an autoencoder and a gated linear network for solving the regression problem under missing value scenario. A sophisticated modeling and identifying algorithm is developed. First, an extended affinity propagation (AP) clustering algorithm is applied to obtain a self-organized competitive net dividing the datasets into several clusters. Second, a multiple imputation tool with top p% winner-take-all denoising autoencoders (DAE) is introduced to realize better predictions of missing values, in which rough estimates of missing values by using the mean imputation and similarity method within the clusters are used as teacher signals of DAE. Finally, a gated linear network is designed to construct a piecewise linear regression model with interpolations in the exact same way as a support vector regression with a quasilinear kernel composed using the cluster information obtained in the AP clustering step. Based on the experiments of five datasets, our proposed method demonstrates its effectiveness and robustness compared with other traditional kernels and state-of-the-art methods, even on datasets with a large percentage of missing values.
AB - In both the research and engineering fields, missing data is a serious problem that cannot be overlooked. Therefore, available datasets with missing data are a challenge to be modeled by conventional global prediction models. In this paper, we propose a hybrid model consisting of an autoencoder and a gated linear network for solving the regression problem under missing value scenario. A sophisticated modeling and identifying algorithm is developed. First, an extended affinity propagation (AP) clustering algorithm is applied to obtain a self-organized competitive net dividing the datasets into several clusters. Second, a multiple imputation tool with top p% winner-take-all denoising autoencoders (DAE) is introduced to realize better predictions of missing values, in which rough estimates of missing values by using the mean imputation and similarity method within the clusters are used as teacher signals of DAE. Finally, a gated linear network is designed to construct a piecewise linear regression model with interpolations in the exact same way as a support vector regression with a quasilinear kernel composed using the cluster information obtained in the AP clustering step. Based on the experiments of five datasets, our proposed method demonstrates its effectiveness and robustness compared with other traditional kernels and state-of-the-art methods, even on datasets with a large percentage of missing values.
KW - affinity propagation algorithm
KW - denoising autoencoder
KW - missing data
KW - nonlinear regression
KW - quasilinear kernel
KW - support vector regression
UR - http://www.scopus.com/inward/record.url?scp=85091768953&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091768953&partnerID=8YFLogxK
U2 - 10.1002/tee.23253
DO - 10.1002/tee.23253
M3 - Article
AN - SCOPUS:85091768953
SN - 1931-4973
JO - IEEJ Transactions on Electrical and Electronic Engineering
JF - IEEJ Transactions on Electrical and Electronic Engineering
ER -