In both the research and engineering fields, missing data is a serious problem that cannot be overlooked. Therefore, available datasets with missing data are a challenge to be modeled by conventional global prediction models. In this paper, we propose a hybrid model consisting of an autoencoder and a gated linear network for solving the regression problem under missing value scenario. A sophisticated modeling and identifying algorithm is developed. First, an extended affinity propagation (AP) clustering algorithm is applied to obtain a self-organized competitive net dividing the datasets into several clusters. Second, a multiple imputation tool with top p% winner-take-all denoising autoencoders (DAE) is introduced to realize better predictions of missing values, in which rough estimates of missing values by using the mean imputation and similarity method within the clusters are used as teacher signals of DAE. Finally, a gated linear network is designed to construct a piecewise linear regression model with interpolations in the exact same way as a support vector regression with a quasilinear kernel composed using the cluster information obtained in the AP clustering step. Based on the experiments of five datasets, our proposed method demonstrates its effectiveness and robustness compared with other traditional kernels and state-of-the-art methods, even on datasets with a large percentage of missing values.
|ジャーナル||IEEJ Transactions on Electrical and Electronic Engineering|
|出版ステータス||Accepted/In press - 2020|
ASJC Scopus subject areas
- Electrical and Electronic Engineering