TY - JOUR

T1 - A study of learning a sparse metric matrix using l1 regularization based on supervised learning

AU - Mikawa, Kenta

AU - Kobayashi, Manabu

AU - Goto, Masayuki

PY - 2015

Y1 - 2015

N2 - In this paper, we focus on classification problems based on the vector space model. As one of the methods, distance metric learning which estimates an appropriate metric matrix for classification by using the iterative optimization procedure is known as an effective method. However, the distance metric learning for high dimensional data tends to cause the problems of overfitting to a training dataset and longer computational time. In addition, the number of parameters that need to be estimated is in proportion to the square of the input data dimension. Therefore, if the dimension of input data becomes high, the number of training data to acquire a metric matrix with enough accuracy becomes enormous. Especially, these problems are caused when analyzing the document data and purchase history data stored in the EC site with high dimensional and sparse structure. To avoid these problems, we propose the method of l1 regularized distance metric learning by introducing the alternating direction method of multiplier (ADMM) algorithm. The effectiveness of our proposed method is clarified by classification experiments using a newspaper article that has a highly dimensional and sparse structure and the UCI machine learning repository, which has a low and dense structure.

AB - In this paper, we focus on classification problems based on the vector space model. As one of the methods, distance metric learning which estimates an appropriate metric matrix for classification by using the iterative optimization procedure is known as an effective method. However, the distance metric learning for high dimensional data tends to cause the problems of overfitting to a training dataset and longer computational time. In addition, the number of parameters that need to be estimated is in proportion to the square of the input data dimension. Therefore, if the dimension of input data becomes high, the number of training data to acquire a metric matrix with enough accuracy becomes enormous. Especially, these problems are caused when analyzing the document data and purchase history data stored in the EC site with high dimensional and sparse structure. To avoid these problems, we propose the method of l1 regularized distance metric learning by introducing the alternating direction method of multiplier (ADMM) algorithm. The effectiveness of our proposed method is clarified by classification experiments using a newspaper article that has a highly dimensional and sparse structure and the UCI machine learning repository, which has a low and dense structure.

KW - ADMM

KW - Distance metric learning

KW - Document classification

KW - L regularization

KW - Vector space model

UR - http://www.scopus.com/inward/record.url?scp=84946057889&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946057889&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84946057889

VL - 66

SP - 230

EP - 239

JO - Journal of Japan Industrial Management Association

JF - Journal of Japan Industrial Management Association

SN - 0386-4812

IS - 3

ER -