### Abstract

Due to the development of information technologies, there is a huge amount of text data posted on the Internet. In this study, we focus on distance metric learning, which is one of the models of machine learning. Distance metric learning is a method of estimating the metric matrix of Mahalanobis squared distance from training data under an appropriate constraint. Mochihashi et al. proposed a method which can derive the optimal metric matrix analytically. However, the vector space for document data is normally very high dimensionally and sparse. Therefore, when this method is applied to document data directly, over-fitting may occur because the number of estimated parameters is in proportion to the square of the input data dimensions. To avoid the problem of over-fitting, a regularization term is introduced in this study. The purpose of this study is to formulate the regularized estimation of the metric matrix in which the optimal metric matrix can be derived analytically. To verify the effectiveness of the proposed method, document classification using a Japanese newspaper article is conducted.

Original language | English |
---|---|

Pages (from-to) | 190-203 |

Number of pages | 14 |

Journal | Journal of Japan Industrial Management Association |

Volume | 66 |

Issue number | 2E |

Publication status | Published - 2015 |

### Fingerprint

### Keywords

- Distance metric learning
- Document classification
- Regularization
- Vector space model

### ASJC Scopus subject areas

- Industrial and Manufacturing Engineering
- Applied Mathematics
- Management Science and Operations Research
- Strategy and Management

### Cite this

*Journal of Japan Industrial Management Association*,

*66*(2E), 190-203.

**Regularized distance metric learning for document classification and its application.** / Mikawa, Kenta; Goto, Masayuki.

Research output: Contribution to journal › Article

*Journal of Japan Industrial Management Association*, vol. 66, no. 2E, pp. 190-203.

}

TY - JOUR

T1 - Regularized distance metric learning for document classification and its application

AU - Mikawa, Kenta

AU - Goto, Masayuki

PY - 2015

Y1 - 2015

N2 - Due to the development of information technologies, there is a huge amount of text data posted on the Internet. In this study, we focus on distance metric learning, which is one of the models of machine learning. Distance metric learning is a method of estimating the metric matrix of Mahalanobis squared distance from training data under an appropriate constraint. Mochihashi et al. proposed a method which can derive the optimal metric matrix analytically. However, the vector space for document data is normally very high dimensionally and sparse. Therefore, when this method is applied to document data directly, over-fitting may occur because the number of estimated parameters is in proportion to the square of the input data dimensions. To avoid the problem of over-fitting, a regularization term is introduced in this study. The purpose of this study is to formulate the regularized estimation of the metric matrix in which the optimal metric matrix can be derived analytically. To verify the effectiveness of the proposed method, document classification using a Japanese newspaper article is conducted.

AB - Due to the development of information technologies, there is a huge amount of text data posted on the Internet. In this study, we focus on distance metric learning, which is one of the models of machine learning. Distance metric learning is a method of estimating the metric matrix of Mahalanobis squared distance from training data under an appropriate constraint. Mochihashi et al. proposed a method which can derive the optimal metric matrix analytically. However, the vector space for document data is normally very high dimensionally and sparse. Therefore, when this method is applied to document data directly, over-fitting may occur because the number of estimated parameters is in proportion to the square of the input data dimensions. To avoid the problem of over-fitting, a regularization term is introduced in this study. The purpose of this study is to formulate the regularized estimation of the metric matrix in which the optimal metric matrix can be derived analytically. To verify the effectiveness of the proposed method, document classification using a Japanese newspaper article is conducted.

KW - Distance metric learning

KW - Document classification

KW - Regularization

KW - Vector space model

UR - http://www.scopus.com/inward/record.url?scp=84940978828&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84940978828&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84940978828

VL - 66

SP - 190

EP - 203

JO - Journal of Japan Industrial Management Association

JF - Journal of Japan Industrial Management Association

SN - 0386-4812

IS - 2E

ER -