Regularized distance metric learning for document classification and its application

Kenta Mikawa, Masayuki Goto

研究成果: Article査読

5 被引用数 (Scopus)

抄録

Due to the development of information technologies, there is a huge amount of text data posted on the Internet. In this study, we focus on distance metric learning, which is one of the models of machine learning. Distance metric learning is a method of estimating the metric matrix of Mahalanobis squared distance from training data under an appropriate constraint. Mochihashi et al. proposed a method which can derive the optimal metric matrix analytically. However, the vector space for document data is normally very high dimensionally and sparse. Therefore, when this method is applied to document data directly, over-fitting may occur because the number of estimated parameters is in proportion to the square of the input data dimensions. To avoid the problem of over-fitting, a regularization term is introduced in this study. The purpose of this study is to formulate the regularized estimation of the metric matrix in which the optimal metric matrix can be derived analytically. To verify the effectiveness of the proposed method, document classification using a Japanese newspaper article is conducted.

本文言語English
ページ(範囲)190-203
ページ数14
ジャーナルJournal of Japan Industrial Management Association
66
2E
出版ステータスPublished - 2015

ASJC Scopus subject areas

  • 戦略と経営
  • 経営科学およびオペレーションズ リサーチ
  • 産業および生産工学
  • 応用数学

フィンガープリント

「Regularized distance metric learning for document classification and its application」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル