A proposal of extended cosine measure for distance metric learning in text classification

Kenta Mikawa, Takashi Ishida, Masayuki Goto

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    11 Citations (Scopus)

    Abstract

    This paper discusses a new similarity measure between documents on a vector space model from the view point of distance metric learning. The documents are represented by points in the vector space by using the information of frequencies of words appearing in each document. The similarity measure between two different documents is useful to recognize the relationship and can be applied to classification or clustering of the data. Usually, the cosine similarity and the Euclid distance have been used in order to measure the similarity between points in the Euclidean space. However, these measures do not take the correlation among words which appear in documents into consideration on an application of the vector space model to document analysis. Generally speaking, many words which appear in documents have correlation to one another depending on the sentence structures, topics and subjects. Therefore, it is effective to build a suitable metric measure taking the correlation of words into consideration on the vector space in order to improve the performance of document classification and clustering. This paper presents a new effective method to acquire a distance measure on the document vector space based on an extended cosine measure. In addition, the way of distance metric learning is proposed to acquire the proper metric from the view point of supervised learning. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of the customer review which is posted on the web site and the newspaper article.

    Original languageEnglish
    Title of host publicationConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
    Pages1741-1746
    Number of pages6
    DOIs
    Publication statusPublished - 2011
    Event2011 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2011 - Anchorage, AK
    Duration: 2011 Oct 92011 Oct 12

    Other

    Other2011 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2011
    CityAnchorage, AK
    Period11/10/911/10/12

    Fingerprint

    Vector spaces
    Supervised learning
    Websites
    Experiments

    Keywords

    • extended cosine measure
    • metric learning
    • similarity measure
    • text mining
    • vector space model

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Control and Systems Engineering
    • Human-Computer Interaction

    Cite this

    Mikawa, K., Ishida, T., & Goto, M. (2011). A proposal of extended cosine measure for distance metric learning in text classification. In Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics (pp. 1741-1746). [6083923] https://doi.org/10.1109/ICSMC.2011.6083923

    A proposal of extended cosine measure for distance metric learning in text classification. / Mikawa, Kenta; Ishida, Takashi; Goto, Masayuki.

    Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics. 2011. p. 1741-1746 6083923.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Mikawa, K, Ishida, T & Goto, M 2011, A proposal of extended cosine measure for distance metric learning in text classification. in Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics., 6083923, pp. 1741-1746, 2011 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2011, Anchorage, AK, 11/10/9. https://doi.org/10.1109/ICSMC.2011.6083923
    Mikawa K, Ishida T, Goto M. A proposal of extended cosine measure for distance metric learning in text classification. In Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics. 2011. p. 1741-1746. 6083923 https://doi.org/10.1109/ICSMC.2011.6083923
    Mikawa, Kenta ; Ishida, Takashi ; Goto, Masayuki. / A proposal of extended cosine measure for distance metric learning in text classification. Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics. 2011. pp. 1741-1746
    @inproceedings{803c978ddfda4bb3ac216d7c116db4d0,
    title = "A proposal of extended cosine measure for distance metric learning in text classification",
    abstract = "This paper discusses a new similarity measure between documents on a vector space model from the view point of distance metric learning. The documents are represented by points in the vector space by using the information of frequencies of words appearing in each document. The similarity measure between two different documents is useful to recognize the relationship and can be applied to classification or clustering of the data. Usually, the cosine similarity and the Euclid distance have been used in order to measure the similarity between points in the Euclidean space. However, these measures do not take the correlation among words which appear in documents into consideration on an application of the vector space model to document analysis. Generally speaking, many words which appear in documents have correlation to one another depending on the sentence structures, topics and subjects. Therefore, it is effective to build a suitable metric measure taking the correlation of words into consideration on the vector space in order to improve the performance of document classification and clustering. This paper presents a new effective method to acquire a distance measure on the document vector space based on an extended cosine measure. In addition, the way of distance metric learning is proposed to acquire the proper metric from the view point of supervised learning. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of the customer review which is posted on the web site and the newspaper article.",
    keywords = "extended cosine measure, metric learning, similarity measure, text mining, vector space model",
    author = "Kenta Mikawa and Takashi Ishida and Masayuki Goto",
    year = "2011",
    doi = "10.1109/ICSMC.2011.6083923",
    language = "English",
    isbn = "9781457706523",
    pages = "1741--1746",
    booktitle = "Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics",

    }

    TY - GEN

    T1 - A proposal of extended cosine measure for distance metric learning in text classification

    AU - Mikawa, Kenta

    AU - Ishida, Takashi

    AU - Goto, Masayuki

    PY - 2011

    Y1 - 2011

    N2 - This paper discusses a new similarity measure between documents on a vector space model from the view point of distance metric learning. The documents are represented by points in the vector space by using the information of frequencies of words appearing in each document. The similarity measure between two different documents is useful to recognize the relationship and can be applied to classification or clustering of the data. Usually, the cosine similarity and the Euclid distance have been used in order to measure the similarity between points in the Euclidean space. However, these measures do not take the correlation among words which appear in documents into consideration on an application of the vector space model to document analysis. Generally speaking, many words which appear in documents have correlation to one another depending on the sentence structures, topics and subjects. Therefore, it is effective to build a suitable metric measure taking the correlation of words into consideration on the vector space in order to improve the performance of document classification and clustering. This paper presents a new effective method to acquire a distance measure on the document vector space based on an extended cosine measure. In addition, the way of distance metric learning is proposed to acquire the proper metric from the view point of supervised learning. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of the customer review which is posted on the web site and the newspaper article.

    AB - This paper discusses a new similarity measure between documents on a vector space model from the view point of distance metric learning. The documents are represented by points in the vector space by using the information of frequencies of words appearing in each document. The similarity measure between two different documents is useful to recognize the relationship and can be applied to classification or clustering of the data. Usually, the cosine similarity and the Euclid distance have been used in order to measure the similarity between points in the Euclidean space. However, these measures do not take the correlation among words which appear in documents into consideration on an application of the vector space model to document analysis. Generally speaking, many words which appear in documents have correlation to one another depending on the sentence structures, topics and subjects. Therefore, it is effective to build a suitable metric measure taking the correlation of words into consideration on the vector space in order to improve the performance of document classification and clustering. This paper presents a new effective method to acquire a distance measure on the document vector space based on an extended cosine measure. In addition, the way of distance metric learning is proposed to acquire the proper metric from the view point of supervised learning. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of the customer review which is posted on the web site and the newspaper article.

    KW - extended cosine measure

    KW - metric learning

    KW - similarity measure

    KW - text mining

    KW - vector space model

    UR - http://www.scopus.com/inward/record.url?scp=83755186800&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=83755186800&partnerID=8YFLogxK

    U2 - 10.1109/ICSMC.2011.6083923

    DO - 10.1109/ICSMC.2011.6083923

    M3 - Conference contribution

    AN - SCOPUS:83755186800

    SN - 9781457706523

    SP - 1741

    EP - 1746

    BT - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics

    ER -