An error probability estimation of the document classification using Markov model

Manabu Kobayashi, Hiroshi Ninomiya, Toshiyasu Matsushima, Shigeichi Hirasawa

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    The document classification problem has been investigated by various techniques, such as a vector space model, a support vector machine, a random forest, and so on. On the other hand, J. Ziv et al. have proposed a document classification method using Ziv-Lempel algorithm to compress the data. Furthermore, the Context-Tree Weighting (CTW) algorithm has been proposed as an outstanding data compression, and for the document classification using the CTW algorithm experimental results have been reported. In this paper, we assume that each document with same category arises from Markov model with same parameters for the document classification. Then we propose an analysis method to estimate a classification error probability for the document with the finite length.

    Original languageEnglish
    Title of host publication2012 International Symposium on Information Theory and Its Applications, ISITA 2012
    Pages717-721
    Number of pages5
    Publication statusPublished - 2012
    Event2012 International Symposium on Information Theory and Its Applications, ISITA 2012 - Honolulu, HI
    Duration: 2012 Oct 282012 Oct 31

    Other

    Other2012 International Symposium on Information Theory and Its Applications, ISITA 2012
    CityHonolulu, HI
    Period12/10/2812/10/31

    Fingerprint

    Data compression
    Vector spaces
    Support vector machines
    Error probability

    ASJC Scopus subject areas

    • Computer Science Applications
    • Information Systems

    Cite this

    Kobayashi, M., Ninomiya, H., Matsushima, T., & Hirasawa, S. (2012). An error probability estimation of the document classification using Markov model. In 2012 International Symposium on Information Theory and Its Applications, ISITA 2012 (pp. 717-721). [6401034]

    An error probability estimation of the document classification using Markov model. / Kobayashi, Manabu; Ninomiya, Hiroshi; Matsushima, Toshiyasu; Hirasawa, Shigeichi.

    2012 International Symposium on Information Theory and Its Applications, ISITA 2012. 2012. p. 717-721 6401034.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Kobayashi, M, Ninomiya, H, Matsushima, T & Hirasawa, S 2012, An error probability estimation of the document classification using Markov model. in 2012 International Symposium on Information Theory and Its Applications, ISITA 2012., 6401034, pp. 717-721, 2012 International Symposium on Information Theory and Its Applications, ISITA 2012, Honolulu, HI, 12/10/28.
    Kobayashi M, Ninomiya H, Matsushima T, Hirasawa S. An error probability estimation of the document classification using Markov model. In 2012 International Symposium on Information Theory and Its Applications, ISITA 2012. 2012. p. 717-721. 6401034
    Kobayashi, Manabu ; Ninomiya, Hiroshi ; Matsushima, Toshiyasu ; Hirasawa, Shigeichi. / An error probability estimation of the document classification using Markov model. 2012 International Symposium on Information Theory and Its Applications, ISITA 2012. 2012. pp. 717-721
    @inproceedings{e5ef243588f24174a324b0afb99fc0ed,
    title = "An error probability estimation of the document classification using Markov model",
    abstract = "The document classification problem has been investigated by various techniques, such as a vector space model, a support vector machine, a random forest, and so on. On the other hand, J. Ziv et al. have proposed a document classification method using Ziv-Lempel algorithm to compress the data. Furthermore, the Context-Tree Weighting (CTW) algorithm has been proposed as an outstanding data compression, and for the document classification using the CTW algorithm experimental results have been reported. In this paper, we assume that each document with same category arises from Markov model with same parameters for the document classification. Then we propose an analysis method to estimate a classification error probability for the document with the finite length.",
    author = "Manabu Kobayashi and Hiroshi Ninomiya and Toshiyasu Matsushima and Shigeichi Hirasawa",
    year = "2012",
    language = "English",
    isbn = "9784885522673",
    pages = "717--721",
    booktitle = "2012 International Symposium on Information Theory and Its Applications, ISITA 2012",

    }

    TY - GEN

    T1 - An error probability estimation of the document classification using Markov model

    AU - Kobayashi, Manabu

    AU - Ninomiya, Hiroshi

    AU - Matsushima, Toshiyasu

    AU - Hirasawa, Shigeichi

    PY - 2012

    Y1 - 2012

    N2 - The document classification problem has been investigated by various techniques, such as a vector space model, a support vector machine, a random forest, and so on. On the other hand, J. Ziv et al. have proposed a document classification method using Ziv-Lempel algorithm to compress the data. Furthermore, the Context-Tree Weighting (CTW) algorithm has been proposed as an outstanding data compression, and for the document classification using the CTW algorithm experimental results have been reported. In this paper, we assume that each document with same category arises from Markov model with same parameters for the document classification. Then we propose an analysis method to estimate a classification error probability for the document with the finite length.

    AB - The document classification problem has been investigated by various techniques, such as a vector space model, a support vector machine, a random forest, and so on. On the other hand, J. Ziv et al. have proposed a document classification method using Ziv-Lempel algorithm to compress the data. Furthermore, the Context-Tree Weighting (CTW) algorithm has been proposed as an outstanding data compression, and for the document classification using the CTW algorithm experimental results have been reported. In this paper, we assume that each document with same category arises from Markov model with same parameters for the document classification. Then we propose an analysis method to estimate a classification error probability for the document with the finite length.

    UR - http://www.scopus.com/inward/record.url?scp=84873556641&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84873556641&partnerID=8YFLogxK

    M3 - Conference contribution

    SN - 9784885522673

    SP - 717

    EP - 721

    BT - 2012 International Symposium on Information Theory and Its Applications, ISITA 2012

    ER -