Multi-valued classification of text data based on an ECOC approach using a ternary orthogonal table

Leona Suzuki, Kenta Mikawa, Masayuki Goto

    Research output: Contribution to journalArticle

    1 Citation (Scopus)

    Abstract

    Because of the advancements in information technology, a large number of document data has been accumulated on various databases and automatic multi-valued classification becomes highly relevant. This paper focuses on a multivalued classification technique that is based on Error-Correcting Output Codes (ECOC) and which combines several binary classifiers. When predicting the category of a new document data, the outputs of the binary classifiers are combined to produce a predicted value. It is a known problem that if two category sets have an imbalanced amount of training data, the prediction accuracy of a binary classifier is low. To solve this problem, a previous study proposed to employ the Reed-Muller (RM) codes in the context an ECOC approach for resolving the imbalance in the cardinality of the training data sets. However, RM codes can equalize the amount of between training data of two category sets only for a specific number of categories. We want to provide a method that can be employed for a multi-valued classification with an arbitrary number of categories. In this paper, we propose a new configuration method combining binary classifiers with categories, which are not used for classification. This method allows us to reduce the amount of training data for each binary classifier while improving the balance of the training data between two category sets for each binary classifier. As a result, the computational complexity can be decreased. We verify the effectiveness of our proposed method by conducting a document classification experiment.

    Original languageEnglish
    Pages (from-to)155-164
    Number of pages10
    JournalIndustrial Engineering and Management Systems
    Volume16
    Issue number2
    DOIs
    Publication statusPublished - 2017 Jun 1

    Fingerprint

    Classifier
    information technology
    experiment
    Computational complexity
    Prediction accuracy
    Imbalance
    Data base
    Document classification
    Experiment

    Keywords

    • Error-correcting output codes
    • Multi-valued classification
    • Ternary code table
    • Text data

    ASJC Scopus subject areas

    • Social Sciences(all)
    • Economics, Econometrics and Finance(all)

    Cite this

    Multi-valued classification of text data based on an ECOC approach using a ternary orthogonal table. / Suzuki, Leona; Mikawa, Kenta; Goto, Masayuki.

    In: Industrial Engineering and Management Systems, Vol. 16, No. 2, 01.06.2017, p. 155-164.

    Research output: Contribution to journalArticle

    @article{661949b8c3c34117ad0317283c754825,
    title = "Multi-valued classification of text data based on an ECOC approach using a ternary orthogonal table",
    abstract = "Because of the advancements in information technology, a large number of document data has been accumulated on various databases and automatic multi-valued classification becomes highly relevant. This paper focuses on a multivalued classification technique that is based on Error-Correcting Output Codes (ECOC) and which combines several binary classifiers. When predicting the category of a new document data, the outputs of the binary classifiers are combined to produce a predicted value. It is a known problem that if two category sets have an imbalanced amount of training data, the prediction accuracy of a binary classifier is low. To solve this problem, a previous study proposed to employ the Reed-Muller (RM) codes in the context an ECOC approach for resolving the imbalance in the cardinality of the training data sets. However, RM codes can equalize the amount of between training data of two category sets only for a specific number of categories. We want to provide a method that can be employed for a multi-valued classification with an arbitrary number of categories. In this paper, we propose a new configuration method combining binary classifiers with categories, which are not used for classification. This method allows us to reduce the amount of training data for each binary classifier while improving the balance of the training data between two category sets for each binary classifier. As a result, the computational complexity can be decreased. We verify the effectiveness of our proposed method by conducting a document classification experiment.",
    keywords = "Error-correcting output codes, Multi-valued classification, Ternary code table, Text data",
    author = "Leona Suzuki and Kenta Mikawa and Masayuki Goto",
    year = "2017",
    month = "6",
    day = "1",
    doi = "10.7232/iems.2017.16.2.155",
    language = "English",
    volume = "16",
    pages = "155--164",
    journal = "Industrial Engineering and Management Systems",
    issn = "1598-7248",
    publisher = "Korean Institute of Industrial Engineers",
    number = "2",

    }

    TY - JOUR

    T1 - Multi-valued classification of text data based on an ECOC approach using a ternary orthogonal table

    AU - Suzuki, Leona

    AU - Mikawa, Kenta

    AU - Goto, Masayuki

    PY - 2017/6/1

    Y1 - 2017/6/1

    N2 - Because of the advancements in information technology, a large number of document data has been accumulated on various databases and automatic multi-valued classification becomes highly relevant. This paper focuses on a multivalued classification technique that is based on Error-Correcting Output Codes (ECOC) and which combines several binary classifiers. When predicting the category of a new document data, the outputs of the binary classifiers are combined to produce a predicted value. It is a known problem that if two category sets have an imbalanced amount of training data, the prediction accuracy of a binary classifier is low. To solve this problem, a previous study proposed to employ the Reed-Muller (RM) codes in the context an ECOC approach for resolving the imbalance in the cardinality of the training data sets. However, RM codes can equalize the amount of between training data of two category sets only for a specific number of categories. We want to provide a method that can be employed for a multi-valued classification with an arbitrary number of categories. In this paper, we propose a new configuration method combining binary classifiers with categories, which are not used for classification. This method allows us to reduce the amount of training data for each binary classifier while improving the balance of the training data between two category sets for each binary classifier. As a result, the computational complexity can be decreased. We verify the effectiveness of our proposed method by conducting a document classification experiment.

    AB - Because of the advancements in information technology, a large number of document data has been accumulated on various databases and automatic multi-valued classification becomes highly relevant. This paper focuses on a multivalued classification technique that is based on Error-Correcting Output Codes (ECOC) and which combines several binary classifiers. When predicting the category of a new document data, the outputs of the binary classifiers are combined to produce a predicted value. It is a known problem that if two category sets have an imbalanced amount of training data, the prediction accuracy of a binary classifier is low. To solve this problem, a previous study proposed to employ the Reed-Muller (RM) codes in the context an ECOC approach for resolving the imbalance in the cardinality of the training data sets. However, RM codes can equalize the amount of between training data of two category sets only for a specific number of categories. We want to provide a method that can be employed for a multi-valued classification with an arbitrary number of categories. In this paper, we propose a new configuration method combining binary classifiers with categories, which are not used for classification. This method allows us to reduce the amount of training data for each binary classifier while improving the balance of the training data between two category sets for each binary classifier. As a result, the computational complexity can be decreased. We verify the effectiveness of our proposed method by conducting a document classification experiment.

    KW - Error-correcting output codes

    KW - Multi-valued classification

    KW - Ternary code table

    KW - Text data

    UR - http://www.scopus.com/inward/record.url?scp=85030749912&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85030749912&partnerID=8YFLogxK

    U2 - 10.7232/iems.2017.16.2.155

    DO - 10.7232/iems.2017.16.2.155

    M3 - Article

    AN - SCOPUS:85030749912

    VL - 16

    SP - 155

    EP - 164

    JO - Industrial Engineering and Management Systems

    JF - Industrial Engineering and Management Systems

    SN - 1598-7248

    IS - 2

    ER -