Because of the advancements in information technology, a large number of document data has been accumulated on various databases and automatic multi-valued classification becomes highly relevant. This paper focuses on a multivalued classification technique that is based on Error-Correcting Output Codes (ECOC) and which combines several binary classifiers. When predicting the category of a new document data, the outputs of the binary classifiers are combined to produce a predicted value. It is a known problem that if two category sets have an imbalanced amount of training data, the prediction accuracy of a binary classifier is low. To solve this problem, a previous study proposed to employ the Reed-Muller (RM) codes in the context an ECOC approach for resolving the imbalance in the cardinality of the training data sets. However, RM codes can equalize the amount of between training data of two category sets only for a specific number of categories. We want to provide a method that can be employed for a multi-valued classification with an arbitrary number of categories. In this paper, we propose a new configuration method combining binary classifiers with categories, which are not used for classification. This method allows us to reduce the amount of training data for each binary classifier while improving the balance of the training data between two category sets for each binary classifier. As a result, the computational complexity can be decreased. We verify the effectiveness of our proposed method by conducting a document classification experiment.
- Error-correcting output codes
- Multi-valued classification
- Ternary code table
- Text data
ASJC Scopus subject areas
- Social Sciences(all)
- Economics, Econometrics and Finance(all)