Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction

Benhui Chen, Takayuki Furuzuki

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. Gene function prediction is a complicated HMC problem with large class number and usually strongly imbalanced class distributions. This paper proposes an improved HMC method based on over-sampling and hierarchy constraint for solving the gene function prediction problem. The HMC task is transferred into a set of binary support vector machine (SVM) classification tasks. Then, two measures are implemented to enhance the HMC performance by introducing the hierarchy constraint into learning procedures. Firstly, for imbalanced classes, a hierarchical synthetic minority over-sampling technique (SMOTE) is proposed as over-sampling preprocessing to improve the SVM learning performance. Secondly, an improved True Path Rule (TPR) ensemble approach is introduced to combine the results of binary probabilistic SVM classifications. It can improve the classification results and guarantee the hierarchy constraint of classes. Experiment results on four benchmark FunCat Yeast datasets show that the proposed method significantly outperforms the basic TPR method and the Flat ensemble method.

Original languageEnglish
Pages (from-to)183-189
Number of pages7
JournalIEEJ Transactions on Electrical and Electronic Engineering
Volume7
Issue number2
DOIs
Publication statusPublished - 2012 Mar

Fingerprint

Labels
Genes
Sampling
Support vector machines
Yeast
Learning systems

Keywords

  • Consistency ensemble
  • Hierarchical multi-label classification
  • Hierarchical SMOTE
  • Imbalanced dataset learning

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

@article{810d208541774c15a1f2a2f796f7ce85,
title = "Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction",
abstract = "Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. Gene function prediction is a complicated HMC problem with large class number and usually strongly imbalanced class distributions. This paper proposes an improved HMC method based on over-sampling and hierarchy constraint for solving the gene function prediction problem. The HMC task is transferred into a set of binary support vector machine (SVM) classification tasks. Then, two measures are implemented to enhance the HMC performance by introducing the hierarchy constraint into learning procedures. Firstly, for imbalanced classes, a hierarchical synthetic minority over-sampling technique (SMOTE) is proposed as over-sampling preprocessing to improve the SVM learning performance. Secondly, an improved True Path Rule (TPR) ensemble approach is introduced to combine the results of binary probabilistic SVM classifications. It can improve the classification results and guarantee the hierarchy constraint of classes. Experiment results on four benchmark FunCat Yeast datasets show that the proposed method significantly outperforms the basic TPR method and the Flat ensemble method.",
keywords = "Consistency ensemble, Hierarchical multi-label classification, Hierarchical SMOTE, Imbalanced dataset learning",
author = "Benhui Chen and Takayuki Furuzuki",
year = "2012",
month = "3",
doi = "10.1002/tee.21714",
language = "English",
volume = "7",
pages = "183--189",
journal = "IEEJ Transactions on Electrical and Electronic Engineering",
issn = "1931-4973",
publisher = "John Wiley and Sons Inc.",
number = "2",

}

TY - JOUR

T1 - Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction

AU - Chen, Benhui

AU - Furuzuki, Takayuki

PY - 2012/3

Y1 - 2012/3

N2 - Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. Gene function prediction is a complicated HMC problem with large class number and usually strongly imbalanced class distributions. This paper proposes an improved HMC method based on over-sampling and hierarchy constraint for solving the gene function prediction problem. The HMC task is transferred into a set of binary support vector machine (SVM) classification tasks. Then, two measures are implemented to enhance the HMC performance by introducing the hierarchy constraint into learning procedures. Firstly, for imbalanced classes, a hierarchical synthetic minority over-sampling technique (SMOTE) is proposed as over-sampling preprocessing to improve the SVM learning performance. Secondly, an improved True Path Rule (TPR) ensemble approach is introduced to combine the results of binary probabilistic SVM classifications. It can improve the classification results and guarantee the hierarchy constraint of classes. Experiment results on four benchmark FunCat Yeast datasets show that the proposed method significantly outperforms the basic TPR method and the Flat ensemble method.

AB - Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. Gene function prediction is a complicated HMC problem with large class number and usually strongly imbalanced class distributions. This paper proposes an improved HMC method based on over-sampling and hierarchy constraint for solving the gene function prediction problem. The HMC task is transferred into a set of binary support vector machine (SVM) classification tasks. Then, two measures are implemented to enhance the HMC performance by introducing the hierarchy constraint into learning procedures. Firstly, for imbalanced classes, a hierarchical synthetic minority over-sampling technique (SMOTE) is proposed as over-sampling preprocessing to improve the SVM learning performance. Secondly, an improved True Path Rule (TPR) ensemble approach is introduced to combine the results of binary probabilistic SVM classifications. It can improve the classification results and guarantee the hierarchy constraint of classes. Experiment results on four benchmark FunCat Yeast datasets show that the proposed method significantly outperforms the basic TPR method and the Flat ensemble method.

KW - Consistency ensemble

KW - Hierarchical multi-label classification

KW - Hierarchical SMOTE

KW - Imbalanced dataset learning

UR - http://www.scopus.com/inward/record.url?scp=84863079466&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863079466&partnerID=8YFLogxK

U2 - 10.1002/tee.21714

DO - 10.1002/tee.21714

M3 - Article

AN - SCOPUS:84863079466

VL - 7

SP - 183

EP - 189

JO - IEEJ Transactions on Electrical and Electronic Engineering

JF - IEEJ Transactions on Electrical and Electronic Engineering

SN - 1931-4973

IS - 2

ER -