Learning misclassification costs for imbalanced datasets, application in gene expression data classification

Huijuan Lu, Yige Xu, Minchao Ye, Ke Yan, Qun Jin, Zhigang Gao

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Cost-sensitive algorithms have been widely used to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically, leading to uncertain performance. Hence an effective method is desired to automatically calculate the optimal cost weights. Targeting at the highest weighted classification accuracy (WCA), we propose two approaches to search for the optimal cost weights, including grid searching and function fitting. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Comprehensive experimental results show that the function fitting is more efficient which can well find the optimal cost weights with acceptable WCA.

    Original languageEnglish
    Title of host publicationIntelligent Computing Theories and Application - 14th International Conference, ICIC 2018, Proceedings
    EditorsPrashan Premaratne, Phalguni Gupta, De-Shuang Huang, Vitoantonio Bevilacqua
    PublisherSpringer-Verlag
    Pages513-519
    Number of pages7
    ISBN (Print)9783319959290
    DOIs
    Publication statusPublished - 2018 Jan 1
    Event14th International Conference on Intelligent Computing, ICIC 2018 - Wuhan, China
    Duration: 2018 Aug 152018 Aug 18

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume10954 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other14th International Conference on Intelligent Computing, ICIC 2018
    CountryChina
    CityWuhan
    Period18/8/1518/8/18

    Fingerprint

    Data Classification
    Misclassification
    Gene Expression Data
    Gene expression
    Costs
    Extreme Learning Machine
    Classification Problems
    Learning systems
    Learning
    Classify
    Grid
    Calculate
    Experimental Results
    Experiment
    Experiments

    Keywords

    • Correct classification rate
    • Cost-sensitive
    • Misclassification cost
    • Parameter fitting

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Lu, H., Xu, Y., Ye, M., Yan, K., Jin, Q., & Gao, Z. (2018). Learning misclassification costs for imbalanced datasets, application in gene expression data classification. In P. Premaratne, P. Gupta, D-S. Huang, & V. Bevilacqua (Eds.), Intelligent Computing Theories and Application - 14th International Conference, ICIC 2018, Proceedings (pp. 513-519). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10954 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-319-95930-6_47

    Learning misclassification costs for imbalanced datasets, application in gene expression data classification. / Lu, Huijuan; Xu, Yige; Ye, Minchao; Yan, Ke; Jin, Qun; Gao, Zhigang.

    Intelligent Computing Theories and Application - 14th International Conference, ICIC 2018, Proceedings. ed. / Prashan Premaratne; Phalguni Gupta; De-Shuang Huang; Vitoantonio Bevilacqua. Springer-Verlag, 2018. p. 513-519 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10954 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Lu, H, Xu, Y, Ye, M, Yan, K, Jin, Q & Gao, Z 2018, Learning misclassification costs for imbalanced datasets, application in gene expression data classification. in P Premaratne, P Gupta, D-S Huang & V Bevilacqua (eds), Intelligent Computing Theories and Application - 14th International Conference, ICIC 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10954 LNCS, Springer-Verlag, pp. 513-519, 14th International Conference on Intelligent Computing, ICIC 2018, Wuhan, China, 18/8/15. https://doi.org/10.1007/978-3-319-95930-6_47
    Lu H, Xu Y, Ye M, Yan K, Jin Q, Gao Z. Learning misclassification costs for imbalanced datasets, application in gene expression data classification. In Premaratne P, Gupta P, Huang D-S, Bevilacqua V, editors, Intelligent Computing Theories and Application - 14th International Conference, ICIC 2018, Proceedings. Springer-Verlag. 2018. p. 513-519. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-95930-6_47
    Lu, Huijuan ; Xu, Yige ; Ye, Minchao ; Yan, Ke ; Jin, Qun ; Gao, Zhigang. / Learning misclassification costs for imbalanced datasets, application in gene expression data classification. Intelligent Computing Theories and Application - 14th International Conference, ICIC 2018, Proceedings. editor / Prashan Premaratne ; Phalguni Gupta ; De-Shuang Huang ; Vitoantonio Bevilacqua. Springer-Verlag, 2018. pp. 513-519 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{9c40c8da143847f2bdcb1639ab07a251,
    title = "Learning misclassification costs for imbalanced datasets, application in gene expression data classification",
    abstract = "Cost-sensitive algorithms have been widely used to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically, leading to uncertain performance. Hence an effective method is desired to automatically calculate the optimal cost weights. Targeting at the highest weighted classification accuracy (WCA), we propose two approaches to search for the optimal cost weights, including grid searching and function fitting. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Comprehensive experimental results show that the function fitting is more efficient which can well find the optimal cost weights with acceptable WCA.",
    keywords = "Correct classification rate, Cost-sensitive, Misclassification cost, Parameter fitting",
    author = "Huijuan Lu and Yige Xu and Minchao Ye and Ke Yan and Qun Jin and Zhigang Gao",
    year = "2018",
    month = "1",
    day = "1",
    doi = "10.1007/978-3-319-95930-6_47",
    language = "English",
    isbn = "9783319959290",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer-Verlag",
    pages = "513--519",
    editor = "Prashan Premaratne and Phalguni Gupta and De-Shuang Huang and Vitoantonio Bevilacqua",
    booktitle = "Intelligent Computing Theories and Application - 14th International Conference, ICIC 2018, Proceedings",

    }

    TY - GEN

    T1 - Learning misclassification costs for imbalanced datasets, application in gene expression data classification

    AU - Lu, Huijuan

    AU - Xu, Yige

    AU - Ye, Minchao

    AU - Yan, Ke

    AU - Jin, Qun

    AU - Gao, Zhigang

    PY - 2018/1/1

    Y1 - 2018/1/1

    N2 - Cost-sensitive algorithms have been widely used to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically, leading to uncertain performance. Hence an effective method is desired to automatically calculate the optimal cost weights. Targeting at the highest weighted classification accuracy (WCA), we propose two approaches to search for the optimal cost weights, including grid searching and function fitting. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Comprehensive experimental results show that the function fitting is more efficient which can well find the optimal cost weights with acceptable WCA.

    AB - Cost-sensitive algorithms have been widely used to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically, leading to uncertain performance. Hence an effective method is desired to automatically calculate the optimal cost weights. Targeting at the highest weighted classification accuracy (WCA), we propose two approaches to search for the optimal cost weights, including grid searching and function fitting. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Comprehensive experimental results show that the function fitting is more efficient which can well find the optimal cost weights with acceptable WCA.

    KW - Correct classification rate

    KW - Cost-sensitive

    KW - Misclassification cost

    KW - Parameter fitting

    UR - http://www.scopus.com/inward/record.url?scp=85051872559&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85051872559&partnerID=8YFLogxK

    U2 - 10.1007/978-3-319-95930-6_47

    DO - 10.1007/978-3-319-95930-6_47

    M3 - Conference contribution

    AN - SCOPUS:85051872559

    SN - 9783319959290

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 513

    EP - 519

    BT - Intelligent Computing Theories and Application - 14th International Conference, ICIC 2018, Proceedings

    A2 - Premaratne, Prashan

    A2 - Gupta, Phalguni

    A2 - Huang, De-Shuang

    A2 - Bevilacqua, Vitoantonio

    PB - Springer-Verlag

    ER -