Adaptive prediction method based on alternating decision forests with considerations for generalization ability

Shotaro Misawa, Kenta Mikawa, Masayuki Goto

    Research output: Contribution to journalArticle

    Abstract

    Many machine learning algorithms have been proposed and applied to a wide range of prediction problems in the field of industrial management. Lately, the amount of data is increasing and machine learning algorithms with low computational costs and efficient ensemble methods are needed. Alternating Decision Forest (ADF) is an efficient ensemble method known for its high performance and low computational costs. ADFs introduce weights representing the degree of prediction accuracy for each piece of training data and randomly select attribute variables for each node. This method can effectively construct an ensemble model that can predict training data accurately while allowing each decision tree to retain different features. However, outliers can cause overfitting, and since candidates of branch conditions vary for nodes in ADFs, there is a possibility that prediction accuracy will deteriorate because the fitness of training data is highly restrained. In order to improve prediction accuracy, we focus on the prediction results for new data. That is to say, we introduce bootstrap sampling so that the algorithm can generate out-of-bag (OOB) datasets for each tree in the training phase. Additionally, we construct an effective ensemble of decision trees to improve generalization ability by considering the prediction accuracy for OOB data. To verify the effectiveness of the proposed method, we conduct simulation experiments using the UCI machine learning repository. This method provides robust and accurate predictions for datasets with many attribute variables.

    Original languageEnglish
    Pages (from-to)384-391
    Number of pages8
    JournalIndustrial Engineering and Management Systems
    Volume16
    Issue number3
    DOIs
    Publication statusPublished - 2017 Sep 1

    Fingerprint

    ability
    learning
    business management
    costs
    Prediction
    fitness
    Prediction accuracy
    candidacy
    Machine learning
    simulation
    cause
    experiment
    performance
    Learning algorithm
    Costs
    Node
    Decision tree
    Fitness
    Simulation experiment
    High performance

    Keywords

    • Alternating Decision Forests
    • Big Data
    • Data Mining
    • Prediction Model
    • Random Forests

    ASJC Scopus subject areas

    • Social Sciences(all)
    • Economics, Econometrics and Finance(all)

    Cite this

    Adaptive prediction method based on alternating decision forests with considerations for generalization ability. / Misawa, Shotaro; Mikawa, Kenta; Goto, Masayuki.

    In: Industrial Engineering and Management Systems, Vol. 16, No. 3, 01.09.2017, p. 384-391.

    Research output: Contribution to journalArticle

    @article{bc7435c9bc1b45229cd028e06e92db47,
    title = "Adaptive prediction method based on alternating decision forests with considerations for generalization ability",
    abstract = "Many machine learning algorithms have been proposed and applied to a wide range of prediction problems in the field of industrial management. Lately, the amount of data is increasing and machine learning algorithms with low computational costs and efficient ensemble methods are needed. Alternating Decision Forest (ADF) is an efficient ensemble method known for its high performance and low computational costs. ADFs introduce weights representing the degree of prediction accuracy for each piece of training data and randomly select attribute variables for each node. This method can effectively construct an ensemble model that can predict training data accurately while allowing each decision tree to retain different features. However, outliers can cause overfitting, and since candidates of branch conditions vary for nodes in ADFs, there is a possibility that prediction accuracy will deteriorate because the fitness of training data is highly restrained. In order to improve prediction accuracy, we focus on the prediction results for new data. That is to say, we introduce bootstrap sampling so that the algorithm can generate out-of-bag (OOB) datasets for each tree in the training phase. Additionally, we construct an effective ensemble of decision trees to improve generalization ability by considering the prediction accuracy for OOB data. To verify the effectiveness of the proposed method, we conduct simulation experiments using the UCI machine learning repository. This method provides robust and accurate predictions for datasets with many attribute variables.",
    keywords = "Alternating Decision Forests, Big Data, Data Mining, Prediction Model, Random Forests",
    author = "Shotaro Misawa and Kenta Mikawa and Masayuki Goto",
    year = "2017",
    month = "9",
    day = "1",
    doi = "10.7232/iems.2017.16.3.384",
    language = "English",
    volume = "16",
    pages = "384--391",
    journal = "Industrial Engineering and Management Systems",
    issn = "1598-7248",
    publisher = "Korean Institute of Industrial Engineers",
    number = "3",

    }

    TY - JOUR

    T1 - Adaptive prediction method based on alternating decision forests with considerations for generalization ability

    AU - Misawa, Shotaro

    AU - Mikawa, Kenta

    AU - Goto, Masayuki

    PY - 2017/9/1

    Y1 - 2017/9/1

    N2 - Many machine learning algorithms have been proposed and applied to a wide range of prediction problems in the field of industrial management. Lately, the amount of data is increasing and machine learning algorithms with low computational costs and efficient ensemble methods are needed. Alternating Decision Forest (ADF) is an efficient ensemble method known for its high performance and low computational costs. ADFs introduce weights representing the degree of prediction accuracy for each piece of training data and randomly select attribute variables for each node. This method can effectively construct an ensemble model that can predict training data accurately while allowing each decision tree to retain different features. However, outliers can cause overfitting, and since candidates of branch conditions vary for nodes in ADFs, there is a possibility that prediction accuracy will deteriorate because the fitness of training data is highly restrained. In order to improve prediction accuracy, we focus on the prediction results for new data. That is to say, we introduce bootstrap sampling so that the algorithm can generate out-of-bag (OOB) datasets for each tree in the training phase. Additionally, we construct an effective ensemble of decision trees to improve generalization ability by considering the prediction accuracy for OOB data. To verify the effectiveness of the proposed method, we conduct simulation experiments using the UCI machine learning repository. This method provides robust and accurate predictions for datasets with many attribute variables.

    AB - Many machine learning algorithms have been proposed and applied to a wide range of prediction problems in the field of industrial management. Lately, the amount of data is increasing and machine learning algorithms with low computational costs and efficient ensemble methods are needed. Alternating Decision Forest (ADF) is an efficient ensemble method known for its high performance and low computational costs. ADFs introduce weights representing the degree of prediction accuracy for each piece of training data and randomly select attribute variables for each node. This method can effectively construct an ensemble model that can predict training data accurately while allowing each decision tree to retain different features. However, outliers can cause overfitting, and since candidates of branch conditions vary for nodes in ADFs, there is a possibility that prediction accuracy will deteriorate because the fitness of training data is highly restrained. In order to improve prediction accuracy, we focus on the prediction results for new data. That is to say, we introduce bootstrap sampling so that the algorithm can generate out-of-bag (OOB) datasets for each tree in the training phase. Additionally, we construct an effective ensemble of decision trees to improve generalization ability by considering the prediction accuracy for OOB data. To verify the effectiveness of the proposed method, we conduct simulation experiments using the UCI machine learning repository. This method provides robust and accurate predictions for datasets with many attribute variables.

    KW - Alternating Decision Forests

    KW - Big Data

    KW - Data Mining

    KW - Prediction Model

    KW - Random Forests

    UR - http://www.scopus.com/inward/record.url?scp=85033549986&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85033549986&partnerID=8YFLogxK

    U2 - 10.7232/iems.2017.16.3.384

    DO - 10.7232/iems.2017.16.3.384

    M3 - Article

    AN - SCOPUS:85033549986

    VL - 16

    SP - 384

    EP - 391

    JO - Industrial Engineering and Management Systems

    JF - Industrial Engineering and Management Systems

    SN - 1598-7248

    IS - 3

    ER -