Abstract
Many machine learning algorithms have been proposed and applied to a wide range of prediction problems in the field of industrial management. Lately, the amount of data is increasing and machine learning algorithms with low computational costs and efficient ensemble methods are needed. Alternating Decision Forest (ADF) is an efficient ensemble method known for its high performance and low computational costs. ADFs introduce weights representing the degree of prediction accuracy for each piece of training data and randomly select attribute variables for each node. This method can effectively construct an ensemble model that can predict training data accurately while allowing each decision tree to retain different features. However, outliers can cause overfitting, and since candidates of branch conditions vary for nodes in ADFs, there is a possibility that prediction accuracy will deteriorate because the fitness of training data is highly restrained. In order to improve prediction accuracy, we focus on the prediction results for new data. That is to say, we introduce bootstrap sampling so that the algorithm can generate out-of-bag (OOB) datasets for each tree in the training phase. Additionally, we construct an effective ensemble of decision trees to improve generalization ability by considering the prediction accuracy for OOB data. To verify the effectiveness of the proposed method, we conduct simulation experiments using the UCI machine learning repository. This method provides robust and accurate predictions for datasets with many attribute variables.
Original language | English |
---|---|
Pages (from-to) | 384-391 |
Number of pages | 8 |
Journal | Industrial Engineering and Management Systems |
Volume | 16 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2017 Sep 1 |
Keywords
- Alternating Decision Forests
- Big Data
- Data Mining
- Prediction Model
- Random Forests
ASJC Scopus subject areas
- Social Sciences(all)
- Economics, Econometrics and Finance(all)