Abstract
In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of the training dataset. For very large datasets, it is thus computationally infeasible. Reducing the size of training dataset is naturally considered as a method to solve this problem. SVM classifiers are constructed by using the training samples called support vectors (SVs) that lie close to the separation boundary. Thus, removing the other samples that are not relevant to SVs might have no effect on building the separation boundary. In other words, we need to reserve the samples that are likely to be SVs. Therefore, a method based on edge detection techniques is proposed to extract such samples near the separation boundary. In order to avoid overfitting, we also use a clustering algorithm to keep the distribution properties of the training dataset. The samples selected by the edge detector and the centroids of clusters are used to reconstruct the training dataset. In the proposed approach, the edge detection technique helps us to extract the local properties around the separation boundary and the clustering algorithm preserves the properties of the entire data. The reconstructed training dataset with a smaller number of samples can make the training process very fast without degrading the classification accuracy.
Original language | English |
---|---|
Pages (from-to) | 229-237 |
Number of pages | 9 |
Journal | IEEJ Transactions on Electrical and Electronic Engineering |
Volume | 8 |
Issue number | 3 |
DOIs | |
Publication status | Published - 2013 May |
Keywords
- Edge detection
- Fast SVM training
- Support vector machine
- Training data reduction
ASJC Scopus subject areas
- Electrical and Electronic Engineering