Fast SVM training using edge detection on very large datasets

Boyang Li, Qiangwei Wang, Jinglu Hu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of the training dataset. For very large datasets, it is thus computationally infeasible. Reducing the size of training dataset is naturally considered as a method to solve this problem. SVM classifiers are constructed by using the training samples called support vectors (SVs) that lie close to the separation boundary. Thus, removing the other samples that are not relevant to SVs might have no effect on building the separation boundary. In other words, we need to reserve the samples that are likely to be SVs. Therefore, a method based on edge detection techniques is proposed to extract such samples near the separation boundary. In order to avoid overfitting, we also use a clustering algorithm to keep the distribution properties of the training dataset. The samples selected by the edge detector and the centroids of clusters are used to reconstruct the training dataset. In the proposed approach, the edge detection technique helps us to extract the local properties around the separation boundary and the clustering algorithm preserves the properties of the entire data. The reconstructed training dataset with a smaller number of samples can make the training process very fast without degrading the classification accuracy.

Original languageEnglish
Pages (from-to)229-237
Number of pages9
JournalIEEJ Transactions on Electrical and Electronic Engineering
Volume8
Issue number3
DOIs
Publication statusPublished - 2013 May

Keywords

  • Edge detection
  • Fast SVM training
  • Support vector machine
  • Training data reduction

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Fast SVM training using edge detection on very large datasets'. Together they form a unique fingerprint.

Cite this