A fast SVM training method for very large datasets

Boyang Li, Qiangwei Wang, Takayuki Furuzuki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of training dataset. Thus, it is computationally infeasible for very large datasets. Reducing the size of training dataset is naturally considered to solve this problem. SVM classifiers depend on only support vectors (SVs) that lie close to the separation boundary. Therefore, we need to reserve the samples that are likely to be SVs. In this paper, we propose a method based on the edge detection technique to detect these samples. To preserve the entire distribution properties, we also use a clustering algorithm such as K-means to calculate the centroids of clusters. The samples selected by edge detector and the centroids of clusters are used to reconstruct the training dataset. The reconstructed training dataset with a smaller size makes the training process much faster, but without degrading the classification accuracies.

Original languageEnglish
Title of host publicationProceedings of the International Joint Conference on Neural Networks
Pages1784-1789
Number of pages6
DOIs
Publication statusPublished - 2009
Event2009 International Joint Conference on Neural Networks, IJCNN 2009 - Atlanta, GA
Duration: 2009 Jun 142009 Jun 19

Other

Other2009 International Joint Conference on Neural Networks, IJCNN 2009
CityAtlanta, GA
Period09/6/1409/6/19

Fingerprint

Support vector machines
Edge detection
Clustering algorithms
Classifiers
Detectors

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Li, B., Wang, Q., & Furuzuki, T. (2009). A fast SVM training method for very large datasets. In Proceedings of the International Joint Conference on Neural Networks (pp. 1784-1789). [5178618] https://doi.org/10.1109/IJCNN.2009.5178618

A fast SVM training method for very large datasets. / Li, Boyang; Wang, Qiangwei; Furuzuki, Takayuki.

Proceedings of the International Joint Conference on Neural Networks. 2009. p. 1784-1789 5178618.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, B, Wang, Q & Furuzuki, T 2009, A fast SVM training method for very large datasets. in Proceedings of the International Joint Conference on Neural Networks., 5178618, pp. 1784-1789, 2009 International Joint Conference on Neural Networks, IJCNN 2009, Atlanta, GA, 09/6/14. https://doi.org/10.1109/IJCNN.2009.5178618
Li B, Wang Q, Furuzuki T. A fast SVM training method for very large datasets. In Proceedings of the International Joint Conference on Neural Networks. 2009. p. 1784-1789. 5178618 https://doi.org/10.1109/IJCNN.2009.5178618
Li, Boyang ; Wang, Qiangwei ; Furuzuki, Takayuki. / A fast SVM training method for very large datasets. Proceedings of the International Joint Conference on Neural Networks. 2009. pp. 1784-1789
@inproceedings{7cb2dfdd110b4576820da3b94ba1d993,
title = "A fast SVM training method for very large datasets",
abstract = "In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of training dataset. Thus, it is computationally infeasible for very large datasets. Reducing the size of training dataset is naturally considered to solve this problem. SVM classifiers depend on only support vectors (SVs) that lie close to the separation boundary. Therefore, we need to reserve the samples that are likely to be SVs. In this paper, we propose a method based on the edge detection technique to detect these samples. To preserve the entire distribution properties, we also use a clustering algorithm such as K-means to calculate the centroids of clusters. The samples selected by edge detector and the centroids of clusters are used to reconstruct the training dataset. The reconstructed training dataset with a smaller size makes the training process much faster, but without degrading the classification accuracies.",
author = "Boyang Li and Qiangwei Wang and Takayuki Furuzuki",
year = "2009",
doi = "10.1109/IJCNN.2009.5178618",
language = "English",
isbn = "9781424435531",
pages = "1784--1789",
booktitle = "Proceedings of the International Joint Conference on Neural Networks",

}

TY - GEN

T1 - A fast SVM training method for very large datasets

AU - Li, Boyang

AU - Wang, Qiangwei

AU - Furuzuki, Takayuki

PY - 2009

Y1 - 2009

N2 - In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of training dataset. Thus, it is computationally infeasible for very large datasets. Reducing the size of training dataset is naturally considered to solve this problem. SVM classifiers depend on only support vectors (SVs) that lie close to the separation boundary. Therefore, we need to reserve the samples that are likely to be SVs. In this paper, we propose a method based on the edge detection technique to detect these samples. To preserve the entire distribution properties, we also use a clustering algorithm such as K-means to calculate the centroids of clusters. The samples selected by edge detector and the centroids of clusters are used to reconstruct the training dataset. The reconstructed training dataset with a smaller size makes the training process much faster, but without degrading the classification accuracies.

AB - In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of training dataset. Thus, it is computationally infeasible for very large datasets. Reducing the size of training dataset is naturally considered to solve this problem. SVM classifiers depend on only support vectors (SVs) that lie close to the separation boundary. Therefore, we need to reserve the samples that are likely to be SVs. In this paper, we propose a method based on the edge detection technique to detect these samples. To preserve the entire distribution properties, we also use a clustering algorithm such as K-means to calculate the centroids of clusters. The samples selected by edge detector and the centroids of clusters are used to reconstruct the training dataset. The reconstructed training dataset with a smaller size makes the training process much faster, but without degrading the classification accuracies.

UR - http://www.scopus.com/inward/record.url?scp=70449585520&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449585520&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2009.5178618

DO - 10.1109/IJCNN.2009.5178618

M3 - Conference contribution

AN - SCOPUS:70449585520

SN - 9781424435531

SP - 1784

EP - 1789

BT - Proceedings of the International Joint Conference on Neural Networks

ER -