Two-stage incremental working set selection for fast support vector training on large datasets

DucDung Nguyen, Kazunori Matsumoto, Yasuhiro Takishima, Kazuo Hashimoto, Masahiro Terabe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

We propose iSVM - an incremental algorithm that achieves high speed in training support vector machines (SVMs) on large datasets. In the common decomposition framework, iSVM starts with a minimum working set (WS), and then iteratively selects one training example to update the WS in each optimization loop. iSVM employs a two-stage strategy in processing the training data. In the first stage, the most prominent vector among randomly sampled data is added to the WS. This stage results in an approximate SVM solution. The second stage uses temporal solutions to scan through the whole training data once again to find the remaining support vectors (SVs). We show that iSVM is especially efficient for training SVMs on applications where data size is much larger than number of SVs. On the KDD-CUP 1999 network intrusion detection dataset with nearly five millions training examples, iSVM takes less than one hour to train an SVM with 94% testing accuracy, compared to seven hours with LibSVM - one of the state-of-the-art SVM implementations. We also provide analysis and experimental comparisons between iSVM and the related algorithms.

Original languageEnglish
Title of host publicationRIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies
Pages221-226
Number of pages6
DOIs
Publication statusPublished - 2008
Externally publishedYes
EventRIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies - Ho Chi Minh City
Duration: 2008 Jul 132008 Jul 17

Other

OtherRIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies
CityHo Chi Minh City
Period08/7/1308/7/17

Fingerprint

Support vector machines
Intrusion detection
Decomposition
Testing
Processing

Keywords

  • Decomposition method
  • Optimization
  • Sequential minimal optimization
  • Support vector machine

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition

Cite this

Nguyen, D., Matsumoto, K., Takishima, Y., Hashimoto, K., & Terabe, M. (2008). Two-stage incremental working set selection for fast support vector training on large datasets. In RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies (pp. 221-226). [4586359] https://doi.org/10.1109/RIVF.2008.4586359

Two-stage incremental working set selection for fast support vector training on large datasets. / Nguyen, DucDung; Matsumoto, Kazunori; Takishima, Yasuhiro; Hashimoto, Kazuo; Terabe, Masahiro.

RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies. 2008. p. 221-226 4586359.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nguyen, D, Matsumoto, K, Takishima, Y, Hashimoto, K & Terabe, M 2008, Two-stage incremental working set selection for fast support vector training on large datasets. in RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies., 4586359, pp. 221-226, RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, Ho Chi Minh City, 08/7/13. https://doi.org/10.1109/RIVF.2008.4586359
Nguyen D, Matsumoto K, Takishima Y, Hashimoto K, Terabe M. Two-stage incremental working set selection for fast support vector training on large datasets. In RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies. 2008. p. 221-226. 4586359 https://doi.org/10.1109/RIVF.2008.4586359
Nguyen, DucDung ; Matsumoto, Kazunori ; Takishima, Yasuhiro ; Hashimoto, Kazuo ; Terabe, Masahiro. / Two-stage incremental working set selection for fast support vector training on large datasets. RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies. 2008. pp. 221-226
@inproceedings{57d8344f43854e899513758f2b2f2b19,
title = "Two-stage incremental working set selection for fast support vector training on large datasets",
abstract = "We propose iSVM - an incremental algorithm that achieves high speed in training support vector machines (SVMs) on large datasets. In the common decomposition framework, iSVM starts with a minimum working set (WS), and then iteratively selects one training example to update the WS in each optimization loop. iSVM employs a two-stage strategy in processing the training data. In the first stage, the most prominent vector among randomly sampled data is added to the WS. This stage results in an approximate SVM solution. The second stage uses temporal solutions to scan through the whole training data once again to find the remaining support vectors (SVs). We show that iSVM is especially efficient for training SVMs on applications where data size is much larger than number of SVs. On the KDD-CUP 1999 network intrusion detection dataset with nearly five millions training examples, iSVM takes less than one hour to train an SVM with 94{\%} testing accuracy, compared to seven hours with LibSVM - one of the state-of-the-art SVM implementations. We also provide analysis and experimental comparisons between iSVM and the related algorithms.",
keywords = "Decomposition method, Optimization, Sequential minimal optimization, Support vector machine",
author = "DucDung Nguyen and Kazunori Matsumoto and Yasuhiro Takishima and Kazuo Hashimoto and Masahiro Terabe",
year = "2008",
doi = "10.1109/RIVF.2008.4586359",
language = "English",
isbn = "9781424423798",
pages = "221--226",
booktitle = "RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies",

}

TY - GEN

T1 - Two-stage incremental working set selection for fast support vector training on large datasets

AU - Nguyen, DucDung

AU - Matsumoto, Kazunori

AU - Takishima, Yasuhiro

AU - Hashimoto, Kazuo

AU - Terabe, Masahiro

PY - 2008

Y1 - 2008

N2 - We propose iSVM - an incremental algorithm that achieves high speed in training support vector machines (SVMs) on large datasets. In the common decomposition framework, iSVM starts with a minimum working set (WS), and then iteratively selects one training example to update the WS in each optimization loop. iSVM employs a two-stage strategy in processing the training data. In the first stage, the most prominent vector among randomly sampled data is added to the WS. This stage results in an approximate SVM solution. The second stage uses temporal solutions to scan through the whole training data once again to find the remaining support vectors (SVs). We show that iSVM is especially efficient for training SVMs on applications where data size is much larger than number of SVs. On the KDD-CUP 1999 network intrusion detection dataset with nearly five millions training examples, iSVM takes less than one hour to train an SVM with 94% testing accuracy, compared to seven hours with LibSVM - one of the state-of-the-art SVM implementations. We also provide analysis and experimental comparisons between iSVM and the related algorithms.

AB - We propose iSVM - an incremental algorithm that achieves high speed in training support vector machines (SVMs) on large datasets. In the common decomposition framework, iSVM starts with a minimum working set (WS), and then iteratively selects one training example to update the WS in each optimization loop. iSVM employs a two-stage strategy in processing the training data. In the first stage, the most prominent vector among randomly sampled data is added to the WS. This stage results in an approximate SVM solution. The second stage uses temporal solutions to scan through the whole training data once again to find the remaining support vectors (SVs). We show that iSVM is especially efficient for training SVMs on applications where data size is much larger than number of SVs. On the KDD-CUP 1999 network intrusion detection dataset with nearly five millions training examples, iSVM takes less than one hour to train an SVM with 94% testing accuracy, compared to seven hours with LibSVM - one of the state-of-the-art SVM implementations. We also provide analysis and experimental comparisons between iSVM and the related algorithms.

KW - Decomposition method

KW - Optimization

KW - Sequential minimal optimization

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=51949094920&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51949094920&partnerID=8YFLogxK

U2 - 10.1109/RIVF.2008.4586359

DO - 10.1109/RIVF.2008.4586359

M3 - Conference contribution

SN - 9781424423798

SP - 221

EP - 226

BT - RIVF 2008 - 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies

ER -