On Coresets for Support Vector Machines

02/15/2020
by   Murad Tukan, et al.
6

We present an efficient coreset construction algorithm for large-scale Support Vector Machine (SVM) training in Big Data and streaming applications. A coreset is a small, representative subset of the original data points such that a models trained on the coreset are provably competitive with those trained on the original data set. Since the size of the coreset is generally much smaller than the original set, our preprocess-then-train scheme has potential to lead to significant speedups when training SVM models. We prove lower and upper bounds on the size of the coreset required to obtain small data summaries for the SVM problem. As a corollary, we show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings. We evaluate the performance of our algorithm on real-world and synthetic data sets. Our experimental results reaffirm the favorable theoretical properties of our algorithm and demonstrate its practical effectiveness in accelerating SVM training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2019

High-Performance Support Vector Machines and Its Applications

The support vector machines (SVM) algorithm is a popular classification ...
research
04/10/2022

Coreset of Hyperspectral Images on Small Quantum Computer

Machine Learning (ML) techniques are employed to analyze and process big...
research
08/10/2022

Classifier Transfer with Data Selection Strategies for Online Support Vector Machine Classification with Class Imbalance

Objective: Classifier transfers usually come with dataset shifts. To ove...
research
07/28/2021

Chance constrained conic-segmentation support vector machine with uncertain data

Support vector machines (SVM) is one of the well known supervised classe...
research
12/05/2018

GADGET SVM: A Gossip-bAseD sub-GradiEnT Solver for Linear SVMs

In the era of big data, an important weapon in a machine learning resear...
research
05/18/2018

Wasserstein Coresets for Lipschitz Costs

Sparsification is becoming more and more relevant with the proliferation...
research
09/29/2020

Efficient SVDD Sampling with Approximation Guarantees for the Decision Boundary

Support Vector Data Description (SVDD) is a popular one-class classifier...

Please sign up or login with your details

Forgot password? Click here to reset