Scalable Alignment Kernels via Space-Efficient Feature Maps

02/18/2018
by   Yasuo Tabei, et al.
0

String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVMs in various applications. However, alignment kernels have a crucial drawback in that they scale poorly due to their quadratic computation complexity in the number of input strings, which limits large-scale applications in practice. We present the first approximation named ESP+SFM for alignment kernels by leveraging a metric embedding named edit-sensitive parsing (ESP) and space-efficient feature maps (SFM) for random Fourier features (RFF) for large-scale string analyses. Input strings are projected into vectors of RFF by leveraging ESP and SFM. Then, SVMs are trained on the projected vectors, which enables to significantly improve the scalability of alignment kernels while preserving their prediction accuracies. We experimentally test ESP+ SFM on its ability to learn SVMs for large-scale string classifications with various massive string data, and we demonstrate the superior performance of ESP+SFM with respect to prediction accuracy, scalability and computation efficiency.

READ FULL TEXT
research
11/25/2019

Efficient Global String Kernel with Random Features: Beyond Counting Substructures

Analysis of large-scale sequential data has been one of the most crucial...
research
03/13/2020

Knowledge Graph Alignment using String Edit Distance

In this work, we propose a novel knowledge base alignment technique base...
research
10/14/2022

Modelling phylogeny in 16S rRNA gene sequencing datasets using string kernels

Motivation: Bacterial community composition is commonly quantified using...
research
01/12/2018

Cosmic String Detection with Tree-Based Machine Learning

We explore the use of random forest and gradient boosting, two powerful ...
research
03/02/2017

A Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels

Non-linear kernel methods can be approximated by fast linear ones using ...
research
06/26/2019

String Sanitization: A Combinatorial Approach

String data are often disseminated to support applications such as locat...
research
10/01/2019

Scalable String Reconciliation by Recursive Content-Dependent Shingling

We consider the problem of reconciling similar, but remote, strings with...

Please sign up or login with your details

Forgot password? Click here to reset