A New Burrows Wheeler Transform Markov Distance

12/30/2019
by   Edward Raff, et al.
0

Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.

READ FULL TEXT
research
08/10/2022

Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network

Malicious software (malware) causes much harm to our devices and life. W...
research
08/20/2015

Review and Perspective for Distance Based Trajectory Clustering

In this paper we tackle the issue of clustering trajectories of geolocal...
research
06/07/2019

Unsupervised Representation Learning of DNA Sequences

Recently several deep learning models have been used for DNA sequence ba...
research
07/10/2020

Third-Order Asymptotics of Variable-Length Compression Allowing Errors

This study investigates the fundamental limits of variable-length compre...
research
09/16/2019

Unaligned Sequence Similarity Search Using Deep Learning

Gene annotation has traditionally required direct comparison of DNA sequ...
research
02/23/2020

Efficient Compression of Long Arbitrary Sequences with No Reference at the Encoder

In a distributed information application an encoder compresses an arbitr...
research
06/30/2011

On Prediction Using Variable Order Markov Models

This paper is concerned with algorithms for prediction of discrete seque...

Please sign up or login with your details

Forgot password? Click here to reset