Fuzzy Hashing as Perturbation-Consistent Adversarial Kernel Embedding

12/17/2018
by   Ari Azarafrooz, et al.
0

Measuring the similarity of two files is an important task in malware analysis, with fuzzy hash functions being a popular approach. Traditional fuzzy hash functions are data agnostic: they do not learn from a particular dataset how to determine similarity; their behavior is fixed across all datasets. In this paper, we demonstrate that fuzzy hash functions can be learned in a novel minimax training framework and that these learned fuzzy hash functions outperform traditional fuzzy hash functions at the file similarity task for Portable Executable files. In our approach, hash digests can be extracted from the kernel embeddings of two kernel networks, trained in a minimax framework, where the roles of players during training (i.e adversary versus generator) alternate along with the input data. We refer to this new minimax architecture as perturbation-consistent. The similarity score for a pair of files is the utility of the minimax game in equilibrium. Our experiments show that learned fuzzy hash functions generalize well, capable of determining that two files are similar even when one of those files was generated using insertion and deletion operations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2011

Kernel diff-hash

This paper presents a kernel formulation of the recently introduced diff...
research
08/24/2022

Transformer-Boosted Anomaly Detection with Fuzzy Hashes

Fuzzy hashes are an important tool in digital forensics and are used in ...
research
01/08/2019

Using fuzzy bits and neural networks to partially invert few rounds of some cryptographic hash functions

We consider fuzzy, or continuous, bits, which take values in [0;1] and (...
research
04/14/2020

Topology-Aware Hashing for Effective Control Flow Graph Similarity Analysis

Control Flow Graph (CFG) similarity analysis is an essential technique f...
research
09/18/2021

When Similarity Digest Meets Vector Management System: A Survey on Similarity Hash Function

The booming vector manage system calls for feasible similarity hash func...
research
04/30/2018

Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

In this paper, we propose a novel deep generative approach to cross-moda...

Please sign up or login with your details

Forgot password? Click here to reset