Hierarchical Bloom Filter Trees for Approximate Matching

12/12/2017
by   David Lillis, et al.
0

Bytewise approximate matching algorithms have in recent years shown significant promise in de- tecting files that are similar at the byte level. This is very useful for digital forensic investigators, who are regularly faced with the problem of searching through a seized device for pertinent data. A common scenario is where an investigator is in possession of a collection of "known-illegal" files (e.g. a collection of child abuse material) and wishes to find whether copies of these are stored on the seized device. Approximate matching addresses shortcomings in traditional hashing, which can only find identical files, by also being able to deal with cases of merged files, embedded files, partial files, or if a file has been changed in any way. Most approximate matching algorithms work by comparing pairs of files, which is not a scalable approach when faced with large corpora. This paper demonstrates the effectiveness of using a "Hierarchical Bloom Filter Tree" (HBFT) data structure to reduce the running time of collection-against-collection matching, with a specific focus on the MRSH-v2 algorithm. Three experiments are discussed, which explore the effects of different configurations of HBFTs. The proposed approach dramatically reduces the number of pairwise comparisons required, and demonstrates substantial speed gains, while maintaining effectiveness.

READ FULL TEXT
research
03/09/2020

Forensic Analysis of Residual Information in Adobe PDF Files

In recent years, as electronic files include personal records and busine...
research
02/28/2020

Forensic analysis of the Windows telemetry for diagnostics

Telemetry is the automated sensing and collection of data from a remote ...
research
04/13/2013

Making I/O Virtualization Easy with Device Files

Personal computers have diverse and fast-evolving I/O devices, making th...
research
04/28/2020

SGX-SSD: A Policy-based Versioning SSD with Intel SGX

This paper demonstrates that SSDs, which perform device-level versioning...
research
09/27/2021

Accelerating LSM-Tree with the Dentry Management of File System

The log-structured merge tree (LSM-tree) gains wide popularity in buildi...
research
12/20/2012

An Experiment with Hierarchical Bayesian Record Linkage

In record linkage (RL), or exact file matching, the goal is to identify ...
research
01/26/2023

Minerva: A File-Based Ransomware Detector

Ransomware is a rapidly evolving type of malware designed to encrypt use...

Please sign up or login with your details

Forgot password? Click here to reset