A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

04/14/2023
by   Wenyang Liu, et al.
8

File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence & image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. The code will be released at https://github.com/wenyang001/Byte2Image.

READ FULL TEXT

page 1

page 3

research
05/01/2023

File Fragment Classification using Light-Weight Convolutional Neural Networks

In digital forensics, file fragment classification is an important step ...
research
07/22/2020

Fragments-Expert: A Graphical User Interface MATLAB Toolbox for Classification of File Fragments

The classification of file fragments of various file formats is an essen...
research
10/04/2020

Latency optimal storage and scheduling of replicated fragments for memory-constrained servers

We consider the setting of distributed storage system where a single fil...
research
02/25/2021

File fragment recognition based on content and statistical features

Nowadays, the speed up development and use of digital devices such as sm...
research
09/16/2014

Improving files availability for BitTorrent using a diffusion model

The BitTorrent mechanism effectively spreads file fragments by copying t...
research
02/17/2015

Randomized LU decomposition: An Algorithm for Dictionaries Construction

In recent years, distinctive-dictionary construction has gained importan...
research
11/08/2022

A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer

Alzheimer's disease (AD) is a progressive neurological disorder, meaning...

Please sign up or login with your details

Forgot password? Click here to reset