NapierOne: A modern mixed file data set alternative to Govdocs1

01/20/2022
by   Simon R. Davies, et al.
0

It was found when reviewing the ransomware detection research literature that almost no proposal provided enough detail on how the test data set was created, or sufficient description of its actual content, to allow it to be recreated by other researchers interested in reconstructing their environment and validating the research results. A modern cybersecurity mixed file data set called NapierOne is presented, primarily aimed at, but not limited to, ransomware detection and forensic analysis research. NapierOne was designed to address this deficiency in reproducibility and improve consistency by facilitating research replication and repeatability. The methodology used in the creation of this data set is also described in detail. The data set was inspired by the Govdocs1 data set and it is intended that NapierOne be used as a complement to this original data set. An investigation was performed with the goal of determining the common files types currently in use. No specific research was found that explicitly provided this information, so an alternative consensus approach was employed. This involved combining the findings from multiple sources of file type usage into an overall ranked list. After which 5000 real-world example files were gathered, and a specific data subset created, for each of the common file types identified. In some circumstances, multiple data subsets were created for a specific file type, each subset representing a specific characteristic for that file type. For example, there are multiple data subsets for the ZIP file type with each subset containing examples of a specific compression method. Ransomware execution tends to produce files that have high entropy, so examples of file types that naturally have this attribute are also present.

READ FULL TEXT
research
06/28/2021

Differential Area Analysis for Ransomware Attack Detection within Mixed File Datasets

The threat from ransomware continues to grow both in the number of affec...
research
03/17/2020

An Exploratory Study of Bot Commits

Background: Bots help automate many of the tasks performed by software d...
research
03/03/2021

Robust PDF Files Forensics Using Coding Style

Identifying how a file has been created is often interesting in security...
research
11/03/2017

Decentralised firewall for malware detection

This paper describes the design and development of a decentralized firew...
research
01/21/2021

Content-Based Textual File Type Detection at Scale

Programming language detection is a common need in the analysis of large...
research
03/19/2021

Fight Virus Like a Virus: A New Defense Method Against File-Encrypting Ransomware

Nowadays ransomware has become a new profitable form of attack. This typ...
research
07/02/2019

Methodology for the Automated Metadata-Based Classification of Incriminating Digital Forensic Artefacts

The ever increasing volume of data in digital forensic investigation is ...

Please sign up or login with your details

Forgot password? Click here to reset