Robust PDF Files Forensics Using Coding Style

03/03/2021
by   Supriya Adhatarao, et al.
0

Identifying how a file has been created is often interesting in security. It can be used by both attackers and defenders. Attackers can exploit this information to tune their attacks and defenders can understand how a malicious file has been created after an incident. In this work, we want to identify how a PDF file has been created. This problem is important because PDF files are extremely popular: many organizations publish PDF files online and malicious PDF files are commonly used by attackers. Our approach to detect which software has been used to produce a PDF file is based on coding style: given patterns that are only created by certain PDF producers. We have analyzed the coding style of 900 PDF files produced using 11 PDF producers on 3 different Operating Systems. We have obtained a set of 192 rules which can be used to identify 11 PDF producers. We have tested our detection tool on 508836 PDF files published on scientific preprints servers. Our tool is able to detect certain producers with an accuracy of 100 able to apply our tool to identify how online PDF services work and to spot inconsistency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2021

Can You Accept LaTeX Files from Strangers? Ten Years Later

It is well-known that Microsoft Word/Excel compatible documents or PDF f...
research
03/03/2021

Exploitation and Sanitization of Hidden Data in PDF Files

Organizations publish and share more and more electronic documents like ...
research
07/15/2020

Static analysis of executable files by machine learning methods

The paper describes how to detect malicious executable files based on st...
research
02/15/2022

Crypto-ransomware detection using machine learning models in file-sharing network scenario with encrypted traffic

Ransomware is considered as a significant threat for most enterprises si...
research
08/29/2021

Making Honey Files Sweeter: SentryFS – A Service-Oriented Smart Ransomware Solution

The spread of ransomware continues to cause devastation and is a major c...
research
01/20/2022

NapierOne: A modern mixed file data set alternative to Govdocs1

It was found when reviewing the ransomware detection research literature...
research
09/14/2021

Detecting Layout Templates in Complex Multiregion Files

Spreadsheets are among the most commonly used file formats for data mana...

Please sign up or login with your details

Forgot password? Click here to reset