Exploitation and Sanitization of Hidden Data in PDF Files

03/03/2021
by   Supriya Adhatarao, et al.
0

Organizations publish and share more and more electronic documents like PDF files. Unfortunately, most organizations are unaware that these documents can compromise sensitive information like authors names, details on the information system and architecture. All these information can be exploited easily by attackers to footprint and later attack an organization. In this paper, we analyze hidden data found in the PDF files published by an organization. We gathered a corpus of 39664 PDF files published by 75 security agencies from 47 countries. We have been able to measure the quality and quantity of information exposed in these PDF files. It can be effectively used to find weak links in an organization: employees who are running outdated software. We have also measured the adoption of PDF files sanitization by security agencies. We identified only 7 security agencies which sanitize few of their PDF files before publishing. Unfortunately, we were still able to find sensitive information within 65 weak sanitization techniques: it requires to remove all the hidden sensitive information from the file and not just to remove the data at the surface. Security agencies need to change their sanitization methods.

READ FULL TEXT
research
03/03/2021

Robust PDF Files Forensics Using Coding Style

Identifying how a file has been created is often interesting in security...
research
06/10/2020

A GPU Register File using Static Data Compression

GPUs rely on large register files to unlock thread-level parallelism for...
research
03/09/2020

Forensic Analysis of Residual Information in Adobe PDF Files

In recent years, as electronic files include personal records and busine...
research
02/01/2021

Can You Accept LaTeX Files from Strangers? Ten Years Later

It is well-known that Microsoft Word/Excel compatible documents or PDF f...
research
10/21/2016

Automated Big Text Security Classification

In recent years, traditional cybersecurity safeguards have proven ineffe...
research
06/29/2020

Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems

Modeling a structured, dynamic environment like a video game requires ke...
research
03/12/2019

Termite: A System for Tunneling Through Heterogeneous Data

Data-driven analysis is important in virtually every modern organization...

Please sign up or login with your details

Forgot password? Click here to reset