Exploitation and Sanitization of Hidden Data in PDF Files

by   Supriya Adhatarao, et al.

Organizations publish and share more and more electronic documents like PDF files. Unfortunately, most organizations are unaware that these documents can compromise sensitive information like authors names, details on the information system and architecture. All these information can be exploited easily by attackers to footprint and later attack an organization. In this paper, we analyze hidden data found in the PDF files published by an organization. We gathered a corpus of 39664 PDF files published by 75 security agencies from 47 countries. We have been able to measure the quality and quantity of information exposed in these PDF files. It can be effectively used to find weak links in an organization: employees who are running outdated software. We have also measured the adoption of PDF files sanitization by security agencies. We identified only 7 security agencies which sanitize few of their PDF files before publishing. Unfortunately, we were still able to find sensitive information within 65 weak sanitization techniques: it requires to remove all the hidden sensitive information from the file and not just to remove the data at the surface. Security agencies need to change their sanitization methods.


Robust PDF Files Forensics Using Coding Style

Identifying how a file has been created is often interesting in security...

A GPU Register File using Static Data Compression

GPUs rely on large register files to unlock thread-level parallelism for...

Forensic Analysis of Residual Information in Adobe PDF Files

In recent years, as electronic files include personal records and busine...

Can You Accept LaTeX Files from Strangers? Ten Years Later

It is well-known that Microsoft Word/Excel compatible documents or PDF f...

Automated Big Text Security Classification

In recent years, traditional cybersecurity safeguards have proven ineffe...

Object Files and Schemata: Factorizing Declarative and Procedural Knowledge in Dynamical Systems

Modeling a structured, dynamic environment like a video game requires ke...

Termite: A System for Tunneling Through Heterogeneous Data

Data-driven analysis is important in virtually every modern organization...

Please sign up or login with your details

Forgot password? Click here to reset