Detecting malicious PDF using CNN

07/24/2020
by   Raphael Fettaya, et al.
0

Malicious PDF files represent one of the biggest threats to computer security. To detect them, significant research has been done using handwritten signatures or machine learning based on manual feature extraction. Those approaches are both time-consuming, require significant prior knowledge and the list of features has to be updated with each newly discovered vulnerability. In this work, we propose a novel algorithm that uses an ensemble of Convolutional Neural Network (CNN) on the byte level of the file, without any handcrafted features. We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94 detects new malicious files, still undetected by most antiviruses. Using automatically generated features from our CNN network, and applying a clustering algorithm, we also obtain high similarity between the antiviruses' labels and the resulting clusters.

READ FULL TEXT

page 8

page 10

research
07/15/2020

Static analysis of executable files by machine learning methods

The paper describes how to detect malicious executable files based on st...
research
03/14/2022

Toward the Detection of Polyglot Files

Standardized file formats play a key role in the development and use of ...
research
07/14/2023

The Automation of the Extraction of Evidence masked by Steganographic Techniques in WAV and MP3 Audio Files

Antiforensics techniques and particularly steganography and cryptography...
research
11/03/2017

Decentralised firewall for malware detection

This paper describes the design and development of a decentralized firew...
research
02/01/2021

Can You Accept LaTeX Files from Strangers? Ten Years Later

It is well-known that Microsoft Word/Excel compatible documents or PDF f...
research
06/14/2018

o-glasses: Visualizing x86 Code from Binary Using a 1d-CNN

Malicious document files used in targeted attacks often contain a small ...
research
01/15/2010

Detecting Botnets Through Log Correlation

Botnets, which consist of thousands of compromised machines, can cause s...

Please sign up or login with your details

Forgot password? Click here to reset