JABBERWOCK: A Tool for WebAssembly Dataset Generation and Its Application to Malicious Website Detection

06/09/2023
by   Chika Komiya, et al.
0

Machine learning is often used for malicious website detection, but an approach incorporating WebAssembly as a feature has not been explored due to a limited number of samples, to the best of our knowledge. In this paper, we propose JABBERWOCK (JAvascript-Based Binary EncodeR by WebAssembly Optimization paCKer), a tool to generate WebAssembly datasets in a pseudo fashion via JavaScript. Loosely speaking, JABBERWOCK automatically gathers JavaScript code in the real world, convert them into WebAssembly, and then outputs vectors of the WebAssembly as samples for malicious website detection. We also conduct experimental evaluations of JABBERWOCK in terms of the processing time for dataset generation, comparison of the generated samples with actual WebAssembly samples gathered from the Internet, and an application for malicious website detection. Regarding the processing time, we show that JABBERWOCK can construct a dataset in 4.5 seconds per sample for any number of samples. Next, comparing 10,000 samples output by JABBERWOCK with 168 gathered WebAssembly samples, we believe that the generated samples by JABBERWOCK are similar to those in the real world. We then show that JABBERWOCK can provide malicious website detection with 99% F1-score because JABBERWOCK makes a gap between benign and malicious samples as the reason for the above high score. We also confirm that JABBERWOCK can be combined with an existing malicious website detection tool to improve F1-scores. JABBERWOCK is publicly available via GitHub (https://github.com/c-chocolate/Jabberwock).

READ FULL TEXT

page 1

page 8

research
09/09/2023

Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic

Machine learning (ML) is promising in accurately detecting malicious flo...
research
06/09/2022

Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms

The COVID-19 pandemic has accelerated research on design of alternative,...
research
04/26/2022

PLOD: An Abbreviation Detection Dataset for Scientific Documents

The detection and extraction of abbreviations from unstructured texts ca...
research
04/05/2023

Feature Engineering Using File Layout for Malware Detection

Malware detection on binary executables provides a high availability to ...
research
05/16/2023

A Review of Data-driven Approaches for Malicious Website Detection

The detection of malicious websites has become a critical issue in cyber...
research
09/07/2023

Detecting unknown HTTP-based malicious communication behavior via generated adversarial flows and hierarchical traffic features

Malicious communication behavior is the network communication behavior g...
research
07/25/2023

The GANfather: Controllable generation of malicious activity to improve defence systems

Machine learning methods to aid defence systems in detecting malicious a...

Please sign up or login with your details

Forgot password? Click here to reset