Automatic Yara Rule Generation Using Biclustering

09/06/2020
by   Edward Raff, et al.
0

Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Developing high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams (n ≥ 8) combined with a new biclustering algorithm to construct simple Yara rules more effectively than currently available software. Our method, AutoYara, is fast, allowing for deployment on low-resource equipment for teams that deploy to remote networks. Our results demonstrate that AutoYara can help reduce analyst workload by producing rules with useful true-positive rates while maintaining low false-positive rates, sometimes matching or even outperforming human analysts. In addition, real-world testing by malware analysts indicates AutoYara could reduce analyst time spent constructing Yara rules by 44-86 more advanced malware that current tools can't handle. Code will be made available at https://github.com/NeuromorphicComputationResearchProgram .

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 10

08/09/2021

Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints

The detection of malware is a critical task for the protection of comput...
01/04/2021

Echelon: Two-Tier Malware Detection for Raw Executables to Reduce False Alarms

Existing malware detection approaches suffer from a simplistic trade-off...
08/02/2016

Improving Zero-Day Malware Testing Methodology Using Statistically Significant Time-Lagged Test Samples

Enterprise networks are in constant danger of being breached by cyber-at...
12/02/2021

Editing a classifier by rewriting its prediction rules

We present a methodology for modifying the behavior of a classifier by d...
01/20/2022

Android Malware Detection using Feature Ranking of Permissions

We investigate the use of Android permissions as the vehicle to allow fo...
06/08/2021

Automatic Generation of Machine Learning Synthetic Data Using ROS

Data labeling is a time intensive process. As such, many data scientists...
10/26/2018

Integrating Transformer and Paraphrase Rules for Sentence Simplification

Sentence simplification aims to reduce the complexity of a sentence whil...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.