Automatic Yara Rule Generation Using Biclustering

by   Edward Raff, et al.

Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Developing high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams (n ≥ 8) combined with a new biclustering algorithm to construct simple Yara rules more effectively than currently available software. Our method, AutoYara, is fast, allowing for deployment on low-resource equipment for teams that deploy to remote networks. Our results demonstrate that AutoYara can help reduce analyst workload by producing rules with useful true-positive rates while maintaining low false-positive rates, sometimes matching or even outperforming human analysts. In addition, real-world testing by malware analysts indicates AutoYara could reduce analyst time spent constructing Yara rules by 44-86 more advanced malware that current tools can't handle. Code will be made available at .



There are no comments yet.


page 10


Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints

The detection of malware is a critical task for the protection of comput...

Echelon: Two-Tier Malware Detection for Raw Executables to Reduce False Alarms

Existing malware detection approaches suffer from a simplistic trade-off...

Improving Zero-Day Malware Testing Methodology Using Statistically Significant Time-Lagged Test Samples

Enterprise networks are in constant danger of being breached by cyber-at...

Editing a classifier by rewriting its prediction rules

We present a methodology for modifying the behavior of a classifier by d...

Android Malware Detection using Feature Ranking of Permissions

We investigate the use of Android permissions as the vehicle to allow fo...

Automatic Generation of Machine Learning Synthetic Data Using ROS

Data labeling is a time intensive process. As such, many data scientists...

Integrating Transformer and Paraphrase Rules for Sentence Simplification

Sentence simplification aims to reduce the complexity of a sentence whil...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.