DeepAI AI Chat
Log In Sign Up

Automatic Yara Rule Generation Using Biclustering

by   Edward Raff, et al.

Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Developing high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams (n ≥ 8) combined with a new biclustering algorithm to construct simple Yara rules more effectively than currently available software. Our method, AutoYara, is fast, allowing for deployment on low-resource equipment for teams that deploy to remote networks. Our results demonstrate that AutoYara can help reduce analyst workload by producing rules with useful true-positive rates while maintaining low false-positive rates, sometimes matching or even outperforming human analysts. In addition, real-world testing by malware analysts indicates AutoYara could reduce analyst time spent constructing Yara rules by 44-86 more advanced malware that current tools can't handle. Code will be made available at .


Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints

The detection of malware is a critical task for the protection of comput...

A Benchmark Comparison of Python Malware Detection Approaches

While attackers often distribute malware to victims via open-source, com...

Echelon: Two-Tier Malware Detection for Raw Executables to Reduce False Alarms

Existing malware detection approaches suffer from a simplistic trade-off...

On deceiving malware classification with section injection

We investigate how to modify executable files to deceive malware classif...

Improving Zero-Day Malware Testing Methodology Using Statistically Significant Time-Lagged Test Samples

Enterprise networks are in constant danger of being breached by cyber-at...

Automatic Generation of Machine Learning Synthetic Data Using ROS

Data labeling is a time intensive process. As such, many data scientists...

Toward an Evidence-based Design for Reactive Security Policies and Mechanisms

As malware, exploits, and cyber-attacks advance over time, so do the mit...