The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development

11/16/2018
by   Alejandro Calleja, et al.
0

During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyber attacks and has consolidated as a commodity in the underground economy. In this work, we analyze the evolution of malware from 1975 to date from a software engineering perspective. We analyze the source code of 456 samples from 428 unique families and obtain measures of their size, code quality, and estimates of the development costs (effort, time, and number of people). Our results suggest an exponential increment of nearly one order of magnitude per decade in aspects such as size and estimated effort, with code quality metrics similar to those of benign software.We also study the extent to which code reuse is present in our dataset. We detect a significant number of code clones across malware families and report which features and functionalities are more commonly shared. Overall, our results support claims about the increasing complexity of malware and its production progressively becoming an industry.

READ FULL TEXT

page 3

page 9

page 16

research
05/28/2020

SourceFinder: Finding Malware Source-Code from Publicly Available Repositories

Where can we find malware source code? This question is motivated by a r...
research
07/04/2021

Machine Learning for Malware Evolution Detection

Malware evolves over time and antivirus must adapt to such evolution. He...
research
08/15/2022

On the Adoption and Effects of Source Code Reuse on Defect Proneness and Maintenance Effort

Context. Software reusability mechanisms, like inheritance and delegatio...
research
08/28/2022

Shedding Light on the Targeted Victim Profiles of Malicious Downloaders

Malware affects millions of users worldwide, impacting the daily lives o...
research
04/02/2021

Feature Evolution and Reuse – An Exploratory Study of Eclipse

One of the purported ways to increase productivity and reduce developmen...
research
02/28/2021

Virus-MNIST: A Benchmark Malware Dataset

The short note presents an image classification dataset consisting of 10...
research
11/27/2022

Devils in the Clouds: An Evolutionary Study of Telnet Bot Loaders

One of the innovations brought by Mirai and its derived malware is the a...

Please sign up or login with your details

Forgot password? Click here to reset