MARFCAT: Transitioning to Binary and Larger Data Sets of SATE IV

We present a second iteration of a machine learning approach to static code analysis and fingerprinting for weaknesses related to security, software engineering, and others using the open-source MARF framework and the MARFCAT application based on it for the NIST's SATE IV static analysis tool exposition workshop's data sets that include additional test cases, including new large synthetic cases. To aid detection of weak or vulnerable code, including source or binary on different platforms the machine learning approach proved to be fast and accurate to for such tasks where other tools are either much slower or have much smaller recall of known vulnerabilities. We use signal and NLP processing techniques in our approach to accomplish the identification and classification tasks. MARFCAT's design from the beginning in 2010 made is independent of the language being analyzed, source code, bytecode, or binary. In this follow up work with explore some preliminary results in this area. We evaluated also additional algorithms that were used to process the data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

11/02/2017

BinPro: A Tool for Binary Source Code Provenance

Enforcing open source licenses such as the GNU General Public License (G...
07/28/2020

SoK: All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask

Disassembly of binary code is hard, but necessary for improving the secu...
01/19/2022

Cross-Language Binary-Source Code Matching with Intermediate Representations

Binary-source code matching plays an important role in many security and...
05/07/2021

Detecting Security Fixes in Open-Source Repositories using Static Code Analyzers

The sources of reliable, code-level information about vulnerabilities th...
06/07/2019

Datalog Disassembly

Disassembly is fundamental to binary analysis and rewriting. We present ...
09/20/2021

To Automatically Map Source Code Entities to Architectural Modules with Naive Bayes

Background: The process of mapping a source code entity onto an architec...
11/12/2018

A Fine-Grained Approach for Automated Conversion of JUnit Assertions to English

Converting source or unit test code to English has been shown to improve...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

0.1 Introduction

This is a follow up work on the first incarnation of MARFCAT detailed in [Mok10d, Mok11]. Thus, the majority of the results content here addresses the newer iteration duplicating only the necessary background and methodology information (reduced). The reader is deferred to consult the expanded background information and results in that previous work freely accessible online (and the arXiv version of that is still occasionally updated).

We elaborate on the details of the expanded methodology and the corresponding results of application of the machine learning techniques along with signal and NLP processing to static source and binary code analysis in search for weaknesses and vulnerabilities. We use the tool, named MARFCAT, a MARF-based Code Analysis Tool [Mok13], first exhibited at the Static Analysis Tool Exposition (SATE) workshop in 2010 [ODBN10] to machine-learn from the (Common Vulnerabilities and Exposures) CVE-based vulnerable as well as synthetic CWE-based cases to verify the fixed versions as well as non-CVE based cases from the projects written in same programming languages. The iteration of this work was prepared based on SATE IV [ODBN12] and uses its updated data set and application. On the NLP side, we employ simple classical NLP techniques (-grams and various smoothing algorithms), also combined with machine learning for novel non-NLP applications of detection, classification, and reporting of weaknesses related to vulnerabilities or bad coding practices found in artificial constrained languages, such as programming languages and their compiled counterparts. We compare and contrast the NLP approach to the signal processing approach in our results summary and illustrate concrete results and for the same test cases.

We claim that the presented machine learning approach is novel and highly beneficial in static analysis and routine testing of any kind of code, including source code and binary deployments for its efficiency in terms of speed, relatively high precision, robustness, and being a complimentary tool to other approaches that do in-depth semantic analysis, etc. by prioritizing those tools’ targets. All that can be used in automatic manner in distributed and scalable diverse environments to ensure the code safety, especially the mission critical software code in all kinds of systems. It uses spectral, acoustic and language models to learn and classify such a code.

This document, like its predecessor, is a “rolling draft” with several updates expected to be made as the project progresses beyond SATE IV. It is accompanied with the updates to the open-source MARFCAT tool itself [Mok13].

Organization

The related work, some of the present methodology is based on, is referenced in Section 0.2. The data sets are described in Section 0.3. The methodology summary is in Section 0.4. We present some of the results in Section 0.5 from the SAMATE reference test data set. Then we present a brief summary, description of the limitations of the current realization of the approach and concluding remarks in Section 0.6. In the Appendix there are classification result tables for specific test cases illustrating top results by precision.

0.2 Related Work

To our knowledge this was the first time a machine learning approach was attempted to static code analysis with the first results demonstrated during the SATE2010 workshop [Mok10d, Mok13, ODBN10]. In the same year, a somewhat similar approach independently was presented [BSSV10] for vulnerability classification and prediction using machine learning and SVMs, but working with a different set of data.

Additional related work (to various degree of relevance or use) is further listed (this list is not exhaustive). A taxonomy of Linux kernel vulnerability solutions in terms of patches and source code as well as categories for both are found in [MLB07]. The core ideas and principles behind the MARF’s pipeline and testing methodology for various algorithms in the pipeline adapted to this case are found in [Mok08b, Mok10b] as it was the easiest implementation available to accomplish the task. There also one can find the majority of the core options used to set the configuration for the pipeline in terms of algorithms used. A binary analysis using machine learning approach for quick scans for files of known types in a large collection of files is described in [MD08]. This includes the NLP and machine learning for NLP tasks in DEFT2010 [Mok10c, Mok10a] with the corresponding DEFT2010App and its predecessor for hand-written image processing WriterIdentApp [MSS09]. Tlili’s 2009 PhD thesis covers topics on automatic detection of safety and security vulnerabilities in open source software [Tli09]. Statistical analysis, ranking, approximation, dealing with uncertainty, and specification inference in static code analysis are found in the works of Engler’s team [KTB06, KAYE04, KE03]. Kong et al. further advance static analysis (using parsing, etc.) and specifications to eliminate human specification from the static code analysis in [KZL10]. Spectral techniques are used for pattern scanning in malware detection by Eto et al. in [ESI09]. Some researchers propose a general data mining system for incident analysis with data mining engines in [IYE09]. Hanna et al. describe a synergy between static and dynamic analysis for the detection of software security vulnerabilities in [HLYD09] paving the way to unify the two analysis methods. Other researchers propose a MEDUSA system for metamorphic malware dynamic analysis using API signatures in [NJG10]. Some of the statistical NLP techniques we used, are described at length in [MS02]. BitBlaze (and its web counterpart, WebBlaze) are other recent types of tools that to static and dynamic binary code analysis for vulnerabilities fast, developed at Berkeley [Son10a, Son10b]. For wavelets, for example, Li et al. [LjXP09] have shown wavelet transforms and -means classification can be used to identify communicating applications on a network fast and is relevant to our study of the code in any form, text or binary.

0.3 Data Sets

We use the SAMATE data set to practically validate our approach. The SAMATE reference data set contains C/C++, Java, and PHP language tracks comprising CVE-selected cases as well as stand-alone cases and the large generated synthetic C and Java test cases (CWE-based, with a lot of variants of different known weaknesses). SATE IV expanded some cases from SATE2010 by increasing the version number, and dropped some other cases (e.g., Chrome).

The C/C++ and Java test cases of various client and server OSS software are compilable into the binary and object code, while the synthetic C and Java cases generated for various CWE entries provided for greater scalability testing (also compilable). The CVE-selected cases had a vulnerable version of a software in question with a list of CVEs attached to it, as well as the most known fixed version within the minor revision number. One of the goals for the CVE-based cases is to detect the known weaknesses outlined in CVEs using static code analysis and also to verify if they were really fixed in the “fixed version” [ODBN12]. The cases with known CVEs and CWEs were used as the training models described in the methodology. The summary below is a union of the data sets from SATE2010 and SATE IV. The preliminary list of the CVEs that the organizers expect to locate in the test cases were collected from the NVD [NIS13a, ODBN12] for Wireshark 1.2.0, Dovecot, Tomcat 5.5.13, Jetty 6.1.16, and Wordpress 2.0. The specific test cases with versions and language at the time included CVE-selected:

  • C: Wireshark 1.2.0 (vulnerable) and Wireshark 1.2.18 (fixed, up from Wireshark 1.2.9 in SATE2010)

  • C: Dovecot (vulnerable) and Dovecot (fixed)

  • C++: Chrome 5.0.375.54 (vulnerable) and Chrome 5.0.375.70 (fixed)

  • Java: Tomcat 5.5.13 (vulnerable) and Tomcat 5.5.33 (fixed, up from Tomcat 5.5.29 in SATE2010)

  • Java: Jetty 6.1.16 (vulnerable) and Jetty 6.1.26 (fixed)

  • PHP: Wordpress 2.0 (vulnerable) and Wordpress 2.2.3 (fixed)

originally non-CVE selected in SATE2010:

  • C: Dovecot

  • Java: Pebble 2.5-M2

Synthetic CWE cases produced by the SAMATE team:

  • C: Synthetic C covering 118 CWEs and K files

  • Java: Synthetic Java covering CWEs and K files

0.4 Methodology

In this section we outline the methodology of our approach to static source code analysis. Most of this methodology is an updated description from [Mok10d]. The line number determination methodology is also detailed in [Mok10d, ODBN10], but is not replicated here. Thus, the methodology’s principles overview is described in Section 0.4.1, the knowledge base construction is in Section 0.4.2, machine learning categories in Section 0.4.3, and the high-level algorithmic description is in Section 0.4.4.

0.4.1 Methodology Overview

The core methodology principles include:

  • Machine learning and dynamic programming

  • Spectral and signal processing techniques

  • NLP -gram and smoothing techniques (add-, Witten-Bell, MLE, etc.)

We use signal processing techniques, i.e. presently we do not parse or otherwise work at the syntax and semantics levels. We treat the source code as a “signal”, equivalent to binary, where each -gram ( presently, i.e. two consecutive characters or, more generally, bytes) are used to construct a sample amplitude value in the signal. In the NLP pipeline, we similarly treat the source code as a “characters”, where each -gram () is used to construct the language model.

We show the system the examples of files with weaknesses and MARFCAT learns them by computing spectral signatures using signal processing techniques or various language models (based on options) from CVE-selected test cases. When some of the mentioned techniques are applied (e.g., filters, silence/noise removal, other preprocessing and feature extraction techniques), the line number information is lost as a part of this process.

When we test, we compute either how similar or distant each file is from the known trained-on weakness-laden files or compare trained language models with the unseen language fragments in the NLP pipeline. In part, the methodology can approximately be seen as some fuzzy signature-based “antivirus” or IDS software systems detect bad signature, except that with a large number of machine learning and signal processing algorithms and fuzzy matching, we test to find out which combination gives the highest precision and best run-time.

At the present, however, we are looking at the whole files instead of parsing the finer-grain details of patches and weak code fragments. This aspect lowers the precision, but is relatively fast to scan all the code files.

0.4.2 CVEs and CWEs – the Knowledge Base

The CVE-selected test cases serve as a source of the knowledge base to gather information of how known weak code “looks like” in the signal form [Mok10d], which we store as spectral signatures clustered per CVE or CWE (Common Weakness Enumeration). The introduction by the SAMATE team of a large synthetic code base with CWEs, serves as a part of knowledge base learning as well. Thus, we:

  • Teach the system from the CVE-based cases

  • Test on the CVE-based cases

  • Test on the non-CVE-based cases

For synthetic cases we do similarly:

  • Teach the system from the CWE-based synthetic cases

  • Test on the CWE-based synthetic cases

  • Test on the CVE and non-CVE-based cases for CWEs from synthetic cases

We create index files in XML in the format similar to that of SATE to index all the file of the test case under study. The CVE-based cases after the initial index generation are manually annotated from the NVD database before being fed to the system. The script that does the initial index gathering in the OSS distribution of MARFCAT is called collect-files-meta.pl written in Perl. The synthetic cases required a special modification to that resulting in collect-files-meta-synthetic.pl where there are no CVEs to fill in but CWEs alone, with the auto-prefilled explanations since the information in the synthetic cases is not arbitrary and controlled for identification.

0.4.3 Categories for Machine Learning

The tow primary groups of classes we train and test on include are naturally the CVEs [NIS13a, NIS13b] and CWEs [VM13]. The advantages of CVEs is the precision and the associated meta knowledge from [NIS13a, NIS13b] can be all aggregated and used to scan successive versions of the the same software or derived products (e.g., WebKit in multiple browsers). CVEs are also generally uniquely mapped to CWEs. The CWEs as a primary class, however, offer broader categories, of kinds of weaknesses there may be, but are not yet well assigned and associated with CVEs, so we observe the loss of precision. Since we do not parse, we generally cannot deduce weakness types or even simple-looking aspects like line numbers where the weak code may be. So we resort to the secondary categories, that are usually tied into the first two, which we also machine-learn along, such as issue types (sink, path, fix) and line numbers.

0.4.4 Algorithms

In our methodology we systematically test and select the best (a tradeoff between speed and accuracy) combination(s) of the algorithm implementations available to us and then use only those for subsequent testing. This methodology is augmented with the cases when the knowledge base for the same code type is learned from multiple sources (e.g., several independent C test cases).

Signal Pipeline

Algorithmically-speaking, the steps that are performed in the machine-learning signal based analysis are in Figure 1. The specific algorithms come from the classical literature and other sources and are detailed in [Mok08b]

and the related works. To be more specific for this work, the loading typically refers to the interpretation of the files being scanned in terms of bytes forming amplitude values in a signal (as an example, 8kHz or 16kHz frequency) using either uni-gram, bi-gram, or tri-gram approach. Then, the preprocessing allows to be none at all (“raw”, or the fastest), normalization, traditional frequency domain filters, wavelet-based filters, etc. Feature extraction involves reducing an arbitrary length signal to a fixed length feature vector of what thought to be the most relevant features are in the signal (e.g., spectral features in FFT, LPC), min-max amplitudes, etc. Classification stage is then separated either to train by learning the incoming feature vectors (usually

-means clusters, median clusters, or plain feature vector collection, combined with e.g., neural network training) or testing them against the previously learned models.

// Construct an index mapping CVEs to files and locations within files
1 Compile meta-XML index files from the CVE reports (line numbers, CVE, CWE, fragment size, etc.). Partly done by a Perl script and partly annotated manually;
2 foreach source code base, binary code base do
       // Presently in these experiments we use simple mean clusters of feature vectors or unigram language models per default MARF specification ([Mok08b])
3       Train the system based on the meta index files to build the knowledge base (learn);
4       begin
5             Load (interpret as a wave signal or );
6             Preprocess (none, FFT-filters, wavelets, normalization, etc.);
7             Extract features (FFT, LPC, min-max, etc.);
8             Train (Similarity, Distance, Neural Network, etc.);
9            
10       end
11      
12      Test on the training data for the same case (e.g., Tomcat 5.5.13 on Tomcat 5.5.13) with the same annotations to make sure the results make sense by being high and deduce the best algorithm combinations for the task;
13       begin
14             Load (same);
15             Preprocess (same);
16             Extract features (same);
17             Classify (compare to the trained -means, or medians, or language models);
18             Report;
19            
20       end
21      
22      Similarly test on the testing data for the same case (e.g., Tomcat 5.5.13 on Tomcat 5.5.13) without the annotations as a sanity check;
23       Test on the testing data for the fixed case of the same software (e.g., Tomcat 5.5.13 on Tomcat 5.5.33);
24       Test on the testing data for the general non-CVE case (e.g., Tomcat 5.5.13 on Pebble or synthetic);
25      
26 end foreach
Algorithm 1 Machine-learning-based static code analysis testing algorithm using the signal pipeline

NLP Pipeline

The steps that are performed in NLP and the machine-learning based analysis are presented in Figure 2. The specific algorithms again come from the classical literature (e.g., [MS02]) and are detailed in [Mok10b] and the related works. To be more specific for this work, the loading typically refers to the interpretation of the files being scanned in terms of -grams: uni-gram, bi-gram, or tri-gram approach and the associated statistical smoothing algorithms, the results of which (a vector, 2D or 3D matrix) are stored.

1 Compile meta-XML index files from the CVE reports (line numbers, CVE, CWE, fragment size, etc.). Partly done by a Perl script and partly annotated manually;
2 foreach source code base, binary code base do
       // Presently in these experiments we use simple unigram language models per default MARF specification ([Mok10b])
3       Train the system based on the meta index files to build the knowledge base (learn);
4       begin
5             Load (-gram);
6         

    Train (statistical smoothing estimators);

7            
8       end
9      
10      Test on the training data for the same case (e.g., Tomcat 5.5.13 on Tomcat 5.5.13) with the same annotations to make sure the results make sense by being high and deduce the best algorithm combinations for the task;
11       begin
12             Load (same);
13             Classify (compare to the trained language models);
14             Report;
15            
16       end
17      
18      Similarly test on the testing data for the same case (e.g., Tomcat 5.5.13 on Tomcat 5.5.13) without the annotations as a sanity check;
19       Test on the testing data for the fixed case of the same software (e.g., Tomcat 5.5.13 on Tomcat 5.5.33);
20       Test on the testing data for the general non-CVE case (e.g., Tomcat 5.5.13 on Pebble or synthetic);
21      
22 end foreach
Algorithm 2 Machine-learning-based static code analysis testing algorithm using the NLP pipeline

0.4.5 Binary and Bytecode Analysis

In this iteration we also perform preliminary Java bytecode and compiled C code static analysis and produce results using the same signal processing, NLP, combined with machine learning and data mining techniques. At this writing, the NIST SAMATE synthetic reference data set for Java and C was used. The algorithms presented in Section 0.4.4 are used as-is in this scenario with the modifications to the index files. The modifications include removal of the line numbers, source code fragments, and lines-of-text counts (which are largely meaningless and ignored. The byte counts may be recomputed and capturing a byte offset instead of a line number was projected. The filenames of the index files were updated to include -bin

in them to differentiate from the original index files describing the source code. Another point is at the moment the simplifying assumption is that each compilable source file e.g.,

.java or .c produce the corresponding .class and .o files that we examine. We do not examine inner classes or linked executables or libraries at this point.

0.4.6 Wavelets

As a part of a collaboration project with Dr. Yankui Sun from Tsinghua University, wavelet-based signal processing for the purposes of noise filtering is being introduced with this work to compare it to no-filtering, or FFT-based classical filtering. It’s been also shown in [LjXP09] that wavelet-aided filtering could be used as a fast preprocessing method for a network application identification and traffic analysis [LKW08].

We rely in part on the the algorithm and methodology found in [AS01, SCL03, KBC05, KBC06], and at this point only a separating 1D discrete wavelet transform (SDWT) has been tested (see Section 0.5.4).

Since the original wavelet implementation [SCL03] is in MATLAB [Mat12a], [Sch07], we used in part the codegen tool from the MATLAB Coder toolbox [Mat12b], [Mat12c] to generate a rough C/C++ equivalent in order to (manually) translate some fragments into Java (the language of MARF and MARFCAT). The specific function for up/down sampling used by the wavelets function in [Mot09] written also C/C++ was translated to Java in MARF as well with unit tests added.

0.4.7 Demand-Driven Distributed Evaluation with GIPSY

To enhance the scalability of the approach, we convert the MARFCAT stand-alone application to a distributed one using an eductive model of computation (demand-driven) implemented in the General Intensional Programming System (GIPSY)’s multi-tier run-time system [Han10, Ji11, Vas05, Paq09], which can be executed distributively using Jini (Apache River), or JMS [JMP13].

To adapt the application to the GIPSY’s multi-tier architecture, we create a problem-specific generator and worker tiers (PS-DGT and PS-DWT respectively) for the MARFCAT application. The generator(s) produce demands of what needs to be computed in the form of a file (source code file or a compiled binary) to be evaluated and deposit such demands into a store managed by the demand store tier (DST) as pending. Workers pickup pending demands from the store, and them process then (all tiers run on multiple nodes) using a traditional MARFCAT instance. Once the result (a Warning instance) is computed, the PS-DWT deposit it back into the store with the status set to computed. The generator “harvests” all computed results (warnings) and produces the final report for a test cases. Multiple test cases can be evaluated simultaneously or a single case can be evaluated distributively. This approach helps to cope with large amounts of data and avoid recomputing warnings that have already been computed and cached in the DST.

The initial basic experiment assumes the PS-DWTs have the training sets data and the test cases available to them from the start (either by a copy or via an NFS/CIFS-mounted volumes); thus, the distributed evaluation only concerns with the classification task only as of this version. The follow up work will remove this limitation.

In this setup a demand represents a file (a path) to scan (actually an instance of the FileItem object), which is deposited into the DST. The PS-DWT picks up that and checks the file per training set that’s already there and returns a ResultSet object back into the DST under the same demand signature that was used to deposit the path to scan. The result set is sorted from the most likely to the list likely with a value corresponding to the distance or similarity. The PS-DGT picks up the result sets and does the final output aggregation and saves report in one of the desired report formats (see Section 0.4.8 picking up the top two results from the result set and testing against a threshold to accept or reject the file (path) as vulnerable or not. This effectively splits the monolithic MARFCAT application in two halves in distributing the work to do where the classification half is arbitrary parallel.

Simplifying assumptions:

  • Test case data and training sets are present on each node (physical or virtual) in advance (via a copy or a CIFS or NFS volume), so no demand driven training occurs, only classification

  • The demand assumes to contain only file information to be examined (FileItem)

  • PS-DWT assumes a single pre-defined configuration, i.e. configuration for MARFCAT’s option is not a part of the demand

  • PS-DWT assume CVE or CWE testing based on its local settings and not via the configuration in a demand

0.4.8 Export

Sate

By default MARFCAT produces the report data in the SATE XML format, according to the SATE IV requirements. In this iteration other formats are being considered and realized. To enable multiple format output, the MARFCAT report generation data structures were adapted case-based output.

Forensic Lucid

The first one, is Forensic Lucid, the author Mokhov’s PhD topic, a language to specify and evaluate digital forensic cases by uniformly encoding the evidence and witness accounts (evidential statement or knowledge base) of any case from multiple sources (system specs, logs, human accounts, etc.) as a description of an incident to further perform investigation and event reconstruction. Following the data export in Forensic Lucid in the preceding work [MPD08, MPD10, Mok08a] we use it asa format for evidential processing of the results produced by MARFCAT. The work [MPD08] provides details of the language; it will suffice to mention here that the report generated by MARFCAT in Forensic Lucid is a collection of warnings as observations with the hierarchical notion of nested context of warning and location information. These will form an evidential statement in Forensic Lucid. The example scenario where such evidence compiled via a MARFCAT Forensic Lucid report would be in web-based applications and web browser-based incident investigations of fraud, XSS, buffer overflows, etc. linking CVE/CWE-based evidence analysis of the code (binary or source) security bugs with the associated web-based malware propagation or attacks to provide possible events where specific attacks can be traced back to the specific security vulnerabilities.

Safes

The third format, for which the export functionality is not done as of this writing, SAFES, is the 3rd format for output of the MARFCAT. SAFES is becoming a standard to reporting such information and the SATE organizers began endorsing it as an alternative during SATE IV.

0.4.9 Experiments

The below is the current summary of the conducted experiments:

  • Re-testing of the newer fixed versions such as Wireshark 1.2.18 and Tomcat 5.5.33.

  • Half-based testing of the previous versions by reducing the training set by half and but testing for all known CVEs or CWEs for Wireshark 1.2.18, Tomcat 5.5.33, and Chrome 5.0.375.54.

  • Testing the new test cases of Dovecot, Jetty 6.1.x, and Wordpress 2.x as well as Synthetic C and Synthetic Java.

  • Binary test on the Synthetic C and Synthetic Java test cases.

  • Performing tests using wavelets for preprocessing.

0.5 Results

The preliminary results of application of our methodology are outlined in this section. We summarize the top precisions per test case using either signal-processing or NLP-processing of the CVE-based and synthetic cases and their application to the general cases. Subsequent sections detail some of the findings and issues of MARFCAT’s result releases with different versions. Some experiments we compare the results with the previously obtained ones [Mok10d] where compatible and appropriate.

The results currently are being gradually released in the iterative manner that were obtained through the corresponding versions of MARFCAT as it was being designed and developed.

0.5.1 Preliminary Results Summary

The results summarize the half-training-full-testing data vs. that of regular ones reported in [Mok10d].

  • Wireshark:

    • CVEs (signal): 92.68%, CWEs (signal): 86.11%,

    • CVEs (NLP): 83.33%, CWEs (NLP): 58.33%

  • Tomcat:

    • CVEs (signal): 83.72%, CWEs (signal): 81.82%,

    • CVEs (NLP): 87.88%, CWEs (NLP): 39.39%

  • Chrome:

    • CVEs (signal): 90.91%, CWEs (signal): 100.00%,

    • CVEs (NLP): 100.00%, CWEs (NLP): 88.89%

  • Dovecot (new, 2.x):

    • 14 warnings; but it appears all quality or false positive

    • (very hard to follow the code, severely undocumented)

  • Pebble:

    • none found during quick testing

  • Dovecot 1.2.x: (ongoing of this writing)

  • Jetty: (ongoing of this writing)

  • Wordpress: (ongoing of this writing)

What follows are some select statistical measurements of the precision in recognizing CVEs and CWEs under different configurations using the signal processing and NLP processing techniques.

“Second guess” statistics provided to see if the hypothesis that if our first estimate of a CVE/CWE is incorrect, the next one in line is probably the correct one. Both are counted if the first guess is correct.

A sample signal visiusalization in the middle of a vulnerable file packet-afs.c in Wireshark 1.2.0 to CVE-2009-2562 is in Figure 1 in the wave form. The low “dips” represent the text line endings (coupled with a preceding character (bytes) in bigrams (two PCM-signed bytes assumed encoded in 8kHz representing the amplitude; normalized), which are often either semicolons, closing or opening braces, brackets or parentheses). Only a small fragment is shown of roughly 300 bytes in length to be visually comprehensive of a nature of a signal we are dealing with.

In Figure 2, there are 3 spectrograms generated for the same file packet-afs.c. The first two columns represent the CVE-2009-2562-vulnerable file, both versions are the same with ehanced contrast to see the detail. The subsequent pairs are of the same file in Wireshark 1.2.9 and Wireshark 1.2.18, where CVE-2009-2562 is no longer present. Small changes are noticeable primarily in the bottom left and top right corners of the images, and even smaller elsewhere in the images.

Figure 1: A wave graph of a fraction of the CVE-2009-2562-vulnerable packet-afs.c in Wireshark 1.2.0

Figure 2: Spectrograms of CVE-2009-2562-vulnerable packet-afs.c in Wireshark 1.2.0, fixed Wireshark 1.2.9 and Wireshark 1.2.18

0.5.2 Version SATE-IV.1

Half-Training Data For Training and Full For Testing

This is one of the experiment per discussion with Aurelien Delaitre and SATE organizers. The main idea is to test robustness and precision of the MARFCAT approach by artificially reducing known weaknesses (their locations) to learn from by 50%, but test on the whole 100% to see how much does precision degrade with such a reduction. Specifically, we supply only CWE classes testing for this experiment (CVE classes make little sense). Only the first 50% of the entries entries were used for training for Wireshark 1.2.0, Tomcat 5.5.13, and Chrome 5.0.375.54, while the full 100% were used to test the precision changes. The below are the results.

It should be noted that CWE classification is generally less accurate due to a lot of dissimilar items “stuffed” (by NVD) into very broad categories such as NVD-CWE-Other and NVD-CWE-noinfo when the data were collected. Additionally, since we arbitrarily picked the first 50% of the training data, some of the CWEs simply were left out completely and not trained on if they were entirely in the omitted half, so their individual precision is obviously 0% when tested for.

The archive contains the .log and the .xml files (the latter for now are in SATE format only with the scientific notation +E3 removed). The best reports are:

The experiments are subdivided into regular (signal) and NLP based testing.

Signal
  • Wireshark 1.2.0:

    Reduction of the training data by half resulted in precision drop compared to the previous result (best 86.11% see the NIST report [Mok11], vs. 72.22% overall).

    New results (by algorithms, then by CWEs):

    guess run algorithms good bad %
    1st 1 -cweid -nopreprep -raw -fft -cheb 26 10 72.22
    1st 2 -cweid -nopreprep -raw -fft -diff 26 10 72.22
    1st 3 -cweid -nopreprep -raw -fft -eucl 22 14 61.11
    1st 4 -cweid -nopreprep -raw -fft -cos 25 23 52.08
    1st 5 -cweid -nopreprep -raw -fft -mink 17 19 47.22
    1st 6 -cweid -nopreprep -raw -fft -hamming 17 19 47.22
    2nd 1 -cweid -nopreprep -raw -fft -cheb 30 6 83.33
    2nd 2 -cweid -nopreprep -raw -fft -diff 30 6 83.33
    2nd 3 -cweid -nopreprep -raw -fft -eucl 24 12 66.67
    2nd 4 -cweid -nopreprep -raw -fft -cos 32 16 66.67
    2nd 5 -cweid -nopreprep -raw -fft -mink 23 13 63.89
    2nd 6 -cweid -nopreprep -raw -fft -hamming 24 12 66.67
    guess run class good bad %
    1st 1 NVD-CWE-noinfo 68 39 63.55
    1st 2 CWE-20 38 22 63.33
    1st 3 CWE-119 18 14 56.25
    1st 4 NVD-CWE-Other 9 8 52.94
    1st 5 CWE-189 0 12 0.00
    2nd 1 NVD-CWE-noinfo 84 23 78.50
    2nd 2 CWE-20 39 21 65.00
    2nd 3 CWE-119 29 3 90.62
    2nd 4 NVD-CWE-Other 11 6 64.71
    2nd 5 CWE-189 0 12 0.00
  • Tomcat 5.5.13:

    Drop from 81.82% (see NIST report’s Table 7, p. 70) to 75% top result as a result (about 7 points) of training data reduction by 50%.

    New precision estimates:

    guess run algorithms good bad %
    1st 1 -cweid -nopreprep -raw -fft -diff 6 2 75.00
    1st 2 -cweid -nopreprep -raw -fft -hamming 5 9 35.71
    2nd 1 -cweid -nopreprep -raw -fft -diff 6 2 75.00
    2nd 2 -cweid -nopreprep -raw -fft -hamming 8 6 57.14
    guess run class good bad %
    1st 1 CWE-264 1 0 100.00
    1st 2 CWE-255 2 0 100.00
    1st 3 CWE-200 1 0 100.00
    1st 4 CWE-22 6 3 66.67
    1st 5 CWE-79 1 4 20.00
    1st 6 CWE-119 0 2 0.00
    1st 7 CWE-20 0 2 0.00
    2nd 1 CWE-264 1 0 100.00
    2nd 2 CWE-255 2 0 100.00
    2nd 3 CWE-200 1 0 100.00
    2nd 4 CWE-22 7 2 77.78
    2nd 5 CWE-79 3 2 60.00
    2nd 6 CWE-119 0 2 0.00
    2nd 7 CWE-20 0 2 0.00
  • Chrome 5.0.375.54:

    Chrome result is for completeness even though it is not a test case for SATE IV. Chrome is poor for some reason—drop from 100% (Table 5, p. 68) to 44.44%, but it’s only 9 entries. The first result in the table is erroenous, i.e., with a poor recall (the sum of , should be total 9).

    guess run algorithms good bad %
    1st 1 -cweid -nopreprep -raw -fft -cos 2 0 100.00
    1st 2 -cweid -nopreprep -raw -fft -eucl 4 5 44.44
    1st 3 -cweid -nopreprep -raw -fft -cheb 3 6 33.33
    1st 4 -cweid -nopreprep -raw -fft -hamming 3 6 33.33
    1st 5 -cweid -nopreprep -raw -fft -mink 2 7 22.22
    2nd 1 -cweid -nopreprep -raw -fft -cos 2 0 100.00
    2nd 2 -cweid -nopreprep -raw -fft -eucl 4 5 44.44
    2nd 3 -cweid -nopreprep -raw -fft -cheb 4 5 44.44
    2nd 4 -cweid -nopreprep -raw -fft -hamming 4 5 44.44
    2nd 5 -cweid -nopreprep -raw -fft -mink 3 6 33.33
    guess run class good bad %
    1st 1 CWE-94 6 3 66.67
    1st 2 CWE-20 3 2 60.00
    1st 3 CWE-79 2 2 50.00
    1st 4 NVD-CWE-noinfo 2 2 50.00
    1st 5 NVD-CWE-Other 1 7 12.50
    1st 6 CWE-399 0 4 0.00
    1st 7 CWE-119 0 4 0.00
    2nd 1 CWE-94 6 3 66.67
    2nd 2 CWE-20 3 2 60.00
    2nd 3 CWE-79 3 1 75.00
    2nd 4 NVD-CWE-noinfo 3 1 75.00
    2nd 5 NVD-CWE-Other 2 6 25.00
    2nd 6 CWE-399 0 4 0.00
    2nd 7 CWE-119 0 4 0.00
Nlp

Generally this genre of classification was poor as before in this experiment, all around 40-45% percent precision.

  • Wireshark 1.2.0:

    New results (by algorithms, then by CWEs):

    guess run algorithms good bad %
    1st 1 -cweid -nopreprep -char -unigram -add-delta 15 21 41.67
    2nd 1 -cweid -nopreprep -char -unigram -add-delta 23 13 63.89
    guess run class good bad %
    1st 1 NVD-CWE-noinfo 11 7 61.11
    1st 2 NVD-CWE-Other 1 1 50.00
    1st 3 CWE-119 2 3 40.00
    1st 4 CWE-20 1 9 10.00
    1st 5 CWE-189 0 1 0.00
    2nd 1 NVD-CWE-noinfo 17 1 94.44
    2nd 2 NVD-CWE-Other 1 1 50.00
    2nd 3 CWE-119 4 1 80.00
    2nd 4 CWE-20 1 9 10.00
    2nd 5 CWE-189 0 1 0.00
  • Tomcat 5.5.13:

    Intriguingly, the best result is higher than with all of the date in the past report (42.42% below vs. previous 39.39%).

    guess run algorithms good bad %
    1st 1 -cweid -nopreprep -char -unigram -add-delta 14 19 42.42
    2nd 1 -cweid -nopreprep -char -unigram -add-delta 18 15 54.55
    guess run class good bad %
    1st 1 CWE-255 1 0 100.00
    1st 2 CWE-264 2 0 100.00
    1st 3 CWE-119 1 0 100.00
    1st 4 CWE-20 1 0 100.00
    1st 5 CWE-22 7 9 43.75
    1st 6 CWE-200 1 3 25.00
    1st 7 CWE-79 1 6 14.29
    1st 8 CWE-16 0 1 0.00
    2nd 1 CWE-255 1 0 100.00
    2nd 2 CWE-264 2 0 100.00
    2nd 3 CWE-119 1 0 100.00
    2nd 4 CWE-20 1 0 100.00
    2nd 5 CWE-22 11 5 68.75
    2nd 6 CWE-200 1 3 25.00
    2nd 7 CWE-79 1 6 14.29
    2nd 8 CWE-16 0 1 0.00
  • Chrome 5.0.375.54:

    Here drop is twice as much ( vs. 88%).

    guess run algorithms good bad %
    1st 1 -cweid -nopreprep -char -unigram -add-delta 4 5 44.44
    2nd 1 -cweid -nopreprep -char -unigram -add-delta 5 4 55.56
    guess run class good bad %
    1st 1 NVD-CWE-noinfo 1 0 100.00
    1st 2 CWE-79 1 0 100.00
    1st 3 CWE-20 1 0 100.00
    1st 4 CWE-94 1 1 50.00
    1st 5 CWE-399 0 1 0.00
    1st 6 NVD-CWE-Other 0 2 0.00
    1st 7 CWE-119 0 1 0.00
    2nd 1 NVD-CWE-noinfo 1 0 100.00
    2nd 2 CWE-79 1 0 100.00
    2nd 3 CWE-20 1 0 100.00
    2nd 4 CWE-94 1 1 50.00
    2nd 5 CWE-399 0 1 0.00
    2nd 6 NVD-CWE-Other 0 2 0.00
    2nd 7 CWE-119 1 0 100.00

0.5.3 Version SATE-IV.2

These runs represent using the same SATE2010 training data for Tomcat 5.5.13 and Wireshark 1.2.0 to test the updated fixed versions (as from SATE2010) to Tomcat 5.5.33 and Wireshark 1.2.18 using the same settings. At this run, no new CVEs that may have happened from the previous fixed versions of Tomcat 5.5.29 and Wireshark 1.2.9 respectively in 2010 were added to the training data for the versions being tested in this experiment as to see if any old issues reoccur or not. In this short summary, both signal and NLP testing reveal no same known issues found.

Verbose log files and input index files are also supplied for the most cases.

[TODO]

0.5.4 Version SATE-IV.5

Wavelet Experiments

The preliminary experiments using the separating discreet wavelet transform (DWT) filter are summarized in Table 2 and Table 3 for CVEs and CWEs respectively. For comparison, the low-pass FFT filter is used for the same as shown in Table 1 and Table 4 respectively. For the CVE experiments, the wavelet transforms overall produces better precision across configurations (larger number of configurations produce higher precision result) than those with the low-pass FFT filter. While the top precision result remains the same, it is shown than when filtering is wanted, the wavelet transform is perhaps a better choice for some configurations, e.g., from 4 and below as well as for the Second Guess statistics. The very top result for the CWE based processing so far exceeds the overall precision of separating DWT vs. low-pass FFT, which then drops below for the subsequent configurations. -cos was dropped from Table 3 for technical reasons. In Figure 3 is a spectrogram with the SDWT preprocessing in the pipeline. More exploration in this area is under way for more advanced wavelet filters than the simple separating DWT filter as to see whether they would outperform -raw or not and at the same time minimizing the run-time performance decrease with the extra filtering.

Figure 3: A spectrogram of CVE-2009-2562-vulnerable packet-afs.c in Wireshark 1.2.0, after SDWT

0.6 Conclusion

We review the current results of this experimental work, its current shortcomings, advantages, and practical implications.

0.6.1 Shortcomings

Following is a list of the most prominent issues with the presented approach. Some of them are more “permanent”, while others are solvable and intended to be addressed in the future work. Specifically:

  • Looking at a signal is less intuitive visually for code analysis by humans. (However, can produce a problematic “spectrogram” in some cases).

  • Line numbers are a problem (easily “filtered out” as high-frequency “noise”, etc.). A whole “relativistic” and machine learning methodology developed for the line numbers in [Mok10d] to compensate for that. Generally, when CVEs are the primary class, by accurately identifying the CVE number one can get all the other pertinent details from the CVE database, including patches and line numbers making this a lesser issue.

  • Accuracy depends on the quality of the knowledge base (see Section 0.4.2) collected. Some of this collection and annotation is manual to get the indexes right, and, hence, error prone. “Garbage in – garbage out.”

  • To detect more of the useful CVE or CWE signatures in non-CVE and non-CWE cases requires large knowledge bases (human-intensive to collect), which can perhaps be shared by different vendors via a common format, such as SATE, SAFES or Forensic Lucid.

  • No path tracing (since no parsing is present); no slicing, semantic annotations, context, locality of reference, etc. The “sink”, “path”, and “fix” results in the reports also have to be machine-learned.

  • A lot of algorithms and their combinations to try (currently permutations) to get the best top N. This is, however, also an advantage of the approach as the underlying framework can quickly allow for such testing.

  • File-level training vs. fragment-level training – presently the classes are trained based on the entire file where weaknesses are found instead of the known file fragments from CVE-reported patches. The latter would be more fine-grained and precise than whole-file classification, but slower. However, overall the file-level processing is a man-hour limitation than a technological one.

  • Separating wavelet filter performance is rather adversely affects the precision to low levels.

  • No nice GUI. Presently the application is script/command-line based.

0.6.2 Advantages

There are some key advantages of the approach presented. Some of them follow:

  • Relatively fast (e.g., Wireshark’s files train and test in about 3 minutes) on a now-commodity desktop or a laptop.

  • Language-independent (no parsing) – given enough examples can apply to any language, i.e. methodology is the same no matter C, C++, Java or any other source or binary languages (PHP, C#, VB, Perl, bytecode, assembly, etc.) are used.

  • Can automatically learn a large knowledge base to test on known and unknown cases.

  • Can be used to quickly pre-scan projects for further analysis by humans or other tools that do in-depth semantic analysis as a means to prioritize.

  • Can learn from SATE’08, SATE’09, SATE’10, and SATE IV reports.

  • Generally, high precision (and recall) in CVE and CWE detection, even at the file level.

  • A lot of algorithms and their combinations to select the best for a particular task or class (see Section 0.4.3).

  • Can cope with altered code or code used in other projects (e.g., a lot of problems in Chrome were found it WebKit, used by several browsers).

0.6.3 Practical Implications

Most practical implications of all static code analyzers are obvious—to detect source code weaknesses and report them appropriately to the developers. We outline additional implications this approach brings to the arsenal below:

  • The approach can be used on any target language without modifications to the methodology or knowing the syntax of the language. Thus, it scales to any popular and new language analysis with a very small amount of effort.

  • The approach can nearly identically be transposed onto the compiled binaries and bytecode, detecting vulnerable deployments and installations—akin to virus scanning of binaries, but instead scanning for infected binaries, one would scan for security-weak binaries on site deployments to alert system administrators to upgrade their packages.

  • Can learn from binary signatures from other tools like Snort [Sou13].

  • The approach is easily extendable to the embedded code and mission-critical code found in aircraft, spacecraft, and various autonomous systems.

0.6.4 Future Work

There is a great number of possibilities in the future work. This includes improvements to the code base of MARFCAT as well as resolving unfinished scenarios and results, addressing shortcomings in Section 0.6.1, testing more algorithms and combinations from the related work, and moving onto other programming languages (e.g., ASP, C#). Furthermore, foster collaboration with the academic, industry (such as VeraCode, Coverity) and government vendors and others who have vast data sets to test the full potential of the approach with the others and a community as a whole. Then, move on to dynamic code analysis as well applying similar techniques there. Other near-future work items include realization of the SVM-based classification, data export in SAFES and Forensic Lucid formats, a lot of wavelet filtering improvements, and distributed GIPSY cluster-based evaluation.

To improve detection and classification of the malware in the network traffic or otherwise we employ machine learning approach to static pcap payload malicious code analysis and fingerprinting using the open-source MARF framework and its MARFCAT application, originally designed for the SATE static analysis tool exposition workshop. We first train on the known malware pcap data and measure the precision and then test it on the unseen, but known data and select the best available machine learning combination to do so. This work elaborates on the details of the methodology and the corresponding results of application of the machine learning techniques along with signal processing and NLP alike to static network packet analysis in search for malicious code in the packet capture (pcap) data. malicious code analysis [BOB10, SEZS01, SXCM04, HJ07, HRSS07, Sue07, RM08, BOA07] We show the system the examples of pcap files with malware and MARFCAT learns them by computing spectral signatures using signal processing techniques. When we test, we compute how similar or distant each file is from the known trained-on malware-laden files. In part, the methodology can approximately be seen as some signature-based “antivirus” or IDS software systems detect bad signature, except that with a large number of machine learning and signal processing algorithms, we test to find out which combination gives the highest precision and best run-time. At the present, however, we are looking at the whole pcap files. This aspect lowers the precision, but is fast to scan all the files. The malware database with known malware, the reports, etc. serves as a knowledge base to machine-learn from. Thus, we primarily:

  • Teach the system from the known cases of malware from their pcap data

  • Test on the known cases

  • Test on the unseen cases

0.6.5 Acknowledgments

The authors would like to express thanks and gratitude to the following for their help, resources, advice, and otherwise support and assistance:

  • NIST SAMATE group

  • Dr. Brigitte Jaumard

  • Sleiman Rabah

  • Open-Source Community

This work is partially supported by the Faculty of ENCS, Concordia University, NSERC, and the 2011-2012 CCSEP scholarship. The wavelet-related work of Yankui Sun is partially supported by the National Natural Science Foundation of China (No. 60971006).

.7 Classification Result Tables

What follows are result tables with top classification results ranked from most precise at the top. This include the configuration settings for MARF by the means of options (the algorithm implementations are at their defaults [Mok08b]).

guess run algorithms good bad %
1st 1 -nopreprep -low -fft -cheb -flucid 37 4 90.24
1st 2 -nopreprep -low -fft -diff -flucid 37 4 90.24
1st 3 -nopreprep -low -fft -eucl -flucid 27 14 65.85
1st 4 -nopreprep -low -fft -hamming -flucid 23 18 56.10
1st 5 -nopreprep -low -fft -mink -flucid 22 19 53.66
1st 6 -nopreprep -low -fft -cos -flucid 36 114 24.00
2nd 1 -nopreprep -low -fft -cheb -flucid 38 3 92.68
2nd 2 -nopreprep -low -fft -diff -flucid 38 3 92.68
2nd 3 -nopreprep -low -fft -eucl -flucid 34 7 82.93
2nd 4 -nopreprep -low -fft -hamming -flucid 26 15 63.41
2nd 5 -nopreprep -low -fft -mink -flucid 31 10 75.61
2nd 6 -nopreprep -low -fft -cos -flucid 39 111 26.00
guess run class good bad %
1st 1 CVE-2009-3829 6 0 100.00
1st 2 CVE-2009-4376 6 0 100.00
1st 3 CVE-2010-0304 6 0 100.00
1st 4 CVE-2010-2286 6 0 100.00
1st 5 CVE-2010-2283 6 0 100.00
1st 6 CVE-2009-3551 6 0 100.00
1st 7 CVE-2009-3549 6 0 100.00
1st 8 CVE-2009-3241 15 9 62.50
1st 9 CVE-2009-2560 9 6 60.00
1st 10 CVE-2010-1455 30 24 55.56
1st 11 CVE-2009-2563 6 5 54.55
1st 12 CVE-2009-2562 6 5 54.55
1st 13 CVE-2009-2561 6 7 46.15
1st 14 CVE-2009-4378 6 7 46.15
1st 15 CVE-2010-2287 6 7 46.15
1st 16 CVE-2009-3550 6 8 42.86
1st 17 CVE-2009-3243 13 23 36.11
1st 18 CVE-2009-4377 12 22 35.29
1st 19 CVE-2010-2285 6 11 35.29
1st 20 CVE-2009-2559 6 11 35.29
1st 21 CVE-2010-2284 6 12 33.33
1st 22 CVE-2009-3242 7 16 30.43
2nd 1 CVE-2009-3829 6 0 100.00
2nd 2 CVE-2009-4376 6 0 100.00
2nd 3 CVE-2010-0304 6 0 100.00
2nd 4 CVE-2010-2286 6 0 100.00
2nd 5 CVE-2010-2283 6 0 100.00
2nd 6 CVE-2009-3551 6 0 100.00
2nd 7 CVE-2009-3549 6 0 100.00
2nd 8 CVE-2009-3241 16 8 66.67
2nd 9 CVE-2009-2560 10 5 66.67
2nd 10 CVE-2010-1455 44 10 81.48
2nd 11 CVE-2009-2563 6 5 54.55
2nd 12 CVE-2009-2562 6 5 54.55
2nd 13 CVE-2009-2561 6 7 46.15
2nd 14 CVE-2009-4378 6 7 46.15
2nd 15 CVE-2010-2287 13 0 100.00
2nd 16 CVE-2009-3550 6 8 42.86
2nd 17 CVE-2009-3243 13 23 36.11
2nd 18 CVE-2009-4377 12 22 35.29
2nd 19 CVE-2010-2285 6 11 35.29
2nd 20 CVE-2009-2559 6 11 35.29
2nd 21 CVE-2010-2284 6 12 33.33
2nd 22 CVE-2009-3242 8 15 34.78
Table 1: CVE Stats for Wireshark 1.2.0, Low-Pass FFT Filter Preprocessing
guess run algorithms good bad %
1st 1 -nopreprep -sdwt -fft -diff -spectrogram -graph -flucid 37 4 90.24
1st 2 -nopreprep -sdwt -fft -cheb -spectrogram -graph -flucid 37 4 90.24
1st 3 -nopreprep -sdwt -fft -eucl -spectrogram -graph -flucid 27 14 65.85
1st 4 -nopreprep -sdwt -fft -hamming -spectrogram -graph -flucid 26 15 63.41
1st 5 -nopreprep -sdwt -fft -mink -spectrogram -graph -flucid 22 19 53.66
1st 6 -nopreprep -sdwt -fft -cos -spectrogram -graph -flucid 38 65 36.89
2nd 1 -nopreprep -sdwt -fft -diff -spectrogram -graph -flucid 39 2 95.12
2nd 2 -nopreprep -sdwt -fft -cheb -spectrogram -graph -flucid 39 2 95.12
2nd 3 -nopreprep -sdwt -fft -eucl -spectrogram -graph -flucid 35 6 85.37
2nd 4 -nopreprep -sdwt -fft -hamming -spectrogram -graph -flucid 29 12 70.73
2nd 5 -nopreprep -sdwt -fft -mink -spectrogram -graph -flucid 31 10 75.61
2nd 6 -nopreprep -sdwt -fft -cos -spectrogram -graph -flucid 39 64 37.86
guess run class good bad %
1st 1 CVE-2009-3829 6 0 100.00
1st 2 CVE-2009-2562 6 0 100.00
1st 3 CVE-2009-4378 6 0 100.00
1st 4 CVE-2010-2286 6 0 100.00
1st 5 CVE-2010-0304 6 0 100.00
1st 6 CVE-2009-4376 6 0 100.00
1st 7 CVE-2010-2283 6 0 100.00
1st 8 CVE-2009-3551 6 0 100.00
1st 9 CVE-2009-3550 6 0 100.00
1st 10 CVE-2009-3549 6 0 100.00
1st 11 CVE-2009-2563 6 2 75.00
1st 12 CVE-2009-2560 11 4 73.33
1st 13 CVE-2009-3241 15 9 62.50
1st 14 CVE-2010-1455 31 23 57.41
1st 15 CVE-2009-2561 6 6 50.00
1st 16 CVE-2010-2287 6 6 50.00
1st 17 CVE-2009-2559 6 6 50.00
1st 18 CVE-2009-3243 16 16 50.00
1st 19 CVE-2010-2285 6 7 46.15
1st 20 CVE-2009-4377 12 16 42.86
1st 21 CVE-2010-2284 6 9 40.00
1st 22 CVE-2009-3242 6 17 26.09
2nd 1 CVE-2009-3829 6 0 100.00
2nd 2 CVE-2009-2562 6 0 100.00
2nd 3 CVE-2009-4378 6 0 100.00
2nd 4 CVE-2010-2286 6 0 100.00
2nd 5 CVE-2010-0304 6 0 100.00
2nd 6 CVE-2009-4376 6 0 100.00
2nd 7 CVE-2010-2283 6 0 100.00
2nd 8 CVE-2009-3551 6 0 100.00
2nd 9 CVE-2009-3550 6 0 100.00
2nd 10 CVE-2009-3549 6 0 100.00
2nd 11 CVE-2009-2563 6 2 75.00
2nd 12 CVE-2009-2560 12 3 80.00
2nd 13 CVE-2009-3241 16 8 66.67
2nd 14 CVE-2010-1455 43 11 79.63
2nd 15 CVE-2009-2561 6 6 50.00
2nd 16 CVE-2010-2287 12 0 100.00
2nd 17 CVE-2009-2559 6 6 50.00
2nd 18 CVE-2009-3243 19 13 59.38
2nd 19 CVE-2010-2285 6 7 46.15
2nd 20 CVE-2009-4377 12 16 42.86
2nd 21 CVE-2010-2284 6 9 40.00
2nd 22 CVE-2009-3242 8 15 34.78
Table 2: CVE Stats for Wireshark 1.2.0, Separating DWT Wavelet Filter Preprocessing
guess run algorithms good bad %
1st 1 -cweid -nopreprep -sdwt -fft -diff -flucid 31 5 86.11
1st 2 -cweid -nopreprep -sdwt -fft -eucl -flucid 29 7 80.56
1st 3 -cweid -nopreprep -sdwt -fft -mink -flucid 17 19 47.22
1st 4 -cweid -nopreprep -sdwt -fft -hamming -flucid 14 22 38.89
2nd 1 -cweid -nopreprep -sdwt -fft -diff -flucid 33 3 91.67
2nd 2 -cweid -nopreprep -sdwt -fft -eucl -flucid 34 2 94.44
2nd 3 -cweid -nopreprep -sdwt -fft -mink -flucid 27 9 75.00
2nd 4 -cweid -nopreprep -sdwt -fft -hamming -flucid 23 13 63.89
guess run class good bad %
1st 1 CWE399 4 0 100.00
1st 2 CWE189 4 0 100.00
1st 3 NVD-CWE-Other 11 1 91.67
1st 4 CWE20 30 10 75.00
1st 5 NVD-CWE-noinfo 34 34 50.00
1st 6 CWE119 8 8 50.00
2nd 1 CWE399 4 0 100.00
2nd 2 CWE189 4 0 100.00
2nd 3 NVD-CWE-Other 11 1 91.67
2nd 4 CWE20 34 6 85.00
2nd 5 NVD-CWE-noinfo 53 15 77.94
2nd 6 CWE119 11 5 68.75
Table 3: CWE Stats for Wireshark 1.2.0, Separating DWT Wavelet Filter Preprocessing
guess run algorithms good bad %
1st 1 -cweid -nopreprep -low -fft -diff -flucid 30 6 83.33
1st 2 -cweid -nopreprep -low -fft -cheb -flucid 30 6 83.33
1st 3 -cweid -nopreprep -low -fft -eucl -flucid 25 11 69.44
1st 4 -cweid -nopreprep -low -fft -mink -flucid 20 16 55.56
1st 5 -cweid -nopreprep -low -fft -cos -flucid 36 40 47.37
1st 6 -cweid -nopreprep -low -fft -hamming -flucid 12 24 33.33
2nd 1 -cweid -nopreprep -low -fft -diff -flucid 31 5 86.11
2nd 2 -cweid -nopreprep -low -fft -cheb -flucid 31 5 86.11
2nd 3 -cweid -nopreprep -low -fft -eucl -flucid 30 6 83.33
2nd 4 -cweid -nopreprep -low -fft -mink -flucid 22 14 61.11
2nd 5 -cweid -nopreprep -low -fft -cos -flucid 48 28 63.16
2nd 6 -cweid -nopreprep -low -fft -hamming -flucid 16 20 44.44
guess run class good bad %
1st 1 CWE399 6 1 85.71
1st 2 CWE20 48 12 80.00
1st 3 NVD-CWE-Other 18 7 72.00
1st 4 CWE189 6 3 66.67
1st 5 NVD-CWE-noinfo 61 61 50.00
1st 6 CWE119 14 19 42.42
2nd 1 CWE399 6 1 85.71
2nd 2 CWE20 48 12 80.00
2nd 3 NVD-CWE-Other 18 7 72.00
2nd 4 CWE189 6 3 66.67
2nd 5 NVD-CWE-noinfo 78 44 63.93
2nd 6 CWE119 22 11 66.67
Table 4: CWE Stats for Wireshark 1.2.0, Low-Pass FFT Filter Preprocessing

.8 Forensic Lucid Report Example

An example report encoding the reported data in Forensic Lucid for Wireshark 1.2.0 after using simple FFT-based feature extraction and Chebyshev distance as a classifier. The report provides the same data, compressed, as the SATE XML, but in the Forensic Lucid syntax for automated reasoning and event reconstruction during a digital investigation. The example is a an evidential statement context encoded for the use in the investigator’s knowledge base of a particular case.

[tabsize=2]results/report-noprepreprawfftchebflucid-training-data.ipl

References

  • [AS01] A. F. Abdelnour and I. W. Selesnick. Nearly symmetric orthogonal wavelet bases. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), May 2001.
  • [BOA07] M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F. Jahanian, and J. Nazario. Automated classification and analysis of Internet malware. Technical report, University of Michigan, April 2007. http://www.eecs.umich.edu/techreports/cse/2007/CSE-TR-530-07.pdf.
  • [BOB10] Hamad Binsalleeh, Thomas Ormerod, Amine Boukhtouta, Prosenjit Sinha, Amr M. Youssef, Mourad Debbabi, and Lingyu Wang. On the analysis of the zeus botnet crimeware toolkit. In Eighth Annual Conference on Privacy, Security and Trust, PST 2010, August 17-19, 2010, Ottawa, Ontario, Canada, pages 31–38. IEEE, 2010.
  • [BSSV10] Mehran Bozorgi, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker.

    Beyond heuristics: Learning to classify vulnerabilities and predict exploits.

    In Proceedings of the 16th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, KDD’10, pages 105–114, New York, NY, USA, 2010. ACM.
  • [ESI09] Masashi Eto, Kotaro Sonoda, Daisuke Inoue, Katsunari Yoshioka, and Koji Nakao. A proposal of malware distinction method based on scan patterns using spectrum analysis. In Proceedings of the 16th International Conference on Neural Information Processing: Part II, ICONIP’09, pages 565–572, Berlin, Heidelberg, 2009. Springer-Verlag.
  • [Han10] Bin Han. Towards a multi-tier runtime system for GIPSY. Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada, 2010.
  • [HJ07] K. Hwang and D. Jung. Anti-malware expert system. In H. Martin, editor, Proceedings of the 17th Virus Bulletin International Conference, pages 9–17, Vienna, Austria: The Pentagon, Abingdon, OX143YP, England, September 2007.
  • [HLYD09] Aiman Hanna, Hai Zhou Ling, Xiaochun Yang, and Mourad Debbabi. A synergy between static and dynamic analysis for the detection of software security vulnerabilities. In Robert Meersman, Tharam S. Dillon, and Pilar Herrero, editors, OTM Conferences (2), volume 5871 of Lecture Notes in Computer Science, pages 815–832. Springer, 2009.
  • [HRSS07] N. Hnatiw, T. Robinson, C. Sheehan, and N. Suan. Pimp my PE: Parsing malicious and malformed executables. In H. Martin, editor, Proceedings of the 17th Virus Bulletin International Conference, pages 9–17, Vienna, Austria: The Pentagon, Abingdon, OX143YP, England, September 2007.
  • [IYE09] Daisuke Inoue, Katsunari Yoshioka, Masashi Eto, Masaya Yamagata, Eisuke Nishino, Jun’ichi Takeuchi, Kazuya Ohkouchi, and Koji Nakao. An incident analysis system NICTER and its analysis engines based on data mining techniques. In Proceedings of the 15th International Conference on Advances in Neuro-Information Processing – Volume Part I, ICONIP’08, pages 579–586, Berlin, Heidelberg, 2009. Springer-Verlag.
  • [Ji11] Yi Ji. Scalability evaluation of the GIPSY runtime system. Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada, March 2011.
  • [JMP13] Yi Ji, Serguei A. Mokhov, and Joey Paquet. Unifying and refactoring DMF to support concurrent Jini and JMS DMS in GIPSY. In Bipin C. Desai, Sudhir P. Mudur, and Emil I. Vassev, editors, Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering (C3S2E’12), pages 36–44, New York, NY, USA, June 2010–2013. ACM. Online e-print http://arxiv.org/abs/1012.2860.
  • [KAYE04] Ted Kremenek, Ken Ashcraft, Junfeng Yang, and Dawson Engler. Correlation exploitation in error ranking. In Foundations of Software Engineering (FSE), 2004.
  • [KBC05] Manesh Kokare, P. K. Biswas, and B. N. Chatterji.

    Texture image retrieval using new rotated complex wavelet filters.

    IEEE Transaction on Systems, Man, and Cybernetics-Part B: Cybernetics, 6(35):1168–1178, 2005.
  • [KBC06] Manesh Kokare, P. K. Biswas, and B. N. Chatterji. Rotation-invariant texture image retrieval using rotated complex wavelet filters. IEEE Transaction on Systems, Man, and Cybernetics-Part B: Cybernetics, 6(36):1273–1282, 2006.
  • [KE03] Ted Kremenek and Dawson Engler. Z-ranking: Using statistical analysis to counter the impact of static analysis approximations. In SAS 2003, 2003.
  • [KTB06] Ted Kremenek, Paul Twohey, Godmar Back, Andrew Ng, and Dawson Engler. From uncertainty to belief: Inferring the specification within. In Proceedings of the 7th Symposium on Operating System Design and Implementation, 2006.
  • [KZL10] Ying Kong, Yuqing Zhang, and Qixu Liu. Eliminating human specification in static analysis. In Proceedings of the 13th international conference on Recent advances in intrusion detection, RAID’10, pages 494–495, Berlin, Heidelberg, 2010. Springer-Verlag.
  • [LjXP09] Ru Li, Ou jie Xi, Bin Pang, Jiao Shen, and Chun-Lei Ren.

    Network application identification based on wavelet transform and k-means algorithm.

    In Proceedings of the IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS2009), volume 1, pages 38–41, November 2009.
  • [LKW08] Kriangkrai Limthong, Fukuda Kensuke, and Pirawat Watanapongse. Wavelet-based unwanted traffic time series analysis. In 2008 International Conference on Computer and Electrical Engineering, pages 445–449. IEEE Computer Society, 2008.
  • [Mat12a] MathWorks. MATLAB. [online], 2000–2012. http://www.mathworks.com/products/matlab/.
  • [Mat12b] MathWorks. MATLAB Coder. [online], 2012. http://www.mathworks.com/help/toolbox/coder/coder_product_page.html, last viewed June 2012.
  • [Mat12c] MathWorks. MATLAB Coder: codegen – generate C/C++ code from MATLAB code. [online], 2012. http://www.mathworks.com/help/toolbox/coder/ref/codegen.html, last viewed June 2012.
  • [MD08] Serguei A. Mokhov and Mourad Debbabi. File type analysis using signal processing techniques and machine learning vs. file unix utility for forensic analysis. In Oliver Goebel, Sandra Frings, Detlef Guenther, Jens Nedon, and Dirk Schadt, editors, Proceedings of the IT Incident Management and IT Forensics (IMF’08), LNI140, pages 73–85. GI, September 2008.
  • [MLB07] Serguei A. Mokhov, Marc-André Laverdière, and Djamel Benredjem. Taxonomy of linux kernel vulnerability solutions. In Innovative Techniques in Instruction Technology, E-learning, E-assessment, and Education, pages 485–493, University of Bridgeport, U.S.A., 2007. Proceedings of CISSE/SCSS’07.
  • [Mok08a] Serguei A. Mokhov. Encoding forensic multimedia evidence from MARF applications as Forensic Lucid expressions. In Tarek Sobh, Khaled Elleithy, and Ausif Mahmood, editors, Novel Algorithms and Techniques in Telecommunications and Networking, proceedings of CISSE’08, pages 413–416, University of Bridgeport, CT, USA, December 2008. Springer. Printed in January 2010.
  • [Mok08b] Serguei A. Mokhov. Study of best algorithm combinations for speech processing tasks in machine learning using median vs. mean clusters in MARF. In Bipin C. Desai, editor, Proceedings of C3S2E’08, pages 29–43, Montreal, Quebec, Canada, May 2008. ACM.
  • [Mok10a] Serguei A. Mokhov. Complete complimentary results report of the MARF’s NLP approach to the DEFT 2010 competition. [online], June 2010. http://arxiv.org/abs/1006.3787.
  • [Mok10b] Serguei A. Mokhov. Evolution of MARF and its NLP framework. In Proceedings of C3S2E’10, pages 118–122. ACM, May 2010.
  • [Mok10c] Serguei A. Mokhov. L’approche MARF à DEFT 2010: A MARF approach to DEFT 2010. In Proceedings of the 6th DEFT Workshop (DEFT’10), pages 35–49. LIMSI / ATALA, July 2010. DEFT 2010 Workshop at TALN 2010; online at http://deft.limsi.fr/actes/2010/pdf/2_clac.pdf.
  • [Mok10d] Serguei A. Mokhov. The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT. [online], October 2010. Online at http://arxiv.org/abs/1010.2511.
  • [Mok11] Serguei A. Mokhov. The use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT. Technical Report NIST SP 500-283, NIST, October 2011. Report: http://www.nist.gov/manuscript-publication-search.cfm?pub_id=909407, online e-print at http://arxiv.org/abs/1010.2511.
  • [Mok13] Serguei A. Mokhov. MARFCAT – MARF-based Code Analysis Tool. Published electronically within the MARF project, http://sourceforge.net/projects/marf/files/Applications/MARFCAT/, 2010–2013. Last viewed April 2012.
  • [Mot09] Motorola. Efficient polyphase FIR resampler for numpy: Native C/C++ implementation of the function upfirdn(). [online], 2009. http://code.google.com/p/upfirdn/source/browse/upfirdn.
  • [MPD08] Serguei A. Mokhov, Joey Paquet, and Mourad Debbabi. Formally specifying operational semantics and language constructs of Forensic Lucid. In Oliver Göbel, Sandra Frings, Detlef Günther, Jens Nedon, and Dirk Schadt, editors, Proceedings of the IT Incident Management and IT Forensics (IMF’08), LNI140, pages 197–216. GI, September 2008. Online at http://subs.emis.de/LNI/Proceedings/Proceedings140/gi-proc-140-014.pdf.
  • [MPD10] Serguei A. Mokhov, Joey Paquet, and Mourad Debbabi. Towards automatic deduction and event reconstruction using Forensic Lucid and probabilities to encode the IDS evidence. In S. Jha, R. Sommer, and C. Kreibich, editors, Proceedings of RAID’10, LNCS 6307, pages 508–509. Springer, September 2010.
  • [MS02] Christopher D. Manning and Hinrich Schutze.

    Foundations of Statistical Natural Language Processing

    .
    MIT Press, 2002.
  • [MSS09] Serguei A. Mokhov, Miao Song, and Ching Y. Suen. Writer identification using inexpensive signal processing techniques. In Tarek Sobh and Khaled Elleithy, editors, Innovations in Computing Sciences and Software Engineering; Proceedings of CISSE’09, pages 437–441. Springer, December 2009. ISBN: 978-90-481-9111-6, online at: http://arxiv.org/abs/0912.5502.
  • [NIS13a] NIST. National Vulnerability Database. [online], 2005–2013. http://nvd.nist.gov/.
  • [NIS13b] NIST. National Vulnerability Database statistics. [online], 2005–2013. http://web.nvd.nist.gov/view/vuln/statistics.
  • [NJG10] Vinod P. Nair, Harshit Jain, Yashwant K. Golecha, Manoj Singh Gaur, and Vijay Laxmi. MEDUSA: MEtamorphic malware dynamic analysis using signature from API. In Proceedings of the 3rd International Conference on Security of Information and Networks, SIN’10, pages 263–269, New York, NY, USA, 2010. ACM.
  • [ODBN10] Vadim Okun, Aurelien Delaitre, Paul E. Black, and NIST SAMATE. Static Analysis Tool Exposition (SATE) 2010. [online], 2010. See http://samate.nist.gov/SATE2010Workshop.html.
  • [ODBN12] Vadim Okun, Aurelien Delaitre, Paul E. Black, and NIST SAMATE. Static Analysis Tool Exposition (SATE) IV. [online], March 2012. See http://samate.nist.gov/SATE.html.
  • [Paq09] Joey Paquet. Distributed eductive execution of hybrid intensional programs. In Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference (COMPSAC’09), pages 218–224, Seattle, Washington, USA, July 2009. IEEE Computer Society.
  • [RM08] A. Newaz M. E. Rafiq and Yida Mao. A novel approach for automatic adjudification of new malware. In Nagib Callaos, William Lesso, C. Dale Zinn, Jorge Baralt, Jaouad Boukachour, Christopher White, Thilidzi Marwala, and Fulufhelo V. Nelwamondo, editors, Proceedings of the 12th World Multi-Conference on Systemics, Cybernetics and Informatics (WM-SCI’08), volume V, pages 137–142, Orlando, Florida, USA, June 2008. IIIS.
  • [Sch07] Rob Schreiber. MATLAB. Scholarpedia, 2(6):2929, 2007. http://www.scholarpedia.org/article/MATLAB.
  • [SCL03] Ivan Selesnick, Shihua Cai, Keyong Li, Levent Sendur, and A. Farras Abdelnour. MATLAB implementation of wavelet transforms. Technical report, Electrical Engineering, Polytechnic University, Brooklyn, NY, 2003. Online at http://taco.poly.edu/WaveletSoftware/.
  • [SEZS01] M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo. Data mining methods for detection of new malicious executables. In Proceedings of IEEE Symposium on Security and Privacy, pages 38–49, Oakland, 2001.
  • [Son10a] Dawn Song. BitBlaze: Security via binary analysis. [online], 2010. Online at http://bitblaze.cs.berkeley.edu.
  • [Son10b] Dawn Song. WebBlaze: New techniques and tools for web security. [online], 2010. Online at http://webblaze.cs.berkeley.edu.
  • [Sou13] Sourcefire. Snort: Open-source network intrusion prevention and detection system (IDS/IPS). [online], 1999–2013. http://www.snort.org/.
  • [Sue07] M. Suenaga. Virus linguistics – searching for ethnic words. In H. Martin, editor, Proceedings of the 17th Virus Bulletin International Conference, pages 9–17, Vienna, Austria: ThePentagon, Abingdon, OX143YP, England, September 2007.
  • [SXCM04] A. H. Sung, J. Xu, P. Chavez, and S. Mukkamala. Static analyzer of vicious executables (SAVE). In Proceedings of 20th Annual of Computer Security Applications Conference, pages 326–334, December 2004.
  • [The13] The MARF Research and Development Group. The Modular Audio Recognition Framework and its Applications. [online], 2002–2013. http://marf.sf.net and http://arxiv.org/abs/0905.1235, last viewed April 2012.
  • [Tli09] Syrine Tlili. Automatic detection of safety and security vulnerabilities in open source software. PhD thesis, Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Canada, 2009. ISBN: 9780494634165.
  • [Vas05] Emil Iordanov Vassev. General architecture for demand migration in the GIPSY demand-driven execution engine. Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada, June 2005. ISBN 0494102969.
  • [VM13] Various contributors and MITRE. Common Weakness Enumeration (CWE) – a community-developed dictionary of software weakness types. [online], 2006–2013. See http://cwe.mitre.org.