Integration of the Static Analysis Results Interchange Format in CogniCrypt

07/04/2019
by   Sriteja Kummita, et al.
Universität Paderborn
Fraunhofer
0

Background - Software companies increasingly rely on static analysis tools to detect potential bugs and security vulnerabilities in their software products. In the past decade, more and more commercial and open-source static analysis tools have been developed and are maintained. Each tool comes with its own reporting format, preventing an easy integration of multiple analysis tools in a single interface, such as the Static Analysis Server Protocol (SASP). In 2017, a collaborative effort in industry, including Microsoft and GrammaTech, has proposed the Static Analysis Results Interchange Format (SARIF) to address this issue. SARIF is a standardized format in which static analysis warnings can be encoded, to allow the import and export of analysis reports between different tools. Purpose - This paper explains the SARIF format through examples and presents a proof of concept of the connector that allows the static analysis tool CogniCrypt to generate and export its results in SARIF format. Design/Approach - We conduct a cross-sectional study between the SARIF format and CogniCrypt's output format before detailing the implementation of the connector. The study aims to find the components of interest in CogniCrypt that the SARIF export module can complete. Originality/Value - The integration of SARIF into CogniCrypt described in this paper can be reused to integrate SARIF into other static analysis tools. Conclusion - After detailing the SARIF format, we present an initial implementation to integrate SARIF into CogniCrypt. After taking advantage of all the features provided by SARIF, CogniCrypt will be able to support SASP.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

05/23/2018

Evaluation of Static Analysis Tools for Finding Vulnerabilities in Java and C/C++ Source Code

It is quite common for security testing to be delayed until after the so...
11/04/2019

An Expert System for Learning Software Engineering Knowledge (with Case Studies in Understanding Static Code Warning)

Knowledge-based systems reason over some knowledge base. Hence, an impor...
05/07/2021

Open Data Portal Germany (OPAL) Projektergebnisse

In the Open Data Portal Germany (OPAL) project, a pipeline of the follow...
07/02/2019

GTIRB: Intermediate Representation for Binaries

GTIRB is an intermediate representation for binary analysis and transfor...
07/21/2020

Reference study of CityGML software support: the GeoBIM benchmark 2019 – Part II

OGC CityGML is an open standard for 3D city models intended to foster in...
01/17/2021

Brightening the Optical Flow through Posit Arithmetic

As new technologies are invented, their commercial viability needs to be...
08/09/2019

RCE: An Integration Environment for Engineering and Science

We present RCE (Remote Component Environment), an open-source framework ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In order to detect errors in their programs, software companies and individual developers use static analysis tools to analyze their software. From the correctness of the program, to security vulnerabilities, to compliance with given standards, to performance, static analysis is widely used in practice. Current tools typically generate reports in their own format for their own interface, or provide means to export general reports in XML or PDF format for example. As a result, software developers often experience a significant overhead parsing and aggregating the reports generated by different analysis tools in order to obtain one complete report. To address this problem, CA Technologies (Technologies, [n. d.]), Cryptsoft (Ltd., [n. d.]), FireEye (FireEye, [n. d.]), GrammaTech (GrammaTech, [n. d.]), Hewlett Packard Enterprise (HPE) ((HPE), [n. d.]), Micro Focus (Focus, [n. d.]), Microsoft (https://www.microsoft.com, [n. d.]), Semmle (Semmle, [n. d.]), and others, proposed a common reporting format for all static analysis tools, the Static Analysis Results Interchange Format, abbreviated as Sarif.

Figure 1. An illustration of Sasp integrated with static analysis clients. When a client (e.g., Eclipse), requests static analysis results from SASP (1), SASP requests results from all other clients (2a–c). It receives them (3a–c), and sends the aggregated report back to Eclipse (4).

Sarif is a standard developed under OASIS (GrammaTech, 2018). The technical committee of Sarif includes members from several static analysis tool vendors, including GrammaTech and other large-scale users (GrammaTech, 2018). Sarif is a JSON-based format designed to not only report the results of an analysis but also its metadata, including schema, URI, and version. It has been created with the goal of unifying the output format of different static analysis tools, making it easy to integrate the reports into a single interface, which is the main objective of Static Analysis Server Protocol (Sasp(GrammaTech, 2018).

Sasp acts as a service where clients, such as the Eclipse Integrated Development Environment (IDE)111https://www.eclipse.org/ide/, IntelliJ IDEA222https://www.jetbrains.com/idea/, or Visual Studio Code333https://code.visualstudio.com/ can request static analysis results obtained from other analysis tools for a given program to analyze, as illustrated in Figure 1. For such a service to respond to a query quickly, it is necessary to enforce a common output standard to aggregate all analysis warnings results efficiently. Sasp achieves this by leveraging Sarif.

We explore how to make an analysis tool support Sarif, in order to eventually incorporate it in the Sasp system, thus enabling interoperability and potential integration with other static analysis tools. In particular, we focus on CogniCrypt (Ram Kamath, 2017), a static analysis tool that detects misuses of cryptographic APIs in Java programs. The current version of CogniCrypt returns its results in its own format, which is used to display warning traces in Eclipse. CogniCrypt is implemented as an Eclipse plugin, and provides software developers with two main functionalities:

  • generating secure implementations of common cryptographic programming tasks,

  • and analyzing developer code in the IDE and reporting existing misuses of cryptographic libraries.

In this paper, we first present CogniCrypt’s original reporting format in Section 2. We then detail the Sarif format and explain its structure and syntax in Section 3. Then, Section 4 describes our implementation of the connector that exports CogniCrypt results in Sarif format. Finally, Section 5 summarizes the outcomes of this paper and presents future work.

2. The CogniCrypt Report Format

1public class Crypto {
2  public void getKey(int keySize) throws NoSuchAlgorithmException{
3    KeyGenerator c = KeyGenerator.getInstance("AES");
4    if (keySize > 0)
5      c.init(512);
6    else
7      c.generateKey(); 
8    c.generateKey();
9  }
10}
Listing 1: Code example containing a ConstraintError and a TypestateError

Cryptography is used for many different purposes. From hashing to encrypting, complex cryptographic libraries are used in many applications. However, using those libraries is not straightforward. Recent studies indicate that software developers have limited to no knowledge on the usage of APIs of cryptographic libraries. Lazar et al. (X. Wang and N. Zeldovich, [n. d.]) carried out an investigation on 269 cryptography related vulnerabilities and found that 83% of them resulted from software application developers misusing the cryptographic libraries. Nadi et al. (Ram Kamath, 2016) show that most cryptographic misuses are due to the insufficient knowledge on the library usage by the developer, and that developers require debugging tools in their development environments to support them.

In order to detect cryptographic API misuses, CogniCrypt uses a set of cryptographic rules encoded in the CrySL format, a definition language that allows cryptographic experts to encode the secure usage of cryptographic libraries in a light-weighted syntax. CogniCrypt automatically converts those rules into an efficient flow-sensitive and context-sensitive static data-flow analysis that it then runs to detect the API misuses described by the rules. In its current state, CogniCrypt contains a complete ruleset for the APIs of the Java Cryptography Architecture (JCA).

1Findings in Java Class: Example.Crypto
2
3   in Method: void getKey(int)
4    ConstraintError violating CrySL rule for javax.crypto.KeyGenerator (on Object #bfd7ff31836bf8643830e32ce26e9ef95 4d0522793ed0e9722ce44f0b255d4ef)
5      First parameter (with value 512) should be any of {128, 192, 256}
6      at statement: virtualinvoke r1.<javax.crypto.KeyGenerator: void init(int)>(varReplacer29)
7      at line: 5
8
9    TypestateError violating CrySL rule for javax.crypto.KeyGenerator (on Object #bfd7ff31836bf8643830e32ce26e9ef95 4d0522793ed0e9722ce44f0b255d4ef)
10      Unexpected call to method generateKey on object of type javax.crypto.KeyGenerator.
11      at statement: virtualinvoke r1.<javax.crypto.KeyGenerator: javax.crypto.SecretKey generateKey()>()
12      at line: 7
Listing 2: CogniCrypt console output for Listing 1

In CogniCrypt, each CrySL rule defines the correct use of a specific Java class of a cryptography library, by encoding constraints on usage order of API calls and parameter types. Error types and reporting are also encoded in CrySL. When CogniCrypt analyses a Java program, a listener waits for the generation of analysis results and outputs them in the command-line as they are returned. A developer can change the reporting format by implementing their own custom reporting listener and using it in place of the default command-line listener. CogniCrypt supports seven types of errors:

  • [leftmargin=.5cm]

  • ConstraintError: This type of error refers to the wrong parameters being supplied to particular method calls. For example, calling Cipher.getInstance("AES") instead of the secure version Cipher.getInstance("AES/ECB/PKCS5Padding").

  • NeverTypeOfError: This error is reported when a variable is of an insecure type, such as a password contained in a string instead of a char array.

  • ForbiddenMethodError: This error is raised when a deprecated or insecure method is called, such as the constructor PBEKeySpec(char[] password).

  • TypestateError: When a call to a method is issued when it shouldn’t be, CogniCrypt raises a TypestateError. For example, calling Cipher.doFinal() when no call to Cipher.init() has been issued before.

  • RequiredPredicateError: This error refers to a second-degree ConstraintError: when an object requires another object to be used in a specific way, and this was not the case. For example, a Cipher object receiving a hardcoded key will raise an error, since such keys should not be hardcoded.

  • ImpreciseValueExtractionError: This error is used when the analysis could not retrieve the parameter passed to a cryptographic method, for example when a key size is supplied in a configuration file instead of in the code. Since the parameter could be faulty, an error of lesser importance is raised.

  • IncompleteOperationError: This error relates to the TypestateError, but instead of referring to a wrong method call, it is raised when a missing call is detected. An example is never calling Cipher.doFinal() on a cipher object.

We illustrate a ConstraintError and a TypestateError in Listing 1, with CogniCrypt’s corresponding report shown in Listing 2. Listing 1 presents a Java method which generates a cryptographic key using an instance of KeyGenerator. Two errors are made here: first, init() of KeyGenerator is called using an incorrect parameter: 512 instead of the secure 128, 192, or 256 values. Second, along the else path, the key generator object is never initialized before generateKey() is called. Using the CrySL rules that describe the usage of KeyGenerator, CogniCrypt thus detects the two errors as a ConstraintError and a TypestateError. We show the corresponding CrySL rules in Appendix A.1.

When reporting an error, CogniCrypt provides:

  • The error type.

  • The error location, as a line number and file name.

  • A customized error message. For example, for the ConstraintError in Listing 2, the error message contains the erroneous first parameter of getKey(), and provides other parameters that should be used instead.

3. The Sarif Format

We now detail the Sarif specification, with respect to reporting warnings. The complete Sarif documentation is found online (GrammaTech, 2018).

Sarif is a JSON format standard (OASIS, 2018). Its three main root keys–shown in Listing 3–are: version which specifies the version of the Sarif format, $schema which specifies the URI of the predefined JSON schema corresponding to the version, and runs an array containing the results of the analysis runs. The six main subkeys of an individual run are shown in Listing 4.

10{
11  "version": "2.0.0",
13  "runs": [{..}]
14}
Listing 3: Root key-value pairs of Sarif
12"runs": [{
13  "tool": {..},
14  "invocations": [{..}],
15  "files": {..},
16  "logicalLocations": {..},
17  "results": [{..}],
18  "resources": {..}
19}]
Listing 4: Subkeys of the key runs in Sarif
14"invocations": {
15  "commandLine": "java -cp CryptoAnalysis-1.0.0-jar-with- dependencies.jar crypto.HeadlessCryptoScanner --rulesDir=src/test/resources/ --applicationCp=CogniCryptDemoExample/ Examples.jar  --sarifReport --reportDir=CogniCrypt/reports",
16
17  "responseFiles": [{
18          "uri": "CryptoAnalysis/build/ CryptoAnalysis-jar-with-dependencies.jar",
19          }, {
20          "uri": "CryptoAnalysis/src/test/resources/",
21          }, {
22          "uri": "CryptoAnalysisTargets/ CogniCryptDemoExample/ Examples.jar"
23          }],
24  "startTime": "2016-07-16T14:18:25Z",
25  "endTime": "2016-07-16T14:19:01Z",
26  "fileName": "CryptoAnalysis/build/ CryptoAnalysis-jar-with-dependencies.jar",
27  "workingDirectory": "/home/CryptoAnalysis/",
28  "environmentVariables": {
29    "PATH": "..",
30    "HOME": "..",
31  },
32  "configurationNotifications": [{
33    "level": "error",
34    "message": {
35      "text": "ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console)."
36    }
37  }],
38  "toolNotifications": [{
39    "level": "note",
40    "message": {
41      "text": "Finished initializing soot."
42    }
43  }, {
44    "level": "warning",
45    "message": {
46      "text": "Couldn’t find any method for CryptSLMethod: keyMaterial = javax.crypto.SecretKey. getEncoded();"
47    },
48  }, {
49    "level": "note",
50    "message": {
51      "text": "Static Analysis took 1 seconds!."
52    }
53  }]
54}
Listing 5: Subkeys of the key invocations in Sarif

The syntax of the runs key can be separated into two categories:

  • reporting analysis results (invocations, files, results, and logicalLocations keys), which we detail in Section 3.1,

  • analysis metadata (tool and resource keys), which we explore in Section 3.2.

3.1. Reporting Analysis Results

In this section, we detail the invocations, files, results, and logicalLocations keys and their subkeys.

invocations

The invocations key describes the invocation information of the static analysis tool that was run. Invocation information mainly includes the start time of the analysis, the end time of the analysis, the environmental variables that are used to run the analysis, the command that is used to invoke the analysis, and the notifications displayed during the analysis. Those notifications are categorized into configuration notifications and tool notifications. The former contain notification objects describing the conditions relevant to the tool configuration, while the latter describe the runtime environment after the static analysis is invoked. A snippet of a CogniCrypt invocation object is shown in Listing 5.

files

The files key contains the information of all the files relevant to the run: the files in which analysis results were detected, or all files examined by the analysis tool. In some cases, a file might be nested inside another file (for example, in a compressed container), which is then referred to as its parent. In the case of nested files, the parent’s name is separated from nested fragment with the character, “#”. The nested fragment then starts with “/”. An example where the file “intro.docx” is located in the file “app.zip” is shown in Listing 6.

19"files": {
20  "collections/list.cpp": {
21    "mimeType": "text/x-c",
22    "length": 980,
23  },
24  "app.zip#/docs/intro.docx": {
25    "uri": "/docs/intro.docx",
26    "mimeType":"wordprocessingml.document",
27  "parentKey": "app.zip",
28  "length": 4050
29  }
30}
Listing 6: Subkeys of the key files in Sarif

logicalLocations

The optional key logicalLocations is used in case the analysis tool yields results that include physical location information, (e.g., source file name, the line and column numbers) and logical location information (e.g., namespace, type, and method name). In some cases, a logical location might be nested in another logical location referred to as its parent. In such cases, logicalLocations should contain properties describing each of its parents, up to the top-level logical location. An example of a warning detected in the C++ class namespaceA::namespaceB::classC is shown in Listing 7. The corresponding logicalLocations object contains the properties describing the class along with its containing namespaces.

54"logicalLocations": {
55  "namespaceA::namespaceB::classC": {
56    "name": "classC",
57    "kind": "type",
58    "parentKey": "namespaceA::namespaceB"
59  },
60  "namespaceA::namespaceB": {
61    "name": "namespaceB",
62    "kind": "namespace"
63    "parentKey": "namespaceA"
64  },
65  "namespaceA": {
66    "name": "namespaceA",
67    "kind": "namespace"
68  }
69}
Listing 7: Subkeys of the key logicalLocations in Sarif
30"results": [{
31  "ruleId": "C2001",
32  "ruleMessageId": "default",
33  "richMessageId": "richText",
34  "message": {
35    "text": "Deleting member ’x’ of variable ’y’ may compromise performance on subsequent accesses of ’y’."
36  },
37  "suppressionStates": [ "suppressedExternally" ],
38  "baselineState": "existing",
39  "level": "error",
40  "analysisTarget": {
41    "uri": "collections/list.cpp",
42  },
43  "locations": [{..}],
44  "codeFlows": [{..}],
45  "stacks": [{..}],
46  "fixes": [{..}],
47  "workItemUris": [
50  ]
51}]
Listing 8: Subkeys of the key results in Sarif
69"locations":[{
70  "physicalLocation": {
71    "fileLocation": {
72      "uri": "collections/list.h",
73    },
74  "region": {
75    "startLine": 15,
76    "startColumn": 9,
77    "endLine": 15,
78    "endColumn": 10,
79    "charLength": 1,
80    "charOffset": 254,
81    "snippet": {
82      "text": "add_core(ptr, offset, val);\n    return;"
83      }
84    }
85  },
86  "fullyQualifiedLogicalName": "collections::list:add"
87}]
Listing 9: Subkeys of the key locations in Sarif

results

Each run object contains an array of result objects, under the key results. Each result represents a warning reported by the analysis, an example of which is shown in Listing 8. We now detail the subkeys of a run object.

51"codeFlows": [{
52    "message": {
53      "text": "Path from declaration to usage"
54    },
55    "threadFlows": [
56      {
57        "id": "thread-52",
58        "locations": [
59          {
60            "step": 1,
61            "importance": "essential",
62            "message": {
63              "text": "Variable \"ptr\" declared.",
64            },
65            "location": {...
66                "region": {
67                  "startLine": 15,
68                  "snippet": {
69                    "text": "int *ptr;"
70                  },
71                }
72              },
73            },
74            "module": "platform"
75          },
76          {
77            "step": 2,
78            "importance": "essential",
79            "message": {
80              "text": "Uninitialized variable \"ptr\" passed to
81                       method \"add_core\".",
82              "richText": "Uninitialized variable ‘ptr‘ passed to
83                           method ‘add_core‘."
84            },
85            "location": {
86              "physicalLocation": {...
87                "region": {
88                  "startLine": 25,
89                  "snippet": {
90                    "text": "add_core(ptr, offset, val)"
91                  }
92                }
93              },
94            },
95          }
96        ]
97      }
98    ]
99  }
100],
Listing 10: Subkeys of the key codeFlow in Sarif
87"stacks": [{
88    "message": {
89      "text": "Call stack resulting from usage of uninitialized variable."
90    },
91    "frames": [
92      {
93        "message": {
94          "text": "Exception thrown."
95        },
96        "location": {
97          "physicalLocation": {...
98            "region": {
99              "startLine": 110,
100              "startColumn": 15
101            }
102          },
103        },
104        "threadId": 52,
105        "address": 10092852,
106        "parameters": [ "null", "0", "14" ]
107      },
108      {
109        "location": {
110          "physicalLocation": {...
111            "region": {
112              "startLine": 43,
113              "startColumn": 15
114            }
115          },
116        },
117        "threadId": 52,
118        "address": 10092176,
119        "parameters": [ "14" ]
120      },
121      {
122        "location": {
123          "physicalLocation": {...
124            "region": {
125              "startLine": 28,
126              "startColumn": 9
127            }
128          },
129        },
130        "threadId": 52,
131        "address": 10091200,
132      }
133    ]
134  }
135],
Listing 11: Subkeys of the key stacks in Sarif
  • ruleId is the unique identifier of the analysis rule that was evaluated to produce the result.

  • ruleMessageId refers to a message in the metadata.

  • richMessageId refers to a more descriptive message in the metadata.

  • message describes the warning. If the message is not specified, the ruleMessageId is used instead.

  • baselineState describes the state of the result with respect to a previous baseline run (i.e., new, existing, or absent).

  • level indicates the severity of the result (e.g., error, warning).

  • locations contains one or more unique location objects marking the exact location of warning, as shown in Listing 9. It contains the physical location (e.g., file name, line and column) or the logical location (such as namespace, type, and method name) and the region in the file where the result is found. If the physical location information is absent, the fullyQualifiedLogicalName property is used instead.

  • codeFlows is an array of individual code flows, which describe the execution path of the warning step by step. An example is shown in Listing 10.

  • stacks is an array of call-stack frames created by the analysis tool. Each stack frame contains location information to the call-stack object, a thread id, parameter values, memory addresses, etc. This is illustrated in Listing 11.

  • fixes is an array of fix suggestions. For each file in a fix object, the format describes regions that can be removed and new contents to be added. An example is found in Listing 12.

  • workItemUris is an array of URIs to existing work items associated with the warning. Work items can be GitHub issues or JIRA tickets for example.

100"fixes": [{
101  "description": {
102    "text": "Initialize the variable to null"
103    },
104  "fileChanges": [{
105    "fileLocation": {
106      "uri": "collections/list.h",
107      },
108    "replacements": [{
109      "deletedRegion": {
110        "startLine": 42
111      },
112      "insertedContent": {
113        "text": "A different line\n"
114      }
115    }]
116  }]
117}]
Listing 12: Subkeys of the key fixes in Sarif

3.2. Metadata

We now detail the tool and resources keys and their subkeys, which are used in Sarif to store analysis metadata.

tool

The key tool contains information regarding the static analysis tool that performed the analysis and produced the report. Its self-descriptive keys are shown in Listing 13.

resources

The resources key contains resource objects such as localized items such as rule metadata and message strings associated with the rules. This prevents data duplication if, for example, multiple warnings refer to the same rule. Each rule object contains rule information such as rule id, rule description, and message strings. This is illustrated in Listing 14. Note that the subkeys messageStrings and richMessageStrings contain all of the messageStrings and richMessageStrings of the result objects (Listing 14).

4. From the CogniCrypt Reporting Format to Sarif

In this section, we detail our approach for converting CogniCrypt results to the Sarif format, following the requirements of Section I.2 of the SARIF documentation444http://docs.oasis-open.org/sarif/sarif/v2.0/csprd01/sarif-v2.0-csprd01.html#_Toc517436281 (OASIS, 2018). To illustrate our implementation, we use the example CogniCrypt report in Listing 15 obtained after analysing an example file from CogniCrypt: Examples.jar555https://github.com/CROSSINGTUD/CryptoAnalysis/blob/master/CryptoAnalysisTargets/CogniCryptDemoExample/Examples.jar. The listing contains two warnings: a ConstraintError (lines 297-299) and a TypestateError (lines 303-305). Listings 16– 17 are snippets of the same report in Sarif format, with the latter describing the warnings, and the former containing all of the remaining data and metadata.

4.1. Mapping CogniCrypt Data to Sarif Keys

To write a Sarif exporter for CogniCrypt, it is important to first identify which information to export from the CogniCrypt error format. We detail this information in this section.

The first level of the Sarif JSON hierarchy contains the version and $schema information. In our implementation, this data is populated based on the current Sarif version: 2.0.0 (Listing 16 line 307), and its respective schema reference (Listing 16 line 308). This information is hardcoded in our converter.

135"tool": {
136  "name": "CodeScanner",
137  "fullName": "CodeScanner 1.1 for Unix (en-US)",
138  "version": "2.1",
139  "semanticVersion": "2.1.0",
140  "language": "en-US",
141  "properties": {
142    "copyright": "Copyright (c) 2017 by Example Corporation.
143    All rights reserved."
144    }
145},
Listing 13: Subkeys of the key tools in Sarif
117"resources":
118{
119  "rules": {
120    "C2001": {
121      "id": "C2001",
122      "shortDescription": {
123        "text": "A variable was used without being initialized."
124      },
125      "fullDescription": {
126        "text": "A variable was used without being initialized. This can result in runtime errors such as null reference exceptions."
127      },
128      "messageStrings": {
129        "default": "Variable \"{0}\" was used without being initialized."
130      },
131      "richMessageStrings": {
132        "richText": "Variable ‘{0}‘ was used without being initialized."
133      }
134    }
135  }
136}
Listing 14: Subkeys of the key resources in Sarif
145Findings in Java Class: example.TypestateErrorExample 
146
147   in Method: getPrivateKey
148    ConstraintError violating CrySL rule for KeyPairGenerator (on Object #9367df75558b10b537d558f11cb 7a523f082e7e256ab7ba827a36db283cf940e)
149      First parameter (with value 1024) should be any of {2048, 4096}
150      at line: 29
151
152
153   in Method: main
154    TypestateError violating CrySL rule for Signature (on Object #9c822ffdf2268ba2e0ff61f394b200 a7510d25a3d4a558ae811e624191c3583b)
155      Unexpected call to method sign on object of type java.security.Signature. Expect a call to one of the following methods initSign,update
156      at line: 24
Listing 15: Example CogniCrypt output
136{
137  "version": "2.0.0",
139  "runs": [{
140    "tool": {
141      "semanticVersion": "1.0.0",
142      "fullName": "CogniCrypt (en-US)",
143      "language": "en-US",
144      "version": "1.0.0"
145    },
146    "files": {
147      "example/TypestateErrorExample.java": {
148        "mimeType": "text/java"
149      }
150    },
151    "results": [...],
152    "resources": {
153      "rules": {
154        "TypestateError": {
155          "id": "TypestateError",
156          "fullDescription": {
157            "text": "The ORDER block of CrySL is violated, i.e., the expected method sequence call to be made is incorrect. For example, a Signature object expects a call to initSign(key) prior to update(data)."
158          }
159        },
160        "ConstraintError": {
161          "id": "ConstraintError",
162          "fullDescription": {
163            "text": "A constraint of a CrySL rule is violated, e.g., a key is generated with the wrong key size."
164          }
165        },
166      }
167    },
168  }]
169}
Listing 16: Sarif output example for Listing 15 (1/2)

To fill the runs information (Listing 16 line 309), we map the following data found in the CogniCrypt error format to the keys of the Sarif format:

156
157  "results": [
158  {
159    "locations": [{
160      "physicalLocation": {
161        "fileLocation": {
162        "uri": "example/TypestateErrorExample.java"
163        },
164        "region": {
165          "startLine": 29
166        }
167      },
168      "fullyQualifiedLogicalName": "example::TypestateErrorExample:: getPrivateKey"
169    }],
170    "ruleId": "ConstraintError",
171    "message": {
172    "text": "First parameter (with value 1024) should be any of {2048, 4096}.",
173    "richText": "ConstraintError violating CrySL rule for KeyPairGenerator."
174    }
175  }, {
176  " locations": [{
177      "physicalLocation": {
178        "fileLocation": {
179        "uri": "example/TypestateErrorExample.java"
180        },
181        "region": {
182          "startLine": 24
183        }
184      },
185      "fullyQualifiedLogicalName": "example::TypestateErrorExample::main"
186      }],
187      "ruleId": "TypestateError",
188      "message": {
189      "text": "Unexpected call to method sign on object of type java.security.Signature. Expect a call to one of the following methods initSign,update.",
190      "richText": "TypestateError violating CrySL rule for Signature."
191      }
192    }]
Listing 17: Sarif output example for Listing 15 (2/2)

4.2. Implementation Details

Our implementation of the CogniCryptSarif converter is integrated in the CogniCrypt repository (Kummita, [n. d.]) and can be enabled by using the --sarifReport option and specifying a directory to store the generated report using --reportDir option. An example is shown at line 5 of Listing 5.

The results of CogniCrypt are available through the class crypto.reporting.ErrorMarkerListener. Each object of this class contains an errorMarkers field containing all warnings. The main class of our converter is crypto.reporting.SARIFReporter, which extends ErrorMarkerListener. In this class, we have overridden the method afterAnalysis(), in which we iterate through the CogniCrypt warnings and convert them into Sarif. Since CogniCrypt stores its results in a Google Guava Table, the complexity of our connector is linear with respect to the number of findings.

4.3. Evaluation

We verified the implementation of our CogniCrypt converter using an online Sarif validator131313http://sarifweb.azurewebsites.net (sar, [n. d.]). The validator takes the generated Sarif file as the input, scans over it, and communicates the format issues when the generated Sarif report does not follow the standard specified in (OASIS, 2018). We generated the Sarif files for all of the CogniCrypt test cases141414https://github.com/CROSSINGTUD/CryptoAnalysis, including the one used in this report (Listings 16 and 17). The validation of the Sarif format passed.

A threat to validity is that the validator was in beta-testing phase at the time. Thus, in addition, we manually verified the JSON format of our reports according to the Sarif standard. All of our Sarif reports were correct.

5. Conclusion and Future Work

In this paper, we explored how to convert the CogniCrypt error format into the more general Sarif format. After detailing the two formats, we detailed our implementation. In our evaluation, we confirmed the correctness of our converter on the CogniCrypt test cases. The current implementation of our connector is available online as part of the official CogniCrypt implementation on GitHub (Kummita, [n. d.]). Since this is an initial prototype, there is still room for improvement. One such improvements is to finish the implementation of the converter to include invocation and logical location information. Another improvement concerns the CogniCrypt error format, which does not encode as many details as it could. For example call-graph information is available in the analysis and could be encoded in Sarif, but the data is lost through the CogniCrypt report. The connector can be improved to retrieve the information directly from the analysis. As a follow-up to this work, CogniCrypt also needs a full support for Sasp, since it is now able to export its results in Sarif.

Acknowledgements.
This research was conducted under the supervision of Eric Bodden as part of the Secure Systems Engineering seminar at Paderborn University, organized by Lisa Nguyen Quang Do. It was partially funded by the Heinz Nixdorf Foundation and by the NRW Research Training Group on Human Centered Systems Security (nerd.nrw).

References

Appendix A CrySL Rules

a.1. KeyGenerator

169SPEC javax.crypto.KeyGenerator
170OBJECTS
171    int keySize;
172    java.security.spec.AlgorithmParameterSpec params;
173    javax.crypto.SecretKey key;
174    java.lang.String alg;
175    java.security.SecureRandom ranGen;
176
177EVENTS
178    g1: getInstance(alg);
179    g2: getInstance(alg, _);
180    Gets := g1 | g2;
181
182    i1: init(keySize);
183    i2: init(keySize, ranGen);
184    i3: init(params);
185    i4: init(params, ranGen);
186    i5: init(ranGen);
187    Inits := i1 | i2 | i3 | i4 | i5;
188
189    gk: key = generateKey();
190
191ORDER
192    Gets, Inits?, gk
193
194CONSTRAINTS
195  alg in {"AES", "HmacSHA224", "HmacSHA256", "HmacSHA384", "HmacSHA512"};
196    alg in {"AES"} => keySize in {128, 192, 256};
197
198REQUIRES
199    randomized[ranGen];
200
201ENSURES
202    generatedKey[key, alg];