Designing Robust API Monitoring Solutions

05/01/2020
by   Daniele Cono D'Elia, et al.
Sapienza University of Rome
0

Tracing the sequence of library and system calls that a program makes is very helpful in the characterization of its interactions with the surrounding environment and ultimately of its semantics. Due to entanglements of real-world software stacks, accomplishing this task can be surprisingly challenging as we take accuracy, reliability, and transparency into the equation. To manage these dimensions effectively, we identify six challenges that API monitoring solutions should overcome and outline actionable design points for them, reporting insights from our experience in building API tracers for software security research. We detail two implementation variants, based on hardware-assisted virtualization (realizing the first general-purpose user-space tracer of this kind) and on dynamic binary translation, that achieve API monitoring robustly. We share our SNIPER system as open source.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/15/2021

Recommending API Function Calls and Code Snippets to Support Software Development

Software development activity has reached a high degree of complexity, g...
06/08/2021

zbMATH Open: API Solutions and Research Challenges

We present zbMATH Open, the most comprehensive collection of reviews and...
01/25/2008

Increased security through open source

In this paper we discuss the impact of open source on both the security ...
08/31/2018

Wasabi: A Framework for Dynamically Analyzing WebAssembly

WebAssembly is the new low-level language for the web and has now been i...
12/01/2020

Designing Voice-Controllable APIs

The main purpose of a voice command system is to process a sentence in n...
11/01/2019

Strategic API Analysis and Planning: APIS Technical Report

Traditionally, software APIs (application programming interfaces) have b...
09/06/2021

Towards API Testing Across Cloud and Edge

API economy is driving the digital transformation of business applicatio...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Modern operating systems come with large, heterogeneous software components that developers can build on when writing a software program. They expose their functionalities through Application Programming Interfaces (APIs) that compiled code accesses through well-defined prototypes and calling conventions. The sequence of APIs that a piece of code may invoke during its execution can be representative of its externally observable behavior, and ultimately of its semantics. In presence of complex code, however, static program analyses may fall short in providing an immediate characterization of such high-level behaviors.

Security researchers can resort to dynamic analysis to interpose on API calls. For instance, monitoring API calls is useful in malware analysis and code reverse engineering activities to track how an untrusted piece of software interacts with the surrounding environment [17]. Tracking API calls is also valuable in dependability research, e.g., for run-time monitoring [18] and troubleshooting [20] of programs. As API monitoring implies an underlying interception mechanism for calls, the process also goes under the colloquial name of API hooking.

Security researchers and practitioners use different forms of API monitoring, with implementations tailored to different contexts. For instance, a malware sandbox may interpose mainly on system calls so to collect in a single spot the events of interest, justified by the desire not to miss behaviors exercised by unaccounted-for library APIs. But when analysts dissect a sample, they resort to monitoring solutions for library calls to understand how its code achieves some behavior. Consider an application using an HTTP-related API: intercepting a packet transmission down in the software stack is not as informative as logging the API call that originated it. Tracking high-level facts is in general valuable for many monitoring, troubleshooting, and reverse engineering activities.

1.0.1 Contributions.

We observed that prior literature seems to touch only slightly the design space and the accuracy, reliability, and transparency dimensions of the API monitoring problem in the general case. Also, we found currently available tracing tools to often fall short in one or more of these respects. Motivated by these observations, in this paper we report on our experience in building robust and accurate tracers, presenting a general design that works for different instrumentation technologies and addresses the three above-mentioned dimensions.

Our implementations target Windows applications, covering a large collection of DLLs and system calls, and can be extended to other systems. We devise two variants: one builds on dynamic binary instrumentation and can be used either standalone or as a support library for existing dynamic analysis systems; the other builds on virtualization extensions for more efficient and transparent instrumentation, and represents the first open solution of this kind. Although general-purpose, we incubated them as part of our malware analysis research: the tricky patterns found in this realm combined with quirks of Windows internals have been a tough training ground for their development. We make the code from our project SNIPER available at https://github.com/dcdelia/sniper.

2 Preliminaries

Before describing our tracing solutions, we first present relevant traits of the API call resolution process in the Windows realm, and illustrate the instrumentation technologies available to date for implementing an API monitoring system.

2.1 Windows API Resolution and Internals

Windows applications can access functionalities of the surrounding environment by loading functions from DLL (dynamic-link library) modules. To solve addresses for external symbols, Windows executables define a static Import Address Table (IAT) that the loader populates at run time with pointers to the desired functions, which are imported from known DLL modules.

Every DLL then maintains an export table for its public functions and storage. Each entry, dubbed also export, is associated with a relative virtual address (RVA), that is its offset from the base address of the module. For an executable importing one or more exports from a DLL, the loader will populate the involved IAT entries by looking up the corresponding RVAs and summing them with the base address chosen for the DLL by the system when loading it.

There are however alternative methods to locate API addresses. A program may manually load a DLL and locate its exports using the GetProcAddress API that does not touch the IAT. Furthermore, regardless of how DLL loading happens, a program may manually solve symbols by navigating the loaded code modules and parsing their export tables. This happens frequently with applications wrapped by executable packers and protectors, which are popularly used in both malicious and benevolent programs [17].

When it comes to the internal structure of a DLL, an exported API can be of different kinds. The base case is when its logic is fully contained in the code starting at the given RVA. In other cases the code is partial and ends with a tail jump to another function (either private or exported) or to an export from another DLL. The latter case is frequent for instance in kernel32.dll relying on kernelbase.dll. In other cases the RVA does not point to code, but represents a forwarder export [31]: this instructs the loader to silently rewire any IAT entry referencing it to point to another export from another DLL. Due to these factors it is not always easy to determine what are the “exit points” for an export.

Export tables also do not contain prototype information. Header files from the Windows SDK specify the calling convention (typically stdcall) and the input modifier of each function argument, that is, when the parameter identifies an input (IN) or an output (OUT) value, or both (INOUT) [32]. As argument passing is by value, an output argument takes the form of a pointer. Headers also introduce a large number of type aliases and structure declarations.

System calls, or syscalls for short, are normally invoked by a program through user-space wrappers from ntdll.dll that set the syscall ordinal in register EAX and trigger a context switch. A program however can elude their monitoring by extracting the ordinals for the Windows version in use from ntdll.dll and triggering the switch with custom code, realizing a so-called direct syscall. Experienced coders can also use undocumented syscalls to make the analysis harder, and prototypes for them may only be found in reverse engineering forums.

2.2 Instrumentation Technologies

As we will see throughout the paper, the type of instrumentation a tracer uses to interpose on API calls impacts several dimensions of the hooking process.

When operating in user space, one possibility is to rewire each IAT entry to point to a stub that logs the call before invoking the intended API function. This approach, known as IAT hooking, misses however calls to APIs solved without using the IAT (§2.1). In terms of recall a better alternative is to move instrumentation to the API code, modifying the initial bytes of each monitored function with the insertion of a trampoline to an analysis stub. A common weakness of both approaches is that their modifications are visible to an adversary.

Artifacts are a well-known problem for dynamic analyses that operate through binary patching. For this reason other technologies have gained in popularity in security research [8]. Dynamic binary translation (DBT) systems can trap execution at arbitrary instructions based on their type (e.g. control transfer instructions) or address, while providing the running code with the illusion that instrumentation is not present. Dynamic binary instrumentation (DBI) is a popular DBT technique for user-space monitoring of programs [5, 29, 8]. When an analysis has to deal with system-wide or kernel-level flows, researchers have used whole-system emulators like QEMU [3] as a DBT system to instrument entire virtual machines. Virtual machine introspection (VMI) tools then come into play to overcome the semantic gap when inspecting high-level features of the underlying OS and processes by reading the raw memory of the guest [14].

The advent of CPU virtualization extensions (VT) has recently favored new instrumentation schemes with better performance and transparency. By maintaining a split view of code and data in the Extended Page Table, Deng et al. [12] show how to create invisible breakpoints for registering analysis callbacks at specific instruction addresses. To insert a breakpoint they create a code page used only for instruction fetching, while the original bytes are left untouched in a data page available for read/write operations: this design defeats introspective attacks by an adversary. Modern sandboxes like [26] use variants of this mechanism to hook system calls and few selected library calls. Later on we will see however that lazy loading mechanisms and other OS entanglements can get in the way when one wishes to use this technology to trace calls to arbitrary APIs.

3 Design Space

In this section we identify general problems in API tracing systems and discuss fundamental aspects in the design space of such a system to cope with them.

3.1 Challenges in API Monitoring

When starting our project we observed that publicly available, general-purpose API tracing systems fall short in one or more of the following aspects:

[C1] Transparency.

Adding probes or other instrumentation for intercepting calls to an API may introduce artifacts that an adversary can look for [17].

[C2] Recall.

The points in the software stack where instrumentation takes place also determine how many of the actual calls a tracer can capture, as we have seen for instance with the limitations of IAT hooking (§2.2).

[C3] Coverage.

Tracing parameter values used in an API call is more informative than tracing API names alone. This requires a programmatic approach to extract prototypes and non-primitive data type declarations for an ample universe of libraries, so to avoid retrieving incomplete information.

[C4] Output values.

A tracer should capture the return value of an API call, but also any data written by it to output buffers supplied by the caller.

[C5] Relevance of calls.

A tracer should filter out intra- and inter-component calls that happen down the software stack: these internal calls make up for a large fraction of the logs but are not informative for users.

[C6] Derived flows.

A tracer should cover derived execution flows such as child processes and remote threads injected in other processes, as an adversarial program may use them to hide API calls from the analysis [23].

We observe that [C1,C6] are compelling in several security settings, [C2-4] impact the soundness and completeness of API monitoring-based analyses, and [C5] affects the usability of the system when a human agent is involved. To come up with robust solutions for API tracing, we reasoned on the key factors behind these challenges and how a hooking design can meet them. In particular, we wanted to pursue accuracy and reliability in the tracing process without tying the design to an instrumentation technology (and its transparency properties).

For [C1] as threat model we assume that the software under analysis may run tests that perform checksumming operations on its code and data (T1), and verify the integrity of DLL code loaded in memory by comparing it with its counterpart on disk or using precomputed signatures (T2). Both behaviors are often found in malware and in programs shielded by executable protectors [1]. We wrote an adversarial program that for test T1 compares each IAT entry of the running program with the expected address for its symbol (found in the export table of the DLL exporting it), and for T2 reads the DLL contents from disk to a buffer, applies relocations (to match where Windows loaded the real DLL), and compares the first 8 bytes of every imported DLL symbol with our copy. Test T1 will expose stubs for IAT hooking, and T2 hooking trampolines.

In Table 1

we report the outcome of the two tests on three popular tracing products used by security professionals and two open-source research tools based on DBT. The first three products were reliably detected by our program. We defer the discussion of the

[C2-6] dimensions for the five tools to §4.5, since in order to better understand it we first need to detail the key points of our design.

max width=0.83 Tracing system Technology Test T1 Test T2 C2 C3 C4 C5 C6 API Monitor v2-r13 IAT hooking -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt SpyStudio v2.9.2 Trampolines -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt WinAPIOverride 6.6.6 Trampolines -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt drltrace [40] DBI -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt PyREBox [7] QEMU-TCG -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt SNIPER DBI, VT -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt -2pt 0cm 0pt 0pt

Table 1: Commodity tools and research systems with generic API tracing capabilities. Circles are filled to indicate if an aspect is met to a basic, good, or optimal extent.

3.2 Scope of Monitoring

Prototypes.

We mentioned that a DLL only provides names and relative locations of its exported functions. For obtaining parameter information a sound way to go is to cross-reference exported names with function declarations from the header files of the Windows SDK111Plus manually assembled headers for undocumented structures and syscalls.: the Deviare engine [34] offers an infrastructure to this end. This approach is general and can be applied also to third-party libraries when their headers are available. The programmatic extraction shall include the size of each parameter (pointers require a recursive valuation), as it is needed to fetch values at run time. Input modifiers are not present in headers, but shall be extracted from the MSDN documentation using a crawler [42].

Relevant calls.

We then reasoned on what would make a traced call informative for a user [C5]: internal calls happening within a DLL or spanning multiple modules do not provide valuable insights to the user, but only describe how the OS implements the outer high-level API that triggers them.

We define the scope of a call relevant when the call is made in code belonging to the program under analysis and eventually returns to it. This definition rules out internal calls or jumps to other exports, and redirections within API code (§2.1), but captures syscalls that programs invoke from their code. By program under analysis we mean the process where the code first runs, the child processes and remote threads it creates, and any recursive byproducts [C6].

3.3 Hook Insertion and Accuracy of the Tracer

As part of its working, a tracer interposes on specific events: obviously the invocation of an API (API entry

event), and the moment it returns to its caller (

API exit) for output values [C4]. The placement of hooks through instrumentation affects both the recall of a tracer [C2] and call parameter extraction [C3, C4].

API entry events.

Given a generic program, a static analysis to determine where it may invoke an API is not straightforward; obfuscations and anti-analysis measures make this problem harder [17]. A reliable choice to intercept an API entry event is thus to monitor when execution reaches code from an API function.

One way is to interpose on control transfer instructions: for instance, PyREBox hooks every call and jmp in the code and checks their target against a list of API addresses. This approach unfortunately introduces unnecessary overhead for transfers unrelated to APIs (which are dominant) and may miss unconventional patterns222Consider for instance a push-push-ret sequence: it writes the return address and the address of the API to invoke to the stack and uses ret to trigger the transfer.. It also requires an instrumentation facility that can hook instructions by type as the CPU sees them during fetching, ruling out static rewriting (self-modifying code breaks it [8]) and VT-assisted instrumentation.

We argue instead for placing instrumentation in the prologue of API functions: for a DLL loaded in memory we hook every unique RVA that appears in its export table and is not a forwarder. For a forwarder we instrument then the function hosting its actual code. This combination maximizes recall [C2].

API exit events.

Intercepting when execution returns to the caller of an API is needed to log return values and output arguments [C4]. Figure 1 depicts two viable options to catch API exit events: (a) placing hooks at DLL load time on the exit points of each API, or (b) doing that dynamically—during an API entry event—for the instruction located at the return address for the call.

Strategy (a) of chasing exit instructions is not immediate due to the redirections present in many API implementations (§2.1). Based on the insights from analyzing Windows DLLs, we wrote a static analysis that processes partial implementation stubs and tail calls in APIs to determine the exit points. We found exotic cases where an export makes a tail call to an export from another DLL that in turn leads to a tail call to another export from a third DLL.

Strategy (b) of chasing return addresses looks easier at first, but the hooking logic should be carefully designed to process only a real API exit. In fact, the instruction following the call may be a join point in the control flow graph, and later on be reached from basic blocks that do not end with an API call. We found many instances of this pattern for example in Microsoft utilities (Appendix 0.A).

Unless a program contains a pathological number of these patterns, strategy (b) is more attractive as it brings fewer invocations of the analysis callback for API exit events, as our design discards internal calls. In fact strategy (a) would trigger the callback also for them, and analysis code has to ignore them.

Choosing one scheme over the other has no impact on the other components of the design, but depends mainly on the capabilities and efficiency the chosen instrumentation technology: we detail this aspect in §4.3 and explain why the two schemes may profitably coexist in the development of a tracer.

Parameter extraction.

To support retrieval of input and output arguments for an API call, the instrumentation facility should expose the CPU and memory context to the tracer. Upon API entry events, the stack pointer value suffices to locate the arguments under the 32-bit stdcall (Windows) and cdecl (UNIX-like systems) calling conventions. 64-bit code requires accessing dedicated registers for the first 4 parameters, and any additional one is passed via stack. For exit events the return value is available in register EAX (plus EDX for wider data types). 64-bit output parameters passed in registers can be saved safely during the entry event. Prototype information is essential on both entry and exit for computing offsets for stack arguments by taking into account their size and order.

4 Implementation

Figure 2 portrays our tracing proposal, which embodies what emerged from the discussions of the design space from the previous sections. We now detail its DBI and VT-based implementations in their common traits and distinctive features.

Figure 1: Handling API calls under strategy (a) and (b) for exits. Arrows represent hooks.

4.1 Instrumentation Technologies

The design options outlined in the previous sections are general, that is, they can be implemented over different instrumentation technologies. Choosing one technology over another however can lead to transparency concerns [C1].

We implement the first variant of SNIPER in Pin [29], a popular DBI choice in programming language, software testing, and security research. The variant ships as a high-level library suitable either for standalone usage as an in-guest API monitoring tool, or for being plugged in existing analysis systems based on Pin, which are numerous in security research [8] and other fields as well.

Figure 2: Bird’s eye-view of the proposed design.

The DBI abstraction ensures that every address or value manipulated by the program matches the one expected in a native execution. Under the hood, Pin operates by JIT-compiling and instrumenting code in a designated cache area: any hook we insert will not be visible to introspective attempts from the threat model of [C1]3.1). We also augment Pin with recent mitigations for DBI artifacts [8]. These factors contribute to making our tracer less conspicuous than commodity tools operating in user space through binary patching (§2.2).

The second variant brings a new piece to the research landscape: a tracer compatible with modern designs for efficient out-of-guest analyses via VT extensions. This variant is particularly suitable for scenarios where minimal invasiveness is desirable (e.g., with code sensitive to environment artifacts [26] or slowdowns [6]) and for monitoring system-wide flows. We build on the VMI features of libvmi [27] and the invisible breakpoints (§2.2) of DRAKVUF [26], a whole-system analysis framework based on the Xen hypervisor. Although DRAKVUF can accommodate generic analyses, its standard hooks monitor every active process and presently are mostly confined to selected syscalls for malware analysis.

4.2 Relevant Calls and Execution Units

In §3.2 we mentioned the advantages of restricting tracing to calls explicitly made by the program under analysis [C5]. The first dimension of the problem is however to identify processes and threads relevant to this end [C6].

Pin operates on a single process, but offers APIs to intercept the creation of a child process or a remote thread in an existing process [8]: we use them to extend the instrumentation to such flows automatically.

In the VT-based scenario the object of the analysis is the entire system, so we carry out a bookkeeping work to identify relevant execution units. We wrote a component that, starting from a process under analysis, tracks the creation of child processes and remote threads recursively. To this end we hook the NtCreateThreadEx and NtTerminateThread syscalls, walking the _EPROCESS and _ETHREAD structures to retrieve the involved thread IDs. We maintain a pool of monitored threads and when execution hits an invisible breakpoint from a hook, we check from the raised callback whether the current execution unit belongs to the pool. In case of code injection patterns, this design also lets us ignore activities from the “authentic” threads of a victim process.

We can then rule out internal calls happening in Windows components [C5] by checking in which region the return address of an API call falls. We observe that a whitelisting approach for code regions belonging to the program can be a slippery road: not only malware and protected executables, but even COTS programs can exercise exotic behaviors like executing code from the heap or change section permissions [6]. We find it safer instead to build a blacklist of return ranges for calls to discard, and populate it with code section addresses of Windows DLLs by intercepting their loading and unloading. This scheme turned out to be robust and efficient: as intervals are disjoint, we use an interval tree with logarithmic lookup cost. We complement the range lookup operation with ad-hoc measures for DLL tail jumps (§2.1) that we present in the next section.

4.3 Hook Insertion and Callbacks

4.3.1 API Entry.

As motivated in §3.3, we target RVAs of exported symbols from loaded DLLs in order to place hooks that interpose on API entry events.

DBI engines offer facilities to intercept system loader activities. Once we identify the base address of a DLL module of interest, we locate its export table, cross-reference the names of exported functions with a database of prototypes333The authors of PyREBox released a large DLL database that we use and refine in a few respects, e.g. to correctly distinguish INOUT arguments from OUT ones., and compute the run-time addresses of such APIs using their RVAs.

We instrument the first instruction in each function and register an onEntry analysis callback, hard-coding the address of the prototype information for the API as argument to avoid a run-time lookup. This approach is independent of the Windows version in use, and has performance advantages as we can use the IMAGE mode of Pin to place efficient ahead-of-time instrumentation [22].

In the VT-based scenario we can equally parse export tables for RVAs, or load precomputed Rekall profiles [36] for the current Windows version. We insert an invisible breakpoint at the first instruction in each function, associating to it an onEntry analysis callback with the address of its prototype information as hard-coded argument. Compared to the DBI case, the callback will first determine whether the intercepted thread belongs to the pool of threads to be monitored.

Invisible breakpoints operate on physical pages: adding instrumentation to logical addresses requires their translation to physical ones. As Windows implements a lazy loading mechanism, when we intercept a DLL loading event not all its pages may be amenable to hook insertion: put in other words, for a given logical address there might be no physical page yet [15]. We wrote a component that loads DLLs of interest in a separate process and reads code from their sections forcing page materialization: since such pages are normally shared among processes, we can place hooks also for the program of interest. This scheme still misses a few corner cases, but luckily other researchers concurrently developed a mechanism to force page faults and materialize pages upon DLL loading [16], using a new feature of libvmi that meanwhile became available. We have started to extend our implementation to integrate their technique.

max width=0.99

function onEntry(threadID, ESP, prototype, …):
      321 if *ESP RangeBlacklist then return SStack = getTLS(threadID) // if not SStack.empty() then
            54cInfo = SStack.peek() // if *ESP == cInfo.ra && ESP == cInfo.esp then return
            9876 hookReturnAddress(*ESP) // removeStaleEntries(SStack, ESP) // SStack.push(*ESP, ESP, prototype) // parseArgsOnEntry(ESP, prototype, ...)
             function onExit(threadID, ESP, EIP, EAX, …):
                  1413121110 SStack = getTLS(threadID) if SStack.empty() then return idx = SStack.size() - 1 cInfo = SStack[idx] while true do
                        1615if cInfo.ra == EIP - -idx 0 then breakcInfo = SStack[idx]
                        1817 if idx == -1 then return if ESP == cInfo.esp + 4 + cInfo.prototype.retN then
                              2019parseArgsOnExit(cInfo.esp, cInfo.prototype, EAX, ...) SStack.resize(idx) //
                              
Figure 3: Analysis callbacks executed upon API entry and exit events.
onEntry callback.

The analysis callback takes as input the value of register ESP (to access the return address and parameters on the stack), a pointer to the prototype information for the current API, and further registers where needed (e.g., with 64-bit code). We provide its simplified pseudocode in Figure 3.

We maintain a thread-local shadow call stack of currently monitored functions444

We use the plural as within a thread the concurrently active functions that return to user code may be multiple: this happens for instance when a program (e.g., a malware sample) dynamically loads a custom DLL file with

LoadLibrary and its DllEntryPoint function invokes one ore more APIs before LoadLibrary returns.. Line 3 restricts logging to calls made in user code, or we would be mirroring also API calls within DLLs. Line 3 discards internal jumps and tail calls to other exported functions, which would see the same return address in program code of their caller (current top stack entry). Line 3 deals with hooking the return address when we use strategy (b) for handling API exit events. Line 3 performs sanitization of stale stack frames in case of instrumentation glitches (if ESP is at higher addresses than the ones stored, those calls likely returned already since the stack grows downwards to lower addresses). Finally, lines 3 and 3 update the shadow stack and log the call, respectively.

4.3.2 API exit events

In §3.3 we presented two strategies for tracing API returns: hooking exit instructions (a) or return addresses (b). For strategy (a) we can place hooks at DLL load time at the exit points identified with static analysis, while for strategy (b) we place them dynamically upon API entry events (line 3).

In the VT-based scenario invisible breakpoints naturally backed both strategies, as they can target arbitrary addresses. For Pin strategy (a) was initially the only efficient option: Pin lacked a neat way to place hooks on specific instructions during execution without resorting to heavy-duty features like TRACE mode, while its ROUTINE mode has known reliability issues for catching routine exits [22]. Once the PIN_RemoveInstrumentationInRange API became available with Pin 3.2, we implemented strategy (b) by forcing Pin to recompile and reanalyze only the instruction at the return address. Such recompilation is needed only the first time a new return address is hit at line 3: as we will explain shortly, the analysis callback will ignore any subsequent spurious raises of the hook.

In §3.3 we also mentioned that strategy (b) reduces the fraction of times the onExit callback is invoked for uninteresting events that have to be discarded. There could be cases however where it may be more intrusive for the program under analysis (e.g., due to recompilation events in Pin), or an adversary knowing the details of the system may tamper with return addresses on the stack. We retain support for both schemes and leave the choice to the user. The two can coexist seamlessly for instance when dealing with a new DLL or function for which we did not precompute exit points: we can instrument the return addresses for such calls, and use the other scheme for the rest of the APIs. This choice also helped us when developing the VT-based version, as for some exit instructions the insertion of an invisible breakpoint in DRAKVUF failed with no apparent reason, but we could fall back to hooking return addresses for that API.

onExit callback.

The analysis routine initially looks for the most recent stack entry matching the current return address, represented by the instruction pointer EIP. The pseudocode shown in Figure 3 is for strategy (b): for this reason lines 3 and 3 can be hit when a previously hooked return address is reached by blocks that did not make an API call (§3.3). This logic is semantically equivalent to turning off instrumentation at the return address, which may not always be cheap (and we wanted to minimize recompilations in Pin). A sanity check at line 3 compares the current ESP value against the one stored by onEntry for the frame, “undoing” the effects of the return instruction555After the return, the stack pointer value will be higher than the ESP seen by onEntry by r+N bytes: r=4 for 32-bit code and 8 for 64-bit code, while N is the space used on stack in argument passing for stdcall APIs (in Windows APIs it is the callee’s responsibility to clean the stack for the caller), and zero for cdecl functions.. We observed that in practice simply checking for is a reliable approximation.

Once we located the shadow stack entry for the current API call, we invoke the routine for processing the return value and output parameters, passing to it the stack pointer value seen on entry, register EAX and where needed EDX for the return value, and any saved output arguments for 64-bit code.

For using strategy (a) we can see that, as the callback would trigger on an exit point, the instruction pointer EIP has not been diverted yet to the return address (which can be found however at *ESP), and the stack pointer has not been adjusted with the displacement associated with the return sequence. We can then adapt the code reported for onExit in Figure 3 by replacing EIP with *ESP at line 3 and by using as condition at line 3.

4.3.3 Parameter extraction.

The routines called by onEntry and onExit at lines 3 and 3, respectively, are conceptually similar. Both may have to locate data from the stack, computing offsets based on the size of each previous argument in the prototype. Logging primitive types is straightforward, while in the case of pointers we need to distinguish the type and size of a pointed object, but first of all verify whether a pointer is meaningful, i.e., if it points to valid data.

Ideally, a sound way would be to take into account the API semantics (e.g., check its return code to discriminate errors), but this may be unrealistic for a general-purpose tracer. We check instead if the pointed object falls into valid memory and call a print helper for its contents. This check is immediate for fixed-size objects such as primitive types or structures. For variable-size objects like strings we cannot rely on the presence of some terminator when an API fails: we conservatively fetch a predefined amount of bytes from the address, reducing it if the resulting chunk would span two pages with the second being invalid.

4.4 System Calls

In the presentation we postponed the discussion of how our implementations address syscalls, as there are some unique aspects to their handling. From the program’s perspective syscalls are self contained: they happen between two context switches (to kernel mode when invoked, back to user mode upon termination), and no shadow stack update is needed. For prototype information we use the database from the DrSyscall module of the DynamoRIO DBI system [5]: it covers many undocumented syscalls, and also auxiliary data for distinguishing the parameter type for cases where it depends on another parameter of the call.

DBT systems can intercept when context-switching instructions are about to execute. In Pin we register two analysis callbacks for syscall entry and exit events: from there we extract the syscall ordinal, retrieve the corresponding prototype from the DrSyscall database, and extract the parameters for the call. In the VT-based scenario we cannot interpose on instructions by type, but we follow the design proposed by the authors of DRAKVUF and detailed in [26], that is, we instrument the entry and exit instructions of syscalls in the NT kernel of Windows (e.g., ntkrnlpa.exe when Physical Address Extension is enabled).

We deem a syscall relevant if it returns either to program code directly (as with direct sycalls commonly found in malware) or to some Nt library wrapper from ntdll.dll that was called from program code. For the latter case we walk back the stack mimicking the effects of the epilogue instructions of the wrapper: we check if the stack frame of the method returns to program code or to an address in the DLL range blacklist, discarding the call in the second case.

4.5 Comparison with Other Tracing Solutions

We can now resume the discussion of Table 1. User-space monitoring solutions (first three entries) introduce classic artifacts [C1] and have other limitations. In terms of recall [C2] no one can catch direct syscalls, API Monitor misses calls to API solved without using the IAT, and SpyStudio hooks only a selected number of APIs used deeply in the software stack. The three tools have accurate prototypes [C3], and can trace output arguments [C4] as their stubs wrap API returns too. Filtering out internal calls is a task for the user [C5]. Processes are traced as a whole, and users have to add child and injected processes to the monitoring manually (or using filters in WinAPIOverride) [C6].

drltrace and PyREBox fare well in terms of artifacts [C1], as they use DBT for hooking: drltrace builds on DBI with DynamoRIO, while PyREBox uses whole-system QEMU-TCG emulation. drltrace iterates on export tables to hook API prologues [C2] like we do. As limitations, it supports fewer APIs than other tools [C3], does not trace return values and output parameters [C4], and can only follow child processes [C6]. It has automatic filtering capabilities for internal calls [C5] by whitelisting the text and heap regions (we mention possible pitfalls in §4.2), but does not expunge tail transfers used for internal calls.

PyREBox interposes on every call and jmp instruction to hook API entry events [C2]: as explained in §3.3, this can add important overheads to the (already high) ones of QEMU [10]. PyREBox has a remarkable collection of prototypes [C3] that we borrow, and logs output parameters by hooking ret instructions in the execution [C4]. For internal calls users may only specify manual filtering policies for calls across specific pairs of libraries [C5]. For derived flows [C6] it follows child processes and conservatively monitors any process that the program interacts with via NtOpenProcess, while our solution is less noisy as it tracks only injected threads, ignoring the normal activities of a victim process.

5 Experimental Results

5.0.1 Validation.

To debug and stress our implementations we initially used system utilities and programs shipped with Windows, as they use heterogenous and numerous APIs and occasionally syscalls. We then ran and verified the logs for deterministic programs such as the conformance tests of the Wine emulator, tools for assessing the transparency of malware sandboxes (as they exercise many low-level, occasionally undocumented primitives), and synthetic tests that we wrote and wrapped with state-of-the-art executable protectors. Our tracers currently run on top of Pin 3.11, Xen 4.12, and DRAKVUF commit 376c03d, and can track up to nearly 19K distinct APIs from 194 DLLs, and 446 syscalls.

5.0.2 Statistics.

In Table 2 we report figures from running 11 malware samples (used in a recent work[9] for their assorted anti-analysis patterns) and 6 classic productivity programs. We tracked APIs from 12 popular DLLs covering different OS features: advapi32, crypt32, gdi32, iphlpapi, kernel32, kernelbase, ole32, oleaut32, shell32, user32, wininet, and ws2_32.dll. For our tests we used an Intel i9-8950HK CPU, 3 GB of RAM, Windows 7 SP1 build 7601 32-bit, and strategy (b) for handling API exits. The DBI and VT-based variants yielded consistent results in the events they recorded. We observed no significant changes when repeating the experiments under Windows 10.

max width=0.99 # of syscalls # of DLL calls DLL APIs (from progr.) Avg call processing time (µs) from progr. internal from progr. internal distinct write out args avg # of args program code internal syscalls (int.) Subject tail normal onEntry onExit onEntry enter exit APT28 0 408 045 50 577 1 934 1 153 200 130 29 3.20 14.38 15.76 3.16 3.28 2.33 BlackSquid 0 12 172 4 667 988 55 715 151 38 2.82 17.42 17.81 14.27 10.76 3.71 Furtim 88 1511 541 371 2 365 887 71 25 2.49 16.53 30.79 2.69 3.61 4.27 Gootkit 0 3 068 4 737 4 478 31 507 79 23 1.37 5.61 8.27 8.69 4.17 2.86 Gozi-ISFB 19 1 509 13 449 11 019 22 180 75 28 1.54 5.60 6.45 3.18 4.13 9.45 Grobios 4 419 225 144 1 275 27 10 2.97 19.18 27.17 5.39 6.36 2.40 Olympic 0 1 129 434 298 4 726 64 26 3.26 18.38 23.36 4.59 8.22 16.83 SmokeLoader 15 485 49 27 1 019 28 10 3.94 29.50 21.20 9.86 5.39 2.53 softpulse 1 1 552 1 163 628 10 702 83 26 2.82 15.37 20.65 8.61 6.39 2.44 Swisyin 0 7 058 55 21 81 456 22 8 4.38 27.52 20.76 3.47 17.17 1.83 Untukmu 0 105 646 23 978 21 195 4 459 691 25 8 2.19 10.63 11.29 2.51 3.46 1.92 7zip 0 28 398 5 922 294 139 152 112 26 3.80 24.36 23.37 4.55 3.89 2.41 BitTorrent 0 113 742 268 804 109 608 913 214 366 109 2.73 16.44 16.87 4.98 6.83 3.26 Chrome 1 821 263 839 1 586 718 684 236 755 054 398 145 2.99 19.68 21.56 8.83 9.21 2.64 Foxit Reader 2 150 490 946 903 205 319 818 568 396 93 3.45 17.78 19.69 4.33 5.23 2.12 Notepad++ 0 315 440 2 955 873 1 645 034 725 638 231 36 2.04 8.63 10.57 3.90 3.15 2.24 TeamViewer 0 307 126 489 341 52 778 1 795 308 328 87 3.36 21.75 23.95 4.22 6.75 7.80

Table 2: API calls recorded on malicious samples and common productivity software.

The collected figures back the importance of distinguishing calls originating in program code from internal ones [C5], as the latter may be orders of magnitude more numerous. Syscalls from program code are few, but can reveal interesting details: for instance, for the Furtim malware they were critical for analysts to understand its adversarial strategies [37, 9]. For DLL functions we divide internal calls into normal and tail-call invocations: the ability to discard tail calls in onEntry (line 3) seems valuable, as they can be as numerically relevant as calls from program code (e.g., Gootkit, Gozi-ISFB). The table also reports how many distinct DLL APIs program code invokes, how many of them have output arguments, and the average number of arguments of all kinds from their prototypes. Output arguments seem relevant in practice [C4], as they are present in 15–40% of the APIs that we observed.

We then measured the time spent in executing our callbacks. The numbers shown in Table 2 refer to analysis code only, as probe insertion is a well-studied problem in DBI and VT-based research [5, 12, 18] that leaves us little optimization room. For DLL functions invoked from program code the average processing time was 5-31 µs for each API entry or exit event, with an apparent correlation with the number of arguments to process. Filtering out internal calls is cheaper: onEntry takes 3–15 µs to terminate after the range (line 3) or the tail call (line 3) check; onExit executes faster—likely due to locality effects—and we omit figures for it since hardly significant (1 µs). As for syscalls, we report figures only for internal ones as they are dominant. Note that enter analysis includes the cost of verifying the return address (§4.4). The extra cost when logging the arguments for syscalls from program code was in the order of a couple dozens of µs.

6 Discussion

6.0.1 Limitations.

Our design does not make use of primitives restricted to specific instrumentation technologies. For this reason it is amenable to different implementations, and we chose666Instrumenting QEMU for whole-system analysis did not seem appealing to us, as native execution of VT-based schemes is faster and brings fewer artifacts. We foresee however no major obstacle to accommodate our design: popular QEMU-based projects like [10] offer infrastructure to hook specific addresses that we could borrow, and our VMI-based thread tracking mechanism would work in QEMU as well. two systems that can cope well with the threat model of [C1] and the other requirements [C2-6]. Our design does not counter evasions targeting the peculiarities of the underlying chosen system. For this well-studied problem the implementation can resort to existing mitigations, such as patching the Time Stamp Counter in the VM monitor upon VM exit events [4], or hiding DBI runtime artifacts as we did by using the mitigation library of [8].

Kawakoya et al. in [23] resort to taint analysis for an adversarial model for API monitoring where an attacker can evade hooks by emulating with own instructions the initial portion of an API before jumping in the middle of its canonical implementation. We can cope with popular forms of such stolen code attacks by moving the hook from the initial instruction of an API to a later basic block (for instance, one that post-dominates the entry block in the control flow graph) where parameters are still visible. The authors also consider code injection attacks to elude monitoring using other processes: we tackle this surface by tracking child processes and remote threads. As we already hook system APIs, the implementation can be extended to recognize new exotic injections [25].

Our tracers are deceived by non-standard library loading. One attack described in [23]—and countered by the authors using disk-level taint analysis—consists in copying a system DLL to a non-standard path and obfuscating the symbols in its export table before loading. [24] describes a more complex attack where the adversary reimplements the Windows loader, and recursively rewires every import referencing other DLLs to use stealth copies of such libraries, so that calls to “standard” API functions are never made in the program. As future work we are thinking of exploring the countermeasures suggested in [24] to extend our design and handle also these attacks against API name resolution.

6.0.2 Other Related Works.

For dynamic analysis of binary programs researchers have used for a long time designs operating alongside the object of the analysis. In the context of monitoring the interactions with environment, systems of this kind have ranged from operation-specific tracers (e.g., [33]) to full-fledged sandboxes. A common strategy was to patch the functions of interest [17]. Microsoft Detours [21] offered general API hooking primitives based on trampolines to invoke a user-defined function, for instance to sanitize sensitive arguments. Similarly, in its day the pioneering CWSandbox [41] replaced the first 5 bytes of each API under observation with a trampoline to an analysis callback.

Code changes and artifacts introduced by such approaches worried researchers and practitioners [17]. In a seminal work [19] Garfinkel and Rosenblum proposed to move an intrusion detection system from the guest to the VM monitor, using VMI techniques to inspect the guest with better transparency and isolation. VMI was later adopted in many other scenarios, first and foremost malware analysis and memory forensics. Ether [13] pioneered low-artifact malware analysis with a system based on VT extensions with syscall tracing capabilities. Many works (e.g., [12]) have then followed in its footsteps, with important contributions in terms of improved transparency and flexibility.

To the best of our knowledge, SNIPER is the first attempt to extend VT-based monitoring to user-space API calls in an automated, general-purpose manner. Our project shares similarities with hprobes [18], a framework that uses VT extensions for warm hook insertion in user space. The work discusses three software dependability case studies: an emergency exploit detector, a watchdog, and an infinite-loop detector. To insert hooks hprobes overwrites instructions of interest with int3: this causes a VM exit upon execution, and the VM monitor kicks in to carry out the analysis. We find hprobes to serve a purpose orthogonal to ours, as it means to back generic, user-supplied analyses for specific events. The technique is also not transparent to checksumming attempts [C1].

This limitation is shared by designs for secure hook insertion inside a VM with OS modifications (e.g., [35, 38]), which are alternative to invisible breakpoints. Recent developments in this area [28, 39] feature more efficient isolation using the VMFUNC feature of VT extensions but introduce distinguishable code artifacts.

Dynamic binary translation systems are a popular choice for security analyses that require fine-grained instrumentation capabilities, such as tracking instructions by their semantics (e.g., reading memory) or performing substantial code modifications. DBT systems usually offer better transparency and flexibility than binary patching [8], although they may incur emulation artifacts [30]. A recent work [11] uses DBI for real-time function call detection for the internal functions of a program. Unlike API functions, entry points for those are not declared in the executable. The authors show how to scrutinize control transfer instructions to identify function calls reliably. It would be interesting to compare this approach in terms of recall and efficiency with a solution combining our design for on-entry hooks with recent advances for function detection in binaries [2].

7 Conclusions

API monitoring is a valuable technique in many security scenarios. We have shared our experience in tracing solutions for Windows binaries and their multifaceted universe of challenges, discussing general design options amenable to different implementation technologies. We also detailed the first tracer that builds on VT extensions, today popular for their efficiency and transparency. Our techniques are general: they make no assumption on how a program is compiled or obfuscated, but only on the calling conventions in use. Thus they may be applied also to other systems such as Linux and MacOS with some adaptations for the structure of libraries and their loading. We hope our readers will find the design points and technical insights presented in the paper useful for their research.

References

  • [1] A. Afianian, S. Niksefat, B. Sadeghiyan, and D. Baptiste (2019-11) Malware dynamic analysis evasion techniques: a survey. ACM Comput. Surv. 52 (6). External Links: ISSN 0360-0300, Document Cited by: §3.1.
  • [2] D. Andriesse, A. Slowinska, and H. Bos (2017-04) Compiler-Agnostic Function Detection in Binaries. In Proceedings of the 2nd IEEE European Symposium on Security and Privacy (EuroS&P’17), Paris, France. Cited by: §6.0.2.
  • [3] F. Bellard (2005) QEMU, a fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC ’05. External Links: Link Cited by: §2.2.
  • [4] M. Brengel, M. Backes, and C. Rossow (2016) Detecting hardware-assisted virtualization. In Detection of Intrusions and Malware, and Vulnerability Assessment, J. Caballero, U. Zurutuza, and R. J. Rodríguez (Eds.), Cham, pp. 207–227. External Links: ISBN 978-3-319-40667-1 Cited by: §6.0.1.
  • [5] D. Bruening, T. Garnett, and S. Amarasinghe (2003) An infrastructure for adaptive dynamic optimization. In Proc. of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO ’03, pp. 265–275. External Links: ISBN 0-7695-1913-X, Link Cited by: §2.2, §4.4, §5.0.2.
  • [6] D. Bruening, Q. Zhao, and S. Amarasinghe (2012) Transparent dynamic instrumentation. In Proceedings of the 8th ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments, VEE ’12, pp. 133–144. External Links: ISBN 978-1-4503-1176-2, Document Cited by: §4.1, §4.2.
  • [7] Cisco Talos PyREBox: Python scriptable reverse engineering sandbox. Note: https://talosintelligence.com/pyrebox (Accessed: April 20, 2020) Cited by: Table 1.
  • [8] D. C. D’Elia, E. Coppa, S. Nicchi, F. Palmaro, and L. Cavallaro (2019) SoK: using dynamic binary instrumentation for security (and how you may get caught red handed). In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, Asia CCS ’19, pp. 15–27. External Links: Document Cited by: §2.2, §3.3, §4.1, §4.1, §4.2, §6.0.1, §6.0.2.
  • [9] D. C. D’Elia, E. Coppa, F. Palmaro, and L. Cavallaro (2020) On the dissection of evasive malware. IEEE Transactions on Information Forensics and Security 15, pp. 2750–2765. Cited by: §5.0.2, §5.0.2.
  • [10] A. Davanian, Z. Qi, Y. Qu, and H. Yin (2019-09) DECAF++: elastic whole-system dynamic taint analysis. In 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019), Chaoyang District, Beijing, pp. 31–45. External Links: ISBN 978-1-939133-07-6, Link Cited by: §4.5, footnote 6.
  • [11] F. de Goër, S. Rawat, D. Andriesse, H. Bos, and R. Groz (2018) Now you see me: real-time dynamic function call detection. In Proceedings of the 34th Annual Computer Security Applications Conference, ACSAC ’18, pp. 618–628. External Links: ISBN 978-1-4503-6569-7, Document Cited by: §6.0.2.
  • [12] Z. Deng, X. Zhang, and D. Xu (2013) SPIDER: stealthy binary program instrumentation and debugging via hardware virtualization. In Proceedings of the 29th Annual Computer Security Applications Conference, ACSAC ’13, pp. 289–298. External Links: ISBN 978-1-4503-2015-3, Document Cited by: §2.2, §5.0.2, §6.0.2.
  • [13] A. Dinaburg, P. Royal, M. Sharif, and W. Lee (2008) Ether: malware analysis via hardware virtualization extensions. In Proc. of the 15th ACM Conference on Computer and Communications Security, CCS ’08, New York, NY, USA, pp. 51–62. External Links: ISBN 978-1-59593-810-7, Document Cited by: §6.0.2.
  • [14] B. Dolan-Gavitt, T. Leek, M. Zhivich, J. Giffin, and W. Lee (2011) Virtuoso: narrowing the semantic gap in virtual machine introspection. In Proceedings of the 2011 IEEE Symposium on Security and Privacy, SP ’11, pp. 297–312. External Links: ISBN 978-0-7695-4402-1, Document Cited by: §2.2.
  • [15] DRAKVUF project page Discussion on improvements around usermode hooking. Note: https://github.com/tklengyel/drakvuf/issues/669 (Accessed: April 20, 2020) Cited by: §4.3.1.
  • [16] DRAKVUF project page Memdump: dumps based on user mode API calls. Note: https://github.com/tklengyel/drakvuf/pull/675 (Accessed: April 20, 2020) Cited by: §4.3.1.
  • [17] M. Egele, T. Scholte, E. Kirda, and C. Kruegel (2008-03) A survey on automated dynamic malware-analysis techniques and tools. ACM Comput. Surv. 44 (2), pp. 6:1–6:42. External Links: ISSN 0360-0300, Document Cited by: §1, §2.1, item [C1] Transparency., §3.3, §6.0.2, §6.0.2.
  • [18] Z. J. Estrada, C. Pham, F. Deng, L. Yan, Z. Kalbarczyk, and R. K. Iyer (2015-Sep.) Dynamic vm dependability monitoring using hypervisor probes. In 2015 11th European Dependable Computing Conference (EDCC), Vol. , pp. 61–72. External Links: Document, ISSN null Cited by: §1, §5.0.2, §6.0.2.
  • [19] T. Garfinkel and M. Rosenblum (2003) A virtual machine introspection based architecture for intrusion detection. NDSS ’03. Cited by: §6.0.2.
  • [20] V. Golender, I. Ben Moshe, and S. Wygodny (US Patent 7386839B1, Jun 2008) SYSTEM and method for troubleshooting software configuration problems using application tracing. Cited by: §1.
  • [21] G. Hunt and D. Brubacher (1999-07) Detours: binary interception of win32 functions. In Third USENIX Windows NT Symposium, Third USENIX Windows NT Symposium edition, pp. 8. External Links: Link Cited by: §6.0.2.
  • [22] Intel Instrumentation granularity. In Pin official documentation (release 97998), Note: https://software.intel.com/sites/landingpage/pintool/docs/97998/Pin/html/ (Accessed: April 20, 2020) Cited by: §4.3.1, §4.3.2.
  • [23] Y. Kawakoya, M. Iwamura, E. Shioji, and T. Hariu (2013) API Chaser: anti-analysis resistant malware analyzer. In Research in Attacks, Intrusions, and Defenses, RAID ’13, pp. 123–143. External Links: ISBN 978-3-642-41284-4 Cited by: item [C6] Derived flows., §6.0.1, §6.0.1.
  • [24] Y. Kawakoya, E. Shioji, Y. Otsuki, M. Iwamura, and T. Yada (2017) Stealth loader: trace-free program loading for api obfuscation. In Research in Attacks, Intrusions, and Defenses, RAID ’17, pp. 217–237. Cited by: §6.0.1.
  • [25] A. Klein and I. Kotler (2019) Windows process injection in 2019 (process injection techniques - gotta catch them all). Black Hat USA. Note: https://i.blackhat.com/USA-19/Thursday/us-19-Kotler-Process-Injection-Techniques-Gotta-Catch-Them-All-wp.pdf (Accessed: April 20, 2020) Cited by: §6.0.1.
  • [26] T. K. Lengyel, S. Maresca, B. D. Payne, G. D. Webster, S. Vogl, and A. Kiayias (2014) Scalability, fidelity and stealth in the DRAKVUF dynamic malware analysis system. In Proceedings of the 30th Annual Computer Security Applications Conf., ACSAC ’14, pp. 386–395. External Links: ISBN 978-1-4503-3005-3, Document Cited by: §2.2, §4.1, §4.4.
  • [27] LibVMI Note: https://github.com/libvmi/libvmi (Accessed: April 20, 2020) Cited by: §4.1.
  • [28] Y. Liu, T. Zhou, K. Chen, H. Chen, and Y. Xia (2015) Thwarting memory disclosure with efficient hypervisor-enforced intra-domain isolation. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, New York, NY, USA, pp. 1607–1619. External Links: ISBN 9781450338325, Document Cited by: §6.0.2.
  • [29] C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood (2005) Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pp. 190–200. External Links: ISBN 1-59593-056-6, Document Cited by: §2.2, §4.1.
  • [30] L. Martignoni, R. Paleari, A. Reina, G. F. Roglia, and D. Bruschi (2013-10) A methodology for testing cpu emulators. ACM Trans. Softw. Eng. Methodol. 22 (4). External Links: ISSN 1049-331X, Document Cited by: §6.0.2.
  • [31] Microsoft Developer Blogs Exported functions that are really forwarders. Note: https://devblogs.microsoft.com/oldnewthing/?p=30473 (Accessed: April 20, 2020) Cited by: §2.1.
  • [32] Microsoft Header annotations. Note: https://docs.microsoft.com/en-us/windows/win32/winprog/header-annotations (Accessed: April 20, 2020) Cited by: §2.1.
  • [33] Microsoft SysInternals suite. Note: https://docs.microsoft.com/en-us/sysinternals/downloads/regmon (Accessed: April 20, 2020) Cited by: §6.0.2.
  • [34] Nektra Deviare API hook. Note: https://github.com/nektra/deviare2/ (Accessed: April 20, 2020) Cited by: §3.2.
  • [35] B. D. Payne, M. Carbone, M. Sharif, and W. Lee (2008-05) Lares: an architecture for secure active monitoring using virtualization. In 2008 IEEE Symposium on Security and Privacy (sp 2008), Vol. , pp. 233–247. External Links: Document, ISSN 2375-1207 Cited by: §6.0.2.
  • [36] Rekall Note: http://www.rekall-forensic.com/ (Accessed: April 20, 2020) Cited by: §4.3.1.
  • [37] SentinelOne (2016) SFG: furtim malware analysis. Technical report Note: https://www.sentinelone.com/blog/sfg-furtims-parent/ (Accessed: April 20, 2020) Cited by: §5.0.2.
  • [38] M. I. Sharif, W. Lee, W. Cui, and A. Lanzi (2009) Secure in-vm monitoring using hardware virtualization. In Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS ’09, pp. 477–487. External Links: ISBN 978-1-60558-894-0, Document Cited by: §6.0.2.
  • [39] B. Shi, L. Cui, B. Li, X. Liu, Z. Hao, and H. Shen (2018) ShadowMonitor: an effective in-vm monitoring framework with hardware-enforced isolation. In Research in Attacks, Intrusions, and Defenses, M. Bailey, T. Holz, M. Stamatogiannakis, and S. Ioannidis (Eds.), Cham, pp. 670–690. External Links: ISBN 978-3-030-00470-5 Cited by: §6.0.2.
  • [40] M. Shudrak, D. Bruening, and J. Testa Drltrace. Note: https://github.com/mxmssh/drltrace (Accessed: April 20, 2020) Cited by: Table 1.
  • [41] C. Willems, T. Holz, and F. Freiling (2007-03) Toward automated dynamic malware analysis using CWSandbox. IEEE Security Privacy 5 (2), pp. 32–39. External Links: Document, ISSN 1558-4046 Cited by: §6.0.2.
  • [42] Zynamics MSDN crawler. Note: https://github.com/zynamics/msdn-crawler/ (Accessed: April 20, 2020) Cited by: §3.2.

Appendix 0.A Additional Material

In §3.3 we have mentioned that when chasing return addresses with strategy (b) to hook API exit events, the instruction corresponding to the return address for some API call may be a join point in the control flow graph of the caller. If we insert a hook there and do not remove it after the call terminates—for instance because hook deletion brings overhead that we wish to avoid—the analysis callback should distinguish whether it is intercepting a real API exit event.

In the example below, taken from the 32-bit calc.exe shipped with Windows 7 SP1 64-bit (file version 6.1.7601.17514), we instrumented the instruction at address 10020cf when we first intercepted the call to the LocalFree API (kernel32.dll) from its enclosing function. However, subsequent invocations of the latter eventually reach this address also from another basic block, namely the entry block, which does not end with an API call. The logic of the analysis should discard these events: our implementation would not find a valid shadow call stack entry for it. We found other instances of this pattern in calc.exe (e.g., at 100367e, 100aaba, and 100cec3) and in several other Windows utilities.

Figure 4: Address 10020cf in calc.exe is a join point in the control flow graph of its enclosing function: it can be reached either by a conditional jump from the entry basic block of its function, or as a fall-through for the call to the LocalFree API function.