National Vulnerability Database (NVD) records show that in the past two years, the cases of buffer overflow attack was almost thrice than previous years [NIST]. Several library functions in C programming lack variable boundary checking leading to potential memory corruption unknowingly. Many operating system kernel and drivers written in such unsafe language allow an attacker to exploit the system. An attacker normally overflows variable(s) space to modify the control data and hijack the program control.
Ever since the introduction of stack smashing [Smashing] in the early 90s, the stack has been vulnerable to multiple attacks. Many software and hardware approaches have been proposed to prevent such attacks. The software-based approach performs both static and lexical analysis of the code to find vulnerable function(s), function call, and illegal accesses of array element(s) in the source code or the binary. Techniques such as FlawFinder [Flawfinder], RATS[RATS], and LibSafe [libsafe] performed an exhaustive search to match tokens in program semantics against a database of known vulnerabilities. However, software-only approaches limit their application to debugging purposes and incur performance overhead as high as 30X [30x]. On the contrary, hardware solutions are faster and transparent to the running process. Non-Executable stack [nonex], Address Space Layout Randomization (ASLR) [ASLR], Canary word [StackGuard] next to return address have been effective with extra instruction overhead. Maintaining shadow stack [ASLR] incurs performance overhead as high as 10%. Most solutions proposed so far implement either additional lines of code to check the bound or maintain expensive memory isolation strategy.
In this paper, we propose a novel approach to extract the memory space information by instrumenting instructions during runtime. We store this information in a table called Variable Record Table (VRT). We track frame pointer operation in instructions to extract static variables space in the stack. For heap space, we instrument the argument and return registers during a dynamic memory function call by an instruction. VRT is built upon instrumenting object file only during runtime. With this variable level information, memory related security issues can be handled by the processor itself. We use VRT entries to check out-of-bound access during buffer overflow. The novelty of the proposed work is as follows:
Extract variable runtime memory space information.
Detect buffer overflow cases utilizing VRT entries.
Experimental results for MIT Static Corpus benchmark [Kartkiewicz2009] (290 C programs) on SimpleScalar toolset [SimpleScalar] successfully detect different buffer overflow cases. We also demonstrate that our approach detects buffer overflow with no additional instruction overhead. The memory overhead to maintain VRT a program with 324 variables was only 13Kb.
The remainder of the paper is organized as follows. In Section II, we present background, discuss the attack on process memory. In Section III, we describe the details of the proposed approach. In Section IV, we report the experimental results. In Section V, we draw conclusions and mention future directions.
Ii-a Buffer Overflow
During a buffer overflow, variables access go beyond their allocated space that may overwrite data into adjacent memory space. Such undesirable and illegal scenario pose a security threat during program execution. Fig. 1 shows an exploitable program with overflow case. In the strcpy() function, there exists no bound checking of the destination variable, p and unrestricted source variable data, argv provided by the user could override p’s heap space resulting in data corruption of adjacent space.
Ii-B Process Memory
Fig. 2(b) depicts the main memory space for a running process. It consists of code and data segment for storing program instructions and data respectively. Stack memory segment in Fig. 2(c) is used to store static variables of the program. It also stores additional information of the function such as arguments, return address, and previous frame pointer in order to maintain the function execution. On the contrary, the heap segment in Fig. 2(a) stores the dynamic data during the runtime. A heap dynamic block space is linked with the static variables in stack space (shown as color coded in figure). Heap space can be dynamically allocated or freed using library functions in the program whereas the stack space is fixed for a function call.
Iii Proposed Approach and Implementation
In this section, we present static and dynamic memory space extraction process followed by discussion on the VRT table and its format. Then, we discuss the use of VRT for different cases of buffer overflow.
Iii-a Static Variable Space
In the stack, static variable space for a function is laid out in sequential order by the compiler. Such variable space starts in the stack after the return address, frame pointer, saved registers, and argument(s) from the previous function. To find the base address of the local variables, we track the frame pointer’s offset in load (lw) and/or store (sw) addresses. The relative difference between sequential variable addresses can provide the size of the allocated space.
In Fig. 3 we have three variables (basic, array and pointer) of int type declared whose assembly code in MIPS is shown in Fig. 4. In this figure, once the frame pointer is initialized with the stack pointer address (line 7), it allocates the local variables’ space based on their size and order of use in the program. We observe that the frame pointer with offset value 16 for the array (line 9), 40 for the pointer (line 10), and 44 for the variable i (line 11). To find the bound, we subtract two adjacent variables’ base addresses. For example, the bound value of the array will be (40-16) = 24. For the last variable (variable i) the bound information can be calculated by subtracting it from line 6 offset when it finishes storing the previous function data.
Iii-B Dynamic Variable Space
Dynamic variable space is reserved or freed using the library functions. malloc(), realloc(), free() and their variants are Direct Memory Access (DMA) functions in C. We present the technique to extract the variables space from the assembly code of these functions.
malloc(): This function takes an integer and return heap space.
Instruction level decomposition of the above malloc function call is shown below. We notice that function call (jal) has the argument register ($4) with an integer value and upon return, return register ($2) with heap address.
In realloc(), argument registers ($4 and $5) are the previous heap address and new size respectively at time of function call. $2 contains the new heap address after returning from the function. Similarly free(), argument register $4 contains the base address of the heap block to be freed. Thus, with the content of argument registers and return registers, we can track and differentiate the DMA functions.
Iii-C Variable Record Table (VRT)
Table I shows the VRT layout with 3 columns namely, associated bit (1 bit), base address (32 bits), and bound value (8 bits). Each entry of VRT uses 41 bits. The associated bit is used to differentiate entries of a function from other function. As shown in Table I, top three entries of a function have associated bit different than the function with two entries. With current function entry at top of the table, new function entry changes associated bit to stay different than current function entry. Associative bit is also helpful to link entries to a particular function. During return from the function, all entries with same associative bit on top of the table will be flushed out.
Iii-C1 Populating VRT
During program execution, each jump instruction to a function with local variable or DMA function (malloc()) will populate the VRT. Every new stack address generated using frame pointer will be kept in the table as entry whose bound information is relative to the upper bound of frame pointer unless new entry is populated into the table. Upon next variable entry, we modify the bound information. With DMA function malloc we push a new entry into the table. While realloc function will search the function entries to match base address and update it with a new base address and bound information.
Iii-C2 Deleting Entry from VRT
Table entry need to be flushed out to cope with program requirement. free() triggers an entry to be deleted. While return from a function requires deletion of all the entries of the function. Entry with same associated bit from top of the table is deleted in this situation.
Iii-D VRT Overheads
VRT size is proportional to the number of entries it can accommodate at any time. VRT dynamically grows and shrinks during the program execution. With the deletion of outgoing function’s entries VRT relinquish space that is used for new function entries.
Search overhead for a particular entry can be greatly reduced by concentrating on current function entry. Associated bit helps to differentiate between different functions.
Iii-E Buffer Overflow and VRT
Once the local variables’ base and bound address are populated in VRT, we can test each array offset and pointer operation to generate the invalid memory address under two representative cases. In the following section, we discuss two different cases of illegal accesses.
Iii-E1 Constant variable index
Direct access to an array with constant index value beyond the array range can be treated as an out-of-bound access case. If unchecked, such operation will corrupt the out of scope data.
Assembly code for array access is shown below. The offset to the load (ld) instruction produces an address that goes beyond the address space of the variable stored in the entry of VRT.
Iii-E2 Loop operation on array or pointer variable
This case is common in buffer overflow condition. String library function such as, strcpy() overflow during loop. Unchecked increment operation (line 4 of C snippet shown below) on a pointer variable (ptr) produces address beyond the space of variable X.
We can decompose ++ptr operation into two instructions in a MIPS like architecture. Firstly, it accesses the variable address and writes it to a register. Secondly, the register is incremented to produce a new address. In the second operation (line 5), $2 act as source and destination address on which increment operation is done. For an out-of-bound case, $2 must have addresses that fall in two different entries of VRT. During regular program execution, increment operation must have an address that falls in the same entry of VRT.
We have used the pipeline micro-architecture in [sah] to implement VRT. Fig. 5 shows the five-stage pipeline architecture with VRT.
Process variable table implementation is achieved by augmenting the fetch and execution stage of the processor pipeline with a VRT extraction unit and a memory space unit. Extraction unit checks on the lw and sw instruction with frame pointer registers as an operand. The new address generated in execution stage is stored to the VRT.
Fig. 6 shows overflow detection architecture using VRT. During line 5 operation in above assembly code, VRT gives the bound information of the first operand and is compared with the second operand If second operand is less than the bound value, we consider it as a within bound access else is an out of bound access.
Iv Experimental Results
We modified the sim-outorder simulator in SimpleScalar toolset [SimpleScalar] to validate the proposed approach. Sim-outorder simulator is a detailed pipelined micro-architectural simulator in SimpleScalar toolset. It models different runtime parameters in detail with features including instruction profiling, branch prediction, caches, and external memory. We chose RISC architecture (PISA) architecture with sequential (i.e., in-order) fetch and decode stage maintaining the instruction order. As we rely on an offset value of previous instruction, sequential execution of instruction was important to maintain.
In order to verify our proposed approach, we first populate VRT. We have used MiBench benchmark suite  with six selected programs for extracting static variable space and six different benchmarks rich in DMA function for heap space operation. Heap and stack space extraction are performed individually. Table II shows the static variable and DMA function count respectively for the programs.
|Benchmark||# Variables||Benchmark||# malloc()||# calloc()||# realloc()||# free()|
In the case of the office suite of the MiBench, we observe 324 entry to be maximum entries. As one entry of VRT consist of one bit valid bit, 32 bit for base address and 8 bit for bound value altogether use 41 bit per entry resulting in total VRT memory size 324*41 = 13Kb.
We also implemented VRT on MIT Corpus suite of 290 C programs for buffer overflow case. Each of 290 test cases from MIT Corpus has four different program suffix namely, ok, large, medium, and min. These programs consist of different buffer overflow attributes [Kartkiewicz2009]. For each case, we successfully detect overflow. The instruction count for each class of program is shown in Table III.
|MIT Corpus Program Class||Instruction Count(Avg.)||Attack Detected?|
V Conclusions and Future Work
We have proposed the variable record table (VRT) approach with zero instruction overhead as a countermeasure for buffer overflow attack. We show frame pointer operation and its role in extracting variable space information. With VRT, we can successfully detect the common form of buffer flow attack. In the future, we plan to use VRT for control flow integrity using the variables for verification. VRT can also be a useful tool for smart data prefetcher where the data block can be prefetched based on variable size rather than the block size.