Interprocess Use of Uninitialized Memory detection using REVEN


Mar 09, 2021
by Quentin and Louis
Categories: REVEN -
Tags: Reverse Engineering - Vulnerability Detection - Analysis API - Taint - REVEN -




Continuing in the series of vulnerability detection scripts, such as the BoF article and the UAF article, today’s article introduces a notebook to detect uses of uninitialized heap memory in REVEN scenarios.

In memory unsafe languages such as C, it is common for variables to start their life uninitialized. Some allocators, such as malloc, return memory in an uninitialized state:

int* x = malloc(sizeof(int));
printf("%i\n", *x); // use of the value of x without initialization.

Uses of uninitialized heap memory can cause all sorts of errors, including some form of dangling pointer (if the malloc’ed type has invariants that happen to be broken by the returned value of malloc) and information leakage (if the allocated memory happens to contain uncleared, possibly confidential, information).

Today’s notebook is interested in finding such uses of uninitialized memory using timeless analysis in a REVEN scenario, where we regard memory as uninitialized as long as it has not been written to after allocation.

Detecting uninitialized memory using REVEN

The first step in the provided notebook is to detect reads of uninitialized heap memory. This is done with the following two substeps:

  1. Allocation detection: allocation detection works similarly to the UAF and BoF notebooks.

  2. Read before write detection: For each detected allocation, find the first read and first write to the newly allocated memory range using memory history. If the first read is before the first write, or there is no write, then we have a use of uninitialized memory!

The way uninitialized memory is used is important

This simple algorithm leverages the memory history feature of REVEN to immediately get results, however it has some limitations, due to the fact that not all reads of uninitialized memory constitute a vulnerability at the ASM level.

In some circumstances, uninitialized memory can be a normal occurrence produced by the compiler because it has no impact on the executed program.

For example, the following C struct must have its x field aligned on 4-bytes (due to processor alignement constraints), so the compiler adds padding to the middle of the struct, which makes it 8-bytes in total:

struct Padded {
    char b; // // size 1, alignment 1
    // added 3-bytes padding to respect an alignment of 4 for the x field
    int x; // size 4, alignment 4
};

the padding bytes will usually not be initialized, and thus will be accessed uninitialized e.g. whenever the struct is copied.

Additionally, the operating system may copy pages around in some occasions, again accessing the uninitialized bytes regardless of whether or not they were written to.

All these accesses have the common point that they don’t really affect the execution of the program, thus the actual value of the memory cannot be recovered easily by an attacker.

However, when the uninitialized memory is actually used in the program logic, for example if it affects control flow, then it becomes possible to infer its value. Often, it is also a bug impacting correctness, and that alone warrants fixing.

Looking for impacts on the control flow

From the uninitialized memory accesses found in the read before write detection step of our algorithm, how can we filter the accesses that are actually used, e.g. to influence control flow? Using the taint, of course!

To detect the influence of the uninitialized memory access on control flow, we run a first taint on the instruction that performs the uninitialized access, in order to find its influence on registers.

# Start a taint of just the first instruction (the memory access)
taint = tainter.simple_taint(
    tag0=taint_tags,
    from_context=self.memaccess.transition.context_before(),
    to_context=self.memaccess.transition.context_after(),
    is_forward=True
)

state_after_first_instruction = taint.state_at(self.memaccess.transition.context_after())

From there, we can remove the tainted memory on the uninitialized memory access, and restart the taint by keeping only the tainted registers. This is to avoid handling multiple uninitialized memory accesses at once.

# We don't want to keep the memory tainted as if the memory is accessed later we
# will have another UUM anyway. So we are using the state of this first taint
# and we remove the initial tainted memory to start a new taint from after the first
# instruction to the end of the trace
taint = tainter.simple_taint(
            tag0=list(map(
                lambda x: x[0],
                state_after_first_instruction.tainted_registers()
            )),
            from_context=self.memaccess.transition.context_after(),
            to_context=None,
            is_forward=True
        )

From there, we iterate through the instructions that involve tainted data, and use the capstone disassembler for finding instructions that make use of flags (such as conditional moves or jumps). If these flags are tainted, then we have a use of uninitialized memory that impacts the control flow!

conditionals = []
for access in taint.accesses(changes_only=False).all():
    ctx = access.transition.context_before()
    md = md_64 if ctx.is64b() else md_32
    cs_insn = next(md.disasm(access.transition.instruction.raw, access.transition.instruction.size))
    # Test conditional jump & move
    for flag, reg in self.test_eflags.items():
        if not cs_insn.eflags & flag:
            continue
        if not UUM._is_register_tainted_in_taint_state(
            taint.state_at(access.transition.context_after()),
            reg
        ):
            continue
        conditionals.append({
            'transition': access.transition,
            'flag': reg,
        })
# The list of transitions that used tainted flags => UUM
return conditionals

Since there can be a lot of UUMs that do not affect the control flow, the notebook offers a flag option to display UUMs only when they do affect the control flow.

Ignoring mask operations

Some reads and writes occur when performing an and or an or instruction between the memory and a mask, such as 0x00 or 0xFF.

Depending on the instruction and the mask, we may want to ignore the read operation and/or the write operation. For example, in the instruction and memory, 0x00, although the value in memory is technically read in the and, it is immediately discarded and replaced by 0. This instruction should be ignored for the purpose of detected uses of uninitialized memory.

Similarly, and memory, 0xFF is just a strange way of expressing a no-op, and should be ignored both as a read and a write operation: it does not really access uninitialized memory in any significant way, nor does it initialize anything.

The script detect these patterns at the byte level.

Detecting interprocess UUM

Since REVEN performs system-wide analysis, an interesting capability of the notebook is to detect interprocess UUMs.

To test for this, we recorded an example scenario containing a toy server and client, running locally on one machine. Under some circumstances, when the client makes a request, the server responds with a buffer that is only partially initialized. The client then uses the response to influence its control flow.

In pseudo-code, the server does something similar to the following:

// ... connection setup, buffer allocations ...
int byte_count = recv(socket, request_buf, request_buf_len, 0);
if (byte_count > 0) {
    char* resp_buf = malloc(RESPONSE_SIZE);
    write_header(resp_buf);
    if (parse_request(request_buf, byte_count)) {
        write_response(resp_buf);
    } // else: uninitialized memory occurs left in this case!
    finalize_response(resp_buf);
    int send_result = send(socket, resp_buf, RESPONSE_SIZE,  0);
    // check send_result...
}
// ... error handling, connection closing ...

and on the client side:

// connection setup, buffer allocations ...
int send_result = send(socket, unparseable_request_buf, request_len, 0);
// check send_result...
int byte_count = recv(socket, resp_buf, resp_buf_size, 0);
if (byte_count > 0) {
    // ...
    if (resp_buf[SHOULD_BAR]) { // <- use of uninitialized memory in the client
        bar();
    }
    // ...
}
// ... error handling, connection closing ...

And here it is, we were able to detect the UUM impacting the control flow in the client, although it originated in the server!

Below is some sample output of the script, we can see in particular that the first detected UUM corresponds to memory allocated in the server, and influences the control flow in the client.

Phys:0x6ca7e3c0: Allocated at #44423414 (0x8113c0 of size 0xa) and freed at N/A
	Alloc in: server+0x3258 / server.exe (1680)

	UUM of 6 byte(s) first read at:
		afd!memcpy+0x1c0 / server.exe (1680)
		#287192480 mov rax, qword ptr [rcx+rdx*1-0x8]

	The control flow depends on uninitialized value(s):
		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / client.exe (2240)
		#346681985 jnz 0x7fff3ae20039 ($+0xa)

		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / client.exe (2240)
		#351209780 jnz 0x7fff3ae20039 ($+0xa)

		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / client.exe (2240)
		#352916410 jnz 0x7fff3ae20039 ($+0xa)

		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / client.exe (2240)
		#356566848 jnz 0x7fff3ae20039 ($+0xa)

		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / client.exe (2240)
		#358248452 jnz 0x7fff3ae20039 ($+0xa)

		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / client.exe (2240)
		#360023296 jnz 0x7fff3ae20039 ($+0xa)

Phys:0x6ca7e3c0: Allocated at #44423414 (0x8113c0 of size 0xa) and freed at N/A
	Alloc in: server+0x3258 / server.exe (1680)

	UUM of 1 byte(s) first read at:
		server+0x1a8e / server.exe (1680)
		#345319567 movzx eax, byte ptr [rax]

	The control flow depends on uninitialized value(s):
		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / server.exe (1680)
		#346500050 jnz 0x7fff3ae20039 ($+0xa)

Phys:0x6ca7e3c0: Allocated at #44423414 (0x8113c0 of size 0xa) and freed at N/A
	Alloc in: server+0x3258 / server.exe (1680)

	UUM of 1 byte(s) first read at:
		server+0x1a8e / server.exe (1680)
		#347471346 movzx eax, byte ptr [rax]

	The control flow depends on uninitialized value(s):
		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / server.exe (1680)
		#348008796 jnz 0x7fff3ae20039 ($+0xa)

Phys:0x6ca7e3c0: Allocated at #44423414 (0x8113c0 of size 0xa) and freed at N/A
	Alloc in: server+0x3258 / server.exe (1680)

	UUM of 1 byte(s) first read at:
		server+0x1a8e / server.exe (1680)
		#351293130 movzx eax, byte ptr [rax]

	The control flow depends on uninitialized value(s):
		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / server.exe (1680)
		#351769951 jnz 0x7fff3ae20039 ($+0xa)

Phys:0x6ca7e3c0: Allocated at #44423414 (0x8113c0 of size 0xa) and freed at N/A
	Alloc in: server+0x3258 / server.exe (1680)

	UUM of 1 byte(s) first read at:
		server+0x1a8e / server.exe (1680)
		#354187897 movzx eax, byte ptr [rax]

	The control flow depends on uninitialized value(s):
		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / server.exe (1680)
		#355832210 jnz 0x7fff3ae20039 ($+0xa)

Phys:0x6ca7e3c0: Allocated at #44423414 (0x8113c0 of size 0xa) and freed at N/A
	Alloc in: server+0x3258 / server.exe (1680)

	UUM of 1 byte(s) first read at:
		server+0x1a8e / server.exe (1680)
		#357248188 movzx eax, byte ptr [rax]

	The control flow depends on uninitialized value(s):
		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / server.exe (1680)
		#358069987 jnz 0x7fff3ae20039 ($+0xa)

Phys:0x6ca7e3c0: Allocated at #44423414 (0x8113c0 of size 0xa) and freed at N/A
	Alloc in: server+0x3258 / server.exe (1680)

	UUM of 1 byte(s) first read at:
		server+0x1a8e / server.exe (1680)
		#359338025 movzx eax, byte ptr [rax]

	The control flow depends on uninitialized value(s):
		Flag 'zf' depends on uninitialized memory
		msvcrt!_write_nolock+0x3db / server.exe (1680)
		#359847648 jnz 0x7fff3ae20039 ($+0xa)

---------------------------------------------------------------------------------
7 UUM(s) found on the whole trace (386055746 transitions) with 396 searched memory addresses
---------------------------------------------------------------------------------
CPU times: user 13.1 s, sys: 1.78 s, total: 14.9 s
Wall time: 1min 34s

Conclusion

This notebook is already available to detect UUM vulnerabilities.

Future work includes tracking uses of uninitialized memory on the stack. While we already know that we cannot possibly detect all of them (see this list of uninitialized scalar accesses that result in undefined behavior, which often completely elides the uninitialized variable at runtime… with Undefined Behavior, anything is possible), a good chunk remains for us to find!

With UAF, BOF, and now UUM detection, REVEN now possesses a wide arsenal of vulnerability detection tools. They all rely on fundamental Timeless Debugging and Analysis features such as symbol search, memory history or the taint, which serve as building blocks for these advanced analyses. What high-level analysis would you like to see next?

Next post: Recording vulnerabilities related to non-deterministic bugs, crashes or other complex cases
Previous post: Who corrupted the data! Get a fast and precise answer with the taint