When working on traces of millions of instructions, one of the biggest challenges can be to detect the small portions of the code that are actually interesting. In this article, we have an application that reads from the network. We will show how to quickly find where the network frames are used, and if it is possible to cause a buffer overflow by writing more data than is allocated. As an example, we'll be analyzing the following command:
wget -O /dev/shm/result.html http://192.168.56.1/index.html
We've set the output to shared memory so that the scenario can be smaller and have less unrelated kernel events.
Search: hardware devices
Since our point of interest is the network traces, we start by searching the device accesses of the network card. We're looking for PCI writes here, since the network card writes to the physical RAM. One interesting PCI write we can find is the one that find its way into the hard drive: the response of the web server. This following image shows the overview of the device accesses during the executed scenario.
Here, we're interested in the last PCI write, so let's select it in the combo box. We can see that at sequence 142546, there is a PCI write to physical address 0x639f040. In 32 bits Linux, this is mapped in kernel space at the linear address 0xc639f040.
Let's request to see the contents of the memory right after the PCI write. Here, we find what looks like a normal Ethernet dump of an HTTP request.
Let's dump into a file using the integrated Python console:
Let's open it in Wireshark after converting it to a proper format using the hd command:
This confirms that we've found a complete Ethernet frame, and that we can browse it using other tools. With some scripting, we could even rebuild the entire communication between the two hosts, as if we had listened to the interfaces the whole time.
Now, let's look for a potential buffer overflow. We have a PCI buffer that is written, we need to find where it is read from. To display where and when an address is used, just check "Show access history":
I've selected the H of HTTP so that we may find the user code parsing the HTTP header. We can also see the kernel code parsing the Ethernet headers if we monitor other parts of this frame.
The first write corresponds to the PCI write. That's why there is no logical address for the write: it was not done using logical addressing, but it was caused by a PCI access directly into the RAM. We can see that the PCI bus writes 560 bytes in one batch (which is why I've exported a file with 560 bytes earlier).
There are three other reads here. The first one leads to a read inside an IRQ (this is IRQ 10, which is mapped at vector #3a) If we double-click the first one, we get this stack trace:
Actually, in this case, the IRQ handler is just comparing the checksum of the received data with the checksum received in the TCP header. It is not particularly interesting here, unless we're looking for kernel bugs. Let's take a look at the next read from this kernel buffer:
In the backtrace widget, we can see that userspace called the recv syscall. By clicking the parent sequence at 143290, we could find the parent trace. But before doing this, we should locate where the buffer is written.
By looking at the assembly code for the function __copy_user_intel, it is quite clear that the source is in ESI and the destination in EDI, the counter in ECX in bytes, just like a REP MOVSB.
Since we want to know where the memory is written to, let's follow the memory pointed by EDI, at 0x80aaf72 in our example.
Previous / next
By double-clicking DWORD PTR [EDI] on the 5th instruction, we can quickly see the content of this memory and access its history if wanted, just like before.
There is a even quicker way, though. By selecting the memory using F7 (first operand), we can navigate to its next read or next write. For example, let's say that we want to check for a possible overflow in the recv function. We'll have three things to check :
- where does the length given to recv comes from,
- where is the destination buffer allocated, and what is its size,
- if both sizes are unrelated, is it possible to have an overflow and on which conditions.
To find out where the length of recv comes from, the easiest way is to go through the backtrace to the recv call and to use the argument displaying feature of the backtrace:
We can see that the parameters are:
- socket: 0x4
- buf: 0x80aaf70
- len: 0x1ff
- flags: 2 (MSG_PEEK)
Note: since the flags are MSG_PEEK, we can expect another read to return the same data later (remember at the previous step, there was one more read from the PCI buffer).
Once we know that the size if 0x1ff, the next objective is to know how it is calculated. The value was given by EDI, so let's see where it came from.
We just have to press F8 (second operand) to select EDI. Then a click on previous (shortcut: Shift-N) leads me do its previous write. In this case, previous write is a pop. We can either use the "percent" plugin (see the associated blog post), or just go to the function start (assuming that nobody modifies the stack). Either way, we find the corresponding push, which is a push edi.
Clicking once more on the previous write to EDI leads us to this memory read, that we can browse again with F8:
We can request the previous write of the stack memory, and so on. After going up several times (using only F7/F8 and Shift-n), we finally reach the origin of the size:
So the received size is static. It is always 200, and the only calculation done here is a -1, probably to be sure to keep a null terminating byte.
The exact same steps can be taken with the buffer value, backtracking the pointer origin.
We find that the pointer is allocated inside the same function that called recv, with the same size (obviously without the -1, this time). In this case, we can see clearly that an overflow is not possible, even if we could control the server side.
Earlier, we noticed that the buffer was read one more time. Let's focus on this other read now.
We know this read occurs at 145455. This is another copy_from_user call. The interesting part is that the backtrace now shows a read call:
The arguments of the read are:
- fd: 4
- buffer: 0x80aaf70
- length: 0x139
It is interesting that the length is not the same as before (0x200). We might be interested on how that particular size is computed. In particular, is it possible to make this size bigger that 0x200 ? After doing the same process again to the origin the length we can find that it is computed by substracting a pointer (let's call that pointer buf_end) returned by a function with the buffer start address. The interesting part of the buffer computation function is before the return value:
The loop (red section) searches for a 0xa (newline character). It increases as long as eax doesn't go path edx.
Once we find the string 0xa 0xd 0xa (\n\r\n), we return the pointer right after the 3 newlines (green section). Since we ensure that eax < edx, we can at most read up to edx + 2.
In the "end of buffer" case (blue rectangle), if we're not at a newline, we return NULL. In the other case, we return the pointer to the right after the last newline character (with some code in case there are two newline characters). We cannot get past edx + 2 either.
Here the interesting part is the central role of edx, which is probably near the end of buffer.Once again with the previous write feature, we can see that edx si computed as follows:
edx = buf_start + size - 2
That explains why we could read up to edx + 2.
Here, size is 0x1ea. It is bigger than 0x139, but why isn't it 0x1ff ? Let's go backwards a bit more. I'm going to spare you the details because I used the same process as before: 0x1ea is actually the return code from the recv function.
This means that wget first uses recv with MSG_PEEK with 0x1ff as buffer size. Once it has its result, it re-reads the same data, but with only the length returned by recv (which is always less than the input size). This means that in the second read too, there are probably no obvious overflows to discover.
By the way, the function we analyzed here merely looks for two newlines to detect the end of the headers in an HTTP request.
Throughout this article, we analyzed a small part of the reception of wget.
We saw how to quickly find the way data coming from the network is used. We can easily find where this data is read or erased, where the reception buffer is allocated and so on.
This is a major tool within REVEN-Axion : the ability to view the memory history in both directions, up to the physical layer.
PS: you'll find a global description of the REVEN technology here.