Monday, July 7, 2014

Pointer overflow : an analysis of LZ4 literal attack

 Last week, when a blog announced to the wild that it was possible to overflow a pointer within LZ4, I immediately produced a fix within the next few hours to protect users, without checking how the code would naturally behave in such circumstance. After all, one assumption of 32-bits memory allocation was broken, so as a rule of thumb, I accepted the idea that it must have broken something.

With the fix available, I was curious to have a deeper look at the technical behavior of the overflow. What follows is a continuation of an attack scenario presented here, which incidentally match an observation I made a long time ago, while assessing the level of issue 52, and totally forgot last week. Since current code is protected against overflow scenario, I will look at this issue from an "old version" perspective, such as, for example, the relatively old r91 (march 2013). The behavior analyzed concerns the function LZ4_decompress_safe(), which is the one designed to be protected against malicious packets. Note that an unsafe version also exists, which is called LZ4_decompress_fast() and is not protected against malicious packets, and therefore offers no such guarantee.
(Note : the safe function is also mapped to LZ4_uncompress_unknownOutputSize(), the unsafe one to LZ4_uncompress()).

A key claim is that it is possible to achieve a Remote Code Execution on such older version. An RCE attack requires to deliver a crafted payload at a selected address in memory (Edit : see some relevant comments on this). The proposed attack is described here. A later version would add that it is possible to do the same with less payload if the destination buffer get allocated within high address region, but ultimately uses the same logic. The present article starts from there.

We will suppose that the target OS has no memory protection in place, such as detection of illegal reading or writing, which would make the attack pointless.

At the start of the attack, we have the destination pointer op, which points into a valid buffer region. If the malicious payload wants to trick the library into writing into an unauthorized region, it looks good enough to cheat on the length of data to be copied. Unfortunately, this condition is checked, and if the pointer gets beyond the valid destination buffer, the LZ4 decoder stops right there and output an error code.

For the attack to have a chance to succeed, the objective is to provide a length which is so large that it makes the pointer wraps beyond the last address 0xFFFFFFFF (note : this is only possible in 32-bits mode, 64-bits address space is way too large to overflow). It requires a lot on input data, but we'll suppose that this condition is met too. This is then possible because the end of literal area is calculated as a pointer called cpy :
cpy = op + length ;  // if length is large enough, we may have cpy < op because of overflow

OK, so what happens next ? The article claims that it delivers a payload of one's choice at the cpy address, basically the code to execute.
What's going to happen is a bit more complex. To understand why, let's follow the code. The relevant lines, for r91, are copied below :

        // get runlength
        token = *ip++; /* ... calculate length ... */
        // copy literals
        cpy = op+length; /* ... check buffer limits ... */
        LZ4_WILDCOPY(ip, op, cpy); ip -= (op-cpy); op = cpy;

        // get offset
        LZ4_READ_LITTLEENDIAN_16(ref,cpy,ip); ip+=2;
        if unlikely(ref < (BYTE* const)dest) goto _output_error;   // Error : offset outside of destination buffer

        // get matchlength

Since cpy < op, the tests checking for the end of output buffer will pass. We also suppose that the test for input buffer pass, so we move to the next line.

        LZ4_WILDCOPY(ip, op, cpy);

This macro is not too easy to understand. Basically, it will speculatively copy 8 bytes from ip (which is supposed to be valid, otherwise the decoder would have already stopped) to op, which is still valid. Yes, only 8 bytes, whatever the value of length. Why ? Because cpy < op,  so after 8 bytes it just stops there.

ip -= (op-cpy); op = cpy;

That's where it starts to become nasty. With op = cpy, the destination pointer is now at a forbidden area. Note that ip has changed too ! Basically, both ip and op have been translated by the same amount.

ip is now somewhere in memory. We don't know where exactly, but it's likely to be an unauthorized memory segment. So the next line of code matters :

LZ4_READ_LITTLEENDIAN_16(ref,cpy,ip); ip+=2;

In LZ4, a literal copy is necessarily followed by a match copy. This line calculates where the data to be copied is, and stores it into the pointer ref. At this point, some OS will crash, due to unauthorized or unmapped access, but we'll take the assumption that the read will silently pass.
By definition of the format, ref = op - distance;
op is currently the same as cpy, so points to an unauthorized memory area.
distance has just been provided by ip. But ip, too, is into an unauthorized memory area. So that means that we don't know what ip is reading. distance is basically a random 16-bits number.

So now, we have ref < op = cpy < validBuffer.
At this stage comes the next line of code :

if unlikely(ref < (BYTE* const)dest) goto _output_error;

Translated, it means that if ref < validBuffer, the decoder detects a problem, and immediately stops decoding. Which is the case here. As a consequence, the overflow finally gets detected here, and the decoder exits with an error code.

OK, so if the issue was already solved, why keeping issue 52 opened on the board ? Well, I confusely remember that I almost believed for a time that the issue was solved, but then realized that this check wasn't good enough. Indeed, after some more scrutiny, here is one border case scenario which requires some additional conditions.

In this specific scenario, it is possible to make the detection fail with the help of a bit of luck : suppose that cpy is so small that its value is < 64K. This requires to either be extremely lucky, or to know the start address op before even generating the crafted payload. This is a very strong hypothesis, and suggests either the existence of another exploit, or a level of insider knowledge associated with predictable allocation practices. But for this exercise, we will nonetheless suppose just that.
Now, let's also suppose that we get lucky, and distance (which is not controlled by the attacker, so is basically a random number) is large enough to make ref underflow memory space. Now ref stands at very high address, and therefore passes the detection test.

What follows is a match copy, which size is controllable by the attacker, thanks to the token, from 4 to 18 bytes (beyond that, size is no longer controllable). Let's suppose we'll only copy 4 bytes, from ref, a very high address presumed unauthorized, to cpy, a very low address < 64K.
This is starting to spell trouble, since an unauthorized memory area gets modified.
Note however that the attacker still does not control what is being copied.

The next stage therefore is another literal copy. It seems it is the right moment to deliver the code to execute ?
Unfortunately, ip is currently lost, somewhere in memory. It will deliver a random token, followed by a random literal sequence.

As long as both ip & op remain small enough, they will evade the detection, and the algorithm will continue to write some random noise at op position, but as soon as ref stops underflowing memory address space, which is probable at each step and necessary beyond 64K, it will get detected, and trigger an error code.

So, the conclusion is :
  • A blind overflow attempt of literal length is extremely likely to get detected
  • To remain undetected, the overflow must be accurate, and finish into the 0-64K memory segment (guaranteeing the success of this operation requires complementary knowledge)
  • It's not possible to start writing beyond address 64K without being detected.
  • Whatever get written into this low memory segment is some random copy & literals operations  from other parts of the memory, which are not under the control of the attacker.
In this attack scenario, DCE is nowhere in sight. But this situation is not the same as safe : writing noise into the low-memory area can result in bad consequences, likely a process crash. Furthermore, the indirect protection requires reading a few bytes from unauthorized memory area, which is also susceptible to crash the program. These are good reasons to update today to r119 for 32-bits applications.