Monday, July 7, 2014

Pointer overflow : an analysis of LZ4 literal attack

 Last week, when a blog announced to the wild that it was possible to overflow a pointer within LZ4, I immediately produced a fix within the next few hours to protect users, without checking how the code would naturally behave in such circumstance. After all, one assumption of 32-bits memory allocation was broken, so as a rule of thumb, I accepted the idea that it must have broken something.

With the fix available, I was curious to have a deeper look at the technical behavior of the overflow. What follows is a continuation of an attack scenario presented here, which incidentally match an observation I made a long time ago, while assessing the level of issue 52, and totally forgot last week. Since current code is protected against overflow scenario, I will look at this issue from an "old version" perspective, such as, for example, the relatively old r91 (march 2013). The behavior analyzed concerns the function LZ4_decompress_safe(), which is the one designed to be protected against malicious packets. Note that an unsafe version also exists, which is called LZ4_decompress_fast() and is not protected against malicious packets, and therefore offers no such guarantee.
(Note : the safe function is also mapped to LZ4_uncompress_unknownOutputSize(), the unsafe one to LZ4_uncompress()).

A key claim is that it is possible to achieve a Remote Code Execution on such older version. An RCE attack requires to deliver a crafted payload at a selected address in memory (Edit : see some relevant comments on this). The proposed attack is described here. A later version would add that it is possible to do the same with less payload if the destination buffer get allocated within high address region, but ultimately uses the same logic. The present article starts from there.

We will suppose that the target OS has no memory protection in place, such as detection of illegal reading or writing, which would make the attack pointless.

At the start of the attack, we have the destination pointer op, which points into a valid buffer region. If the malicious payload wants to trick the library into writing into an unauthorized region, it looks good enough to cheat on the length of data to be copied. Unfortunately, this condition is checked, and if the pointer gets beyond the valid destination buffer, the LZ4 decoder stops right there and output an error code.

For the attack to have a chance to succeed, the objective is to provide a length which is so large that it makes the pointer wraps beyond the last address 0xFFFFFFFF (note : this is only possible in 32-bits mode, 64-bits address space is way too large to overflow). It requires a lot on input data, but we'll suppose that this condition is met too. This is then possible because the end of literal area is calculated as a pointer called cpy :
cpy = op + length ;  // if length is large enough, we may have cpy < op because of overflow

OK, so what happens next ? The article claims that it delivers a payload of one's choice at the cpy address, basically the code to execute.
What's going to happen is a bit more complex. To understand why, let's follow the code. The relevant lines, for r91, are copied below :

        // get runlength
        token = *ip++; /* ... calculate length ... */
        // copy literals
        cpy = op+length; /* ... check buffer limits ... */
        LZ4_WILDCOPY(ip, op, cpy); ip -= (op-cpy); op = cpy;

        // get offset
        LZ4_READ_LITTLEENDIAN_16(ref,cpy,ip); ip+=2;
        if unlikely(ref < (BYTE* const)dest) goto _output_error;   // Error : offset outside of destination buffer

        // get matchlength

Since cpy < op, the tests checking for the end of output buffer will pass. We also suppose that the test for input buffer pass, so we move to the next line.

        LZ4_WILDCOPY(ip, op, cpy);

This macro is not too easy to understand. Basically, it will speculatively copy 8 bytes from ip (which is supposed to be valid, otherwise the decoder would have already stopped) to op, which is still valid. Yes, only 8 bytes, whatever the value of length. Why ? Because cpy < op,  so after 8 bytes it just stops there.

ip -= (op-cpy); op = cpy;

That's where it starts to become nasty. With op = cpy, the destination pointer is now at a forbidden area. Note that ip has changed too ! Basically, both ip and op have been translated by the same amount.

ip is now somewhere in memory. We don't know where exactly, but it's likely to be an unauthorized memory segment. So the next line of code matters :

LZ4_READ_LITTLEENDIAN_16(ref,cpy,ip); ip+=2;

In LZ4, a literal copy is necessarily followed by a match copy. This line calculates where the data to be copied is, and stores it into the pointer ref. At this point, some OS will crash, due to unauthorized or unmapped access, but we'll take the assumption that the read will silently pass.
By definition of the format, ref = op - distance;
op is currently the same as cpy, so points to an unauthorized memory area.
distance has just been provided by ip. But ip, too, is into an unauthorized memory area. So that means that we don't know what ip is reading. distance is basically a random 16-bits number.

So now, we have ref < op = cpy < validBuffer.
At this stage comes the next line of code :

if unlikely(ref < (BYTE* const)dest) goto _output_error;

Translated, it means that if ref < validBuffer, the decoder detects a problem, and immediately stops decoding. Which is the case here. As a consequence, the overflow finally gets detected here, and the decoder exits with an error code.


OK, so if the issue was already solved, why keeping issue 52 opened on the board ? Well, I confusely remember that I almost believed for a time that the issue was solved, but then realized that this check wasn't good enough. Indeed, after some more scrutiny, here is one border case scenario which requires some additional conditions.

In this specific scenario, it is possible to make the detection fail with the help of a bit of luck : suppose that cpy is so small that its value is < 64K. This requires to either be extremely lucky, or to know the start address op before even generating the crafted payload. This is a very strong hypothesis, and suggests either the existence of another exploit, or a level of insider knowledge associated with predictable allocation practices. But for this exercise, we will nonetheless suppose just that.
Now, let's also suppose that we get lucky, and distance (which is not controlled by the attacker, so is basically a random number) is large enough to make ref underflow memory space. Now ref stands at very high address, and therefore passes the detection test.

What follows is a match copy, which size is controllable by the attacker, thanks to the token, from 4 to 18 bytes (beyond that, size is no longer controllable). Let's suppose we'll only copy 4 bytes, from ref, a very high address presumed unauthorized, to cpy, a very low address < 64K.
This is starting to spell trouble, since an unauthorized memory area gets modified.
Note however that the attacker still does not control what is being copied.

The next stage therefore is another literal copy. It seems it is the right moment to deliver the code to execute ?
Unfortunately, ip is currently lost, somewhere in memory. It will deliver a random token, followed by a random literal sequence.

As long as both ip & op remain small enough, they will evade the detection, and the algorithm will continue to write some random noise at op position, but as soon as ref stops underflowing memory address space, which is probable at each step and necessary beyond 64K, it will get detected, and trigger an error code.

So, the conclusion is :
  • A blind overflow attempt of literal length is extremely likely to get detected
  • To remain undetected, the overflow must be accurate, and finish into the 0-64K memory segment (guaranteeing the success of this operation requires complementary knowledge)
  • It's not possible to start writing beyond address 64K without being detected.
  • Whatever get written into this low memory segment is some random copy & literals operations  from other parts of the memory, which are not under the control of the attacker.
In this attack scenario, DCE is nowhere in sight. But this situation is not the same as safe : writing noise into the low-memory area can result in bad consequences, likely a process crash. Furthermore, the indirect protection requires reading a few bytes from unauthorized memory area, which is also susceptible to crash the program. These are good reasons to update today to r119 for 32-bits applications.

15 comments:

  1. One bloggerjust said it demonstrated an attack using Python-LZ4. How does it match with what you are presenting here ?

    ReplyDelete
    Replies
    1. He demonstrated an attack against the CERN Python version of LZ4. As far as I know, this code is based on the *unsafe* version of LZ4 decoder, and therefore, it is unprotected, obviously.

      One thing to keep in mind is that the CERN guys did this work quite some time ago, at a time when an attack on LZ4 vector was probably considered unlikely. I guess they can probably solve their problem by simply integrating the latest python version of Steeve Morin, which is using the safe version of the decoder.

      Delete
    2. Wow ! Did the blogger know that he was presenting an attack on an unsafe version ? (basically, a defenseless one ?)

      Delete
    3. I don't know, he probably selected this version for a reason. But I think that it does not truly matters. At the end of the day, the message is : for any application receiving data from an external source, use the safe version, only.

      Delete
    4. yea it' does matter ! hi said he presented an attak on LZ4, Not the Unsafe version ! damn, wat a cheat !

      Delete
    5. The guy says you're liar, and that the CERN is using the real Python version of Steeve

      Delete
    6. It doesn't necessarily oppose. CERN is using an old version of Steeve's Python binder, but old versions of the Python binder were indeed mapped to the unsafe decoder version.

      As hinted previously, when the binder was created, it was probably not expected it would become so successful. It's only natural that security became a topic as a consequence of this success.

      Steeve latest version uses the safe decoder version, and is a recommended upgrade.

      Delete
  2. Clarification, since there are too many questions on this :

    The present article was redacted and posted *before* the mentioned python-lz4 demo was published, so it obviously cannot be an answer to it. Please try to separate these issues.

    As far as I can tell, any demo that use the specific vulnerability conditions identified since the beginning to create an exploit should work : 32-bits application, untrustable input data supply, and large input data blocks (specifically, input blocks ~16 MB for useful predictivity) (and obviously an unpatched system).

    ReplyDelete
  3. http://blog.securitymouse.com/2014/07/a-final-lz4-act-hacking-erlang.html

    ReplyDelete
    Replies
    1. Sight, it's the PoC demo all over again, proving that it's possible to create an application designed to fail ... What does this learn us more than the C PoC already did at the beginning ?

      It's all the more pointless as the new PoC are now using an unsafe function of the library ! It's not even related to the bug anymore ...

      Considering the huge amount of time he spent on the algorithm, it's difficult to imagine he hasn't noticed.

      Delete
    2. baah, he delivers the same PoC multiple times, pretending he "cracked" a language ... :) Next time, he should tell how he just cracked C by using a bad pointer ^^

      Delete
  4. > RCE attack requires to deliver a crafted payload at a selected address in memory.
    Inherenrly wrong assumption. Its basically enough to get some ability to change memory. Then attacker can try countless ways to trick legitimate code into doing something useful, even if there is no direct way to upload proper code. For example, on ARM, its mostly impossible to do classic overruns due to architecture specifics. Hackers invented ret2libc which would work after series of tricks, even in these cicrumstances. So ability of attacker to cause some unexpected memory corruption is generally bad thing. Even if it's not obvious how to exploit it, some smartass can eventually find some strange ways of doing things which would allow to do something meaningful after abusing certain bug.

    Basically you better to think about it this way: your worst enemy can feed your decompression algo with whatever crap they can invent. So you better to assume input to be extermely hostile.

    ReplyDelete
    Replies
    1. Thanks for sharing your thoughts

      Delete
    2. If "Its basically enough to get some ability to change memory",
      why not just deliver the payload directly into the output buffer,
      without bothering about overflow effects ?
      (and then make something about it later on if I understand the ret2libc example ?)

      Delete
    3. Imagine the example of a web server. The attacker can't (absent unpatched bugs in the webserver) modify arbitrary memory locations at their whim, but can certainly put specific payloads all over memory (e.g., via repeated form submission) as part of the normal operation of the webserver.

      Used in conjunction with an attack that reads "random" parts of memory, it's entirely possible to bias that "random" memory towards an interesting payload.

      One thing to keep in mind is that it often seems that attackers have to get very "lucky" for a string of things to come together for an exploit to work. Of course, in practice, attackers generate their own luck. They will analyze and understand the behavior of memory allocators, figure out what is already in memory, populate memory with payloads of their choosing, try their attack repeatedly in an automated fashion, etc.

      Delete