Wednesday, July 2, 2014

Vulnerabilities disclosure - how it's supposed to work

Note : there is a follow-up to this story

 In the lifetime of LZ4, I've received tons of feedbacks, feature requests, warming thanks, bug disclosures, often with their bugfix freely included.

This is the essence of open source : through disseminate usage and reports, the little piece of software gets exposed to more and more situations, becoming stronger, and reaching overtime a level a perfection that most closed-source implementations can only dream of, due to limited capabilities of internal testings only. The strength of the formula totally relies on such feedback, since it is impossible for a single organization to guess and test all possible situations.
It is an enlightening experience, and I can encourage every developer to follow a similar journey. It will open your eyes and perspective into a much larger linked world.

Among theses issues, there were more than a few security bugfixes. I can't tell how much I thank people for their welcomed contribution in this critical area. Disclosures were received by email, or using the LZ4 issue board. Typical contributors were modest professional developers, providing their piece of advise contributing to a larger and larger edifice, and got a "thanks notice" in the NEWS log, a perk they were not even requesting. This is this modesty and positive construction mindset which really touched me and drove me forward.

Such early security fixes deserve today high praise, way more than a recent "integer overflow hype" created to receive inflated media coverage. It's a sad state of affair, but much more important issues are fixed way more politely and securely, as a sincere desire to improve a situation. Overtime, such security bugs became less and less common, and almost disappeared, basically proving the strength of the open source formula, an implementation becoming stronger after each round.


When an implementation becomes widespread, the rules of security disclosure gradually evolve, because the number of systems and user potentially exposed become too large to game with. A good illustration of a proper reporting process is detailed on the CERT "Computer Emergency Response Team" website. The process is both transparent and clean. The security auditing firm would first start by contacting upstream developer team, ensure the bug is understood and a fix undertaken. It will issue a typical disclosure delay of 45 days, to ensure upstream organization get an incentive to solve the problem fast enough (without the delay, some organisations would opt for not doing anything at all !). After which delay, a public disclosure can be sent. Hype is optional, but is not forbidden either, as long as users get correctly protected beforehand, which is the whole point.


That's why, in retrospect, I've been so angry last time about the auditor code of conduct. I initially reacted at the overblown vulnerability description, and only realized later that it was also a question of disclosure policy. With a little bit of communication, everything would have been fine, as usual.
Unfortunately it did not happened. The auditor barely left a footnote on the issue board to request a level raise on an item (which was accepted), and then simply lost contact, focusing instead on overselling a security risk for maximum media coverage. In doing so, he never looked back to ensure that a fix was ongoing or being deployed to protect users before disclosure.

This behavior is totally untypical for a would-be respectable security firm. In fact, it is in total contradiction with core values of security auditing. To willingly sacrifice public safety for some selfish media coverage feels just insane. I was thinking : "Damn, suppose he'd been right..."

Fortunately, the risk was not as large as advertised. And since then, I've been made to believe it was just a genuine mis-communication lapse, thanks to reassuring words from the auditor himself, acknowledging the lack of communication and wishing to improve the situation for the future (and linking his nicer words to Twitter for a more positive image). Considering that previous vulnerability couldn't result in anything serious, and willing to grant the privilege of doubt, the story was closed with a more neutral-tone statement. I would then expect that, from now on, a normal communication level would be restored, future bugs being disclosed "as usual", that is starting with an issue notification, and a discussion.

Foolish assumption.

In total contradiction with his own logged commitment, donb doubled down earlier today, broadcasting a new vulnerability issue directly to the wild, without a single notification to upstream developer :
http://blog.securitymouse.com/2014/07/i-was-wrong-proving-lz4-exploitable.html

The new vulnerability could be correct this time. I have not been able to prove/disprove it myself, but have no reason to disbelieve it. Some specific hardware and debugging capabilities seem required to observe it though.

Apparently, the risk is valid for ARM platforms (maybe some specific versions, or a set of additional platform specific rules, the exact scope of which I don't know about). I have doubts that it is only hardware related, I believe it must be OS-driven instead, or a combination of both.
The key point is the ability of the 32-bits system to allocate memory blocks at high addresses, specifically beyond 0x80000000h. This situation never happened in previous tests. Each 32-bits process was encapsulated by the OS into its own virtual address space, which maximum size is 2 GB, irrespective of the total amount of RAM available (this was tested on 4 GB machines). With no counter example at hand, it became an assumption. The assumption was key in the discussions assessing gravity level, and remained undisputed up to now. Today's new information is that this situation can in fact be different for some combinations of OS and hardware, the precise list of which is not clearly established.
Should you own a configuration able to generate such a condition, you're very welcomed to test the proposed fix on it. The quality of the final fix for this use case will depend on such tests.
The issue tracker is at this address :
https://code.google.com/p/lz4/issues/detail?id=134
A first quickfix is proposed there.
[Edit] The vulnerability existence can now be tested, using the new fuzzer test available at https://github.com/Cyan4973/lz4/tree/dev.


In normal circumstances, a vulnerability disclosure is welcomed. For the open source movement, it translates into better and safer code. That's a view I totally embrace.
But obviously, everything depends on the way vulnerability is disclosed. Even a simple mail or a note on the issue board is a good first step. At the end of the day, the objective is to get the issue fixed and deployed first, before any user get exposed, to reduce the window of opportunity for malware spreaders. It's just plain common sense.

This latest disclosure does not share such goodwill elements. By selecting direct wide broadcasting without ever notifying the upstream developer about its finding, the security auditor did the exact opposite of his social mission, ensuring maximum exposition danger for all users and systems. Of course, this choice will create him a "hacker gangsta" (in his own words) reputation within his professional circle, which he believes is good for him. But that's questionable, can a paying customer entrust its critical security vulnerabilities to a self-made security auditor with borderline business conduct like this ?


As far as we are concerned, the goal of the game is to get a safer implementation of LZ4 available to the general public, trying to make the window of opportunity as small as possible. In the longer run, the episode will serve as another reinforced stone, providing security benefit to the open sourced edifice. But in the short term, we suffer exposure.

The new status is as follows :
  • The vulnerability can affect 32-bits applications (64-bits are safe)
  • The vulnerability is only possible on 32-bits systems allocating memory beyond address 0x80000000h (for legacy format) or 0xB0000000h (for regular frame format)
  • It still requires very large blocks for a change to trigger it.
The second point is very difficult to know. It seems Windows systems are safe for example, but that still leaves a lot of other systems to check. The new fuzzer tool is now designed to test the existence of this vulnerability, and check the efficiency of the last fix against this new exploit scenario.

You can get the new fuzzer tool and the proposed fix at : https://github.com/Cyan4973/lz4/tree/dev
The fix seems to provide good results so far, don't hesitate to test it on your target system, should it match the above conditions.

[Edit] : the fix is good to go to master, hello r119 !
[Edit 2] : since the second condition is relatively difficult to assess, the fix is recommended for any 32-bits application.

[Edit 3] : After further analysis, it seems the new overflow issue remain relatively difficult to exploit usefully. It has opened new fronts, but still require some favorable conditions, outside of attacker control, such as allocation at the very end of the available memory address range. Relatively large data blocks remain a necessity for a correspondingly good success perspective. Previously published conditions still apply to design an interesting attack scenario. With most LZ4 programs using small blocks, it makes overflow risk a rarity, if not provably impossible depending on allocation rules.
Still, with a fix available, updating your 32-bits applications to r119+ remains a recommended move.

[Edit 4] : End of Linux kernel risk assessment. This potential overflow has no useful attack scenario for now. It is nonetheless getting fixed, to protect future applications. Knowing current list of applications using LZ4 within the kernel, the only remaining attack scenario is a boot image modification. When such a scenario is possible, then you've got a lot more to worry about, a non-guaranteed potential overflow under cooperative conditions pales in comparison of a direct modification of the boot image, inserting typically some worm code.

--------------------------------------------------------------------------------

[Edit] : Sight, and now to the next aggression level : an organized straw-man campaign launched over social media, attributing me words I never said or that I strongly disagree with. I can only wish such thing never happen to you. Today, I feel compelled to put some records straight for the public :
  • Analyzing software for vulnerabilities is not nice, it's great
  • And of course, vulnerabilities must be fixed, and timely disclosure is a great tool to reach that goal
OK, so where is the problem ?
  • Broadcasting a vulnerability to the wild without even providing a single notification to upstream developer is a deliberate harmful move; it cannot get close to the definition of ethical disclosure, under no possible metric
  • Conflating gravity levels to spread meritless fears and harvest free ad is a despicable scare-mongering practice
The debate over Responsible Disclosure is not new. In factit is gaining strength, precisely because software becomes ubiquitous. Long gone are the day when a vulnerability would put at risk some isolated computers primarily used to play games. Computing is now interconnected, and the backbone of our most critical services. With Internet of Things, it's going to be present into everyday devices, including medical equipments, surveillance systems, smartgrid probes, etc.

In Responsible Disclosure, there is Disclosure, which is a good thing. There is also Responsible. For CERT, it translates into calm vulnerability classification, a notification, and a fix delay. There is a huge difference between a notification, even into a public issue board, and a communication campaign directly to the wild, designed to bank on a vulnerability which is not yet fixed since it was not yet identified, putting at risk as many people and systems as possible to strengthen its "fear factor".

In Europe, law has selected its side, ruling in simple words that providing a manual to launch a cyber attack is about as good as providing a plan for a bomb. Obviously, only the most adamant cases had to meet a judge, resulting in few convictions, mostly when the specific charges of "willful harm" were on the table. Hence, justice get involved when an offender explicitly targets a plaintiff, but public gets little protection. I believe someday, it will simply no longer be acceptable to consider public safety a dispensable collateral victim, exposed to the fire of an advertisement exercise.

4 comments:

  1. There are ways [e.g. see http://msdn.microsoft.com/en-us/library/windows/hardware/ff556232(v=vs.85).aspx] to change the behavior of 32-bit windows, so it is possible to allocate beyond 0x80000000 on some windows machines. This is not a common configuration though; I remember somebody at my workplace doing that some years ago and then promptly hitting defects in software that could not handle addresses with the top-most bit set.

    Apart from that, I fully agree with everything you wrote Yann :).

    ReplyDelete
    Replies
    1. Yeah, I guess it has probably become an implicit assumption for many Windows 32-bits programs.

      Delete
  2. However, no everyone out there programs in Windows, Linux, MacOS, Android or IOS.

    The code I am working does allocate memory above the 2GB range. This means that the 32-Bit OS version can be affected.

    Earl Colby Pottinger

    ReplyDelete
    Replies
    1. Yes, you're right. That's why updating to r119 is recommended for all 32-bits systems. It's too complex to check the "high address range" condition.

      Delete