Recently, a significant "bad-code attack" affected many organizations due to a problematic update from CrowdStrike Falcon, a well-regarded cybersec...
For further actions, you may consider blocking this person and/or reporting abuse
The software supply chain is broken.
One company shipping updates that affect the kernel of machines that control critical systems. I mean you can blame it on the bad commits, but these are only the loudy consequences.
I choose my words. Whether you validate me or not is irrelevant.
Never said CrowdStrike update the kernel but "that affect". Cheers.
CrowdStrike doesn't "affect" the kernel either. Their code did not cause the kernel code to crash nor affect the kernel in any way. It was CrowdStrike's code that crashed because it dereferenced a null pointer.
The only reason the word "kernel" is being used at all here is because CrowdStrike's code, as a driver, runs in "kernel mode" as opposed to "user mode." Just because code runs in kernel mode doesn't mean it affects the kernel. "Kernel mode" and "kernel" are two different things.
Perhaps choose words more carefully.
The supply chain is globally broken, to me, whether it's because of Windows or not:
Their crash crashed the machines. It should not be possible, and it means any other solution with the same patterns/privileges would have cause this mess, which means it's an attack vector (complex but possible).
My bad for the "kernel mode" vs "kernel," though. While affecting the core with the same privileges as the kernel and damaging the kernel itself are two different things, it does not change my point.
A "supply chain" isn't what you think it is. This is. Windows is wholly made in Redmond; CrowdStrike's software is wholly made by them. Neither is sourcing raw materials to make software from all over the globe.
True, but not relevant to my only points that I made in my original post.
When I say stupid things, I do recognize it.
Supply chain is that for me in this case, though.
Even if it's unfortunate here, the "threat" comes from a vendor, which is enough to include the word "supply chain" in the debate:
Except in this case, there is no "someone" that infiltrated any systems. No person or group used CrowdStrike's software to infiltrate Windows systems. It was entirely CrowdStrike's fault.
yes, but it's quite the same result. A supply chain problem can be intentional or the result of an accident.
I view the "someone" as a necessary condition. Anyway, I'm not going to argue it further. Believe whatever you want.
no, it's part of the risk.
too bad :(
The driver itself wasn't updated, so while of course such should be tested, it's irrelevant here. The problem was their driver apparently does no file validation to ensure the format is sane.
Their apology is pretty worthless. It's just damage control. Their issue is one of naïveté, arrogance, or otherwise good developers being forced to “ship it” by management.
"No file validation" - agreed, that's a key issue.
But damage control is important too, though, for managing the fallout and maintaining trust, right?
Thanks for the insight, by the way!
Damage control is important to prevent the stock price from tanking too much.
Trust? They've lost that. Time and the market will tell if they can regain it. It depends on how many competitors they have and how lazy companies are to switching to another vendor.
Great overview, especially with regards to the technical details (kernel mode drivers etc), but this leaves one big question:
HOW on earth could their testing not have picked this up? I mean, if this update crashed all of their customer's computers, it would surely crash any of their "test" computers when the update got installed on it ...
There's one theory I can come up with: they produced the update, and tested it extensively, and then there was one "cowboy" in their organization who thought:
"Oh, let me just quickly add this last-minute 'improvement' that I've put together - for sure it's harmless, and I'm sure it works, no need to go through full QA/testing again!"
Meaning they probably need to further tighten their (presumably already tight) procedures ...
By the way, I totally agree with the "incremental rollout" suggestion - I was astonished when I realized that they rolled out this update to ALL of their customers at the same moment ...
I mean, it's so obvious:
Roll it out to a few customers first ... let it simmer for a day or two ... roll it out to a bigger group ... only then roll it out to everyone else.
P.S. was it really a billion computers? I read somewhere that one percent (1%) of the Windows computers worldwide was affected by this - does this mean that ONE HUNDRED BILLION Windows computers exist globally?
The rationale for doing an update to every customer simultaneously is that you want to protect them from threats ASAP.
Suppose there's a new threat in the wild. CrowdStrike codes up an update to neutralize the new threat. But suppose CrowdStrike instead did gradual updates. Suppose you're one of the customers who did not get the update first. Your computers remain vulnerable. Now suppose your computers are compromised by the threat. You blame CrowdStrike for not updating your computers in a timely manner.
Basically, CrowdStrike is damned either way.
They're not damned either way - they should simply ALWAYS do a manual (not just automated) sanity check by installing any update, no matter how tiny, on a few of their test computers before releasing it - that's what went wrong here ... but yeah it's always easy in hindsight :D
P.S. they've now announced that they do consider more gradual rollouts, even if just spaced apart by a few hours, not days or weeks, it's probably worth the tradeoff re vulnerabilities
I agree that they should have done better in-house testing. However, that's not what my comment was about. My comment was only about the policy of rolling out updates simultaneously to customers.
As for their updated policy to do skewed roll-outs, only time will tell if it's a better policy. Certainly if the rollout is spread across hours and not days, the likelihood that some customer somewhere will (a) get compromised during that roll-out window because they didn't get the update yet, (b) realize they didn't get the update as soon as they possibly could have, and (c) sues CrowdStrike as a result is extremely small — but non-zero.
Of course, if I were CrowdStrike, I'd put something into their customer contract that guards against that possibility just to be legally bullet-proof.
Yeah just put that into their contract ... there will ALWAYS be a vulnerability window no matter how short, so yeah :)
I too had the same thought, adding to that - or was it some work of an intern?
Loved that you pointed it out, actually, the term "billion" in the article was indeed metaphorical and meant to emphasize the scale of the issue. I already know that the real number is just too low.
Thanks for the feedback!
Two days later, and the mystery is solved - it slipped through their QA because there was a bug in their test/QA software itself ! So their QA program/software said "yes it's good", but in reality it wasn't ...
Still baffles me that they didn't do a quick manual sanity check as well, and ONLY relied on the automated testing, but okay ... they (the supplier) have now also indicated that in the future they want to move away from "big bang" updates, and towards more gradual rollouts - sounds like a good idea to me ;-)
It must have slipped through their QA procedures SOMEHOW but yeah it's baffling ...
It's estimated that about 8.5 million computers were directly affected by the CrowdStrike crash, which is a far way off from 1 Billion. 😂
Other users would have been indirectly affected by the services that were directly affected, like AWS, Gmail, etc. but the estimates are still far less than a billion.
Nice article though. :)
Actually, the term "billion" in the article was indeed metaphorical and meant to emphasize the scale of the issue.
I already know that the real number is just too low.
Thanks!
Loved reading the complete details :)
Appreciate the kind words, Anmol!
PS: i too have fun reading your articles as well. keep up the great work, man.
Why monkey patching insecure by design system is a bad idea. MS famous for being cheap on development, CrowdStrike is holding tradition.
This isn't really a case of monkey patching that applies mostly to non-compiled languages. The CrowdStrike case is closer either to dynamic loading or using data as code, i.e., pseudo-code.