On Friday Morning 19th of July 2024, a faulty update from the security firm CrowdStrike put millions of corporate computers in a BSOD loop.
I will not return to the details, some articles will explain it better than me. You can find them here and here.
The root cause of the bug is updated driver files, C-00000291*.sys caused the BSOD. Removing these files solves the problem.
In the physical world, you need to start Windows in Safe Mode and remove these drivers, but it is not so easy in the cloud.
Fortunately, Microsoft has come up with a solution for Azure.
The solution is based on several Azure CLI commands. This will copy the VHD of the specified VM and then deploy a VM to access the disk and remove the faulty driver before restoring it to the initial VM.
az vm repair create -g <TargetVMResourceGroup> -n <TargetVM> --verbose
az vm repair run -g <TargetVMResourceGroup> -n <TargetVM> --run-id win-crowdstrike-fix-bootloop --run-on-repair –verbose
az vm repair restore -g <TargetVMResourceGroup> -n <TargetVM> --verbose
These commands will download the CrowdStrike Fix payload, copy the disk, create a new VM in a separate resource group, perform the fix, and replace the disk of the target VM.
The operation is very manual, and everything is interactive, you will be asked to respond Yes and you will need to provide a username and a password for the repair VM (The password must be at least 12 characters long, with a special character, a number, and different case).
It works well, but it will take at least 40 minutes per VM so scaling is difficult.
There is another solution, as the VM is in working condition for 1 to 2 minutes after the restart before entering the BSOD loop, it is possible to use a script to delete the faulty driver.
You can imagine a scenario where you stop an Azure VM then start the VM and execute a script via Run Command.
Here is an example of a script you can apply via the run command
$crowdStrikeDefaultFolder = "C:\Windows\System32\drivers\CrowdStrike\"
$crowdStrikeFolderExist = Test-Path -Path $crowdStrikeDefaultFolder -ErrorAction SilentlyContinue
If ($crowdStrikeFolderExist) {
$faultyDriverFiles = "$($crowdStrikeDefaultFolder)C-00000291*.sys"
try {
$faultyDriverFilesList = Get-ChildItem -Path $faultyDriverFiles
}
catch {
write-error "Unable to get driver list"
Write-Error -Message " Exception Type: $($_.Exception.GetType().FullName) $($_.Exception.Message)"
exit 0
}
foreach ($faultyDriver in $faultyDriverFilesList) {
try {
remove-Item -Path $faultyDriver.FullName -Force
}
catch {
write-error "Unable to delete driver file $($faultyDriver.FullName)"
Write-Error -Message " Exception Type: $($_.Exception.GetType().FullName) $($_.Exception.Message)"
}
}
}
It will not correct all the VMs, because it depends on how the VM boots and if the Azure agent can run the script before the Crowdstrike agent is fully started but it will reduce the number of affected VMs. This way you can scale the fix across several VMs.
The solution is provided "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Top comments (0)