Following the cybersecurity company's July 19 software update debacle which affected Windows computers worldwide, and the ensuing disruptions, CrowdStrike has released a Preliminary Post Incident Review(PIR) on the Content Configuration Update Impacting the Falcon Sensor and the Windows Operating System (BSOD). This will be more detailed in the company's full investigation in the forthcoming Root Cause Analysis that will be released publicly, according to CrowdStrike.
"On Friday, July 19, 2024 at 04:09 UTC, as part of regular operations, CrowdStrike released a content configuration update for the Windows sensor to gather telemetry on possible novel threat techniques," the company wrote in the preliminary review. "These updates are a regular part of the dynamic protection mechanisms of the Falcon platform. The problematic Rapid Response Content configuration update resulted in a Windows system crash. Systems in scope include Windows hosts running sensor version 7.11 and above that were online between Friday, July 19, 2024 04:09 UTC and Friday, July 19, 2024 05:27 UTC and received the update. Mac and Linux hosts were not impacted. The defect in the content update was reverted on Friday, July 19, 2024 at 05:27 UTC. Systems coming online after this time, or that did not connect during the window, were not impacted."
As to what Went wrong and why? The company writes: "CrowdStrike delivers security content configuration updates to our sensors in two ways: Sensor Content that is shipped with our sensor directly, and Rapid Response Content that is designed to respond to the changing threat landscape at operational speed. The issue on Friday involved a Rapid Response Content update with an undetected error."
The report continues:
"Sensor Content provides a wide range of capabilities to assist in adversary response. It is always part of a sensor release and not dynamically updated from the cloud. Sensor Content includes on-sensor AI and machine learning models, and comprises code written expressly to deliver longer-term, reusable capabilities for CrowdStrike’s threat detection engineers.
"These capabilities include Template Types, which have pre-defined fields for threat detection engineers to leverage in Rapid Response Content. Template Types are expressed in code. All Sensor Content, including Template Types, go through an extensive QA process, which includes automated testing, manual testing, validation and rollout steps.
"The sensor release process begins with automated testing, both prior to and after merging into our code base. This includes unit testing, integration testing, performance testing and stress testing. This culminates in a staged sensor rollout process that starts with dogfooding internally at CrowdStrike, followed by early adopters. It is then made generally available to customers. Customers then have the option of selecting which parts of their fleet should install the latest sensor release (‘N’), or one version older (‘N-1’) or two versions older (‘N-2’) through Sensor Update Policies.
"The event of Friday, July 19, 2024 was not triggered by Sensor Content, which is only delivered with the release of an updated Falcon sensor. Customers have complete control over the deployment of the sensor -- which includes Sensor Content and Template Types."
Microsoft in a blog post, also examined the CrowdStrike outage and provided a technical overview of the root cause.
The computing giant explains why security products use kernel-mode drivers today and the safety measures Windows provides for third-party solutions. And shares how customers and security vendors can better leverage the integrated security capabilities of Windows for increased security and reliability. Microsoft also provides a look into how Windows will enhance extensibility for future security products.
Microsoft also confirms CrowdStrike’s analysis that this was a read-out-of-bounds memory safety error in the cybersecurity developed CSagent.sys driver.
CrowdStrike Software Bug Causes Global IT Outage, Disruptions In Aviation, Other Sectors
A software update from a United States cybersecurity firm CrowdStrike on Friday(July 19), caused a widespread IT outage and 'blue screens of death,' affecting millions of Microsoft Windows devices worldwide. The incident resulted in significant disruptions to various industries, including aviation.
Hundreds of flights were canceled or delayed globally, with Delta Air Lines being particularly affected. The outage impacted airport systems, including baggage handling and security screening, causing long lines and congestion at the airports, as passengers were unable to check in or access flight information.
Many Fortune 500 companies, including airlines, are estimated to have lost up to $5.4 billion in revenues and gross profit due to the outage. The health care and banking sectors were also severely affected, with estimated losses of $1.94 billion and $1.15 billion, respectively.
In the United Kingdom, some hospitals experienced issues with electronic patient records and medical equipment. Flights were canceled or delayed, with British Airways and EasyJet among the airlines affected. Firms relying on CrowdStrike’s cybersecurity services, such as security monitoring and incident response, were also affected. Ambulance and fire services faced difficulties with communication and dispatch systems.
Also impacted in the the UK, are thousands of businesses and organizations using Microsoft products, such as Windows and Office. Amazon Web Services (AWS) users also experienced issues with their cloud services.
The cybersecurity firm has since released a software update to fix the bug.
CrowdStrike CEO George Kurtz faced backlash for his initial response on X, to the debacle. “CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts,” Kurtz wrote Friday. “Mac and Linux hosts are not impacted. This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed.”
Kurtz initially struggled to provide a timeline for when systems would be restored, leaving customers and regulators in the dark. His response was criticized for being too technical and lacking a personal touch.
Lulu Meservey, chief executive of public relations company Rostra, posted a scathing critique of the statement on social media platform X earning over 15,000 likes as she lambasted Kurtz for using “weapons-grade corpo speak.”
“Let’s be clear. Legalese doublespeak is designed to dodge and obfuscate rather than inform or communicate,” said Meservey. “This statement was obviously written by a committee of lawyers and middle managers whose only goal was to avoid legal risk and threats to their own job security. If you can’t understand what the statement is even saying, it’s working as intended.”
She criticised Kurtz for adopting a “passive voice” and described the statement as “almost comical in its efforts to dodge assigning responsibility,” before pointing out a lack of an apology.
“The first words should be ‘I’m sorry,’” she said. “This outage knocked out 911 call centres and hospitals. People literally might have died. And the company’s CEO is out here playing it down as if it’s not a big deal.”
To make matters worse, CrowdStrike offered a $10 UberEats voucher as a token of apology to its staff and partners. This gesture was widely panned as insufficient and insensitive, particularly given the significant financial losses incurred by affected businesses, estimated to be around $5.4 billion.
Kurtz, in a statement on the company’s website late on Friday afternoon, apologized once again for the outage and said that CrowdStrike was working to help restore systems.
“Nothing is more important to me than the trust and confidence that our customers and partners have put into CrowdStrike,” Kurtz said. “As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again.”
The CEO told NBC’s Today Show in the US that the problem was down to a bug in a single update. “We identified this very quickly and remediated the issue,” he said, adding that CrowdStrike was now “working with each and every customer to make sure that we can bring them back online.”
Kurtz said there had been a “negative interaction” between the update and Microsoft’s operating system, which had then caused computers to crash, sparking the global outage, which remains ongoing.
Asked how one faulty update could cause such global chaos, he said: “We have to go back and see what happened here, our systems are always looking for the latest attacks from adversaries that that are out there.”
He reiterated that there was no possibility it was a cyber-attack. However, although the problem had been identified and a fix issued, Kurtz said “it could be some time for some systems” to return to normal, stressing that they would not “just automatically recover.”
Authorities in the UK and the US Department of Transportation are investigating the incident, and airlines are reviewing their contingency plans to mitigate the impact of future outages. Kurtz is due to testify in a US congressional hearing.