Blog
Widespread IT Outage Tests Organizations’ Resilience, Highlights Fragility of Digital Infrastructure
ExtraHop
July 22, 2024
On Friday, July 19, a Falcon Sensor software update released by CrowdStrike caused widespread IT outages at organizations including hospitals, banks, airlines, emergency responders, shipping hubs, and media companies around the world. In an official statement, CrowdStrike said “this configuration update triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems.”
CrowdStrike stressed that the outages, which impacted Windows devices, were “not the result of or related to a cyberattack,” according to the statement. MacOS and Linux workstations were not affected by the outage. Additionally, ExtraHop was not impacted by the update or subsequent IT outages.
CrowdStrike immediately deployed a “fix” to the problem, and in a LinkedIn post, said that a significant number of the bricked Windows devices were “back online and operational.” Nevertheless, the fact that the outages stemmed from a defect in a single content update highlights a pressing need for improved organizational resilience to the risks that arise from increased digitization, cloud adoption, and the interconnectedness of the modern IT supply chain.
As highlighted by a Microsoft blog discussing the CrowdStrike incident, “this incident demonstrates the interconnected nature of our broad ecosystem — global cloud providers, software platforms, security vendors and other software vendors, and customers. It’s also a reminder of how important it is for all of us across the tech ecosystem to prioritize operating with safe deployment and disaster recovery using the mechanisms that exist.”
Indeed, this event underscores organizations’ collective dependence on a wide variety of automatic software updates and the significant disruption that can occur when those auto-updates contain flaws or otherwise don’t execute as intended across an expansive customer base.
The incident on Friday is not the first example of significant business disruption caused by IT outages and other technology failures. Previously, the fragility of the API and open-source-based digital ecosystem on which we all depend was exposed by the SolarWinds hack, the Log4j vulnerability disclosure, and the revelation of a malicious backdoor planted in the xz Utils data compression utility that is ubiquitous across Linux and other Unix-like operating systems.
Collectively, these events reveal the inherent fragility of the interconnected technology that underpins the web and drives organizations’ day-to-day operations. Thus, the issue with the recent CrowdStrike update and the outages related to it are at least in part a byproduct of centralized cloud dependencies and organizations’ lack of visibility into them. If left unchecked, this visibility void has the potential for disaster.
This most recent IT crisis underscores the need for more conscientious business resilience and disaster recovery planning, phased software updates, and for a greater focus on IT, cyber, and resiliency risks at the board level.
ExtraHop urges organizations to deepen their understanding of the complexities associated with their software supply chains and third-party dependencies. We also encourage enterprises to develop contingency plans in the event that critical vendor updates or other unexpected single points of failure cause system crashes.
Additionally, we recommend that systems administrators consider deploying software updates in stages. In this manner, sys admins can see how a sample cluster of their organization’s workstations respond to an update, without impacting the entire enterprise network. From a security perspective, we’re also aware of threat actors’ propensity to exploit crises like this and launch opportunistic attack campaigns, so the need to remain vigilant against phishing attacks goes without saying.
Finally, organizations need to seek out board leadership that is cognizant of cloud and related cyber risks to ensure that the provisions highlighted above are taken seriously by firm executives and their employees. In the end, this unfortunate incident transcends one vendor’s glitchy update. Organizations must take greater care to understand their exposures to intertwining software dependencies in the cloud and the turbulence they can cause if a critical component short circuits.
Discover more