Proactive preparation: Learnings from the CrowdStrike outage

Many of us will be familiar with the inconvenience caused by IT failures.

From having to leave virtual meeting rooms to losing important files, a failing IT system can act as an annoying reminder of our reliance on tech. These issues were dwarfed, however, by the recent CrowdStrike outage, which saw the impact of a faulty software update amplified on a global scale, affecting thousands and causing severe disruptions to flights, banks, and healthcare services and more.

The July outage affected as many as 8.5 million Windows devices worldwide, rendering them inoperable for hours – days, in some cases. Affected devices required manual reboots, placing a significant strain on business time and resources. This meant that, whilst the faulty patch was resolved relatively quickly, the consequences had a significant knock-on effect across the operations of multiple industries – from banking to aviation.

These ramifications are a stark reminder of the complex software infrastructure that has become integral to the day-to-day functioning of many businesses. As more organisations rely on digital systems for day-to-day operations, situations like those caused by the CrowdStrike IT outage could become more common. At the same time, the supply chains underpinning this infrastructure are becoming knottier. One domino falls – in this case, CrowdStrike – and the rest can quickly follow.

So it’s vital that businesses have robust contingency plans to protect digital infrastructure and withstand the evolving technology risks facing them, particularly in the event of third-party disruption.

Small fault, huge impact

In today’s interconnected world, a failure or disruption in one part of an industry’s network can have a ripple effect, potentially compromising the entire supply chain and leading to contagion risk. The impact of such disruption is further exacerbated by organisations relying on a small number of third-party software suppliers for a critical element of their business, such as core data storage or cyber security – creating a potential single point of failure, whereby you cannot access a key service if that supplier is down or goes bust.

The CrowdStrike outage exemplified the levels of disruption this can cause. Alongside postponed hospital procedures amidst a continuing NHS backlog, more than 4,000 flights were cancelled and 35,000 delayed as a result of the issue. The incident is estimated to have cost the top 500 US companies an eye-watering £4.1 billion in financial losses.

When businesses are reliant on third-party vendors for critical services, a disruption to that supplier can leave them in limbo, unable to access the crucial technology they need for every-day processes.

Putting robust digital operational resilience plans in place can help to mitigate these risks, but businesses first need to understand where, exactly, their vulnerability points are. In today’s complex digital landscape, some weak spots can be easy to overlook.

The risk of future disruption

Preparing for future disruptions is necessary for all businesses. July’s incident was far from an isolated occurrence; in fact, it’s a relatively minor example of the vulnerabilities that pose a risk to all industries, including critical national infrastructure such as finance, healthcare and transport.

Such disruption, though different in cause, is happening constantly. Some major examples have happened in recent memory. In May, the ‘one of a kind’ Google Cloud misconfiguration left 500,000 members of Australian superannuation fund UniSuper unable to access accounts. We also recall the 2023 Silicon Valley Bank (SVB) collapse. Providing banking support to approximately half of all US venture-backed technology and healthcare companies, the failure of SVB demonstrated how business-critical services, such as financial operations, can and do suddenly fail.

The collapse or failure of a third-party supplier might feel out of a business’ control, but there are contingency plans that can be put in place to help mitigate the impact on operations.

Fostering digital resilience

We’ve already seen industry-focused pushes for greater digital operational resilience, often led by regulators.

The Digital Operational Resilience Act (DORA) from the EU is a good example of this. Requiring financial institutions to follow rules for protection, detection, containment, recovery, and repair capabilities around IT-related incidents, DORA ensures that clear, actionable recovery plans are in place to mitigate against software supplier disruption in the financial sector. Similar action is being taken in Australia, which recently finalised its Prudential Practice Guide CPG 230 on Operational Risk Management.

However, the CrowdStrike outage drew attention to the global need for this level of preparation, across every industry. Regulators should not be solely responsible for encouraging this, either. All organisations need to have clear recovery plans in place that can be activated quickly in the event of situations like software supplier disruption. For effective and robust protection, businesses should take the following steps:

1. Create contingency plans

You can’t plan for every scenario. However, having contingency plans can significantly minimise disruption if worse case scenarios occur. Clear guidance, such as knowing who to speak to about the situation and when during outages, can help financial organisations quickly identify faults in their supply chains and restore services.

2. Consider strengthened contractual agreements

Contractual obligations with software suppliers provide an added layer of protection if issues arise. These ensure that there’s a legally binding agreement in place to ensure suppliers handle the issue effectively.

Escrow agreements are also key. They protect the critical source code behind applications by keeping a current copy in escrow and can help organisations manage risk if a supplier can no longer provide software or updates. Tri-party escrow agreements between businesses, software firms, and escrow providers ensure that business critical data and digital systems remain accessible even in the event of supplier failure. So, it gives financial institutions peace of mind, especially given the future possibility of complete supplier failure.

3. Understand your supplier’s supply chain

As mentioned earlier, supply chains are complex. Software providers also rely on their own suppliers, creating an interconnected web of dependencies. Organisations in the sector should understand their suppliers’ contingency plans to handle disruptions in their wider supply chain.

Knowing these plans provides peace of mind that suppliers are also prepared for disruptions and have effective steps in place to minimise any impact.

Is your business prepared for future risks?

The CrowdStrike outage is a powerful reminder that third-party risk management is a vital part of ensuring true digital operational resilience in today’s world. It needs to be a strategic priority at every level of an organisation. This includes ensuring that risk management frameworks are robust enough to withstand unforeseen disruption or loss of business-critical services. The key is taking a proactive, preventative approach to the increasingly complex tech risk landscape businesses face.


About the Author

Adrian Ah-Chin-Kow is Global Commercial Director of Escode. Formerly Software Resilience, Escode, part of NCC Group is the world’s largest provider of Software Escrow and Verification Services. As market leaders of the Software Escrow category, our story began in 1988 when we identified that businesses were becoming increasingly technology-dependent, but no safety framework existed to protect those using the software or even the vendors themselves. Enter the Software Escrow Agreement, which is a tri-party agreement between licensee, the vendor and ourselves that verifies and safeguards the software source code in our vaults for the ultimate protection of your business-critical systems and intellectual property.

more insights