More transparency and more resilience needed

Quantum security wouldn’t make the slightest difference in most common outages and especially those we’ve seen recently.

The real danger we face is that we lean into innovation too heavily and convince ourselves that in the future all our problems will be solved or that the fixes are so complex they inevitably also rely on third parties. In truth, the problems we’ll face in IT are the same as those we’ve always done: how do we get data from A to B. Quantum security is akin to using an armoured tank to transfer the data instead of a Ford Escort. We’re still moving the same data, but doing it in a way that’s cumbersome, expensive and over-engineered. 

The reality we face is that IT outages cannot be foreseen, but in today’s global world we’re inextricably linked through IT services and infrastructure and a critical error at one of the global IT service providers can cause one system after another to collapse worldwide, like a game of dominoes. So no matter where a company resides, from Australia to Europe to the USA, if there’s a widespread outage then the impact is broad and severe. 

IT managers must therefore create more transparency to better understand these dependencies and implement countermeasures. Real cyber resilience can only be achieved if companies can bridge the failure of one of the core service providers by automatically switching to the common alternative. These disaster recovery concepts are the key to keeping a company’s core services running. And to give the IT teams enough time to analyse the incident. These are basic foundations of IT resilience. There are innovative ways in which we can utilise AI too. For instance, AI can be used to test a companys’ resiliency and recovery by running scenarios such as disasters and cyber incidents using its knowledge base. This can then highlight where challenges may occur and how you can recover and learn from these events to improve. 

The real failures are in testing and resilience. 

Common causes of outages on the vendor side are most likely to come from a failure to test software rigidly and a poor release management strategy that pushes updates widely to 100% of its end users, rather than a staggered approach, which would identify and contain any unforeseen issues to a small number of users or machines. To help address this, companies can use AI to review change requests. By pre-playing the change through AI, they can learn the effects of the change on the ecosystem – how many outages occur when a change is made, and if a downstream event occurs that has not been foreseen due to an outdated configuration management database. 

On the customer side, it’s a matter of resilience. There’s an obvious issue of placing too much trust in third party software and allowing it to be released into production before it’s been scrutinised and deemed safe within the enterprise. 

Worse, there’s poor operational resiliency that sees 100% of systems go down, when the operation should be split so any impact from a release only affects a portion of the production. Companies should look at using the intelligence from their AI analysis to reposition applications, or to split workloads to cater for unplanned outages, so that staff can keep working throughout.

Finally, these companies showed poor recovery plans. Most common outages are easily fixed, with updates from the vendor available within minutes or hours. It should not be taking days to recover. AI can be utilised to help companies’ monitor hardware, flagging failures before they occur and switching routes to avoid the problem areas, this level of Machine Learning is now a common feature, but often unused.


About the Author

Mark Molyneux is EMEA CTO at Cohesity. A modern platform for the AI era Our mission at Cohesity is simple: to protect, secure, and provide insights into the world’s data. The largest organizations around the globe rely on us to strengthen their business resilience. With the Cohesity Data Cloud, we are able to deliver on that mission. Our customers can recover from cyber events faster, manage and secure their data at enterprise scale, and gain valuable insights with our industry-leading AI capabilities.

Featured image: Adobe

more insights