To the cloud and back: Why businesses are repatriating ML workloads

The last few years have seen many enterprise companies rush to the cloud as part of their growth/agility strategy and in an effort to become known for their “cloud first” approach.

In fact, a study from Canalys shows companies spent $107 billion in cloud infrastructure globally in 2019. 

However, as the realities of the cloud are becoming better understood, there has been a rise in companies moving their workloads off from the public cloud. In fact, a recent IDC report shows 80% of companies have plans to repatriate at least some of their workloads that are currently in the public cloud. After going through an initial enthusiasm for the benefits of the cloud, it seems many IT managers are starting to realise that deciding where to run different workloads is not always a straightforward decision and a cloud only approach is less appealing than first thought. This applies specifically to machine learning (ML) workloads as the compute requirements can vary dramatically through the process.

From costs to data security and sovereignty, there are a number of things that need to be considered — and it’s unlikely to be an “all cloud” or “all on premise” situation. 

The cloud repatriation wave 

One of the leading drivers for repatriation though has been the cost, which happens to be the biggest advantage of on premises infrastructure. In the initial rush to the cloud, it was the draw of easy add ons which cloud providers could offer that attracted many business leaders. New tools and solutions that could be scaled up quickly and easily. Now though, companies are dealing with a case of “bill shock”. Keeping those infrastructures running for large periods of time in the cloud is proving expensive. The move to cloud has also brought the need for businesses to invest in reskilling or upskilling staff, as it requires different capabilities than that of managing a data center. This need to prepare the workforce has also added costs to the ‘cloud bill’ which IT managers of the past may not have accounted for. A well constructed total cost of ownership calculation is little consolation when you have to actually hand over money at the end of the month.

Another consideration leading IT professionals to leave the public cloud has been security concerns. A recent survey conducted by IDC and Ermetic revealed that nearly 80% of companies experienced a cloud data breach in the past 18 months. The potential risk of a breach in the cloud can be worse than the risk of a breach in your data center. This is because, unlike on prem, if you’re under attack you can’t solve it by simply ‘unplugging’ the network. The public cloud has also presented challenges as companies re-visit their data sovereignty strategy to ensure they adhere to regulations such as the General Data Protection Regulation (GDPR). In fact, I’ve been speaking to customers from EMEA countries who have opted out of public cloud entirely because the providers don’t have an infrastructure which is scalable and accessible in their territory, something required by more recent privacy regulations.

When deciding whether or not they should be staying in the cloud, businesses should be looking at their ML workloads on a case by case basis.

What ML workloads should businesses run, where?

Different ML workloads bring up varying levels of complexity which in turn will result in vastly different costs for hosting them in the cloud. Therefore when deciding to run machine learning workloads, the question that needs to be asked first is “what” type rather than “where” should it run.

For example, take a ML workload that’s being used to train an algorithm on image classification. This workload won’t be running constantly, but when it does, there is a need for a much larger infrastructure to support the complexity of the model. This is where the public cloud can be advantageous, as it offers a robust infrastructure with significant amounts of compute power. As they’re dealing with workloads which will only be run occasionally, businesses also won’t have to worry about exorbitant cost, as providers charge customers on an on-demand basis. 

Once you get the ML models trained to classify new images, the next step will be to run scoring workloads and batch workloads to analyse them, in real time, and label them accurately as new data comes in. These types of ML workloads tend to happen much more frequently and don’t require as much infrastructure, which makes running them on premises, or on a private cloud, either hosted or on premises, more cost effective, as they’re both cheaper options. This will also allow more control over the models being run, freeing business leaders from concerns around data sovereignty and security. As a rule of thumb, workloads that need lots of compute power but only run occasionally should be run on a public cloud, and workloads that run 24/7 and use the same amount of compute can run on premises. 

The public cloud is still undoubtedly the quickest way to get workloads running. Public cloud providers will often have newer technologies and hardware that is easier to access and scale up than that of a usual corporate IT department. But it tends to come with different security risks. Hosting your own infrastructure on premises (including private cloud) is slower to scale but a cheaper option to run always-on, lower infrastructure models. Pros and cons for all — so it’s worth taking a step back and analysing whether cloud, and if so, which cloud, will be the right choice for what you are trying to achieve. 

Just because an “all cloud” strategy may not be right, this doesn’t mean moving everything back to an all on-prem solution is the answer. In fact, many companies are finding their needs lie in a hybrid cloud infrastructure, with a combination of on premises and private and public cloud, making the most of each infrastructure with a tailored mix.

Making the most of the mix

With a hybrid infrastructure helping to address many of the issues associated with a public cloud only or on prem only solution, there are a number of things to think about to maximise this kind of a mix. Firstly, it’s important to take the time to build a machine learning lifecycle strategy that will allow you to have access and control over all your data and ML models in the future – no matter where it is sitting. If you’re working across many platforms, it can be easy to end up with silos, as different people work on different platforms. This in itself can cause issues from a regulatory, cost, and data governance perspective. 

That’s why many businesses are seeing the benefits of investing in an enterprise data cloud (EDC), to help maximise their data. An EDC is essentially a hybrid & multi cloud unified point of control to manage a businesses IT infrastructure, data and analytic workloads. All on one single platform and regardless of where the data lives. Having a platform that is optimised to run analytics in any cloud, gives businesses the flexibility needed, with the added bonus of not being tied in to a single provider. The advantage of having consistent data security, governance and control, across all of their data and infrastructure, also ticks off the requirement of safeguarding when it comes to data privacy and regulatory compliance. Solutions like this will help to ensure any decision you make about where your data is now, won’t become a problem in the future. 

Ultimately, the key is not thinking just on the immediate effects of what you need to do with your data, but looking at the potential future impact of any decision. If you’re interested in this topic, make sure to check out the talk I presented  at the Virtual AI Summit, available here. Hopefully, it will help you to leave a positive mark when making decisions about your data now, and avoid creating problems for future IT teams. 

About the Author

Jeff Fletcher is ML Cloud Lead at Cloudera. Cloudera delivers an enterprise data cloud for any data, anywhere, from the Edge to AI. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises. Learn more at

Featured image: ©Gorodenkoff Productions