Prioritising Proper Protection: Managing Big Data and Open Source Workloads

Organisations are dealing with more diverse data forms than ever before

Ranging from images to videos to social media posts, this data is often untagged, unknown, and unstructured. However, new data forms are only a part of the new data management challenge. As organisations grow in size, complexity and the services they offer, so too does the diversity of the workloads they must run. Modern companies operate huge applications, some of which must process billions of requests each minute from all around the world.

Fortunately, there are solutions and systems that enable businesses to cope with these new demands. Organisations are increasingly embracing big data and open source databases that are advanced and flexible enough to scale with their needs. However, outdated attitudes to data protection could end up being their Achilles’ heel.

Modern workloads require modern data protection measures. Previous systems – such as replication – are not suitable for big data or open source databases, and do not ensure they are secure.

While many big data and open source databases offer some form of protection – including snapshots and built-in recovery tools – they lack the point-in-time backup and recovery capabilities needed to achieve enterprise-grade data protection. The stakes are too high to let a workload go down, so organisations must cover all the bases from backup and recovery to analysis and data management.

Carrying critical data

Time is of the essence. Big data and open source solutions have already become crucial to businesses. According to IDC, big data will soon become a $260 billion market and more than a third of organisations work with scalable big data solutions. At the same time, open source ranks among the most popular databases companies use for mission-critical applications.

Big data and open source environments demand a range of security requirements, including point-in-time recovery of historical analysis and fast data recovery. Back up is critical – these workloads carry critical data and services no organisation can afford to lose.

The primary reason to create backups is to protect against accidental or malicious data loss due to logical or human error. In today’s digital economy data is precious. Losing it can lead to gaps in valuable insights or missing out on crucial opportunities. With GDPR now in force, it also poses the risk of severe regulatory penalties and reputational damage. Backups with off-site copies protect against site outages and complete disaster situations that result in the destruction of equipment and data environments.

Yet there are also benefits from a resilience point of view. According to a Ponemon Institute study, the mean cost of an unplanned outage per minute is $8,851, or approximately $530,000 per hour. As many big data or open source database environments are utilised for mission critical applications, this could have a significant compliance or financial impact your organisation.

Many companies may see replication solutions as sufficient to the task, but they don’t go quite far enough. Replication does provide real-time to near-real-time protection, but it doesn’t protect from logical or human error – whether accidental or malicious – that can result in data loss. Replication can also lead to expensive and resource-hungry clustering, using up unnecessary storage space when it may already be in short supply.

Manual recovery similarly has its shortcomings as a protection measure. While it’s sometimes possible to reconstruct data from the original data sources through manual recovery, in most situations the data will either be lost or unavailable from the source. That, or the reconstruction process is time-prohibitive.

Prioritising proper protection

Today, some organisations see data growth rates of 40% to 60% each year. Combined with increasingly reliance on non-traditional workloads for mission-critical applications, there has never been a more crucial time to understand the need for reliable backup and recovery. To reduce the complexity and keep up with data growth, a single unified strategy is crucial.

Ideally, organisations need backups that run as fast as possible without disrupting production activity. Businesses should look to modern, parallel streaming architectures to eliminate bottle necks and optimise storage for these demanding scale-out, multi-node workloads. Big data workload activity can grow drastically in a short space of time, so it’s also important that solutions can scale automatically and be responsive as the needs of these workloads evolve.

What’s best is that proper protection doesn’t have to be just another cost of business. By connecting all environments under one system, it becomes easier to source and utilise previously siloed data. According to IDC, organisations that can analyse all relevant data and deliver actionable insight will generate $430 billion more in productivity benefits by 2020 than their less analytically advanced peers. Backup, once an afterthought for big data volumes, is now invaluable.

The good news is that organisations can reduce complexity, keep up with data growth and enable digital transformation at the same time. Next generation big data and open source workloads allow businesses to generate insights and develop innovative features and applications they need for a competitive edge. By prioritising proper protection, organisations can stay competitive and remain relevant to their customers.

About the Author

Mark Coletta is Senior Product Manager at Veritas Technologies. At Veritas Technologies, we empower businesses of all sizes to discover the truth in information—their most important digital asset. Using our platform, customers can accelerate their digital transformation and solve pressing challenges like multi-cloud data management, data protection, storage optimization, compliance readiness and workload portability—with no cloud vendor lock-in.

Featured image: Suebsiri