Data replication is a method of copying data to ensure that all information stays identical in real-time between all data resources

Think of database replication as the net that catches your information and keeps it falling through the cracks and getting lost. Data, however, hardly stays stagnant. It’s ever-changing. Replication is an ongoing process that ensures data from a primary database is mirrored in a replica, even one located on the other side of the planet. This helps organisations achieve sub-millisecond latency intervals and ultimately provide users with the near-real-time experiences they desire.

Explaining the process

Data replication is copying data from one host to another, like say, between two on-prem, or one on-prem to a cloud, and so forth. The point is to achieve real-time consistency with all data for all users, wherever they’re accessing the data from. Data-Driven Business Models (DDBMs), like the gaming industry, rely heavily on the analytics acquired through real-time data.

The Benefits of Data Replication

Reliability / Disaster Recovery

In case of an emergency, should your primary instance be compromised, it’s vital to have mission-critical applications safeguarded with a replica that can be swapped in its place. Disaster recovery replication methods work similarly to a backup generator; imagine blowing a critical fuse, or your power grid goes dark – you won’t have to worry because you have a backup generator to swoop in as a substitute and keep your lights running. 

Performance: By spreading the data across multiple instances, you’re helping to optimise read performance. Performance is also optimised by having your data accessible in multiple locations, thus minimising any latency issues. Also, when replicas are directed to process most of your reads, that opens up space for your primary to tackle most of the heavy lifting of writes.

Manpower: Reduction in IT labour to manually replicate data.

Examples of Data Replication

Snapshot Replication

As its name suggests, snapshot replication takes a “snapshot” of the data from the primary as it appears at a specific moment and moves it along to the replica. Like a photograph, snapshot replication captures what data looks like at a point in time, as it looks when it moves from the primary to the replica, but doesn’t account for how it is later updated. Thus, don’t use snapshot replication to make a backup.

This method is helpful for recoveries in the event of accidental deletion. Think of it like your Version History on Google Docs. Wish you could work on your presentation the way it looked four hours ago? If Google Docs takes a snapshot of your work at hourly intervals, you could click back on that version, or “snapshot,” from four hours ago and see what your information looked like then. 

Merge Replication

This method typically begins with a snapshot of the data and distributes that data to its replicas and maintains synchronisation of data between the entire system. What makes merge replication different is that it allows each node to make changes to the data independently, but merges all those updates into a unified whole. 

Merge replication also accounts for each change made at each node. To go back to our previous Google Docs example, if you’ve ever shared a document with co-workers who then leave comments and edits on your document, you’ll see who made what changes and at what time. Merge replication functions in a very similar way. 

Key-Based Replication

Also known as key-based incremental data replication, this method leverages a replication key to identify, locate and alter only the specific data that has been changed since the last update. By isolating that information, it facilitates the backup process, working with only as much load as it has to. Though key-based replication makes for a speedy method of refreshing new data, it comes with the disadvantage of failing to replicate deleted data.

Active-Active Geo-Replication: Active-Active Geo-Replication, also known as  peer-to-peer replication, works somewhat like transactional replication, as it relies on constant transactional data via nodes. With active-active, all the nodes in the same network are constantly sending data to one another by synching the database with all the corresponding nodes. All the nodes are also writable, meaning anyone can change the data, anywhere around the world, and it will reflect in all the other nodes. This guarantees real-time consistency, no matter where in the globe the change may occur.

Conflict-free Replicated Data Types (CRDTs) define how this data is replicated. In the event of a network failure with one of the replicas, or nodes, the other replicas will have all the necessary data ready to replicate once that node comes back online. This is a solid solution for enterprises that need several data centres located across the globe.

Conclusion

These days, “instantaneous” isn’t quick enough. Cutting latency down to sub-millisecond intervals is the universal objective. We’ve all seen this situation before – pressing the refresh button on a website, waiting for what feels like an eternity (seconds) to see your information updated. Latency decreases productivity for a user. Achieving near-real-time is the goal, and for organisations looking to achieve that, data replication, especially active-active geo-replication, is a great tool.


About the Author

Paula Dallabetta is Senior Product Marketing Manager at Redis. Redis makes apps faster, by creating a data foundation for a real-time world. It is the driving force behind Open-Source Redis, the world’s most loved in-memory database, and commercial provider of Redis Enterprise, a real time data platform. Redis Enterprise powers real-time services for over 8,000 organizations globally.