We know that all write requests go to the master and all read requests go to the replicas but there is an issue to achieve it because both master and replicas should be in sync always otherwise consistency problems will occur. So let's see what are the different types of replication strategies out there and how are they addressing consistency issues.
There are three types of replication methods, they are
- Synchronous replication
- Asynchronous replication
- Semi-synchronous replication
In this replication method, once the master gets a write request from the application. first, writes data into the database then sends these changes to all replicas. Then the master waits till all replicas write the changes into their database and acknowledge and finally master responds success message to the application write request.
let consider write operation when the user request for the write request, the application sends a request to the master in time T1 and master database write the changes in Time T2 and send those request to the replicas.
Replicas receive the changes in T3 and T4 and populate data changes but that time master will be waiting for replica's response. As soon as replica1 is done with task send an acknowledgment to the master server which is T5 and master still waiting for replica2's response. At time T6 replica2 finishes the task sends acknowledge to the master server. Now master got the acknowledgment of all replicas. Now master will respond as the success message to the users.
you also can easily see that the problem is Time. From time T2 to t7 master is just waiting for replica's acknowledgment by the time the user also is waiting. It affects user experience as well and what if replica2 has some problem and it takes a longer time to respond so the master database is ideally waiting for a long time. this is an unnecessary waiting time for the user. In this method, Time will grow exponentially when replicas count getting increased because the master has to wait until all replicas to acknowledge.
In the Asynchronous replication method, the master sends the confirmation to the application as soon as it has received the message and written successfully into the database then it sends the replicate request to all replicas.
This looks great, we don’t notice any performance impact as the replication happens in the background after we already got a response and if the replica is dead or slow we won’t even notice it, as the data was already sent back to the client. Life is good.
but there are two problems with this method one is replication lag and another one is we are weakening out durability guarantees lets focus of durability issue first
our problem is that if the master fails before it sends the request to replicas then we lost the changes.
Yes, it may be totally fine to take the risk, but think about dealing with financial transactions? it is a nightmare.
Another issue is that there could be some data delays between master and replicas so that we cannot send read requests to not synced replicas, this is called replication lag.
There's some middle ground, We can define some replicas to replicate synchronously and others just use asynchronous manner. this is called Semi-synchronous replication
When to use what
There is no clear answer to this question; your choice depends entirely on your business priorities. Asynchronous replication works best with projects that span across long distances and are allocated a minimal budget. It is also suitable for businesses that can afford partial data loss. On the other hand, synchronous replication is performed when reliable and long-term storage is necessary and the business cannot afford to lose any critical data. Also, semi-synchronous replication can be used to achieve the best of both.
In the next article, we will talk about How to create database replication architecture for the Postgres database.