CQRS is about the separation of the write model and the read model in our system architecture. The mainstream approach in creating information systems is CRUD-based thinking where we are thinking about reading, creating, updating, and deleting a record. When it comes to complicated systems it is not a sufficient approach especially when it is needed to combine different information to provide complex or various representations of information.
One way to cope with this complexity is to separate it from the domain model where our write model passes through the domain logic for ensuring the correctness of data states.
So CQRS comes to help!
The first thing that we must accept is eventual consistency. How much your system could leverage eventual consistency, is an important question that you have to answer to measure how much CQRS is suitable for you.
Apart from how to implement CQRS, it brings many advantages to our system, such as:
- It is possible to use the most suitable technology for our specific data by leveraging polyglotism
- We could easily move to Task-Based UI
- Independently scaling. Of course, there are many situations that read models have to tolerate a high load of requests or vice versa.
- Optimized data schema. It is possible to use the third normal form Just for the Write side.
- Separation of concerns.
- And Simpler queries
The big question is: How could we build a Read Service for our read model?
The answer is, there are different techniques to do that, and depend on the situation, you can choose one of them.
I will describe them as much as I know in this article.
1- Dual Writes
It seems to be a straightforward solution to a complex problem. In one transaction write to two destinations.
But there are some drawbacks to this technique, and the first one is what if the second database doesn’t support transactions. Also, if the second one was in a separate service or process, by failing its local transaction, your system will go to the inconsistent state. In the past, when we build monoliths, we used distributed transactions to avoid this situation. Distributed transactions use the 2 phase commit protocol. It splits the commit process of the transaction into 2 steps and ensures the ACID principles for all systems. But, in microservices, it doesn’t scale well as it requires locks in its transaction management.
2- Database Sync
It’s a familiar technique for most developers. Using a timely fashion scheduler to do a batch operation that copies a transformed data in a denormalized view. Another approach is using a background service scheduler to poll changes in the database and apply it to the destination database.
Mostly, when it is hard to use event-driven approaches, this is a good solution but with some cost.
One disadvantage to this way is applying pressure to the write database. Also, there is a large inconsistency window if scheduler time has sat to larger time (e.g. in every 24 hours). Sometimes this inconsistency window becomes more important depending on the business.
Also, the background scheduler service is a failure point itself!
3- Event-Driven Approach
It is a very popular approach today. When some changes happen to states in the write databases, raise an event, then consumers will use it to apply this change to their underlying database.
Consumers have to subscribe to receive specified events from an intermediary message bus such as RabbitMQ or Kafka.
Also, it corresponds to the DDD approach, so by emitting domain events, it is easy to implement it.
4- Change Data Capture(CDC)
Change data capture refers to the process or technology for identifying and capturing changes made to a database. By allowing you to detect, capture, and deliver changed data, CDC reduces the time required for and resource costs of data warehousing while enabling continuous data integration. CDC eliminates the need for bulk load updating and inconvenient batch windows by enabling incremental loading or real-time streaming of data changes into your event stream.
It relies on the replication log to act as events emitted by the database; then we can do some stream processing upon them. Kafka Connect with Debezium is one of the useful technologies that brings everything is needed to implement the CDC.
There is some situation that using an event-driven approach is impossible or hard to adapt, e.g. when the write model is a legacy system that you cannot modify it and adapt to EDA. In this situation, the CDC could be a suitable approach.
The event-driven approach is in the application layer but CDC is in the database layer. So when it is hard to add EDA to your application CDC is a better choice.
But this approach has a drawback that it would couple other systems to our system’s physical data model, and we would have to forever keep our public entities the same as the database model.
Depend on the situation that we are in, a different technique we use to cope with the consistency problem between the write model and read model in CQRS. But Event-driven approach is the preferred approach especially when DDD is used in our microservices. But there are some situations that adapting an Event-driven approach is costly or sometimes impossible. CDC approach could be a better alternative because it’s performance and near real-time data, event streaming that leads us to a smaller inconsistency window and better performance.