1. Introduction
Microservices architecture is constantly growing. It brings a lot of benefits, especially over obsolete monolith architecture. On the other hand, there are multiple challenges while developing a project using microservices. One of the most important concerns is database design. There are two crucial questions as it comes to data design. How to organize the data and where to store it?
In this tutorial, we’ll try to answer them.
2. Database per Service
There are two main options for organizing the databases when using microservices architecture:
- Database per service
- Shared database
In this section, we’ll describe the first one.
2.1. Fundamentals
By definition, microservices should be loosely coupled, scalable, and independent in terms of development and deployment. Therefore, the database per service is a preferred approach as it perfectly meets those requirements. Let’s see how it looks:
The idea is simple. Each microservice has its own data store (whole schema or a table). Other services can’t access the data stores that they don’t own. Such a solution brings a lot of benefits.
First of all, changes to an individual database don’t impact other services. Thus, there isn’t a single point of failure in the application. So to speak, the application is more resilient.
Secondly, individual data stores are easier to scale. Moreover, the domain’s data is encapsulated within the microservice. Therefore, it’s easier to understand the service with its data as a whole. It’s especially important for new members of a development team. It will take less time and effort for them to fully understand the area they’re responsible for.
Finally, with the database per service, we’re able to use polyglot persistence. It means that we can use different database technologies for different microservices. So one service may use an SQL database and another one a NoSQL database. That’s feature allows using the most efficient database depending on the service requirements and functionality.
2.2. Drawbacks
Despite all of those benefits, there are some serious drawbacks and challenges regarding the database per service approach. As we mentioned earlier, each microservice can only access directly its own data store. Therefore, services need a communication method to exchange data. So, each service must provide a clear API.
Consequently, there is a need for a failure protection mechanism in case the communication fails. Let’s say we send payment requests from service A to service B. Service A awaits for the response to perform appropriate action basing on the result. During that, service B goes offline. We need to handle the situation and inform service A about the result when B is back online. The circuit breaker mechanism can help out here.
The next important problem is transactions. Spanning transactions across microservices can negatively impact consistency and atomicity. A similar drawback is related to complex queries. There isn’t a simple way to execute join queries on multiple data stores.
Finally, data-related operations spanned across microservices could be hard to debug in case of any problems.
3. Shared Database
A shared database is considered an anti-pattern. Although, it’s debatable. The point is that when using a shared database, the microservices lose their core properties: scalability, resilience, and independence. Therefore, a shared database is rarely used with microservices.
When a shared database seems to be the best option for the microservices project, we should rethink if we really need the microservices. Maybe the monolith would be the better choice. Let’s see how a shared database approach looks like:
The use cases of using a shared database with microservices aren’t common. An example could be a temporary state while migrating the monolith to microservices. The primary benefit of the shared database over per service is transaction management. There is no need to span the transactions over the services.
Moreover, the data is fully constrained, and the appropriate radiations are preserved. Subsequently, the redundancy decreases. We can easily execute complicated queries with joins.
Another important thing is no need to exchange stored data between microservices. So, the API is simplified, and there is no problem with the consistency of data and state in case the communication fails. There are some serious drawbacks though.
Microservices with shared databases can’t easily scale. What is more, the database will be a single point of failure. Changes related to the database could impact multiple services. Besides, microservices won’t be independent in terms of development and deployment as they connect to and operate on the same database.
This pattern could be considered in cases like:
- existing data store should be preserved
- existing data layer codebase shouldn’t be changed
- the transactions are crucial for the application
4. Data Related Patterns
There are a variety of patterns that are used for managing data within a microservices architecture. In this section, we’ll briefly introduce the essential ones.
4.1. Saga Pattern
We mentioned earlier that spanning transactions across microservices can be problematic. In simple words, the transaction will be successful only if all related services successfully execute their own part. In case of a failure in one service, the whole transaction should fail. Moreover, in that case, services that already did their part should roll back the changes.
In general, that’s what the saga pattern is responsible for. The Saga pattern is a sequence of local transactions that represent a single distributed transaction. Each service executes a local transaction. If the local transaction ends successfully, an event or message is published that triggers the next local transaction in the sequence. In case of failure, saga provides compensating transactions that roll back the changes.
There are two types of implementing the saga pattern :
- Orchestration – central controller (orchestrator) manages all interactions between microservices
- Choreography – decentralized technique of broadcasting events
4.2. CQRS
CQRS (Command Query Responsibility Segregation) helps with another important feature: querying related data from multiple data stores. Moreover, it simplifies the complexity of business logic by separating concerns. Additionally, it helps with the scalability of microservices.
The idea is simple. We’re separating the data layer from the business logic layer. Further, classes can only write to the database (Command) or read from it (Query). So, a single class can’t do both. That approach results in many benefits. The code is clearer and easier to maintain or extend. Different components can be separately optimized, developed, and what’s especially important, scaled.
Subsequently, the components are loosely coupled, and work can be effectively split between developers or teams. Finally, the application divided into components is easier to test. There isn’t one correct way to implement the CQRS pattern. The implementation can base on the domain, requirements, framework, actual state of the project, etc. CQRS is often used alongside the Event Sourcing pattern. Let’s describe that one.
4.3. Event Sourcing
A lot of modern applications rely on events for various purposes. For example, as we mentioned earlier, a service in a saga sequence atomically updates the database and publishes an event or message. Event Sourcing makes use of applications events.
Event Sourcing is a technique of representing the state by persisting state-changing events. Every time the business entity changes, the event is persisted in the event store.
As the name suggests, the event site is a database for events. It can be SQL, NoSQL, or any other way that is suitable for the project. Moreover, the event store can act as a message broker. All interested components subscribe to it. When an event is persisted, the event store delivers information to all subscribers. Publishing an event is a single atomic operation. Therefore, it provides reliability and atomicity of database operation across microservices.
Furthermore, it creates a complete audit log. In case of any problem or bug, it’s easy to research the state changes and eventually restore the valid state. Thus, debugging is less complex. Additionally, event sourcing can avoid impedance mismatch between object-oriented and relational data. To sum up, event sourcing can be a great help in microservices architecture or any event-driven application.
5. How to Choose the Database?
The first step when planning a database design in microservices is to choose the model. We already mentioned the database per service and shared database models. Also, we considered their pros, cons, and common use cases.
The second step is to pick specific database technology (or technologies) that will be most efficient for the project or service. To do that, we need to consider a few properties.
The first important parameter is read performance. Read performance can be either number of operations per second or the speed of fetch queries. Application or services related to e-commerce, CRM, banking software typically will contain features that require fetching data fast and often.
The second important property is the write performance. It’s similar to the previous one. Just, in that case, we’re writing to the database, not reading from it. If the services need to persist a lot of data or even store big blobs, this can be a core parameter.
The next one is latency. It’s a delay between user action and server response. This is especially important in user experience-related components. Good examples are live streaming applications or real-time gaming.
Another important property is resource efficiency. Usually, the fewer resources are consumed, the better. It may result in faster executions, decreased host load, and eventual costs depending on the platform.
Last but not least, we should consider provisioning efficiency. In general, it’s how the database impacts the development, deployment, and tests of the microservices. As we already mentioned earlier, the independence of microservices in those terms is really important.
5.1. SQL vs. NoSQL
Most often, there are two technologies considered for the project or service: SQL and NoSQL. Basically, it’s more complicated, especially if it comes to NoSQL. There is a variety of NoSQL database implementations, namely. Although, in this article, we won’t elaborate on the database’s low-level implementation. Let’s compare SQL and NoSQL in general.
SQL
NoSQL
Relational
Non-relational
A single way to store the data: tables
Various implementations: column, document, graph, key-value
Heavily supports transactions
Not suitable for the heavy load of transactions
Pre-defined schema. Changes to the schema require migration
Flexible schema
Best for vertical scaling
Best for horizontal scaling
Based on ACID
Based on CAP theorem
Not suitable for large datasets
Prefered for large datasets
Synchronous execution of inserts and updates
Asynchronous execution of inserts and updates
Suitable for complex queries
Lacks of features to compose complex queries
6. Conclusion
In this article, we elaborated on database design in a microservices architecture. As we can see is a very complex task. All elements should be carefully planned and suited to the project needs to maximize its efficiency.