1. Introduction

Microservices architecture is constantly growing. It brings a lot of benefits, especially over obsolete monolith architecture. On the other hand, there are multiple challenges while developing a project using microservices. One of the most important concerns is database design. There are two crucial questions as it comes to data design. How to organize the data and where to store it?

In this tutorial, we’ll try to answer them.

2. Database per Service

There are two main options for organizing the databases when using microservices architecture:

  1. Database per service
  2. Shared database

In this section, we’ll describe the first one.

2.1. Fundamentals

By definition, microservices should be loosely coupled, scalable, and independent in terms of development and deployment. Therefore, the database per service is a preferred approach as it perfectly meets those requirements. Let’s see how it looks:

Blank diagram 1

The idea is simple. Each microservice has its own data store (whole schema or a table). Other services can’t access the data stores that they don’t own. Such a solution brings a lot of benefits.

First of all, changes to an individual database don’t impact other services. Thus, there isn’t a single point of failure in the application. So to speak, the application is more resilient.

Secondly, individual data stores are easier to scale. Moreover, the domain’s data is encapsulated within the microservice. Therefore, it’s easier to understand the service with its data as a whole. It’s especially important for new members of a development team. It will take less time and effort for them to fully understand the area they’re responsible for.

Finally, with the database per service, we’re able to use polyglot persistence. It means that we can use different database technologies for different microservices. So one service may use an SQL database and another one a NoSQL database. That’s feature allows using the most efficient database depending on the service requirements and functionality.

2.2. Drawbacks

Despite all of those benefits, there are some serious drawbacks and challenges regarding the database per service approach. As we mentioned earlier, each microservice can only access directly its own data store. Therefore, services need a communication method to exchange data. So, each service must provide a clear API.

Consequently, there is a need for a failure protection mechanism in case the communication fails. Let’s say we send payment requests from service A to service B. Service A awaits for the response to perform appropriate action basing on the result. During that, service B goes offline. We need to handle the situation and inform service A about the result when B is back online. The circuit breaker mechanism can help out here.

The next important problem is transactions. Spanning transactions across microservices can negatively impact consistency and atomicity. A similar drawback is related to complex queries. There isn’t a simple way to execute join queries on multiple data stores.

Finally, data-related operations spanned across microservices could be hard to debug in case of any problems.

3. Shared Database

A shared database is considered an anti-pattern. Although, it’s debatable. The point is that when using a shared database, the microservices lose their core properties: scalability, resilience, and independence. Therefore, a shared database is rarely used with microservices.

When a shared database seems to be the best option for the microservices project, we should rethink if we really need the microservices. Maybe the monolith would be the better choice. Let’s see how a shared database approach looks like:

Blank diagram

The use cases of using a shared database with microservices aren’t common. An example could be a temporary state while migrating the monolith to microservices. The primary benefit of the shared database over per service is transaction management. There is no need to span the transactions over the services.

Moreover, the data is fully constrained, and the appropriate radiations are preserved. Subsequently, the redundancy decreases. We can easily execute complicated queries with joins.

Another important thing is no need to exchange stored data between microservices. So, the API is simplified, and there is no problem with the consistency of data and state in case the communication fails. There are some serious drawbacks though.

Microservices with shared databases can’t easily scale. What is more, the database will be a single point of failure. Changes related to the database could impact multiple services. Besides, microservices won’t be independent in terms of development and deployment as they connect to and operate on the same database.

This pattern could be considered in cases like:

  • existing data store should be preserved
  • existing data layer codebase shouldn’t be changed
  • the transactions are crucial for the application

There are a variety of patterns that are used for managing data within a microservices architecture. In this section, we’ll briefly introduce the essential ones.

4.1. Saga Pattern

We mentioned earlier that spanning transactions across microservices can be problematic. In simple words, the transaction will be successful only if all related services successfully execute their own part. In case of a failure in one service, the whole transaction should fail. Moreover, in that case, services that already did their part should roll back the changes.

In general, that’s what the saga pattern is responsible for. The Saga pattern is a sequence of local transactions that represent a single distributed transaction. Each service executes a local transaction. If the local transaction ends successfully, an event or message is published that triggers the next local transaction in the sequence. In case of failure, saga provides compensating transactions that roll back the changes.

There are two types of implementing the saga pattern :

  • Orchestration – central controller (orchestrator) manages all interactions between microservices
  • Choreography – decentralized technique of broadcasting events

4.2. CQRS

CQRS (Command Query Responsibility Segregation) helps with another important feature: querying related data from multiple data stores. Moreover, it simplifies the complexity of business logic by separating concerns. Additionally, it helps with the scalability of microservices.

The idea is simple. We’re separating the data layer from the business logic layer. Further, classes can only write to the database (Command) or read from it (Query). So, a single class can’t do both. That approach results in many benefits. The code is clearer and easier to maintain or extend. Different components can be separately optimized, developed, and what’s especially important, scaled.

Subsequently, the components are loosely coupled, and work can be effectively split between developers or teams. Finally, the application divided into components is easier to test. There isn’t one correct way to implement the CQRS pattern. The implementation can base on the domain, requirements, framework, actual state of the project, etc. CQRS is often used alongside the Event Sourcing pattern. Let’s describe that one.

4.3. Event Sourcing

A lot of modern applications rely on events for various purposes. For example, as we mentioned earlier, a service in a saga sequence atomically updates the database and publishes an event or message. Event Sourcing makes use of applications events.

Event Sourcing is a technique of representing the state by persisting state-changing events. Every time the business entity changes, the event is persisted in the event store.

As the name suggests, the event site is a database for events. It can be SQL, NoSQL, or any other way that is suitable for the project. Moreover, the event store can act as a message broker. All interested components subscribe to it. When an event is persisted, the event store delivers information to all subscribers. Publishing an event is a single atomic operation. Therefore, it provides reliability and atomicity of database operation across microservices.

Furthermore, it creates a complete audit log. In case of any problem or bug, it’s easy to research the state changes and eventually restore the valid state. Thus, debugging is less complex. Additionally, event sourcing can avoid impedance mismatch between object-oriented and relational data. To sum up, event sourcing can be a great help in microservices architecture or any event-driven application.

5. How to Choose the Database?

The first step when planning a database design in microservices is to choose the model. We already mentioned the database per service and shared database models. Also, we considered their pros, cons, and common use cases.

The second step is to pick specific database technology (or technologies) that will be most efficient for the project or service. To do that, we need to consider a few properties.

The first important parameter is read performance. Read performance can be either number of operations per second or the speed of fetch queries. Application or services related to e-commerce, CRM, banking software typically will contain features that require fetching data fast and often.

The second important property is the write performance. It’s similar to the previous one. Just, in that case, we’re writing to the database, not reading from it. If the services need to persist a lot of data or even store big blobs, this can be a core parameter.

The next one is latency. It’s a delay between user action and server response. This is especially important in user experience-related components. Good examples are live streaming applications or real-time gaming.

Another important property is resource efficiency. Usually, the fewer resources are consumed, the better. It may result in faster executions, decreased host load,  and eventual costs depending on the platform.

Last but not least, we should consider provisioning efficiency. In general, it’s how the database impacts the development, deployment, and tests of the microservices. As we already mentioned earlier, the independence of microservices in those terms is really important.

5.1. SQL vs. NoSQL

Most often, there are two technologies considered for the project or service: SQL and NoSQL. Basically, it’s more complicated, especially if it comes to NoSQL. There is a variety of NoSQL database implementations, namely. Although, in this article, we won’t elaborate on the database’s low-level implementation. Let’s compare SQL and NoSQL in general.

SQL

NoSQL

Relational

Non-relational

A single way to store the data: tables

Various implementations: column, document, graph, key-value

Heavily supports transactions

Not suitable for the heavy load of transactions

Pre-defined schema. Changes to the schema require migration

Flexible schema

Best for vertical scaling

Best for horizontal scaling

Based on ACID

Based on CAP theorem

Not suitable for large datasets

Prefered for large datasets

Synchronous execution of inserts and updates

Asynchronous execution of inserts and updates

Suitable for complex queries

Lacks of features to compose complex queries

6. Conclusion

In this article, we elaborated on database design in a microservices architecture. As we can see is a very complex task. All elements should be carefully planned and suited to the project needs to maximize its efficiency.