1. Overview

Docker is a framework designed to ease the creation, deployment, and general management of containers. Since a container can run any kind of application, we might want to also use it for a database management system (DBMS).

In this tutorial, we’ll review how to work with Docker to handle databases.

In the first part, we’ll cover the installation of a database management system on the local machine. Then, we’ll discover how data persistence works across containers. To conclude, we’ll discuss the reliability of implementing databases in Docker production environments.

2. Running a Docker Image Locally

To begin with, we need a DBMS within a container.

2.1. Starting With a Standard Docker Image

First, we install Docker Desktop. Then, we find an existing image of the chosen DBMS from the DockerHub. Once we locate it, we pick the docker pull command from the top right corner of the page.

In this case, we work with PostgreSQL:

$ docker pull postgres

When the download is complete, the docker run command creates a running database within a Docker container. For PostgreSQL, the POSTGRES_PASSWORD environment variable can be specified with the -e option:

$ docker run -e POSTGRES_PASSWORD=password postgres

Next, we test the database container connection.

2.2. Connecting a Java Project to the Database

Let’s quickly verify the DBMS connection via a local Java project to the database using a JDBC data source.

The connection string uses the default PostgreSQL port 5432 on localhost:

jdbc:postgresql://localhost:5432/postgres?user=postgres&password=password

An error should inform us that the port has not been opened. Indeed, the database listens for connections inside the container network while the Java project runs outside of it.

To fix this issue, we map the container port to a port on the localhost. For that, we use the same port, so we map 5432 on the local machine to 5432 in the container:

$ docker run -p 5432:5432 -e POSTGRES_PASSWORD=password postgres

The connection should be working now, so we’re able to use the JDBC data source.

2.3. Running SQL Scripts

So, let’s connect to the DBMS from a shell. For example, we can run an initialization script.

First, we find the running container ID:

$ docker ps
CONTAINER ID   IMAGE      COMMAND                  CREATED          STATUS          PORTS                    NAMES
65d9163eece2   postgres   "docker-entrypoint.s…"   27 minutes ago   Up 27 minutes   0.0.0.0:5432->5432/tcp   optimistic_hellman

Then, the docker exec command with the interactive -it option runs a shell inside the container:

$docker exec -it 65d9163eece2 bash

Finally, we can connect to the database instance with the command-line client and paste an SQL script:

root@65d9163eece2:/# psql -U postgres
postgres=#CREATE DATABASE TEST;
CREATE TABLE PERSON(
  ID INTEGER PRIMARY KEY,
  FIRST_NAME VARCHAR(1000),
  LAST_NAME VARCHAR(1000)
);
...

For example, if we have a large dump file to load, we usually avoid copy-pasting. We can run the import command directly from the host instead with the docker exec command:

$ docker exec 65d9163eece2 psql -U postgres < dump.sql

This way, we also facilitate automation.

3. Persist Data With a Docker Volume

Of course, databases usually need to preserve data across restarts.

3.1. Why Do We Need Volumes?

The basic setup works as long as we use the same container, with docker container stop and start each time we need to reboot. Indeed, Docker persists data inside a temporary directory by default.

However, if we use docker run again, a new empty container is created, which removes the old data.

3.2. Docker Volumes Setup

The first task is to inspect the container and see which volume is used by the database:

$ docker inspect -f "{{ .Mounts }}" 65d9163eece2
[{volume f1033d3 /var/lib/docker/volumes/f1033d3/_data /var/lib/postgresql/data local true }] 

In this case, volume f1033d3 maps the container directory /var/lib/postgresql/data to a temporary directory /var/lib/docker/volumes/f1033d3/_data created on the host filesystem.

By using the -v option of docker run, we can modify this mapping:

$ docker run -v C:\docker-db-volume:/var/lib/postgresql/data -e POSTGRES_PASSWORD=password postgres

Now, we can see the database files created in the C:\docker-db-volume directory.

As a result, each time we use the docker run command, the data should persist**.

Also, we may want to share the configuration between team members or across different environments. We can also use a Docker Compose file, which creates new containers each time making volumes mandatory.

4. Working With Docker in Production

Docker Compose is great for sharing configuration and managing containers as stateless services. If a service fails or can’t handle the workload, we can configure Docker Compose to create new containers automatically. This is very useful for building a production cluster for implementations like REST back-ends, which are stateless by design.

However, databases are stateful, and their management can be more complex. So, let’s review different setups.

4.1. Single Instance Database

Let’s suppose we’re building a non-critical environment, for testing or in production, that tolerates periods of downtime during deployments, backups, or failure.

In this case, we don’t need a high-availability cluster, and we can simply leverage Docker Compose for a single-instance database with a basic setup:

  • simple volume for data storage, since containers run on the same machine
  • limit runs to one container at a time using the global mode

Let’s see a minimal working example:

$ cat docker-compose.yaml
version: '3'
services:       
  database:
    image: 'postgres'
    deploy:
      mode: global
    environment:
      - POSTGRES_PASSWORD=password
    ports:
      - "5432:5432"
    volumes:
      - "C:/docker-db-volume:/var/lib/postgresql/data"

Using this configuration, Docker Compose creates only one container at a time and reuses the data files from the C:\docker-db-volume directory.

However, it’s even more important in this configuration to make regular backups. In case of a configuration error, this directory could be erased or corrupted by the container.

4.2. Replicated Databases

Now, let’s assume now that the production environment is critical.

In this case, orchestration tools like Docker Swarm and Kubernetes are beneficial for stateless containers: They offer vertical and horizontal clustering, with load-balancing, fail-over, and auto-scaling capabilities.

Unfortunately, since database containers are stateful and these solutions don’t provide a volume replication mechanism, they won’t work in this case.

On the other hand, it can be dangerous to build homemade configurations because it can lead to severe data loss:

  • using shared storage like NFS or NAS for volumes doesn’t prevent data loss when the database is restarted in another instance
  • on master-slave clusters, it’s a common error to let a Docker orchestration unit elect more than one master node, which leads to data corruption

So, there are several different options:

  • implement a database-specific or hardware replication mechanism
  • subscribe to Platform-as-a-Service solutions like OpenShift, Amazon AWS, or Azure
  • use a Docker-specific replication mechanism like KubeDB and Portworx

While not optimal if we want to use Docker without extensions directly, it’s always better to protect the setup than to try and force tools on a problem.

5. Whether to Use a Docker DBMS

After getting to know how we might leverage Docker alone for a DBMS setup, let’s dive deeper into why we often want to avoid that.

5.1. Benefits of Docker DBMS

To begin with, using a container provides an isolated minimal environment that runs the DBMS. This way, we can reproduce the same context on many machines and easily upgrade, migrate, and generally handle the setup.

In fact, most major DBMS environments are containerized on sites like DockerHub, available for download:

$ docker pull postgres

Since the data context is anonymous and ethereal, we don’t need to worry about transfers beyond the basic configuration and startup options.

Importantly, containers don’t reduce the performance of database management, especially when it comes to input and output.

5.2. Drawbacks of Docker DBMS

On the negative side, not all suppliers of DBMS implementations provide support for all containerized environments. This is mainly due to the fast-paced development of Docker and container systems in general. Still, it potentially restricts database production solutions based on containers.

Part of the reason is also that split-brain scenarios can be more common with containers.

Further, we might be tempted to create several DBMS instances of the same kind that work with the same files. This can be an issue since Docker can’t resolve race conditions beyond what the database system itself can do.

On a related note, we shouldn’t kill a database container as we would a stateless service container. This can result in serious data loss.

Finally, although container storage drivers are now mature, they still represent an abstraction that can be another point of failure.

6. Conclusion

In this article, we’ve reviewed the basic configuration suitable for development, testing, and non-critical production.

Finally, we concluded that Docker has drawbacks when used in high-availability environments. Therefore, it should be avoided or coupled with solutions specialized for database clusters.


« 上一篇: Docker 容器的状态
» 下一篇: Jenkinsfile 注释