Posted on 20 Dec 2018
This is the sixth article of the Getting started with Docker series. In this article, I want to discuss how docker volumes work and how to use them to separate an application from its data.
In the first article of this series, in the “Why Docker?” section we discussed why docker technology became so popular in recent years. Docker in modern application infrastructure allows us to easily manage the application lifecycle simplifying its deployment. One of the major pain in application deployment is the upgrade. With Docker, it is a simple container replacement.
However, in our PostgreSQL cluster, we have a problem. The data folder (/home/postgres/data) is inside the container so if we replace a container we lost the data directory and the configuration.
How we can solve this issue? Docker allows the separation between binaries, data, and configuration with the volumes. Basically, you can allocate space on the host system and share it with the container, when it is destroyed space still exists.
Docker has three options for containers to store files in the host machine so that the files are persisted even after the container stops.
Volumes are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
Bind mounts are the folder on the host system shared with the container. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
Photo from https://docs.docker.com
Tmpfs mount is a disk space in memory very useful when the container needs to quickly manipulate files. Imagine, for example, an application that receives zip files that must be expanded to do some activities. In this scenario, a temporary filesystem is perfect to improve performance.
For our PostgreSQL cluster, both volumes and bind mounts are good solutions to persist data outside the container and separate them from the binaries. However, there are some benefits in using volumes over bind mounts. From the official documentation here a list of some of them:
However, if we need to have a storage space of a specific filesystem we cannot use volumes because their filesystem is the one present under the folder_/var/lib/docker/volumes_ on the host system and we cannot change it. If the application needs a specific filesystem we need to allocate it on the host system and share it with the container via bind mounts.
The docker commands that we need to learn to manage volumes are few. To create a Docker volume you can use the command:
You can check the volume created with the command:
Once created the volume and container lifecycles are unrelated. If the volume is not required any more you can remove it with the command:
If you want to bind mounts a host folder or a volume to a container use the following command:
The same -v option can be used with the “docker create” command:
In the fifth article, we discussed how to create a PostgreSQL cluster with three containers. In order to have an upgradable cluster, we need to separate PostgreSQL binaries from data and logs.
As a first step, we need the script build_volumes.sh to create the three volumes.
Here the code of the relative clean_volumes.sh script.
We need to modify the start_containers.sh to bind mount the volumes to the three containers.
The code first checks that volume exists and then binds mounts them to the containers.
The process is exactly the same as the previous article. This time even if you stop a container, no data loss occurs. The separation of data from binary allows you to run two new scenarios: upgrade and failover. You can find how to implement these scenarios here.
In this article, we discussed how to use volumes or bind mounts to separate binaries from data e logs of an application. You can download the source code here in the postgresql-cluster-volume folder. In the next article, we will see how to manage the Docker build and runtime configuration with Docker Compose instead of helper scripts (i.e. build_image.sh, clean_image.sh, build_volumes.sh, clean_volumes.sh, start_containers.sh, and stop_containers.sh).