Getting started with Docker

Virtualisation and Containerisation are an important part of today’s computing landscape. Containers offer several benefits compared to software running directly on an operating system:

  • They are isolated environments, helping with issues like security and dependencies. Running e.g. multiple versions of the same database software on a single OS can be very difficult to do, but with containers this is no problem. You can even run different OS versions on the same system.
  • Once a container is configured, it can easily be deployed on multiple systems.
  • Compared to virtual machines, containers require less resources as they don’t run their own kernel.
  • Perhaps most importantly, combined with Kubernetes it becomes possible to build redundant and thus fail-safe services.

Most of the tutorials that I’ve seen explain how to containerise some service like a web or database server, but I wanted to do something else: run a Python script for data processing. Docker is the de-factor standard for containers, with Podman being an alternative. I’ll stick to Docker here. You need Docker Desktop, the personal version is free. On Windows, Docker Desktop uses the Windows Subsystem for Linux 2 (WSL2), so make sure to enable this first, which will require enabling virtualisation in the BIOS.

Docker containers usually use Linux as operating system, so you need to pick a suitable Linux base image from the Docker Hub.

Dockerfile

A Docker container is configured with a text configuration file named Dockerfile. It’s easiest to place this in the same folder as the Python script. Here are the contents of my Dockerfile:

FROM ubuntu:22.04
RUN apt-get -y update && \
    apt-get -y install python3 pip

RUN pip install geopandas laspy[lazrs,laszip] scipy
RUN mkdir -p /home/pipeline
RUN mkdir -p /home/pipeline/data
COPY create_laz_hull.py /home/pipeline
WORKDIR /home/pipeline
ENTRYPOINT ["python3", "create_laz_hull.py" ]
CMD ["/home/pipeline/data"]

This Dockerfile specifies the following things:

  • Ubuntu 22.04 is chosen as base image.
  • RUN commands are executed during the container build. Here, first Python and PIP are installed, then the necessary Python packages are added.
  • The environment is then prepared by creating directories and copying required files (in this case only the Python script) from the host to the container. Remember that this is a Linux system, so you’ll have the usual Linux directory tree with /home, /opt, /etc and so on.
  • The WORKDIR is set, which is the directory where all commands will be executed once the container runs.
  • ENTRYPOINT defines which command to run with which arguments once the container is started. In this case, it’s running the Python interpreter with the script file as argument.
  • Finally, CMD specifies the arguments to pass to the Python script. You could add other CMD arguments if multiple steps are to be executed.

You can then build the container by opening a terminal in the folder where the Dockerscript resides and running docker build -t test .

When this is done for the first time, it may take a while as Docker first needs to download the Ubuntu image and then install Python and several Python packages. If you afterwards do small changes, like changing the ENTRYPOINT, the build will be much faster.

Technically the container can now be run with docker run test, but we’ll use docker compose instead.

Compose file

Using only Dockerfiles and docker build/run is fine for single containers, but when you want to run multiple containers or enable access to files via volumes, it makes more sense to use docker compose. The configuration is done in a YAML file called docker-compose.yaml:

services:
  computehull:
    build: .
    volumes:
     - C:/work/LAZ/:/home/pipeline/data/

With only a single container and no ports to map etc., this is quite simple.

  • computehull is the name of the service, as the Python script computed the convex hull of the supplied point cloud files.
  • build specifies where to build the container. As the Dockerfile is in the same directory, it’s .
  • volumes are used for mapping host paths to container paths. In the Dockerfile above, it has been specified that the Python script goes looking for files in /home/pipeline/data. Hence this is specified as target for the directory that is C:\work\LAZ on the host’s Windows file system.

You can then build and run the container(s) with docker compose up. I’ve found that error messages during the build process aren’t very clear, so if your container isn’t building correctly, I’d do docker build first. docker compose may also fail to apply configuration changes, which I managed to solve by running docker compose up –build instead.

If everything is Ok, the container will run and terminate once the Python script has completed.

How to go from here

If you want to run a container on a different system, the recommended way is to rebuild the container there. It is also possible to push the image to a container registry (either Docker Hub or a private one) and pull it on the target system. Multiple containers can be combined to provide services that depend on each other, such as a database and webserver. At some point, you’ll run into Kubernetes for running containers instead of bare Docker…

The Docker documentation can be found at https://docs.docker.com/.

 

Leave a comment

Your email address will not be published. Required fields are marked *