Simple Introduction to Docker Swarm

My notes on Docker Swarm and why it's my favorite orchestration tool that automagically scales and manages your services with just a few commands.

I've been looking for something that could help me scale my Node.js apps on bare metal without using any Cloud services.

My requirements were:

it has to be super super simple to use and manage
it has to be able to run Docker containers
it has to be able to run on one machine but also scale to multiple machines if needed

and I found a perfect match - Docker Swarm. It's a (fancy word) orchestration tool that does exactly that. You can think of it as the easier and more approachable version of Kubernetes, even k3s.

But... it wasn't that easy to get started. The biggest issue for me was researching that stuff, why? Because Docker team is horrible at naming things. A few years back there was a product called Docker Swarm, that was later renamed to Docker Swarm Classic, but the majority of the content on the internet was still using Docker Swarm to refer to this old product. Later Docker released a new built-in version of Docker Swarm. If you have Docker installed, you already have it, you just need to activate "swarm mode".

Another confusing thing was that Docker Swarm is using docker compose format, so you can create Stacks (same term as in IaC) in docker compose format and even in a docker-compose.yml file, but... you never run any docker compose commands in a production environment. I'm guessing they did that to support local development, but this is not something I would ever be interested in! :)

Some definitions:

Swarm - a cluster of machines that are running Docker Swarm
Node/Worker - a machine that is running Docker Swarm
Service - a container that is running on a machine
Stack - a collection of services that are deployed together

There are two types of workers. Manager worker and just standard worker. Manager workers can do the same things as standard workers, but they have additional responsibilities - managing other workers. Manager workers are often used for handling persistent services like databases or observability tools.

Docker Swarm visualization

What are the coolest features of Docker Swarm though?

scaling the whole swarm is as simple as running docker swarm join --token <token> <ip>
scaling a service is as simple as running docker service scale <service-name>=<number-of-replicas> and it will automatically distribute the replicas across the swarm. And even if you detach/disconnect one worker from the swarm, it will still try to keep the specified number of replicas, automatically
since you can run multiple replicas of any service, you can also achieve almost zero downtime deployments. You can configure it to update only one replica at a time and while it's updating, the load balancer won't route the traffic to that old replica. It's not perfect though, because if your service takes a lot of time to start, and a user has bad luck and gets routed to the freshly created replica before it fully initializes, it can cause downtime - but the possibility of this is very low, so it's not a big deal, at least for me. You could use health checks to overcome that issue.
you don't have to deploy a stack over and over again, you can simply update a single service's image and it will automatically update all the replicas.

Here's the example Docker Swarm stack:

version: '3.7'
 
services:
  nginx:
    image: your.docker.registry.com/nginx:latest
    volumes:
      - ./conf.d:/etc/nginx/conf.d
    depends_on:
      - webapp
    ports:
      - 3005:3005
  webapp:
    image: your.docker.registry.com/webapp:master
    deploy:
      replicas: 10
      update_config:
        parallelism: 1
        order: start-first
        failure_action: rollback
        delay: 20s
      rollback_config:
        parallelism: 2
        order: stop-first
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
        window: 120s

If you look at the deploy.update_config section, you can see a few things:

parallelism - how many replicas can be updated at the same time - so the things I mentioned earlier about zero downtime deployments
order - you can set either start-first (first start new replica, then kill old one) or stop-first.
failure_action - what to do if the update fails. You can set it to rollback to revert the changes or continue to keep the old replica running.
delay - the delay between each replica update

If you wanted to deploy the stack you would simply do

docker stack deploy <stack_name> --with-registry-auth --compose-file=docker-compose.yml

and you could preview all the services created with

docker service ls

Then if you wanted to deploy your application, you would push a new image to your docker registry and then run:

docker service update --force --image=your.docker.registry.com/webapp:master <yourStackName_serviceName>

and that's it!

If your app went viral you could quickly scale it up by either creating new replicas

docker service scale <service-name>=<number-of-replicas>

or/and by connecting new workers to the swarm.

Published on August 19, 2024 • 4 min read