Simple Introduction to Docker Swarm
My notes on Docker Swarm and why it's my favorite orchestration tool that automagically scales and manages your services with just a few commands.
I've been looking for something that could help me scale my Node.js apps on bare metal without using any Cloud services.
My requirements were:
- it has to be super super simple to use and manage
- it has to be able to run Docker containers
- it has to be able to run on one machine but also scale to multiple machines if needed
and I found a perfect match - Docker Swarm. It's a (fancy word) orchestration tool that does exactly that. You can think of it as the easier and more approachable version of Kubernetes, even k3s.
But... it wasn't that easy to get started. The biggest issue for me was researching that stuff, why? Because Docker team is horrible at naming things. A few years back there was a product called Docker Swarm, that was later renamed to Docker Swarm Classic, but the majority of the content on the internet was still using Docker Swarm
to refer to this old product. Later Docker released a new built-in version of Docker Swarm. If you have Docker installed, you already have it, you just need to activate "swarm mode".
Another confusing thing was that Docker Swarm is using docker compose format, so you can create Stacks (same term as in IaC) in docker compose format and even in a docker-compose.yml
file, but... you never run any docker compose commands in a production environment. I'm guessing they did that to support local development, but this is not something I would ever be interested in! :)
Some definitions:
- Swarm - a cluster of machines that are running Docker Swarm
- Node/Worker - a machine that is running Docker Swarm
- Service - a container that is running on a machine
- Stack - a collection of services that are deployed together
There are two types of workers. Manager worker and just standard worker. Manager workers can do the same things as standard workers, but they have additional responsibilities - managing other workers. Manager workers are often used for handling persistent services like databases or observability tools.
What are the coolest features of Docker Swarm though?
- scaling the whole swarm is as simple as running
docker swarm join --token <token> <ip>
- scaling a service is as simple as running
docker service scale <service-name>=<number-of-replicas>
and it will automatically distribute the replicas across the swarm. And even if you detach/disconnect one worker from the swarm, it will still try to keep the specified number of replicas, automatically - since you can run multiple replicas of any service, you can also achieve almost zero downtime deployments. You can configure it to update only one replica at a time and while it's updating, the load balancer won't route the traffic to that old replica. It's not perfect though, because if your service takes a lot of time to start, and a user has bad luck and gets routed to the freshly created replica before it fully initializes, it can cause downtime - but the possibility of this is very low, so it's not a big deal, at least for me. You could use health checks to overcome that issue.
- you don't have to deploy a stack over and over again, you can simply update a single service's image and it will automatically update all the replicas.
Here's the example Docker Swarm stack:
If you look at the deploy.update_config
section, you can see a few things:
parallelism
- how many replicas can be updated at the same time - so the things I mentioned earlier about zero downtime deploymentsorder
- you can set eitherstart-first
(first start new replica, then kill old one) orstop-first
.failure_action
- what to do if the update fails. You can set it torollback
to revert the changes orcontinue
to keep the old replica running.delay
- the delay between each replica update
If you wanted to deploy the stack you would simply do
and you could preview all the services created with
Then if you wanted to deploy your application, you would push a new image to your docker registry and then run:
and that's it!
If your app went viral you could quickly scale it up by either creating new replicas
or/and by connecting new workers to the swarm.
Published on August 19, 2024 • 4 min read