Docker Swarm

After you've decentralized your Database and Storage, you can go into Docker Swarm. If you haven't done so make sure to that first.

Even though you can use Docker Swarm to deploy the application containers as the MariaDB, for example, in this tutorial, you'll go with a decentralized approach for the volumes and the applications.

Good to know

All the docker-compose.yml and .env files are available in the book repo.

What Docker Swarm is?

The best way to understand Docker Swarm is to think of it as Docker way to treat a bunch of servers as one giant computer. Meaning you're not creating containers rather services. Each service have tasks which are the actual deployed container.

Docker Swarm has many pieces, here is the must-know terminology:

Terminology	Meaning
Node	Physical server
Manager node	Server that ranked as a manager. Can change the deployed services and remove other nodes.
Worker node	Server that ranked as a worker. This server can't change anything rather just get a new tasks for a manager.
Stack	List of services inside `docker-compose.yml` file
Service	Declaration of a desired state for a given container. The state includes the container image version and various `deploy` options. For example setting the `replicas` deploy-option declares that this service desired state is to have 4 replicas of that service.
Task	An action that being send from the managers to a worker contain container details that need to run by the worker.
Replica	Number of replicas a given server will have in the Swarm. All the replicas can be in a single node, or they can be scattered all around.
Scaling	Changing the number of replicas for a given service in realtime.
Rolling update	Changing the image version (upgrade or downgrade) of a service in realtime.
Quorum	In case of more than one manager, you'll need to have the majority of the managers available before rolling any update. For example, in case you have 3 managers and 2 are down, you won't be able to roll any update. The reason is that the manager need to "consult" each other to come to a verdict before changing some of the settings, To make sure the decision made by you Docker Swarm require you to have the majority of the manager quorum available

For example, if a worker node is not responding is in charge of 3 tasks for some service. The managers will deploy that 3 tasks to other nodes.

Manage nodes

To get a list of all available nodes and their status in the Swarm, run:

shell

docker node ls

Drain manager

By default, manager nodes get assigned to tasks like any other nodes. In most cases this would be the best option. In case of big Swarm, you may want to restrict tasks to a given manager node by running:

shell

docker node update --availability drain NODE_ID

Running this command with the manager node-id will move all the current node tasks to other nodes, and block a future task assigned for this node.

Quorum

One of the hard-to-swallow-pills when it comes to Docker Swarm is the all quorum concept.

When you're running Docker in Swarm mode, you're no longer deploying containers but services. When you've more the one manager a vote will happen between the managers, the first to response will become the leader.

The leader-manager node will go over the list of request services and their desired state, and will update/delete/create what needed using tasks.

For example, you've deployed service that uses appwrite:1.3.4, now, you run in the manager the command to change the image for that service to appwrite:1.3.8. Docker Swarm will use the leader node to manage the upgrade process, so far so good.

In case the leader node becomes unavailable, then a new vote should take place. If the remaining manager node is less than the majority of the total manager nodes, Docker won't be able to update the service. This is due to the fact that no manager node can get the majority of the votes to be declared as the leader, no leader, no update.

In this case, you must do whatever you can to restore the amount of managers nodes to get back the quorum majority. If it is not possible, recover the Swarm

Here is a good interactive tutorial to understand the Raft algorithm.

Deploying

To deploy Appwrite to a Swarm you'll need to follow this steps.

Init the Swarm.
Join nodes to the Swarm as managers or workers.
Set the stack .env and docker-compose.yml files.
Deploy the Stack
Set load balancer. Docker does the rest for you.

For this example will use a small cluster composed of 5 servers:

Not inside the Swarm Decentralized server contains the Databases and Storage volumes.
Manager node
Worker 1 node
Worker 2 node
Worker 3 node

Make sure all the servers have Docker installed.

Init the Swarm

shell

docker swarm init --advertise-addr 10.0.0.1

The --advertise-addr contained the IP address in which other nodes use to connect to the manager. You can replace the 10.0.0.1 IP with either:

The manager internal IP which is accessible to other nodes - recommended for most use-cases
The manager external IP - useful for use cases when you want to join nodes that don't share the same network as the manager.

The above command will output something like this:

Swarm initialized: current node (j3wvahq4bf3zy05892grcq4fu) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-token-token 10.0.0.1:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

The line that starts with docker swarm join is the command needed to run to join nodes to your Swarm as workers.

You can get this command again any time by running:

shell

docker swarm join-token worker

In case you want to add another manager, you can run:

shell

docker swarm join-token manager

Hostname

Run this command to set the server host name to manager-1

shell

hostname manager-1

This is necessary as some of the services you'll deploy will be attached to a node with manager-1 as the hostname.

Join nodes

Inside each one of your nodes run the join command from the previous step.

shell

docker swarm join --token SWMTKN-1-token-token 10.0.0.1:2377

That's all you need to do when you want to join nodes.

Setting the stack

In your root folder create folder named appwrite. Inside that folder create these two files using the following links.

Swarm uses stacks to deploy a list of services across all the Swarm nodes. Stack uses the same yaml syntax as Docker compose. Docker will know which fields to ignore when you're deploying the file to a Swarm and vice versa.

This docker-compose.yml file has been adapted to be used as Swarm stack, while most of the file is just like the regular Appwrite docker-compose.yml file here are the differences.

Removing fields

When using docker-compose syntax for Swarm there are some attributes that you'll need to remove.

The attribute container_name is not relevant when using Swarm as the access is service-based and not by container.

The attribute restart is part of the deploy one, which is unique to Swarm.

yaml

container_name: appwrite 
restart: unless-stopped

Setting deploy

Inside each service you can see the deploy attribute contained all the information about how should Swarm deploy that service across the nodes mesh.

There are two types of deployment inside Swarm:

replica (default) - Service being replicated across the network
global — Service being deployed on each node.

For example, the appwrite service:

yaml

deploy:
      mode: global
      restart_policy:
        condition: on-failure

You can see the service is being set as global meaning each server will have one container of appwrite.

On the other hand, other services will have something like this:

yaml

deploy:
      mode: replicated
      replicas: 1
      restart_policy:
        condition: on-failure

Both services use the restart_policy to declare that the service should restart on-failure.

Adjusting Traefik

Traefik is the reverse proxy service use by Appwrite to route the traffic between the appwrite and appwrite-realtime containers.

To use Traefik with Swarm you'll need use it like so:

yaml

services:
  traefik:
    image: traefik:2.7
    <<: *x-logging
    command:
      - ...
    deploy:
      mode: global
      restart_policy:
        condition: on-failure
    ports:
      - target: 80
        published: 80
        mode: host
        protocol: tcp
      - target: 443
        published: 443
        mode: host
        protocol: tcp

The deploy is set to global and the ports are changed to host making each Traefik container parsing request on its own. Notice that the protcol is set to tcp as this necessary for routing the realtime websocket.

Using the hostname

In the appwrite-executor and appwrite-worker-functions services there's another field:

yaml

deploy:
      replicas: 1
      placement:
        constraints:
          - "node.hostname==manager-1"

The use of constraints inside the placement attribute make sure that these two services will be deployed only to node with manager-1 as is hostname.

The reason is that these two containers can't be replicated without adjustments, and the best way is to have them on the same node.

Deploying the stack

Inside /root/appwrite run this command.

shell

export $(grep -v '^#' .env | xargs) && docker stack config -c docker-compose.yml

This command will go over all the .env variables and will add them to the local environment.

Then, run

shell

docker stack deploy -c docker-compose.yml appwrite

This command will use the docker-compose.yml file to deploy stack named appwrite

In case you're changing any of the value inside these two files, you can run these two commands again, Docker Swarm will know you're just updating the settings.

To explore the deployed services, you can use these commands.

shell

# Show all services
docker services ls

# Get service logs
docker services logs SERVICE_NAME

# Get service containers details and node placement
docker services ps SERVICE_NAME

Load balancer

If you try to access any IP of those services, Docker Swarm will use Traefik to navigate you to Appwrite.

To take advantage of that, create a Load Balancer and add all the swarm nodes as a target.

Upgrade

Upgrading is separated to two steps.

1. Update the image version

Edit the image version inside your docker-compose.yml file, for example:

yaml

image: appwrite/appwrite:1.3.7 
image: appwrite/appwrite:1.3.8

Then run the deployment commands

2. Migrate

In case there's a need to migrate the database, run this command in any of your nodes

shell

docker ps

This command will return the local container names, search for the main appwrite container, it should be named something like this appwrite_appwrite.1.a3faf3e.

Then run this command replacing appwrite_appwrite.1.a3faf3e with the local appwrite container.

shell

docker compose exec appwrite_appwrite.1.a3faf3e migrate

Backup & Restore

These instructions are for backup and restoring the Swarm data.

Backup

In any manager node run

shell

# Stop Docker to prevent discrepancy

systemctl stop docker

# Backup the swarm folder (using zip)

zip -r swarm.zip /var/lib/docker/swarm

# Start Docker back again

systemctl start docker

Restore

Create a new manager node, then run:

shell

# Stop docker
systemctl stop docker

# Delete the newly created Swarm data 
rm -rf /var/lib/docker/swarm

# Restore from previous backup
unzip swarm.zip -d /var/lib/docker/

# Start docker
systemctl start docker

# Force using the restored cluster.
docker swarm init --force-new-cluster

Recover from losing the quorum

In case you can't bring up the majority of the managers, you can run this command form a working manager node.

shell

docker swarm init --force-new-cluster --advertise-addr 10.0.0.1:2377

Replace 10.0.0.1 with node IP

This special command will remove all other managers and make the current node as the leader. All services and workers will get attached to that node. As for managers, you'll need to reconnect them to the new leader.

Benchmarks

Go to Benchmarks to see how Appwrite is handling request when scaling horizontally using Swarm.

Ansible

In the book repo in the swarm folder, you'll find ansible file for automating all the Swarm installation process.

You'll need to set just the servers IP and run

shell

ansible-playbook appwrite.yml --ask-vault-pass

I love it!

Pocket size instructions

markdown

# Step-by-step summary, checklist style.

1. Create a decentralized **server** contains databases & storage drivers.
2. Create **server**
    1. Create swap file
    2. Install docker
    3. Mount the decentralized server `share` folder
    4. Create a *snapshot* and name it `swap_plus_docker`
    5. Init Swarm
    6. Add the `swarm` and `swarm-manager` tags to the server.
3. Create another 2 **servers** using the `swap_plus_docker` snapshot.
    1. Connect to the Swarm as manager
    2. Add the `swarm` and `swarm-manager` tags to the server.
4. Create 5 more **servers** using the `swap_plus_docker`.
    1. Connect as worker
    2. Add the `swarm` and `swarm-worker` tags to the server.
5. Create `docker-compose.yml` using the Swarm `docker-compose.yml` file
6. Create `.env` file using the Swarm `.env` file
7. Update, set and backup the `.env` environment variables.
8. Run `export $(grep -v '^#' .env | xargs) && docker stack config -c docker-compose.yml`
9. Run `docker stack deploy -c docker-compose.yml appwrite`
10. Create a **Load-balancer** and make balanced through all the `swarm` tagged-servers.

# 🚀 Your Swarm has been deployed

Docker Swarm ​

What Docker Swarm is? ​

Manage nodes ​

Drain manager ​

Deploying ​

Init the Swarm ​

Hostname ​

Join nodes ​

Setting the stack ​

Removing fields ​

Setting deploy ​

Adjusting Traefik ​

Using the hostname ​

Deploying the stack ​

Load balancer ​

Upgrade ​

1. Update the image version ​

2. Migrate ​

Backup & Restore ​

Backup ​

Restore ​

Recover from losing the quorum ​

Benchmarks ​

Ansible ​

Docker Swarm

What Docker Swarm is?

Manage nodes

Drain manager

Deploying

Init the Swarm

Hostname

Join nodes

Setting the stack

Removing fields

Setting deploy

Adjusting Traefik

Using the hostname

Deploying the stack

Load balancer

Upgrade

1. Update the image version

2. Migrate

Backup & Restore

Backup

Restore

Recover from losing the quorum

Benchmarks

Ansible