First Look at Docker SwarmKit

I was planning to deploy a test environment for a new application today, then the release of Docker SwarmKit came. I saw this as the perfect opportunity to spend part of the day giving SwarmKit a try. This post is a very early look at my experience installing SwarmKit on EC2 servers.

At Replicated we write a platform which allows SaaS companies (including several dev tools) to deploy into private environments by using Docker. I’ve become quite familiar with the ins and outs of schedulers and orchestrators while building our platform over the past 2 years. We’ve even built our own scheduler and orchestration runtime for Docker containers to support some of our early customers. I’ve set up and run Kubernetes and Mesosphere clusters, and am familiar with running a containerized production environment.

After going through the process of deploying an application using swarmctl, I did a quick analysis of what I like about Swarmkit vs. the missing tooling that I would have to find (or likely build) to get it working for a production environment. And while I was writing this, there were more commits made to the README showing some more use cases and how to use what’s built-in. I’d recommend checking out the latest docs on SwarmKit to see what’s new there.

What’s Included:


Provisioning

Setting up a cluster was quite easy. I run a couple of commands, and everything synced up. Docker’s decision to build service discovery in here was great. It’s really not hard to set up a new cluster. Comparatively, setting up a Kubernetes cluster takes more effort unless you want to use Google Container Engine.

Running Containers

Obviously, running containers is the reason I set up a cluster. And I was able to easily run my container, scale instances up and down, edit environment variables and other properties of the containers, without much effort. Even when I had something wrong in my container which caused it to restart, when I swarmctl service rm‘ed, swarmd did a really good job of cleaning up old stopped containers.

Familiarity

I’ve been using Docker for a while, and swarmctl did not introduce a big learning curve. I know it’s early and will get more complex quickly as new features are added. But right now, this is very approachable and easy to get a cluster running.

Secure By Default

I didn’t have to manually create or transfer any TLS certs around. This cluster auto provisioned pretty easily, and just worked. This is a great example of giving me a secure cluster easily. If we had to optionally transfer certs to secure a cluster, there would be a lot of insecure clusters out there. I like that all communication is happening over TLS encrypted connections.

Stability?

Honestly, I didn’t play with it long enough to judge stability. SwarmKit was announced less than 24 hours ago, so production-grade stability will come with some time. It’s definitely built on proven technology (Docker, Raft), and stability and security seem to be very solid.

Node Management

You can add, remove, and drain traffic from a node. This is a building block that I can use to build a custom update policy on. If I have a custom and complex upgrade strategy, I think I could build on top of this functionality to create rolling upgrade functionality.

What’s Missing:


ServiceSpec files

To get this deployed into good production environment with change management, release history, and all of the features I’d want, I need to be able to define a ServiceSpec in a file and pass it into the service create/update command. Instead of swarmctl service create --name redis --image=redis --env KEY=VALUE, I want to use swarmctl service create --file redis.yaml. When I run swarmctl service create --help it shows this as an argument, but I looked at the code. I don’t think it’s implemented yet. There is some discussion at https://github.com/docker/swarmkit/issues/537, but the discussion is a little confusing because it doesn’t match up to the code. I think there are some old issues that aren’t relevant now and need to be cleaned up. I’m not positive that the help text of the CLI matches what is actually supported.

Load Balancers / Ingress

It’s not immediately obvious what Docker’s plan is here. I’m not sure how I’m expected to set up ingress to these containers. There’s an ingress network defined by default, but I’m running in EC2, so I’d like to use an ELB. My servers don’t have public IPs, so how can I expose my service? I definitely would like swarmd to manage this for me because it’s the scheduler, it places containers. I couldn’t yet make this work, and I’m not sure if it’s because I couldn’t figure it out, or if it’s just not yet supported completely.

Upgrade Strategies

While I can swarmctl service update api ..., I’d like to define very specific and custom rolling update strategies. I need to safely shut down my container, stop it, and restart the new version. And I want some control to have this stop if the upgrade isn’t working. I know I’ll have to write some code to support this, and I’m eager to. I just need to figure out how to integrate this into SwarmKit. I think some defaults and samples would be a great addition here.

System Services

I want to run swarmd as a system service, from a known release. I want to be able to push this service out and manage it with upstart or systemd. This isn’t hard to do myself and does not need to be a core part of the service.

Remote Cluster Logs

With Kubernetes, I can run kubectl logs <podname>. I’d love to be able to swarmctl service <service-name> logs -f for debugging or monitoring a running system.

Private Repository Credentials

I need to deploy private images that I store on Docker Hub and quay.io. I think I can manage this with docker-machine, but it wasn’t immediately obvious. In the above deployment, I manually pulled my image from quay.io to each node in the cluster, which wouldn’t work in a production environment.

My Initial Setup & Deploy Process

How did I come to those conclusions? Well, I went through the process to set up and deploy a relatively standard SaaS environment into SwarmKit; below is a chronicle of my initial experience with deploying a cluster with Swarmkit.

The deployment for this application shouldn’t be a difficult one. It involves a few components:

  • MySQL
  • Elasticsearch
  • RabbitMQ
  • A static, React built site
  • Another static, React built site
  • An API written in Golang
  • A worker, also written in Golang

I can scratch MySQL, Elasticsearch and RabbitMQ from today’s deployment. I’ll swap them out for hosted services today (RDS, elastic.co and SQS). But I want to make sure that anything I deploy to my own instances is managed in a scalable Docker cluster. And I will eventually ship this to private environments, so no significant propietary tools are allowed.

Environment

I decided to set up an entirely new VPC in us-west-1 (Northern California) to do this. I set up 6 subnets (2 public, 2 private and 2 db). Now I have address space in us-west-1a and us-west-1c available for everything. I set up an OpenVPN server and configured it so that servers in the private subnets don’t have public IP addresses but I can access them over the VPN. This will make life easy for now.

Great, I’m in. But I need a manager machine also. So I turned on a t2.medium instance in one of my private subnets to get started. This is the machine I will use to manage the cluster, push out updates, and troubleshoot/monitor the containers.

Installing SwarmKit

This is the part I’ve been waiting for. The last hour was just setting up some infrastructure; now it’s time to play with the new stuff!

Next, I install Docker 1.11.2 on the management node. This is the current release of Docker, and the one I would expect SwarmKit to be compatible with. And I need to build the binaries from the master (only branch) of the repo to get started.

$ git clone https://github.com/docker/swarmkit.git
Cloning into 'swarmkit'...  
remote: Counting objects: 11236, done.  
remote: Compressing objects: 100% (27/27), done.  
remote: Total 11236 (delta 8), reused 0 (delta 0), pack-reused 11209  
Receiving objects: 100% (11236/11236), 6.94 MiB | 1.49 MiB/s, done.  
Resolving deltas: 100% (7199/7199), done.  
Checking connectivity... done.

$ docker run -it -v `pwd`/swarmkit:/go/src/github.com/docker/swarmkit golang:1.6 /bin/bash
Unable to find image 'golang:1.6' locally  
1.6: Pulling from library/golang  
51f5c6a04d83: Pull complete  
a3ed95caeb02: Pull complete  
7004cfc6e122: Pull complete  
5f37c8a7cfbd: Pull complete  
e0297283ad9f: Pull complete  
a7164db3234c: Pull complete  
6bb08da223d8: Pull complete  
c718b2eba451: Pull complete  
Digest: sha256:66618c0274d300e897bcd2cb83584783e66084ea636b88cb49eeffbeb7f9b508  
Status: Downloaded newer image for golang:1.6

[email protected]:/go# cd /go/src/github.com/docker/swarmkit/

[email protected]:/go/src/github.com/docker/swarmkit# make binaries  
🐳 bin/swarmd
🐳 bin/swarmctl
🐳 bin/swarm-bench
🐳 bin/protoc-gen-gogoswarm
🐳 binaries

[email protected]:/go/src/github.com/docker/swarmkit# exit  

Using these newly built binaries:

$ swarmkit/bin/swarmd -d /tmp/node-mgmt-01 --listen-control-api /tmp/mgmt-01/swarm.sock --hostname mgmt-01
Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one.  
INFO[0000] 4a678cf4eff2b943 became follower at term 2  
INFO[0000] newRaft 4a678cf4eff2b943 [peers: [], term: 2, commit: 8, applied: 0, lastindex: 8, lastterm: 2]  
WARN[0000] ignoring request to join cluster, because raft state already exists  
INFO[0000] 4a678cf4eff2b943 became follower at term 2  
INFO[0000] newRaft 4a678cf4eff2b943 [peers: [], term: 2, commit: 8, applied: 0, lastindex: 8, lastterm: 2]  
INFO[0000] Listening for local connections               addr=/tmp/mgmt-01/swarm.sock proto=unix  
INFO[0000] Listening for connections                     addr=[::]:4242 proto=tcp  
INFO[0005] 4a678cf4eff2b943 is starting a new election at term 2  
INFO[0005] 4a678cf4eff2b943 became candidate at term 3  
INFO[0005] 4a678cf4eff2b943 received vote from 4a678cf4eff2b943 at term 3  
INFO[0005] 4a678cf4eff2b943 became leader at term 3  
INFO[0005] raft.node: 4a678cf4eff2b943 elected leader 4a678cf4eff2b943 at term 3  
INFO[0005] node is ready  

Great. This is better. I have a management node running.

This application isn’t going to receive too much traffic immediately. I decided to start with a relatively small, 3 node SwarmKit cluster of t2.medium instances. I turn on these servers in the private subnets and install docker-engine. My cluster has started, but it’s not a cluster yet – it’s just 3 servers sitting in a VPC that can communicate on an internal network.

I didn’t want to build the SwarmKit binaries each time, so I copied the bins to my 3 new servers and bootstrapped them as workers 1, 2 and 3 using these commands:

$ scp ubuntu@<mgmt_01_address>:/home/ubuntu/swarmkit/bin/swarmd .
$ scp ubuntu@<mgmt_01_address>:/home/ubuntu/swarmkit/bin/swarmctl .
$ swarmd -d /tmp/node --hostname work-<N> --join-addr <mgmt_01_address>:4242
Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one.  
INFO[0000] node is ready  

Amazingly simple. I think this worked. I don’t see anything that says it failed to connect, but it also doesn’t have a successfully connected message. It’s ok, I’m going to push forward.

Back to the management node, I run:

$ export SWARM_SOCKET=/tmp/mgmt-01/swarm.sock
$ ./swarmctl  node ls
ID             Name     Membership  Status  Availability  Manager status  
--             ----     ----------  ------  ------------  --------------
0fd1wrr78xdld  work-1   ACCEPTED    READY   ACTIVE  
14qektqj267gj  mgmt-01  ACCEPTED    READY   ACTIVE        REACHABLE *  
2mi1lv4edolas  work-3   ACCEPTED    READY   ACTIVE  
2rvbyfbhcgi2h  work-2   ACCEPTED    READY   ACTIVE  

We have a cluster! Let’s deploy my container!

Wait, the README sort of tapers off here. I can deploy redis, but what fun is that? I want to deploy my own custom image. I have a few services to deploy, and I plan to start by creating an service to describe and run my API on multiple workers in my new cluster. I suspect that a swarmctl service create -f <filename> is the same definition as a service in a docker-compose yaml. After experimenting and going to the code, I just don’t think this is implemented. It really doesn’t look like I can create a service from a spec file, although it’s listed in the CLI help:

$ ./swarmctl service create --help
Create a service

Usage:  
  ./swarmctl service create [flags]

Flags:  
      --args value       Args (default [])
      --env value        Env (default [])
  -f, --file string      Spec to use
      --image string     Image
      --instances uint   Number of instances for the service Service (default 1)
      --mode string      one of replicated, global (default "replicated")
      --name string      Service name
      --network string   Network name
      --ports value      Ports (default [])

Global Flags:  
  -n, --no-resolve      Do not try to map IDs to Names when displaying them
  -s, --socket string   Socket to connect to the Swarm manager (default "/tmp/mgmt-01/swarm.sock")

This isn’t a big deal right now. I’m going to push forward and deploy my service manually:

$ ./swarmctl service create --name api --image quay.io/my_org/api:7dab5f6 --env PROJECT_NAME=api
0icvt9xvf7ja0yspn26yfvvn8  

I’m skipping over some manual steps required to get that private image pulled. And that’s only one env var. But it works and I feel very confident in extending this to support all required environment variables and volumes and ports.

$ ./swarmctl service ls
ID                         Name  Image                           Instances  
--                         ----  -----                           ---------
0icvt9xvf7ja0yspn26yfvvn8  api   quay.io/my_org/api:7dab5f6  1  

My container is deployed and running. Let’s start to figure out what swarmctl can do.

Scaling this cluster up:

$ ./swarmctl service update api --instances 2
0icvt9xvf7ja0yspn26yfvvn8

$ ./swarmctl service ls
ID                         Name  Image                           Instances  
--                         ----  -----                           ---------
0icvt9xvf7ja0yspn26yfvvn8  api   quay.io/my_org/api:7dab5f6  2

$ ./swarmctl service inspect api
ID                : 0icvt9xvf7ja0yspn26yfvvn8  
Name              : api  
Instances         : 2  
Template  
 Container
  Image           : quay.io/my_org/api:7dab5f6
  Env             : [PROJECT_NAME=api]

Task ID                      Service    Instance    Image                             Desired State    Last State                Node  
-------                      -------    --------    -----                             -------------    ----------                ----
bartp5krui1815paw2srmtd28    api        1           quay.io/my_org/api:7dab5f6    RUNNING          RUNNING 3 minutes ago     work-1  
70kexpn10suulum0hxursil28    api        2           quay.io/my_org/api:7dab5f6    RUNNING          RUNNING 41 seconds ago    work-2  

Cool. What else can it do? Can I update the environment variables? Yep, I can, but obviously it restarts the container(s):

$ ./swarmctl service update api --env PROJECT_NAME=api,TEST=1
0icvt9xvf7ja0yspn26yfvvn8  
[email protected]:~$ ./swarmctl service inspect api  
ID                : 0icvt9xvf7ja0yspn26yfvvn8  
Name              : api  
Instances         : 2  
Template  
 Container
  Image           : quay.io/my_org/api:7dab5f6
  Env             : [PROJECT_NAME=api, TEST=1]

Task ID                      Service    Instance    Image                             Desired State    Last State                Node  
-------                      -------    --------    -----                             -------------    ----------                ----
0q3kiohsucrimhcl40xomi47h    api        1           quay.io/my_org/api:7dab5f6    RUNNING          RUNNING 1 second ago      work-1  
c1kq8hx4whdbtldb7gtapn0ut    api        2           quay.io/my_org/api:7dab5f6    RUNNING          ACCEPTED 5 seconds ago    work-2  

Wrapping It Up

Docker SwarmKit looks to be an interesting addition to the scheduling and orchestration ecosystem. Until now, any reasonable-scale production infrastructure would either use Kubernetes or Mesosphere (or roll your own) to manage containers at scale. The current release of SwarmKit appears to be an early building block that we can continue to extend to support different environments. It’s not currently as feature-rich as the other popular schedulers, but it’s built on solid, proven technology and doesn’t seem to be trying to solve everything-for-everyone. I’m excited to contribute to SwarmKit and help deliver the features I need to deploy and manage this new application.
First Look at Docker SwarmKit

Show Comments