I was planning to deploy a test environment for a new application today, then the release of Docker SwarmKit came. I saw this as the perfect opportunity to spend part of the day giving SwarmKit a try. This post is a very early look at my experience installing SwarmKit on EC2 servers.
At Replicated we write a platform which allows SaaS companies (including several dev tools) to deploy into private environments by using Docker. I’ve become quite familiar with the ins and outs of schedulers and orchestrators while building our platform over the past 2 years. We’ve even built our own scheduler and orchestration runtime for Docker containers to support some of our early customers. I’ve set up and run Kubernetes and Mesosphere clusters, and am familiar with running a containerized production environment.
After going through the process of deploying an application using
swarmctl, I did a quick analysis of what I like about Swarmkit vs. the missing tooling that I would have to find (or likely build) to get it working for a production environment. And while I was writing this, there were more commits made to the README showing some more use cases and how to use what’s built-in. I’d recommend checking out the latest docs on SwarmKit to see what’s new there.
Setting up a cluster was quite easy. I run a couple of commands, and everything synced up. Docker’s decision to build service discovery in here was great. It’s really not hard to set up a new cluster. Comparatively, setting up a Kubernetes cluster takes more effort unless you want to use Google Container Engine.
Obviously, running containers is the reason I set up a cluster. And I was able to easily run my container, scale instances up and down, edit environment variables and other properties of the containers, without much effort. Even when I had something wrong in my container which caused it to restart, when I
swarmctl service rm‘ed,
swarmd did a really good job of cleaning up old stopped containers.
I’ve been using Docker for a while, and
swarmctl did not introduce a big learning curve. I know it’s early and will get more complex quickly as new features are added. But right now, this is very approachable and easy to get a cluster running.
Secure By Default
I didn’t have to manually create or transfer any TLS certs around. This cluster auto provisioned pretty easily, and just worked. This is a great example of giving me a secure cluster easily. If we had to optionally transfer certs to secure a cluster, there would be a lot of insecure clusters out there. I like that all communication is happening over TLS encrypted connections.
Honestly, I didn’t play with it long enough to judge stability. SwarmKit was announced less than 24 hours ago, so production-grade stability will come with some time. It’s definitely built on proven technology (Docker, Raft), and stability and security seem to be very solid.
You can add, remove, and drain traffic from a node. This is a building block that I can use to build a custom update policy on. If I have a custom and complex upgrade strategy, I think I could build on top of this functionality to create rolling upgrade functionality.
To get this deployed into good production environment with change management, release history, and all of the features I’d want, I need to be able to define a ServiceSpec in a file and pass it into the service create/update command. Instead of
swarmctl service create --name redis --image=redis --env KEY=VALUE, I want to use
swarmctl service create --file redis.yaml. When I run
swarmctl service create --help it shows this as an argument, but I looked at the code. I don’t think it’s implemented yet. There is some discussion at https://github.com/docker/swarmkit/issues/537, but the discussion is a little confusing because it doesn’t match up to the code. I think there are some old issues that aren’t relevant now and need to be cleaned up. I’m not positive that the help text of the CLI matches what is actually supported.
Load Balancers / Ingress
It’s not immediately obvious what Docker’s plan is here. I’m not sure how I’m expected to set up ingress to these containers. There’s an ingress network defined by default, but I’m running in EC2, so I’d like to use an ELB. My servers don’t have public IPs, so how can I expose my service? I definitely would like
swarmd to manage this for me because it’s the scheduler, it places containers. I couldn’t yet make this work, and I’m not sure if it’s because I couldn’t figure it out, or if it’s just not yet supported completely.
While I can
swarmctl service update api ..., I’d like to define very specific and custom rolling update strategies. I need to safely shut down my container, stop it, and restart the new version. And I want some control to have this stop if the upgrade isn’t working. I know I’ll have to write some code to support this, and I’m eager to. I just need to figure out how to integrate this into SwarmKit. I think some defaults and samples would be a great addition here.
I want to run swarmd as a system service, from a known release. I want to be able to push this service out and manage it with upstart or systemd. This isn’t hard to do myself and does not need to be a core part of the service.
Remote Cluster Logs
With Kubernetes, I can run
kubectl logs <podname>. I’d love to be able to
swarmctl service <service-name> logs -f for debugging or monitoring a running system.
Private Repository Credentials
I need to deploy private images that I store on Docker Hub and quay.io. I think I can manage this with
docker-machine, but it wasn’t immediately obvious. In the above deployment, I manually pulled my image from quay.io to each node in the cluster, which wouldn’t work in a production environment.
My Initial Setup & Deploy Process
How did I come to those conclusions? Well, I went through the process to set up and deploy a relatively standard SaaS environment into SwarmKit; below is a chronicle of my initial experience with deploying a cluster with Swarmkit.
The deployment for this application shouldn’t be a difficult one. It involves a few components:
- A static, React built site
- Another static, React built site
- An API written in Golang
- A worker, also written in Golang
I can scratch MySQL, Elasticsearch and RabbitMQ from today’s deployment. I’ll swap them out for hosted services today (RDS, elastic.co and SQS). But I want to make sure that anything I deploy to my own instances is managed in a scalable Docker cluster. And I will eventually ship this to private environments, so no significant propietary tools are allowed.
I decided to set up an entirely new VPC in us-west-1 (Northern California) to do this. I set up 6 subnets (2 public, 2 private and 2 db). Now I have address space in us-west-1a and us-west-1c available for everything. I set up an OpenVPN server and configured it so that servers in the private subnets don’t have public IP addresses but I can access them over the VPN. This will make life easy for now.
Great, I’m in. But I need a manager machine also. So I turned on a t2.medium instance in one of my private subnets to get started. This is the machine I will use to manage the cluster, push out updates, and troubleshoot/monitor the containers.
This is the part I’ve been waiting for. The last hour was just setting up some infrastructure; now it’s time to play with the new stuff!
Next, I install Docker 1.11.2 on the management node. This is the current release of Docker, and the one I would expect SwarmKit to be compatible with. And I need to build the binaries from the master (only branch) of the repo to get started.
$ git clone https://github.com/docker/swarmkit.git Cloning into 'swarmkit'... remote: Counting objects: 11236, done. remote: Compressing objects: 100% (27/27), done. remote: Total 11236 (delta 8), reused 0 (delta 0), pack-reused 11209 Receiving objects: 100% (11236/11236), 6.94 MiB | 1.49 MiB/s, done. Resolving deltas: 100% (7199/7199), done. Checking connectivity... done. $ docker run -it -v `pwd`/swarmkit:/go/src/github.com/docker/swarmkit golang:1.6 /bin/bash Unable to find image 'golang:1.6' locally 1.6: Pulling from library/golang 51f5c6a04d83: Pull complete a3ed95caeb02: Pull complete 7004cfc6e122: Pull complete 5f37c8a7cfbd: Pull complete e0297283ad9f: Pull complete a7164db3234c: Pull complete 6bb08da223d8: Pull complete c718b2eba451: Pull complete Digest: sha256:66618c0274d300e897bcd2cb83584783e66084ea636b88cb49eeffbeb7f9b508 Status: Downloaded newer image for golang:1.6 [email protected]:/go# cd /go/src/github.com/docker/swarmkit/ [email protected]:/go/src/github.com/docker/swarmkit# make binaries 🐳 bin/swarmd 🐳 bin/swarmctl 🐳 bin/swarm-bench 🐳 bin/protoc-gen-gogoswarm 🐳 binaries [email protected]:/go/src/github.com/docker/swarmkit# exit
Using these newly built binaries:
$ swarmkit/bin/swarmd -d /tmp/node-mgmt-01 --listen-control-api /tmp/mgmt-01/swarm.sock --hostname mgmt-01 Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one. INFO 4a678cf4eff2b943 became follower at term 2 INFO newRaft 4a678cf4eff2b943 [peers: , term: 2, commit: 8, applied: 0, lastindex: 8, lastterm: 2] WARN ignoring request to join cluster, because raft state already exists INFO 4a678cf4eff2b943 became follower at term 2 INFO newRaft 4a678cf4eff2b943 [peers: , term: 2, commit: 8, applied: 0, lastindex: 8, lastterm: 2] INFO Listening for local connections addr=/tmp/mgmt-01/swarm.sock proto=unix INFO Listening for connections addr=[::]:4242 proto=tcp INFO 4a678cf4eff2b943 is starting a new election at term 2 INFO 4a678cf4eff2b943 became candidate at term 3 INFO 4a678cf4eff2b943 received vote from 4a678cf4eff2b943 at term 3 INFO 4a678cf4eff2b943 became leader at term 3 INFO raft.node: 4a678cf4eff2b943 elected leader 4a678cf4eff2b943 at term 3 INFO node is ready
Great. This is better. I have a management node running.
This application isn’t going to receive too much traffic immediately. I decided to start with a relatively small, 3 node SwarmKit cluster of t2.medium instances. I turn on these servers in the private subnets and install docker-engine. My cluster has started, but it’s not a cluster yet – it’s just 3 servers sitting in a VPC that can communicate on an internal network.
I didn’t want to build the SwarmKit binaries each time, so I copied the bins to my 3 new servers and bootstrapped them as workers 1, 2 and 3 using these commands:
$ scp [email protected]<mgmt_01_address>:/home/ubuntu/swarmkit/bin/swarmd . $ scp [email protected]<mgmt_01_address>:/home/ubuntu/swarmkit/bin/swarmctl . $ swarmd -d /tmp/node --hostname work-<N> --join-addr <mgmt_01_address>:4242 Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one. INFO node is ready
Amazingly simple. I think this worked. I don’t see anything that says it failed to connect, but it also doesn’t have a successfully connected message. It’s ok, I’m going to push forward.
Back to the management node, I run:
$ export SWARM_SOCKET=/tmp/mgmt-01/swarm.sock $ ./swarmctl node ls ID Name Membership Status Availability Manager status -- ---- ---------- ------ ------------ -------------- 0fd1wrr78xdld work-1 ACCEPTED READY ACTIVE 14qektqj267gj mgmt-01 ACCEPTED READY ACTIVE REACHABLE * 2mi1lv4edolas work-3 ACCEPTED READY ACTIVE 2rvbyfbhcgi2h work-2 ACCEPTED READY ACTIVE
We have a cluster! Let’s deploy my container!
Wait, the README sort of tapers off here. I can deploy redis, but what fun is that? I want to deploy my own custom image. I have a few services to deploy, and I plan to start by creating an service to describe and run my API on multiple workers in my new cluster. I suspect that a
swarmctl service create -f <filename> is the same definition as a service in a docker-compose yaml. After experimenting and going to the code, I just don’t think this is implemented. It really doesn’t look like I can create a service from a spec file, although it’s listed in the CLI help:
$ ./swarmctl service create --help Create a service Usage: ./swarmctl service create [flags] Flags: --args value Args (default ) --env value Env (default ) -f, --file string Spec to use --image string Image --instances uint Number of instances for the service Service (default 1) --mode string one of replicated, global (default "replicated") --name string Service name --network string Network name --ports value Ports (default ) Global Flags: -n, --no-resolve Do not try to map IDs to Names when displaying them -s, --socket string Socket to connect to the Swarm manager (default "/tmp/mgmt-01/swarm.sock")
This isn’t a big deal right now. I’m going to push forward and deploy my service manually:
$ ./swarmctl service create --name api --image quay.io/my_org/api:7dab5f6 --env PROJECT_NAME=api 0icvt9xvf7ja0yspn26yfvvn8
I’m skipping over some manual steps required to get that private image pulled. And that’s only one env var. But it works and I feel very confident in extending this to support all required environment variables and volumes and ports.
$ ./swarmctl service ls ID Name Image Instances -- ---- ----- --------- 0icvt9xvf7ja0yspn26yfvvn8 api quay.io/my_org/api:7dab5f6 1
My container is deployed and running. Let’s start to figure out what swarmctl can do.
Scaling this cluster up:
$ ./swarmctl service update api --instances 2 0icvt9xvf7ja0yspn26yfvvn8 $ ./swarmctl service ls ID Name Image Instances -- ---- ----- --------- 0icvt9xvf7ja0yspn26yfvvn8 api quay.io/my_org/api:7dab5f6 2 $ ./swarmctl service inspect api ID : 0icvt9xvf7ja0yspn26yfvvn8 Name : api Instances : 2 Template Container Image : quay.io/my_org/api:7dab5f6 Env : [PROJECT_NAME=api] Task ID Service Instance Image Desired State Last State Node ------- ------- -------- ----- ------------- ---------- ---- bartp5krui1815paw2srmtd28 api 1 quay.io/my_org/api:7dab5f6 RUNNING RUNNING 3 minutes ago work-1 70kexpn10suulum0hxursil28 api 2 quay.io/my_org/api:7dab5f6 RUNNING RUNNING 41 seconds ago work-2
Cool. What else can it do? Can I update the environment variables? Yep, I can, but obviously it restarts the container(s):
$ ./swarmctl service update api --env PROJECT_NAME=api,TEST=1 0icvt9xvf7ja0yspn26yfvvn8 [email protected]:~$ ./swarmctl service inspect api ID : 0icvt9xvf7ja0yspn26yfvvn8 Name : api Instances : 2 Template Container Image : quay.io/my_org/api:7dab5f6 Env : [PROJECT_NAME=api, TEST=1] Task ID Service Instance Image Desired State Last State Node ------- ------- -------- ----- ------------- ---------- ---- 0q3kiohsucrimhcl40xomi47h api 1 quay.io/my_org/api:7dab5f6 RUNNING RUNNING 1 second ago work-1 c1kq8hx4whdbtldb7gtapn0ut api 2 quay.io/my_org/api:7dab5f6 RUNNING ACCEPTED 5 seconds ago work-2
Wrapping It Up
Docker SwarmKit looks to be an interesting addition to the scheduling and orchestration ecosystem. Until now, any reasonable-scale production infrastructure would either use Kubernetes or Mesosphere (or roll your own) to manage containers at scale. The current release of SwarmKit appears to be an early building block that we can continue to extend to support different environments. It’s not currently as feature-rich as the other popular schedulers, but it’s built on solid, proven technology and doesn’t seem to be trying to solve everything-for-everyone. I’m excited to contribute to SwarmKit and help deliver the features I need to deploy and manage this new application.
First Look at Docker SwarmKit