Hands on with AWS Elastic Container Service for Kubernetes

Amazon’s long-awaited Elastic Container Service for Kubernetes (EKS) is here, which means everybody operating in the world of cloud-native applications and Amazon Web Services should probably develop at least a baseline understanding of what it does and how it works. Replicated is no exception: lots of end-users deploy Replicated-powered applications on AWS, and we’re continuously working to improve our support for Kubernetes as it becomes a more popular development platform for our customers. So, we did our due diligence by examining EKS, and are sharing our first impressions here.

We’ll continue to follow EKS as it matures but, in the meantime we hope this can be a valuable resource to anybody just diving into it.

eks-diagram

What is EKS

In a nutshell, EKS makes it possible to click a button and get a Kubernetes control plane running in your AWS account.

More specifically, what AWS is doing with EKS is deploying and managing a set of Kubernetes (1.10.3) components for you on managed infrastructure—these are not instances you control or can access. EKS automates the deployment and manages the process of keeping the kube-apiserver, etcd, the runtime scheduler and more running reliably in a highly available environment. You pay an hourly price, and none of your pods will be scheduled on these nodes. There also aren’t any sizing options—every EKS cluster has the same size control plane.

This is definitely a case where you’re trading control for convenience, because it’s not trivial to provision and set up a new Kubernetes cluster.

At first look, it appears that Amazon EKS is missing a lot of features, but it’s interesting to think about the transparency it’s providing. It’s not quite as easy as one-click-to-Kubernetes, but you can actually get a fully running, production-grade Kubernetes cluster easier than running kops or any other tool. But be warned: You have to do a little more work, and you should come prepared to use Terraform, CloudFormation, Ansible or some other tool to automate the pieces that Amazon doesn’t.

Jeff Barr wrote a walkthrough of setting up a new cluster using the AWS Console here. It’s good to read through this to understand how it works, but Hashicorp also shipped a Terraform Provider at the same time AWS launched EKS. If you want to deploy an EKS cluster, I’d recommend that you use Terraform.

Setting up a new test cluster using the AWS Console is a little rough. There are a few manual steps you have to do in the Console such as creating an IAM role, deploying the cluster, creating an autoscaling group, and deploying nodes. Then you have to manually create a .kube/config entry to access the cluster using kubectl. The team at Weave shipped a nice command line utility called eksctl that wraps all of this into a single command. Using eksctl is the fastest way to create a test cluster to experiment with EKS.

Comparing EKS to GKE and AKS

For anyone who has some experience with the Google Cloud or Azure managed Kubernetes services (GKE and AKS, respectively), here’s a quick comparison of how EKS stacks up on features.

GKE AKS (Preview) EKS
Automatic worker node provisioning yes yes no
Default storage class for disk allocation yes yes self-deployed
Highly available, managed master nodes yes yes yes
Private clusters yes no yes
Kubernetes versions supported for new clusters (as of 06/06/2018) 1.8.8 - 1.10.2 1.7.7 - 1.9.6 1.10.3
Ingress provisions cloud load balancer yes yes ??
Rolling node updates (automatically moving to patch releases of k8s) yes yes TBD
CNI (networking) Custom for GCP Azure CNI VPC CNI or Calico
Multi-AZ nodes yes ?? yes
Auto scaling yes yes (but needs K8S 1.10.x which isn't available yet) no (but you could create an ASG manually)
Native kubectl support yes yes yes

What EKS does

Provisions and operates the master nodes

Provisioning etcd and the Kubernetes API server aren’t trivial. Amazon has automated this, as well as bootstraping consensus between these nodes. This is a convenient feature to offer; it looks like they are also provisioning a load balancer with a static CNAME record, so you can configure kubectl once and let Amazon make sure the services continue to resolve.

Allows you to create Kubernetes nodes in a private VPC

Launching with private VPC support on day 0 is great. Most people should have private clusters, and use an internet gateway for outbound and ELB/ALB for inbound. Amazon doesn’t create that VPC for you, but relies on you having a VPC already running. Because so many people have invested into building AWS infrastructure, Amazon chose to offer support for existing VPCs and can lean on the fact that there’s a community of tools and knowledge around provisioning and connecting to VPCs.

Calico or VPC CNI for pod networking

Amazon has two offerings for container (pod) networking: Calico and a custom VPC CNI plugin. The custom VPC CNI plugin allocates IP addresses for pods right from the VPC subnet. This is nice for transparency and the ability to have a single networking layer between pods and any other service on the VPC.

However, there are limitations to the number of IP addresses that can be assigned per instance, and this can be limiting when running large workloads on EKS. If you expect to deploy more pods, then you’ll want to configure Calico as an overlay network and not use the provided VPC CNI plugin.

Integrates in-cluster RBAC with IAM

Amazon is deploying Heptio Authenticator into EKS clusters to enable a tight integration between IAM and Kubernetes RBAC. This is a nice feature that allows existing AWS accounts (including SAML) to authenticate into kubectl for management tasks. For example, it’s great that I can give folks on the team kubectl access to get logs, but not to deploy new resources.

What EKS doesn’t do

Node provisioning

Unlike other managed Kubernetes services, EKS leaves the task of provisioning nodes to the user. However, its docs do include CloudFormation templates for provisioning the remote nodes and creating an autoscaling group. While it’s sort of great that you have access to all of these underlying AWS items, it’s not really a managed service if you have to manage all of this yourself.

Because AWS has a mature and widely adopted autoscaling group product, it’s pretty easy to see why it decided to let operators manage this on their own. Perhaps Amazon is holding off on this feature to encourage more Fargate use when they launch an integration between Fargate and EKS later in 2018. Also, it gives you autoscaling based on your own criteria and provides additional ways to control costs.

Storage classes

Kubernetes uses storage classes to provision persistent volumes when a persistent volume claim is deployed to a cluster. This is a nice feature, as it allows a Kubernetes YAML file to allocate some storage, even if it doesn’t already exist.

Amazon has programmable storage (EBS) as a core and mature component of its cloud offering. But, for some reason, it requires that everyone manually define the storage class in their EKS clusters, following the instructions here. Amazon could have deployed this as a default storage class to provide a “batteries included” cluster, and allow users who didn’t want EBS storage to remove or edit. It seems like the common use case will be to deploy this, and leaving it out of the default clusters will probably cause some early challenges when adopting EKS.

Documentation

There’s simply not much documentation on EKS and there are no docs on how to set up ingress or many other common Kubernetes tasks. The troubleshooting guide looks like there hasn’t been a lot of customer feedback on the project while EKS was in preview.

Amazon doesn’t have to document Kubernetes, but it does need to provide detailed documentation on best practices for integrating an EKS cluster with other native AWS services. Documentation should at least have results when searching for Kubernetes terms like “ingress” and “pvc”. Amazon is a little late to the managed Kubernetes game, and it would be a good idea to take advantage of the existing developer knowledge.

More regions

Currently, EKS is only available in us-east-1 and us-west-2. Regulation will prevent a lot of adoption of EKS until non-US-based regions are supported. Azure’s AKS service operates in 4 regions, but they are all North America at this time. By comparison, Google Cloud’s GKE service supports zones across the globe and has the most locations, by far.

Fargate

EKS will be great when Amazon can auto provision nodes using its Fargate service. I don’t want to manage a set of workers, and deal with scaling them and updating them. Fargate should solve this problem.

Open Questions

  • When new versions of Kubernetes are released, how will this be managed and applied?
  • How current will Amazon keep its Kubernetes version? Kubernetes is a fast-moving project, and it’s pretty important to stay current.
  • How will Fargate and EKS integration look when it ships?
  • When this matures a little more, what does it mean for Elastic Container Service? Is there a future for ECS when it’s just as easy to deploy an EKS cluster?
  • When will EKS support non-US regions?