Refactoring a Dockerfile for Image Size

Marc Campbell
 | 
Feb 4, 2016

Update

Since this post, Docker has released improved support for writing complex and still maintainable Dockerfiles. Check out our blog post on multi-stage Docker builds.

Original Post

There’s been a welcome focus in the Docker community recently around image size. Smaller image sizes are being championed by Docker and by the community. When many images clock in at multi-100 MB and ship with a large ubuntu base, it’s greatly needed. Here’s a review of the top 10 image sizes (latest tag) on Docker Hub today:

[.pre]IMAGE NAME SIZEbusybox1 MBubuntu 188 MBswarm17 MBnginx134 MBregistry 423 MBredis151 MBmysql360 MBmongo317 MBnode 643 MBdebian 125 MB[.pre]

A lot of the benefit can be had by simply using a small base image (Alpine Linux, BusyBox, etc). Enough has been written about using these base images, so I assume you’ve already picked a good one. After that, it’s up to the maintainer of the Dockerfile to know some best practices and keep the image size small. Specifically, we’ll examine the image size implications of joining multiple RUN commands onto one line and some practical examples of best practices for apt-get (ie removing the apt-get cache and --no-install-recommends).

Remove cruft in the same Dockerfile line that you added it

Docker images are built from a layered filesystem. Each layer only contain the differences between it and the one below it. At the top, you see a unified view, but the history of how it was built is maintained. Each line in a Dockerfile creates a new layer on top of the existing stack.

For example, let’s start with a Dockerfile snippet that looks like this

[.pre]ADD https://storage.googleapis.com/golang/go1.5.3.src.tar.gz /tmp# do some things with that fileRUN rm /tmp/go1.5.3.src.tar.gz[.pre]

You might think you are doing a good and responsible thing by deleting the .tar.gz file when you are done. But the layer containing that file is still part of the image. You mask it from the final image with the rm command, but the contents of that .tar.gz file is still are still in the image layer, and will still be downloaded by everyone who docker pulls your image.

It’s better to write it all on one line so it’s not committed to the image as separate layers. For example, a small rewrite of the snippet above would be:

[.pre]RUN curl -o \/tmp/go.1.5.3.src.tar.gz \https://storage.googleapis.com/golang/go1.5.3.src.tar.gz && \<do some things with the file>&& \rm /tmp/go1.5.3.src.tar.gz[.pre]

It’s not as pretty to look at, but it results in a much more efficient image size. If that line really annoys you, write it in a script, then ADD, RUN it in the Dockerfile.

Remove your apt/yum cache, but do it right!

Most Dockerfile authors know that you should apt-get remove any necessary packages. One common example is an image that’s built with curl and/or wget to download files. You can apt-get remove curl afterwards, but the layer containing them will remain present in the final image. Remove them (and all auto installed dependencies) in the same Dockerfile line you added them.

This is especially tricky for complex Dockerfiles, so let’s walk through an example.

In practice, let’s see an example

Here’s a simplified version of a typical Dockerfile that might run a python service. Don’t worry, we will optimize this.

[.pre]FROM ubuntu:14.04RUN apt-get updateRUN apt-get install -y curl python-pipRUN pip install requestsADD ./my_service.py /my_service.pyENTRYPOINT ["python", "/my_service.py"][.pre]

my_service.py is a python script that simply contains:

[.pre]#!/usr/bin/pythonprint 'Hello, world!'[.pre]

Time to build and check the image size:

[.pre]$ sudo docker build -t size .$ sudo docker imagesREPOSITORYTAG IMAGE IDCREATED VIRTUAL SIZEsizelatestda8a9be731ac4 seconds ago 360.5 MBubuntu14.04 6cc0fc2a5ee32 weeks ago 187.9 MB[.pre]

Yikes. The 188 MB base image makes sense from the table above, but we’ve practically doubled the image size to run a hello-world python script. What exactly is being reported in the 360.5 MB number? It’s the total of the “visible” layer (the top one, da8… in my example) and all layers that were used to create this top layer.

Adding a cleanup layer

We should probably clean up after ourselves. Let’s try a Dockerfile that looks like this:

[.pre]FROM ubuntu:14.04RUN apt-get updateRUN apt-get install -y curl python-pipRUN pip install requests## Clean upRUN apt-get remove -y python-pip curlRUN rm -rf /var/lib/apt/lists/*ADD ./my_service.py /my_service.pyENTRYPOINT ["python", "/my_service.py"][.pre]

Building and checking on that yields:

[.pre]$ sudo docker build -t size .$ sudo docker imagesREPOSITORYTAG IMAGE IDCREATED VIRTUAL SIZEsizelatestc6dacdd006602 seconds ago 361.3 MBubuntu14.04 6cc0fc2a5ee32 weeks ago 187.9 MB[.pre]

It grew larger (slightly)! Cleaning up after ourselves has backfired!

Cleaning up in the same layer

Let’s try collapsing the apt operations into a single line:

[.pre]FROM ubuntu:14.04RUN apt-get update && \apt-get install -y curl python-pip && \pip install requests && \apt-get remove -y python-pip curl && \rm -rf /var/lib/apt/lists/*ADD ./my_service.py /my_service.pyENTRYPOINT ["python", "/my_service.py"][.pre]

Building and running this version yields:

[.pre]$ sudo docker build -t size .$ sudo docker imagesREPOSITORYTAG IMAGE IDCREATED VIRTUAL SIZEsizelateste531f8674f339 seconds ago 338 MBubuntu14.04 6cc0fc2a5ee32 weeks ago 187.9 MB[.pre]

Ok, that made it smaller. But why is it still so huge? I was expecting a lot less.

More apt-optimizations

It turns out that apt-get install brings along a handful of other “recommended” packages. Recommended packages for apt are simply dependencies that may or may not be required. Some users will require them because of their environment or how they use the package, but it’s not always a requirement.

Running pip on Ubuntu 14.04, it’s very easy to confirm that there are no side effects of removing the recommended packages from this installation. This is something you should definitely test before you ship this off to production. A quick scan of the official packages on Docker Hub show that redis, mysql, mongo, postgres, elasticsearch and more use this technique to make their images smaller.

Let’s try it again with --no-install-recommends in the apt-get.

[.pre]FROM ubuntu:14.04RUN apt-get update && \apt-get install -y --no-install-recommends curl python-pip && \pip install requests && \apt-get remove -y python-pip curl && \rm -rf /var/lib/apt/lists/*ADD ./my_service.py /my_service.pyENTRYPOINT ["python", "/my_service.py"][.pre]

Building and running this version yields:

[.pre]REPOSITORYTAG IMAGE IDCREATED VIRTUAL SIZEsizelatestfddc30aee4dc6 seconds ago 229.2 MBubuntu14.04 6cc0fc2a5ee32 weeks ago 187.9 MB[.pre]

Ok, that just dropped 120 MB from the image. This looks good.

Create a Dockerfile strategy in your organization to control this. The Dockerfile syntax is easy to learn, but very nuanced when it comes to optimization.

No items found.