A Generic Structure for Efficient and Minimal Docker Image Builds

30 Apr 2021

As a growing software-first company, you will, sooner or later, notice that using VMs with provisioning is insufficient for actual horizontal scaling. The solution is to follow the cloud-based approach: create multiple instances of your software on the base of container images. However, to go to the cloud, your software should be bundled smartly and reliably to receive minimal and secure images. We show a generic approach that could help you to structure your packaging. Furthermore, we extend that approach to deliver container images that require a one-time creation of runtime artifacts. With the approach, we at AISLER were able to reduce our deploy time by 50%.

A long story short: The rise of cloud computing

Traditionally, software was deployed on-premise in VMs. This required a manual creation of VMs followed by the deployment of the software. As it turned out, this approach is unreliable since deployments are not reproducible. Documentation of the changes to the host VM in wikis or other documents easily gets out of sync with reality. At this point, various tools jumped in, e.g., Ansible, Chef, or puppet, enabling VM provision. Their configuration files fill the lacking documentation of a system configuration and are executable for deployment at the same time. However, VM provision is insufficient to achieve actual on-demand horizontal scalability. All new resources have to be configured at least once, creating a time and management overhead. Furthermore, isolation between different services is hard to achieve: package and configuration spill-overs between services are hard to prevent.

A cloud-based approach with containers enables automatic deployment by going the other way around: configure and deploy everything once, save it as an image, and then only create running copies aka containers. The creation process for suitable images sounds easy but poses challenges. In the following, we will present what these challenges were for us at our daily work and how we solved them.

Swiss-knife for containers

When building images, you can not avoid one name: Docker. Docker is popular to build, share images, and deploy them as containers. Although Docker did not invent images nor containers, and one does not have to use Docker for the deployment of images, the Docker build tools are extremely helpful. They facilitate the creation of custom images. Furthermore, the vibing community provides a large stock of images to build on, making life even easier. Therefore, we use Docker to create our images.

Criteria for good images

We want to build good container images. But what do you consider a superb image? Without a sufficient definition, it is hard to recognize an actual success. Therefore, we first set up a list of criteria to fulfill for which we can then optimize. In general, criteria for image creation do not differ that much from traditional provisioning and deployment. However, we have to be more careful since you run the full provision of the system every time you package your software. Therefore, the prioritization has to be adapted. We see the following criteria for a container build process:

Security: We see this as the number one priority. Without a properly secured image, you should not do anything. Especially in times of copy-paste solutions you quickly shoot yourself in the foot. We consider an image secure that provides minimal attack vectors and knows as much as it needs to be, i.e., an attacker has no access to credentials when getting the image as such. As with traditional deploys, where the host system has to be chosen and configured, the criterion is independent of the security of the software you run.
Size: Size is crucial for distribution, deployment, and scalability. Everything except your software and what it requires is superfluous. Note that this also has an impact on the next criterion.
Time: When you are living DevOps and Continuous Delivery, you deploy often. This means your build pipeline assembles the image and deploys your software multiple times a day. The time required should be as low as possible.

Structure is the key

A Dockerfile fulfilling the criteria varies with your project’s requirements and tech stack. Therefore, we first discuss the generic structure we followed for a long time before diving into actual code. We think that the correct structure is crucial since almost all specific requirements should be assignable to one generic step if you do not want to start from scratch on each change. For quick builds with Docker, it is especially important. Docker reuses unchanged layers and we can thereby easily save computation time and network load on each run.

Our structure to leverage this and receive good images is with four steps quite short:

FROM $GOOD_BASE_IMAGE:$USED_VERSION

# Basic configuration

# Install dependencies

# Import code

# Configure start

Let’s discuss this in short. The heart of every image is the base image, so you should choose wisely. It has a big effect on the security, the work you have, and on the memory and computational footprint of your containers. In general, I recommend you to choose a well-maintained base image of your preference that gets updates on a base of a distro you know to handle. Luckily, the Docker community provides a lot of images that are well maintained. A few words on the variant: most images in turn base on Debian and/or Alpine. If you are used to the first, you can take it, since the latter is really for size optimized and may require further work to install everything you need. Since Debian is quite big, you often get slim images. I recommend taking them since you can not reduce size more easily. However: Alpine is a good choice if you are willing to make the extra mile. Independent of the choice, but very important for reproducibility is to use a version tag. Nothing is worse than using latest and get a failing build if, e.g., the Debian base version changes.

The first step is basic configurations. Put everything here that you will rarely if ever change. This sets the foundation. We use it for setting our working directory to expose the port. We also set up an unprivileged user that runs our software. This is a security feature. As with normal local execution; run things with the lowest permissions as possible. Depending on the mapping to local users, you otherwise risk your host system in the worst case. Important: Make sure the user has the permissions to do what is necessary during runtime. It should be able to access temporary folders or caches, BUT it should never be allowed to change code! If you are interested in what you can do I recommend the OWASP cheatsheet for Docker or this Dockerfile best practices.

Next, one should install dependencies. Copy ONLY the files for the dependency management tool of your development stack into the container. Run then one big command that installs distros package dependencies, development stack dependencies, AND clean up. The last bit is important since the installation usually leaves caches filled that you do not need. What you have to install depends on your software and your distro: It’s trial and error. My recommendation here: Better spend some time trying to find the minimal packages required instead of installing one big package allocating gigabytes of space.

Now import the code. Why do we split up code and dependency copying? It is our first step to save build time. Changes to the code only do not require a reinstall of dependencies, you can simply use the layer.

Finally, configure the container start. Set environment variables as runtime mode such as production or staging. Do not forget to switch to the previously created unprivileged runtime user here. Last but not least, set ports and default command. We do all of these things last since there is not much overhead and can quickly be changed without triggering a full rebuild.

Real-world implementation

We use Ruby and Ruby on Rails as our primary language and framework. Therefore, our real-world example shows the setup of a Rails application. Although we use Ruby and Rails-specific commands to configure the image, there should exist similar or corresponding commands in most other languages and frameworks. Now, how does that look in code? Our Dockerfile looks as follows:

# Good base image with version
FROM ruby:2.7.6-slim

# Basic configuration
WORKDIR /app

EXPOSE 3000

RUN groupadd -r service-user && \
    useradd -r -m -g service-user service-user && \
    mkdir -p /app/tmp && \
    chmod -R 0666 /app/tmp

# Install dependencies
COPY Gemfile Gemfile.lock ./
RUN DEBIAN_FRONTEND=noninteractive \
    apt-get update && \
    apt-get install -y --no-install-recommends \
      a \
      list \
      of \
      packages \
    && \
    bundle install \
      --clean \
      --deployment \
      --frozen \
      --jobs=$(nproc) \
      --without development test \
    && \
    rm -rf $GEM_HOME/cache && \
    apt-get clean && \
    rm -rf vendor/cache/* && \
    rm -rf /var/lib/apt/lists/*

# Import code
COPY . .

# Configure start
USER service-user

ARG ENV=production
ENV RAILS_ENV=$ENV \
    RAKE_ENV=$ENV \
    RAILS_LOG_TO_STDOUT=true

CMD ["bin/rails", "server", "-b", "0.0.0.0", "-p", "3000"]

We use a Docker community image with a Ruby version we currently use as a base. Bounded by our dependencies, we went with Debian instead of Alpine. First, we configure the working directory and set up a non-root user to run our software. Important: We use Rails for our software, so we have to make sure to allow read/write to the temporary folder. In the second step, dependencies are installed after installing the packages they require. We clean up after ourselves by wiping the caches. We can then copy the source code. Finally, we switch to our user, set the environment variables, and configure the command.

So the usage of a good image and a dedicated user with limited permissions forms a solid foundation for a secure image. Furthermore, only our software and its requirements are present in the image. Combined with compact commands and a good order we achieve a small image size and a reduced build time.

Taking it one step further: Building static assets

The presented approach works and we used it for a long time. It comes however with a caveat: it assumes that we use everything we installed during runtime. This limits its usability when external build tools are used that are only used one-time, e.g., that build code, generated code, or bundles web assets as webpack. We could add everything to the Dockerfile and install, build, and finally remove these tools. However, we have to do it in one command, otherwise, the change will turn up in the final image. We thereby can no longer benefit from layer caching: No reuse is possible since the command has to run every time. The larger the build toolset and their dependency is, the worse the disadvantage gets.

Furthermore, one additional issue with the previous approach popped up overtime for us: internal repositories. Passing credentials for accessing them, e.g., as arguments does not prevent stealing: they get baked into the image by Docker. Even though we do not publish internal images, we prefer to have the credentials not in there.

Docker comes to help with a multi-image Dockerfile. It enables you to define multiple images in one Dockerfile, copy files between images and benefit from layer reuse at the same time. If you want to learn more about it, take a look at the Official Docker multi-stage build documentation. We use it to introduce one image that is only used for building assets. We copy the results into the final runtime image. Thereby, our build toolset does not end up in final image. And: build arguments are only baked into the building image. By copying only files, the credentials are lost.

We receive the following generic structure for this approach:

ARG BUILD_IMAGE=$GOOD_BUILD_BASE_IMAGE:$USED_VERSION
ARG BASE_IMAGE=$GOOD_BASE_IMAGE:$USED_VERSION

FROM $BUILD_IMAGE AS build-cache

# Install build dependencies

# Build your runtime artifacts

FROM $BASE_IMAGE AS service

# Basic configuration

# Install runtime dependencies

# Copy artifacts from build-cache

# Import code

# Configure start

At the first glance, not a lot has changed. We introduce a secondary image we call build-cache that bases on a good image for our toolset. Again, the image should be chosen wisely to save installing unnecessary dependencies and should be used with a version for reproducibility. In it, we install everything we still need for building our runtime artifacts. The commands to create the final image can therefore be reduced to the runtime dependencies. For the final image, copy commands now handle importing of the artifacts from the build-cache. The result: No more space consumption, only the needed files are included in the final runtime image.

In our environment, we ended up having this:

ARG BUILD_BASE_IMAGE=ruby:2.7.6-slim
ARG BASE_IMAGE=ruby:2.7.6-slim

FROM $BUILD_BASE_IMAGE AS build-cache

WORKDIR /app

# Install build dependencies
RUN DEBIAN_FRONTEND=noninteractive \
    apt-get update && \
    apt-get install -y --no-install-recommends \
      build \
      packages

# Build your runtime artifacts aka gems and assets
ARG BUNDLE_PRIVATE_GEM_SERVER_SECRET
COPY Gemfile Gemfile.lock ./
RUN bundle install \
      --clean \
      --deployment \
      --frozen \
      --jobs=$(nproc) \
      --without development test

COPY . .
RUN bin/rails assets:precompile


FROM $BASE_IMAGE AS parts

# Basic configuration
WORKDIR /app

EXPOSE 3000

RUN groupadd -r service-user && \
    useradd -r -m -g service-user service-user && \
    mkdir -p /app/tmp && \
    chmod -R 0666 /app/tmp

# Install runtime dependencies
RUN DEBIAN_FRONTEND=noninteractive \
    apt-get update && \
    apt-get install -y --no-install-recommends \
      runtime \
      packages \
      only \
    && \
    apt-get clean && \
    rm -rf vendor/cache/* && \
    rm -rf /var/lib/apt/lists/*

# Copy artifacts from build-cache
COPY Gemfile Gemfile.lock ./
COPY --from=build-cache /app/vendor/bundle /app/vendor/bundle
COPY --from=build-cache /app/public/ ./public/
RUN bundle install \
      --clean \
      --deployment \
      --frozen \
      --jobs=$(nproc) \
      --without development test

# Copy code
COPY . .

# Configure start
USER service-user

ARG ENV=production
ENV RAILS_ENV=$ENV \
    RAKE_ENV=$ENV \
    RAILS_LOG_TO_STDOUT=true

CMD ["bin/rails", "server", "-b", "0.0.0.0", "-p", "3000"]

One can see the changes according to this new scheme. Since we use a separate image for building now, we do not have to care much about condensing commands. We can split up the command for installing dependencies for runtime artifacts. On rebuild, Docker reuses the layers, improving building speed even more. One thing which is special to Ruby and the package manager bundler: we have to call the install command again to link executables, e.g., for Rails. The rest is reusing the earlier work of the first approach.

With the rewrite, we still benefit from security, size, and time as in the first approach. We reduce our image size that would be wasted for build tools in the first approach. Furthermore, we get rid of any attack vectors these could possess and prevent leaking of internal credentials. In general means, the multi-image approach allows us to build anything in any image and ensure that we only transfer what we need to the final image. Because of layer caching, the commands of the build images do not have to be compacted, in turn improving build speed and facilitating writing.

Sum it up

We showed our approach to build Docker images that are minimal and secure. Of course, we cut down our code for the examples a bit, but the principle stays the same. We can reduce the weight of our images by taking only what you need and following general security recommendations. In combination with a good command order, it allows you to save build time and deploy fast and often! Of course, one could improve the process, e.g., by introducing version pinning to distro dependencies. Nevertheless, the core scheme stays the same and allows us to benefit in different scenarios. With the approach, we managed to cut down our deploy time by 50% when combining it with caching.

Tim Bolender