Writing Dockerfiles: Best Practices for Small, Fast Images
Write Dockerfiles that build fast and ship small: layer caching order, multi-stage builds, base image choices, .dockerignore, non-root users, and the mistakes everyone makes.
A Dockerfile looks like a shell script, and that is exactly the mental trap. It is not a script that runs top to bottom once - it is a recipe where every instruction produces a cached filesystem layer, and the whole game is controlling which layers get rebuilt and how big each one is. Get the mental model right and your builds go from three minutes to three seconds on a code change, and your images go from 1.2GB to 40MB. Get it wrong and you ship a bloated image with your AWS keys baked into layer 4, cached forever. This guide is about the two things that actually matter - order and size - and the instructions, patterns, and mistakes that flow from them. If you have not met the container basics yet, the Docker fundamentals guide covers images, layers, and containers; this guide assumes you know what a layer is and want to build good ones.
The instructions that matter
A Dockerfile is a short list of instructions, and you can go a long way with about eight of them. The important thing is not memorizing the list - it is understanding what each one does to the resulting image and its layers.
- FROM - the base image you build on top of. Every Dockerfile starts here.
FROM node:20-slimsays "start from this filesystem and toolchain." It sets the OS, the package manager, and often the language runtime. This choice drives your final image size more than anything else you do. - WORKDIR - sets the working directory for everything that follows (RUN, COPY, CMD) and creates it if it does not exist. Use it instead of
RUN cd /app, which does not persist across instructions because each RUN is its own layer. SetWORKDIR /apponce and stop writing absolute paths everywhere. - COPY - copies files from your build context (your project directory) into the image. This is what you want 99% of the time.
- ADD - like COPY but with two extra behaviors: it can fetch a URL, and it auto-extracts local tar archives. Those behaviors are surprising and easy to misuse. The rule: use COPY unless you specifically need tar auto-extraction. Do not use ADD to download things - use
RUN curlor a build stage so you control caching and can verify what you fetched. - RUN - executes a command at build time and commits the result as a new layer. This is where you install packages, compile, and run build steps. Every RUN is a layer, so how you group them matters (more on that below).
- ENV - sets an environment variable that persists into the running container. Good for
NODE_ENV=productionor a defaultPORT. Note that ENV values are baked into the image and visible indocker history- never put secrets here. - ARG - a build-time variable, available only during the build, NOT in the running container. Use it for things like a version number you want to parameterize:
ARG NODE_VERSION=20. The distinction from ENV trips people up: ARG disappears when the build finishes, ENV ships in the image. (And ARG is still visible in build history, so it is also not for secrets.) - EXPOSE - documentation, essentially. It records which port the app listens on. It does NOT publish the port - you still need
-p 8080:8080atdocker run, or ports in your Compose file. Treat EXPOSE as a hint to humans and tooling, not a networking instruction. - CMD and ENTRYPOINT - what runs when the container starts. These two cause the most confusion, so they get their own section.
CMD vs ENTRYPOINT, and exec vs shell form
Both CMD and ENTRYPOINT define the process a container runs, but they compose differently. The clean way to think about it: ENTRYPOINT is the command, CMD is the default arguments.
ENTRYPOINT ["python", "app.py"]
CMD ["--port", "8080"]
Run this with no arguments and it executes python app.py --port 8080. Run docker run myimage --port 9000 and Docker replaces the CMD, giving you python app.py --port 9000 - the ENTRYPOINT stays fixed. That is the pattern for a container that is fundamentally "this one program," where you want to let callers pass flags but not swap the program out.
If you only set CMD, the whole thing is overridable: CMD ["python", "app.py"] runs by default, but docker run myimage bash replaces it entirely and gives you a shell. That is the right choice for a general image where you sometimes want to run something else. For most application images, a plain CMD is fine and simpler; reach for ENTRYPOINT when the image wraps a single tool.
The second distinction is exec form vs shell form, and this one bites people in production.
CMD ["nginx", "-g", "daemon off;"] # exec form - a JSON array
CMD nginx -g "daemon off;" # shell form - a bare string
Exec form runs your process directly as PID 1. Shell form wraps it in /bin/sh -c "...", so your process becomes a child of the shell. The consequence that matters: signals. When Docker or Kubernetes stops a container, it sends SIGTERM to PID 1. In exec form that reaches your app, which can shut down cleanly. In shell form the signal goes to sh, which usually does not forward it, so your app never gets the SIGTERM - it just gets SIGKILLed after the grace period, dropping in-flight requests. Always use the exec (JSON array) form for CMD and ENTRYPOINT. Use shell form only when you genuinely need shell features like variable expansion or pipes, and even then consider an explicit ENTRYPOINT ["sh", "-c", ...] so the choice is visible.
Layer caching: order is everything
Here is the single most important thing to understand about Dockerfiles. Each instruction creates a layer, and Docker caches every layer. On a rebuild, Docker walks your instructions top to bottom and reuses the cached layer for each one - until it hits an instruction whose inputs changed. From that point down, every layer is rebuilt, because each layer is built on top of the one before it. The cache invalidates from the first changed layer to the bottom.
For a COPY, "changed" means the copied files changed. For a RUN, "changed" means the command string changed (or an earlier layer it depends on changed). This has one enormous practical implication: order your instructions from least-frequently-changing to most-frequently-changing. Things that rarely change (the base image, system packages, dependency installs) go near the top so they stay cached. Things that change on every commit (your application source) go near the bottom.
The canonical mistake is copying your source code in before installing dependencies. Watch what it costs:
# BAD - dependency install is invalidated by every code change
FROM node:20-slim
WORKDIR /app
COPY . . # copies package.json AND all your source
RUN npm ci # this layer depends on the COPY above
CMD ["node", "server.js"]
Because COPY . . grabs everything, editing a single line in server.js changes the input to that COPY layer. The cache breaks there, so the RUN npm ci below it reruns from scratch - reinstalling every dependency on every code change, even though package.json did not move. On a real project that is minutes of npm ci (or pip install, or go mod download) on every single build.
The fix is to copy the dependency manifests first, install, and only then copy the rest of the source:
# GOOD - deps are cached until the manifest actually changes
FROM node:20-slim
WORKDIR /app
COPY package.json package-lock.json ./ # only the manifests
RUN npm ci # cached unless the manifests change
COPY . . # source changes land HERE, below the install
CMD ["node", "server.js"]
Now a code change only invalidates the COPY . . layer and everything below it. The npm ci layer stays cached and is reused instantly. You only pay the full install when you actually change package.json or the lockfile. The same pattern applies to every ecosystem: COPY go.mod go.sum ./ then RUN go mod download before COPY . .; COPY requirements.txt . then RUN pip install -r requirements.txt before the app; COPY pom.xml . then a dependency-resolve step for Maven. Whatever your stack, split "the thing that lists dependencies" from "the code," and put the code last.
Multi-stage builds: the highest-leverage technique
If you take one technique from this guide, take this one. A multi-stage build lets you use a big, fully-loaded image to compile or bundle your app, then copy ONLY the finished artifact into a tiny, clean final image. Your build tools, compilers, dev dependencies, and intermediate files never make it into what you ship.
The problem it solves: to build software you need a lot - compilers, headers, dev packages, a full package manager, sometimes hundreds of megabytes of build dependencies. But to RUN the result you often need almost none of that. A Go binary needs no Go toolchain at runtime. A bundled frontend needs no Node.js, just a web server serving static files. Without multi-stage, all that build machinery ships in your image, bloating it and widening the attack surface.
Here is a Go service. The build stage has the whole Go toolchain; the final stage has just the compiled binary on a minimal base:
# ---- build stage ----
FROM golang:1.22 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download # cached until go.mod/go.sum change
COPY . .
RUN CGO_ENABLED=0 go build -o /out/app ./cmd/server
# ---- final stage ----
FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/app /app
USER nonroot
EXPOSE 8080
ENTRYPOINT ["/app"]
The magic is COPY --from=build: it reaches into the earlier stage and pulls out just the one file you named. The golang:1.22 image is around 800MB; the final image here is a handful of megabytes - the binary plus a near-empty distroless base. The Go compiler, module cache, and source code all stay behind in the build stage and are discarded. (Note CGO_ENABLED=0 to produce a static binary so it runs on the minimal base with no libc.)
The same shape works for a Node frontend, where you build with the full Node image and serve the static output from nginx:
# ---- build stage ----
FROM node:20-slim AS build
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build # produces ./dist
# ---- final stage ----
FROM nginx:1.27-alpine
COPY --from=build /app/dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
The final image has no Node, no npm, no node_modules, no source - just the built static assets and nginx. You can have more than two stages (a deps stage, a build stage, a test stage, a final stage) and target any of them with docker build --target build. But the core move is always the same: build fat, ship thin, copy only the artifact across the boundary. This single technique routinely cuts image size by 10x and removes most of the CVEs that scanners flag, because most of those CVEs live in build tools you were never running.
Base image choices: full vs slim vs alpine vs distroless
Your FROM line is the biggest single lever on final size and security. The common options, from largest to smallest:
- Full (
node:20,python:3.12,debian:12) - the complete OS with a package manager, shell, and lots of tooling. Big (often 400MB to 1GB+) but convenient: everything you might need for building is there. Good as a build stage; rarely what you want to ship. - Slim (
node:20-slim,python:3.12-slim) - the same Debian base with the docs, extra locales, and rarely-used packages stripped out. Usually a fraction of the full size while keeping glibc and a normalapt, so most things "just work." This is the sensible default final base for many apps. - Alpine (
node:20-alpine,alpine:3.20) - built on musl libc and BusyBox instead of glibc and GNU coreutils. Tiny (the base alpine image is around 5MB). The catch is the musl vs glibc gotcha: some software (especially anything with precompiled native binaries or C extensions - Python wheels, some Node native modules) is built against glibc and either fails to run on musl or silently misbehaves, and you end up recompiling from source, which is slow and sometimes painful. Alpine is excellent when your stack is musl-clean; it can cost you a day when it is not. Test, do not assume. - Distroless (
gcr.io/distroless/static,gcr.io/distroless/base) - Google's images that contain your app and its runtime dependencies and nothing else: no shell, no package manager, no coreutils. Smallest attack surface you can get short of a scratch image, and a great final stage for a compiled binary or a self-contained app. The tradeoff is debuggability: with no shell you cannotdocker exec -it ... shto poke around, so debugging shifts to logs, ephemeral debug containers, and getting it right before it ships. Distroless has a:debugvariant with a BusyBox shell for when you genuinely need to get inside.
The pragmatic play: use a full image for your build stage (you want all the tools), and the smallest base you can tolerate operationally for the final stage - slim if you want a normal shell and glibc, distroless if you want minimum surface and can live without a shell, alpine if your stack is proven musl-clean. Do not chase the last few megabytes at the cost of being unable to debug an incident at 2am.
.dockerignore: keep junk out of the build context
Before Docker builds anything, it packages up your build context - the directory you point docker build at - and sends it to the daemon. If that directory contains node_modules, a .git folder with your whole history, build output, and a few gigabytes of test fixtures, all of it gets bundled and shipped to the daemon on every build, even the parts no instruction copies. That slows every build and, worse, means a careless COPY . . can pull secrets and cruft into your image.
A .dockerignore file (same syntax as .gitignore, sitting next to your Dockerfile) excludes paths from the context entirely. It does three things: shrinks the context so builds start faster, stops COPY . . from copying things you never meant to ship, and keeps the cache stable (files excluded from the context cannot invalidate a COPY layer). A reasonable starting point for a Node project:
.git
node_modules
npm-debug.log
dist
build
coverage
.env
.env.*
*.md
Dockerfile
.dockerignore
.vscode
.idea
The two lines that matter most for safety are .env and .git. Copying a .env full of credentials into your image is a classic leak. And a .git directory can be hundreds of megabytes of history that has no business in a runtime image. Note that excluding node_modules is also correct even though you need dependencies - you install them fresh inside the image with npm ci, so shipping your host's node_modules (built for your OS, possibly stale) would be both wasteful and wrong. Write the .dockerignore at the same time you write the Dockerfile, not after you notice the image is 900MB.
Security and correctness
Small images tend to be more secure images (less surface), but a few things need explicit attention.
Run as a non-root user. By default a container runs as root, and root inside the container maps to a powerful user that, combined with a kernel escape or a mounted host path, is a real risk. Create a user and switch to it before the app runs:
RUN addgroup --system app && adduser --system --ingroup app app
USER app
After the USER instruction everything runs as that unprivileged user. Do your privileged setup (installing packages, chowning files the app needs to write) before the USER line, and make sure the app does not need to write to root-owned paths or bind to a port below 1024 - listen on 8080, not 80, inside the container. Distroless :nonroot images and many official images provide a ready-made non-root user you can just USER into.
Pin versions, avoid :latest. FROM node:latest means your build is non-reproducible - the same Dockerfile builds a different image next week when latest moves, and a build that passed CI can break on the next run with no change from you. Pin at least the minor version (node:20-slim), and for real reproducibility pin the exact digest:
FROM node:20.11.1-slim@sha256:a1b2c3... # exact, immutable
The digest guarantees byte-for-byte the same base image forever, which is what you want for reproducible builds and supply-chain integrity. Do the same for OS packages where it matters (apt-get install -y curl=7.88.1-10) if you need strict reproducibility. Unpinned bases and apt-get install curl with no version are the source of "it built fine yesterday" mysteries.
Never bake secrets into layers. This is the one that burns people. Layers are immutable and persist in the image history. If you do this:
# NEVER do this
COPY .env . # secret now lives in a layer, forever
RUN echo "$API_KEY" > /tmp/key # so does this
RUN rm /tmp/key # this does NOT remove it from the earlier layer
deleting the file in a later RUN does not help - the earlier layer still contains it, and anyone with the image can docker history and pull it out. ARG and ENV are no better; both show up in build metadata. The correct tools are BuildKit build secrets and multi-stage builds. A build secret is mounted only for one RUN and never written to any layer:
# with BuildKit: docker build --secret id=npmrc,src=$HOME/.npmrc .
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc npm ci
The .npmrc (with your registry token) is available during that npm ci and vanishes with the RUN - it is never committed to a layer. For a secret needed only to fetch or build something, do it in an early build stage and copy across only the result; the secret and everything derived from it stay in the discarded stage. The rule is simple: a secret must never be the input to a COPY, ENV, ARG, or a plain RUN that persists it. If a secret ever did land in a pushed image, rotating it is the only real fix - you cannot scrub a layer that is already out there.
Minimize the attack surface. Everything above compounds here: fewer packages, a smaller base, no shell where you can avoid one, no build tools in the final stage, and running as non-root all shrink what an attacker can use. Install only what the app needs at runtime, and prefer --no-install-recommends on Debian bases so apt does not drag in a pile of extras.
Common mistakes
Most bad Dockerfiles fail in the same handful of ways. Here is the checklist to grep your own files against.
-
COPY . .before installing dependencies. The big one from the caching section. It ties your slow dependency install to every code change. Copy manifests, install, then copy source. -
COPY . .with no.dockerignore. Now you are copying.git,node_modules,.env, and build output into the image. Slow, bloated, and a secret leak waiting to happen. Always pair a broad COPY with a.dockerignore. -
One giant RUN, or too many tiny ones - and not cleaning the package cache. Each RUN is a layer. Chain related install steps into one RUN with
&&so intermediate state does not become a permanent layer, and clean the package manager cache in the SAME RUN. This:# BAD - the apt cache lives in a layer forever, adding tens of MB RUN apt-get update RUN apt-get install -y curlbecomes:
# GOOD - update, install, and cleanup in one layer; nothing left behind RUN apt-get update \ && apt-get install -y --no-install-recommends curl \ && rm -rf /var/lib/apt/lists/*Cleaning up in a LATER RUN does nothing, for the same reason deleting a secret later does nothing: the cache was already committed to the earlier layer. It has to happen in the same RUN that created it.
-
Running as root. The default, and easy to forget. Add a non-root
USERbefore the app runs. -
Unpinned base images (
:latest). Non-reproducible builds that break silently when upstream moves. Pin the tag, and the digest if you need certainty. -
Shell-form CMD/ENTRYPOINT. Your app runs under
sh -cand never receives SIGTERM, so shutdowns turn into SIGKILLs and dropped requests. Use the JSON array (exec) form. -
Using ADD for everything, or to download URLs. Surprising behavior and worse caching. Use COPY; use
RUN curl(or a build stage) when you actually need to fetch something.
Fix those seven and you are ahead of most Dockerfiles in production.
The shape of it
A Dockerfile is a recipe where every instruction is a cached layer, and two properties decide whether it is good: order and size. Order your instructions least- to most-frequently-changing so a code edit does not blow your dependency cache - manifests and installs up top, source at the bottom. Use multi-stage builds to compile in a fat image and ship only the artifact in a tiny one; it is the single highest-leverage move for both size and security. Choose your final base deliberately along the size/debuggability axis, keep the build context clean with .dockerignore, run as a non-root user, pin your bases, and keep secrets out of layers with BuildKit secrets and multi-stage. Do those things and your images build in seconds, ship in megabytes, and do not leak. Once your images are solid, wiring them together for local development is the job of Compose - see the Docker Compose guide.