Guides/DockerDocker/Docker Fundamentals for DevOps

Docker Fundamentals for DevOps

The Docker mental model that makes everything click: an image is a read-only template, a container is a running instance of it - just an isolated process, not a VM. Plus layers, registries, volumes, and the commands you actually use.


Docker gets explained backwards. Most tutorials open with a wall of commands and leave you memorizing flags without a model to hang them on. But almost everything in Docker falls out of one distinction: an image is a read-only template, and a container is a running instance of that image. Get that clear and the rest - layers, volumes, ports, the command set - stops being trivia and becomes obvious. The second thing to internalize is what a container actually is: not a tiny virtual machine, but a normal process on your host that the kernel has been told to isolate. This guide builds both ideas, then gives you the working command set, the data model, and the housekeeping you need to not fill your disk.

The one distinction: images vs containers

Start here, because everything else depends on it. An image is a read-only, packaged template: your application plus its dependencies, libraries, and a minimal filesystem, frozen into an artifact. It does nothing on its own. A container is what you get when you run an image - a live instance with its own isolated process, filesystem view, and network.

The relationship is exactly like a class and an object, or a program on disk and a process running it. One image can spawn many containers, all identical at birth, each running independently. When you docker run nginx, Docker takes the read-only nginx image and starts a container from it. Run it three more times and you have four containers, all backed by the same image, each with its own life.

docker pull nginx:1.25       # download the image (the template)
docker run nginx:1.25        # start a container (an instance)
docker run nginx:1.25        # another container, same template

Hold onto this: you build and ship images, you run and throw away containers. Images are the durable artifact you version and push to a registry; containers are cheap, disposable instances you start and stop constantly. Almost every confusing thing about Docker resolves once you know which of the two you are looking at.

Layers: why images are built the way they are

An image is not one monolithic blob. It is a stack of read-only layers, each one a set of filesystem changes, stacked on top of each other by a union filesystem (overlayfs on modern Linux) that presents the whole stack as a single merged filesystem. Each instruction that builds an image (install a package, copy your code) adds a layer on top of the previous ones.

When you run a container, Docker adds one more layer on top of the image's read-only stack: a thin, writable container layer. Every file the container creates or modifies goes here. The image layers underneath never change - that is what lets many containers share the same image safely. Each container just gets its own private writable layer, and the shared read-only layers are stored on disk exactly once.

This design is not academic. It drives two things you will care about daily:

  • Size. Two images built on the same base (say both FROM python:3.12-slim) share those base layers on disk. Pull a second image with the same base and only the new layers download. This is why base-image choices matter and why "everyone uses the same base" saves real space and bandwidth.
  • Caching. When you rebuild an image, Docker reuses cached layers for any step whose inputs have not changed and only rebuilds from the first changed step down. This is the single most important thing to understand for fast builds, and it is why Dockerfile instruction order matters so much - a topic the Dockerfiles guide covers in depth.

The practical takeaway for now: images are layered and shared, containers add one writable layer on top, and that writable layer disappears when the container is removed. Which is the perfect segue to what a container really is.

What a container actually is: a process, not a VM

Here is the thing that trips up nearly everyone coming from VMs. A container is not a small virtual machine. There is no guest operating system inside it, no virtualized hardware, no hypervisor. A container is just a normal process running on your host, sharing your host's kernel, that the kernel has been asked to isolate and constrain. Two Linux kernel features do the work:

  • Namespaces provide isolation - they change what a process can see. There is a separate namespace for each kind of resource: PID (the container sees its own process tree starting at PID 1, not the host's), NET (its own network interfaces and IP), MNT (its own filesystem mount view - this is what the image layers become), UTS (its own hostname), IPC, and user. Wrap a process in these namespaces and it genuinely believes it has a machine to itself, while actually running right alongside your other processes on the same kernel.
  • cgroups (control groups) provide limits - they control what a process can use: how much CPU, memory, and I/O it is allowed. This is how --memory=512m is enforced. Exceed the memory limit and the kernel's OOM killer stops the process.

That is the whole trick. A container is a process (or a small tree of processes) wrapped in namespaces for isolation and cgroups for limits, with a filesystem assembled from image layers. Run docker run nginx and then ps aux on the host: you will find the nginx process sitting there in the host's process list, just isolated.

This is exactly why containers are so much lighter than VMs. A VM boots a full guest OS with its own kernel on top of virtualized hardware - that costs gigabytes of disk, hundreds of megabytes of RAM, and tens of seconds to boot. A container starts a process on the kernel you already have running, so it starts in milliseconds, weighs megabytes, and packs far more densely onto a host. The trade-off is that containers share the host kernel, so isolation is weaker than a VM's hardware-level boundary, and (on Linux) containers run Linux - Docker Desktop on Mac and Windows quietly runs a Linux VM to host them.

Registries, image names, and tags

Images live in registries - servers that store and distribute them. Docker Hub is the default public one; every cloud has its own (Amazon ECR, Google Artifact Registry, GitHub's GHCR), and teams run private registries. docker pull downloads from a registry; docker push uploads to it.

An image reference has structure. Read nginx:1.25 as repository:tag. The full form is registry/namespace/repository:tag:

registry.example.com/myteam/api:2.3.1
|------------------| |----| |-| |----|
      registry      namespace repo  tag

When you omit the registry, Docker assumes Docker Hub. When you omit the tag, Docker assumes :latest. That default is a trap worth calling out directly.

:latest is not "the newest version." It is just the tag a publisher happened to name latest. It is a mutable pointer - the image behind myapp:latest today can be a completely different image tomorrow, because someone pushed a new one to that same tag. Deploy with :latest and you lose the ability to say what is actually running, your builds stop being reproducible, and "it worked yesterday" becomes impossible to debug. Always pin a specific version tag (nginx:1.25.3) in anything real.

For true immutability, go one level deeper and use a digest. A digest is a content hash of the exact image, and it can never point at anything else:

docker pull nginx@sha256:a1b2c3...   # this exact image, forever

Tags are convenient and movable; digests are exact and permanent. Use pinned tags for readability in day-to-day work, and digests when you need a guarantee that the bytes cannot change under you.

The working command set

This is the core loop you will run thousands of times. Learn these with their intent, not as a list.

Starting containers. docker run creates and starts a container from an image. The flags carry the meaning:

docker run -d --name web -p 8080:80 nginx:1.25
  • -d (detached) runs it in the background and hands you back the prompt. Without it, the container holds your terminal and its logs stream to your screen.
  • --name web gives the container a stable name so you can refer to it later. Skip it and Docker assigns a random name like nostalgic_babbage.
  • -p 8080:80 publishes a port (covered in the next section).
  • -e KEY=value sets an environment variable inside the container - the standard way to pass config.
  • -v mounts a volume or bind mount (covered under persistence).
  • --rm deletes the container automatically when it exits. Great for one-off and interactive runs so you do not accumulate stopped containers.

Seeing what is running. docker ps lists running containers; docker ps -a includes stopped ones, which matters because stopped containers stick around until removed.

docker ps        # running containers
docker ps -a     # ALL containers, including stopped/exited

Reading logs. A container's logs are just whatever its main process wrote to stdout and stderr. -f follows them live, the same way tail -f does.

docker logs web           # everything the container has printed
docker logs -f web        # follow live
docker logs --tail=100 web

Getting inside. docker exec runs a new command in an already-running container; -it (interactive + TTY) gives you a shell.

docker exec -it web /bin/sh    # a shell inside the running container
docker exec web env            # run one command and print its output

Note the distinction from docker run: run starts a new container from an image; exec runs a command inside an existing one. Reaching for docker run when you meant to poke at a running container is a common early mistake.

Stopping and removing. docker stop sends SIGTERM (then SIGKILL after a grace period) to shut a container down cleanly. docker rm deletes a stopped container. You cannot remove a running container without -f.

docker stop web
docker rm web           # remove the stopped container
docker rm -f web        # stop AND remove in one go

Managing images. docker images lists the images on disk; docker pull fetches one; docker rmi removes one.

docker images           # images stored locally
docker pull redis:7     # fetch an image without running it
docker rmi nginx:1.25   # remove an image

The full lifecycle, start to finish: docker pull an image (or run pulls it for you) -> docker run a container from it -> docker ps to see it and docker logs to watch it -> docker exec to get inside when you need to poke around -> docker stop then docker rm to tear it down. That is the whole rhythm.

Ports: the host/container boundary

A container gets its own network namespace, which means its own private IP that is not reachable from your host by default. A process listening on port 80 inside the container is listening on the container's network, not yours. To reach it, you publish the port with -p host:container.

docker run -d -p 8080:80 nginx:1.25

Read -p 8080:80 as "map port 8080 on the host to port 80 in the container." Now a request to http://localhost:8080 on your machine is forwarded into the container's port 80. The left number is yours to choose; the right number is whatever the app inside actually listens on. They do not have to match, and you use that freedom to run several containers of the same image side by side:

docker run -d -p 8080:80 nginx:1.25
docker run -d -p 8081:80 nginx:1.25    # second instance, different host port

Both containers listen on 80 internally with no conflict, because that 80 lives in each container's own namespace; only the host-side ports (8080, 8081) must be unique on your machine. This host/container boundary is the mental model for all Docker networking: inside the container is a separate network, and you deliberately poke holes through it with -p.

Data persistence: containers are ephemeral

Remember the writable container layer from earlier? When you docker rm a container, that layer is deleted with it. Everything the container wrote to its own filesystem is gone. That is by design - containers are meant to be disposable - but it means any data that must outlive the container has to live somewhere else. Docker gives you two mechanisms.

Named volumes are storage Docker creates and manages for you, living in Docker's own area on the host. This is the right default for data that must survive - databases, uploads, anything stateful.

docker volume create pgdata
docker run -d --name db -v pgdata:/var/lib/postgresql/data postgres:16

Now Postgres writes its data into the pgdata volume. Destroy and recreate the container from the same image and re-attach the volume, and the data is still there, because it never lived in the container's writable layer. Volumes are portable, easy to back up, and the option to reach for in production.

Bind mounts map a specific directory on your host straight into the container. You control the exact host path.

docker run -d -v /home/me/site:/usr/share/nginx/html:ro nginx:1.25

The container now serves files directly from /home/me/site on your host, and :ro makes it read-only. Bind mounts shine in development - mount your source code in so edits on your machine show up instantly in the container without rebuilding.

The difference in one line: a named volume is storage Docker owns and manages; a bind mount is a host directory you point at directly. Use named volumes for real data you want Docker to manage and keep portable; use bind mounts for local development and for injecting host files or config. Both survive docker rm, because both live outside the container's writable layer.

Housekeeping: why your disk fills up

Docker is a pack rat by default, and this surprises people when their disk hits 100 percent. Every image you pull, every container you start (even stopped ones), every build's intermediate layers, and every unused volume sits on disk until you remove it explicitly.

A few things accumulate quietly:

  • Stopped containers. They do not vanish when they exit - docker ps -a shows the pile, and each keeps its writable layer around. Using --rm on throwaway runs avoids this.
  • Dangling images. When you rebuild an image with the same tag, the old image loses its tag but stays on disk as a <none>:<none> entry. Rebuild often and these stack up fast.
  • Unused volumes. Remove a container and its named volumes usually stay behind (which is the point - they hold your data), but volumes you truly no longer need linger.
  • Build cache. Layer caching is what makes rebuilds fast, but the cache itself grows and can reach many gigabytes.

The cleanup commands, from gentle to aggressive:

docker image prune            # remove dangling (untagged) images
docker container prune        # remove all stopped containers
docker volume prune           # remove volumes not used by any container
docker system prune           # containers + networks + dangling images + build cache
docker system prune -a --volumes   # also remove UNUSED images and ALL unused volumes

Two cautions. docker system prune -a removes every image not currently backing a container, so you will re-download them next time you need them. And --volumes deletes unused volumes - be certain nothing you care about is sitting in one, because that is real data. Run the plain docker system prune regularly and it keeps disk usage sane; reach for the -a --volumes sledgehammer only when you know what you are clearing.

Where this goes next

You now have the model that everything else builds on: images are read-only layered templates, containers are disposable isolated processes started from them, data that must survive lives in volumes, and networking is a boundary you poke through with published ports. Running other people's images and driving them with the command set above will carry you a long way.

The next two steps turn you from someone who runs containers into someone who ships them. First, writing your own images with a Dockerfile - the build instructions, how layer caching rewards good instruction ordering, and how to keep images small and secure. That is the Docker images and Dockerfiles guide. Second, running multiple containers together - a web app, its database, and a cache wired up with one declarative file instead of a fistful of docker run commands. That is Docker Compose. Both are direct extensions of the fundamentals here: a Dockerfile builds the image, Compose orchestrates the containers, and everything you learned about layers, volumes, and ports carries straight over.