Guides/KubernetesKubernetes/Kubernetes Networking: Services, Ingress, and DNS

Kubernetes Networking: Services, Ingress, and DNS

How a packet actually reaches your pod: the flat pod network, Services and kube-proxy, cluster DNS, Ingress vs Gateway API, NetworkPolicies, and how to debug it all.


Kubernetes networking feels like a black box until you can trace one request end to end: a client types a name, that name resolves to an IP, that IP gets rewritten to a pod IP, and the packet lands on a container running on some node you never picked. Every piece of Kubernetes networking - Services, DNS, Ingress, NetworkPolicies - is a layer in that path. This guide follows a packet from the outside world down to your pod, so that when the path breaks (and it will, usually as "the Service returns nothing but the pods are fine") you know exactly which layer to inspect. If you have not read the fundamentals guide yet, the pods-are-disposable and labels-and-selectors ideas from there are the foundation for everything here.

The pod network model: one flat network, unstable IPs

The base assumption Kubernetes makes about networking is deliberately simple, and it is the thing most people miss: every pod gets its own IP address, and every pod can reach every other pod at that IP without any NAT. One flat address space, cluster-wide. A pod on node A talks to a pod on node C by its IP as if they were on the same LAN, even though they are on different machines. There are no per-pod port mappings, no "which host port did this container get." Each pod is a first-class network citizen.

Kubernetes itself does not implement this. It defines the contract and hands the job to a CNI plugin (Calico, Cilium, the cloud provider's own, and so on) that programs the actual routing - overlay networks, VPC routes, eBPF, whatever the plugin chooses. From your side as an application or an operator, the model is the same regardless of plugin: flat network, one IP per pod, everyone can reach everyone (until you add NetworkPolicies, later).

So far so good. Here is the catch that makes the rest of this guide necessary: pod IPs are unstable and useless to target directly. A pod is disposable. When it dies, its replacement is a brand-new pod with a brand-new IP. Roll out a new version, scale up, drain a node - the set of pod IPs behind your app churns constantly. If you hardcode a pod IP anywhere (a config file, another service, a DNS record you manage yourself) it will be wrong within hours. You need something stable in front of the pods that tracks this churn for you. That something is a Service.

kubectl get pods -o wide     # see each pod's IP and which node it is on

Run that twice across a rollout and you will watch the IPs change under you. That instability is not a bug to fix; it is the whole reason the abstractions below exist.

Services: a stable virtual IP over churning pods

A Service is a stable virtual IP (the ClusterIP) and a DNS name that load-balance across whatever pods currently match its selector. The pods behind it churn; the Service IP never does. This is the indirection that makes the disposable-pod model usable, and understanding how it actually works is the difference between guessing at network bugs and diagnosing them.

How a Service tracks its pods: the selector and Endpoints

A Service does not know about pods by name or by IP. It knows a label selector, exactly like a Deployment does. Kubernetes continuously evaluates that selector and maintains a separate object listing the current set of matching, ready pod IPs. Historically that object was Endpoints; on any modern cluster it is EndpointSlices (Endpoints is still synthesized for backward compatibility, so both kubectl get commands work).

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web          # THIS is what binds the Service to pods
  ports:
    - port: 80         # the port clients hit on the Service
      targetPort: 8080 # the port on the pod/container

The chain is: selector app: web -> the endpoint controller finds every ready pod labelled app: web -> it writes their IPs into an EndpointSlice -> that slice is the actual list of backends. When a pod becomes ready, its IP is added; when it dies or fails its readiness probe, its IP is removed. Watch it live:

kubectl get endpoints web            # legacy view: Service -> pod IP:port list
kubectl get endpointslices -l kubernetes.io/service-name=web   # the modern object
kubectl describe endpointslice <name>

Internalize this: the Service is the selector plus the resulting endpoint list. If that list is empty, the Service is a stable IP pointing at nothing, and every request to it hangs or refuses. This is the single most common Kubernetes networking failure, and we will return to it in the debugging section - the fix is almost always a selector that does not match any pod's labels, or pods that are never becoming Ready.

How the ClusterIP actually works: kube-proxy, iptables, and IPVS

Here is the part that surprises people: the ClusterIP is not a real interface anywhere. No process listens on it. No pod has it. You cannot ping it in a way that hits a server. It is a virtual IP that exists only as a set of packet-rewriting rules on every node.

The component that programs those rules is kube-proxy, which runs on every node and watches Services and EndpointSlices through the API server. When it sees a Service, it programs the node's kernel so that any packet destined for the ClusterIP:port gets its destination rewritten (DNAT) to one of the backing pod IPs, chosen roughly at random. That rewrite happens in the kernel, on the node where the traffic originates, before the packet ever leaves. So load-balancing is not done by a central proxy - it is done independently on every node, in the data path.

Kube-proxy has two main modes for this:

  • iptables mode (long the default) - it writes iptables rules. For each Service it adds a rule that matches the ClusterIP and, using probability-based rules, DNATs to one of the endpoint IPs. Simple and robust, but the rule set grows linearly with the number of Services and endpoints, and rule evaluation is a sequential chain, so very large clusters (tens of thousands of Services) can see latency and slow rule-sync.
  • IPVS mode - it uses the kernel's IP Virtual Server, a hash-table-based load balancer built for exactly this. It scales to far more Services with roughly constant lookup cost and offers real balancing algorithms (round-robin, least-connection, and others). On large clusters, IPVS is the answer.

You rarely configure this by hand, but knowing it explains real behavior. It is why you cannot ping a ClusterIP meaningfully (nothing answers ICMP for it, only the DNAT rules exist). It is why balancing is per-connection, not per-request (so a long-lived HTTP/2 or gRPC connection sticks to one pod - a classic "why is one replica getting all the traffic" surprise). And it is why, when kube-proxy is unhealthy on a node, Services work fine from every other node but not that one.

NodePort and LoadBalancer: getting traffic in from outside

ClusterIP is internal only. To reach a Service from outside the cluster you layer on top of it:

  • NodePort - opens the same high-numbered port (default range 30000-32767) on every node's IP, and traffic to nodeIP:nodePort is forwarded (by the same kube-proxy rules) to the Service and on to a pod. It works from anywhere that can reach a node, which makes it useful for bare-metal or dev, but it is crude: ugly ports, you must know node IPs, and you usually want something in front of it.
  • LoadBalancer - a superset of NodePort. On a managed cluster (EKS, GKE, AKS) creating a type: LoadBalancer Service tells the cloud-controller-manager to provision a real cloud load balancer with an external IP or hostname, pointing at the NodePorts under the hood. This is how you put a single service on the public internet.

The important design point: you do not give every microservice its own LoadBalancer. Each cloud load balancer costs money and gives you a bare L4 IP with no HTTP smarts. For HTTP traffic you provision one load balancer, point it at an Ingress or Gateway, and route everything through that single entry point. More on that below.

Service DNS: how names become IPs

Nobody hardcodes a ClusterIP either - the ClusterIP is stable, but you still want to refer to services by name. Kubernetes runs an in-cluster DNS server (CoreDNS, as a Deployment in kube-system), and the kubelet configures every pod's /etc/resolv.conf to use it. So inside any pod, a Service name resolves to that Service's ClusterIP automatically.

The naming follows a strict, predictable pattern. A Service named web in namespace shop is reachable as:

  • web - from a pod in the same namespace (shop). The search domains in resolv.conf fill in the rest.
  • web.shop - from any namespace, when you want to be explicit about which namespace.
  • web.shop.svc.cluster.local - the fully qualified domain name (FQDN). This is the canonical form: <service>.<namespace>.svc.cluster.local. Everything else is shorthand that the search-domain rules expand into this.
# from inside a pod
nslookup web                              # same namespace, short name
nslookup web.shop.svc.cluster.local       # any namespace, FQDN
cat /etc/resolv.conf                       # see the nameserver + search domains

The svc piece matters: it distinguishes Service DNS from pod DNS (pod-ip.<namespace>.pod.cluster.local also exists but is rarely used). The practical rule: use the short name for same-namespace calls, and the FQDN (or at least service.namespace) for cross-namespace calls. A subtle bug source is the search domains - a same-namespace lookup of a name that happens to also exist in another namespace can resolve surprisingly, so when in doubt, spell out the FQDN.

Headless services: when you do not want a virtual IP

Sometimes the ClusterIP indirection is exactly what you do NOT want. If you set clusterIP: None, you get a headless Service: no virtual IP, no kube-proxy load-balancing. Instead, a DNS query for the Service name returns the A/AAAA records of the individual pod IPs directly, one record per ready endpoint.

apiVersion: v1
kind: Service
metadata:
  name: cassandra
spec:
  clusterIP: None      # headless
  selector:
    app: cassandra
  ports:
    - port: 9042

You want this when the client needs to see and address the individual pods rather than a random one behind a VIP. The main cases:

  • StatefulSets, where each pod has a stable identity. A headless Service gives every pod a stable per-pod DNS name (cassandra-0.cassandra.shop.svc.cluster.local), so peers can find each other by ordinal. Databases and clustered systems need this.
  • Client-side load balancing or service discovery, where the client wants the full list of backend IPs and will do its own connection management (common with gRPC clients, or systems that maintain a connection pool to every peer).

If you resolve a normal Service you get one IP (the ClusterIP). Resolve a headless Service and you get the whole set of pod IPs. That difference is the entire point.

Ingress and the Gateway API: L7 routing at the edge

Services get you L4 connectivity - IP and port. But real HTTP traffic needs L7 decisions: route by hostname, route by URL path, terminate TLS, rewrite headers. That is what Ingress and the newer Gateway API are for. The core idea is the one from the LoadBalancer section: put one thing at the edge that fans out to many backend Services, instead of exposing each service separately.

Ingress

An Ingress is a set of HTTP routing rules: for this host and this path prefix, send traffic to this Service. It is the config; the thing that enforces it is an ingress controller (ingress-nginx, Traefik, HAProxy, or a cloud one like the AWS Load Balancer Controller). The controller is itself a pod (or set of pods) running a real reverse proxy, typically sitting behind a single LoadBalancer Service. Crucially, an Ingress resource does nothing without a controller installed - the object just sits in etcd. This trips people up constantly: they kubectl apply an Ingress, nothing happens, because no controller is watching.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: store
spec:
  ingressClassName: nginx        # which controller handles this
  tls:
    - hosts: [shop.example.com]
      secretName: shop-tls        # TLS cert/key, terminated here
  rules:
    - host: shop.example.com
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: api
                port: { number: 80 }
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port: { number: 80 }

That single Ingress, behind one cloud load balancer, routes shop.example.com/api/* to the api Service and everything else to web, and terminates HTTPS using the cert in the shop-tls Secret. Add ten more services and you add ten more path rules, not ten more load balancers. That is the whole value proposition: one public entry point, one TLS termination point, host- and path-based fan-out to many internal ClusterIP Services.

Gateway API: the successor

Ingress carried the ecosystem for years but has real limits: it is HTTP-centric, and every controller invented its own annotations for anything beyond basic routing (rewrites, timeouts, auth), so Ingress manifests became non-portable annotation soup. The Gateway API is the newer, now-graduated replacement. It splits the one Ingress object into a few role-oriented resources:

  • GatewayClass - the type of gateway (like ingressClassName, cluster-level).
  • Gateway - an actual listener: ports, protocols, TLS. Owned by cluster/infra operators.
  • HTTPRoute (and TCPRoute, GRPCRoute, and so on) - the routing rules, attached to a Gateway. Owned by app teams.
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
spec:
  parentRefs:
    - name: store-gateway
  hostnames: ["shop.example.com"]
  rules:
    - matches:
        - path: { type: PathPrefix, value: /api }
      backendRefs:
        - name: api
          port: 80

The wins over Ingress: it is protocol-aware beyond HTTP, advanced features (header matching, traffic splitting for canaries, rewrites) are first-class fields instead of vendor annotations, and the role split lets platform teams own Gateways while app teams own their own routes without editing a shared object. For a new cluster today, reach for the Gateway API if your controller supports it; Ingress is still everywhere and perfectly fine for straightforward host/path routing.

NetworkPolicies: closing the default-open network

Remember the base model: flat network, every pod can reach every other pod. That default is open - by design, out of the box, any pod can open a connection to any other pod in the cluster, across namespaces. That is convenient and completely unacceptable for anything past a toy cluster. A compromised frontend pod should not be able to talk straight to the payments database.

A NetworkPolicy restricts that. The mechanics have one rule you must internalize: NetworkPolicies are additive allow-lists, and the moment a pod is selected by any policy, it flips from allow-all to deny-all for the direction(s) that policy specifies. A pod with no policy selecting it is wide open; a pod selected by even one ingress policy now allows only the ingress that policy permits. There is no explicit "deny" - you deny by selecting and then not allowing.

The canonical starting move is a default-deny for a namespace, then open specific paths on top:

# Deny all ingress to every pod in this namespace...
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: shop
spec:
  podSelector: {}          # {} selects every pod in the namespace
  policyTypes: [Ingress]
---
# ...then allow ONLY the api pods to reach the db pods on 5432.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-allow-api
  namespace: shop
spec:
  podSelector:
    matchLabels: { app: db }
  policyTypes: [Ingress]
  ingress:
    - from:
        - podSelector:
            matchLabels: { app: api }
      ports:
        - protocol: TCP
          port: 5432

ingress controls who may connect to the selected pods; egress controls where the selected pods may connect out to. You select traffic sources/destinations by podSelector (pods in the same namespace), namespaceSelector (whole namespaces, the usual way to isolate one namespace from another), or ipBlock (CIDR ranges, for off-cluster endpoints). One thing that bites people: DNS is egress too. If you write a default-deny-egress policy and forget to allow UDP/TCP 53 to CoreDNS, every name lookup in that pod fails and the app looks broken in a way that has nothing to do with the app.

One caveat worth stating plainly: NetworkPolicies are enforced by the CNI plugin, not by Kubernetes itself. Calico and Cilium enforce them; some minimal CNIs silently ignore them, so your policy applies cleanly and enforces nothing. Confirm your CNI supports NetworkPolicy before you rely on it as a security control.

Debugging: following the packet down the layers

Almost every Kubernetes networking incident is one layer of the path being broken while the others are fine. The discipline that resolves them fast is to stop guessing and test each layer top-down, isolating where the request actually dies. Here is the method, and the classic bug it catches.

The classic: "the Service returns nothing but the pods are healthy"

The pods are Running, 1/1 Ready, logs look clean, kubectl exec into a pod and the app responds on its port - yet hitting the Service hangs or connection-refuses. This is the single most common networking bug, and it is almost always an empty endpoint list: the Service selector does not match the pods, or the pods are not Ready, so the Service is a valid IP pointing at nobody.

The one command that diagnoses it instantly:

kubectl get endpoints web
# NAME   ENDPOINTS                        AGE
# web    <none>                           7m      <- the smoking gun

<none> means zero backends. Now find out why:

kubectl get endpointslices -l kubernetes.io/service-name=web   # confirm, modern object
kubectl describe svc web | grep -i selector                    # what does the Service select?
kubectl get pods --show-labels                                 # what labels do pods actually have?

Line up the Service's selector against the pods' labels. Nine times out of ten it is a mismatch - app: web on the Service versus app: web-api on the pods, or a tier: frontend the Service demands and the pods lack. Fix the labels or the selector so they agree, and the endpoint list populates immediately. The other cause of an empty list is pods that are not Ready (a failing readiness probe pulls them out of endpoints), which kubectl get pods shows as 0/1.

The layered method for everything else

When it is not an empty endpoint list, walk the path from the top and isolate the failing layer with port-forward, which lets you skip layers deliberately:

  1. Is the app itself even serving? Bypass Kubernetes networking entirely and forward straight to a pod. If this fails, it is not a networking problem, it is the app or its port.

    kubectl port-forward pod/<pod-name> 8080:8080
    curl localhost:8080/healthz
    
  2. Does the Service route to the pod? Forward to the Service instead of the pod. If step 1 worked but this does not, the break is in the Service layer - re-check endpoints and selector as above.

    kubectl port-forward svc/web 8080:80
    curl localhost:8080/healthz
    
  3. Does DNS resolve? From inside another pod, check that the name resolves to the Service's ClusterIP. If resolution fails or returns the wrong IP, it is a DNS problem (wrong namespace, CoreDNS unhealthy, or a NetworkPolicy blocking port 53), not a routing problem.

    kubectl run tmp --rm -it --image=nicolaka/netshoot -- /bin/bash
    # then inside:
    nslookup web.shop.svc.cluster.local
    curl http://web.shop.svc.cluster.local/healthz
    
  4. Is a NetworkPolicy blocking it? If DNS resolves and the pod is healthy but the connection still fails from a specific source pod, suspect policy. List policies touching the target and check whether the source pod is actually allowed.

    kubectl get networkpolicy -n shop
    kubectl describe networkpolicy <name> -n shop
    
  5. Is it the edge (Ingress/Gateway)? If in-cluster access works (steps 1-3 pass) but external access does not, the problem is the Ingress or Gateway layer: check the controller pod's logs, confirm the ingressClassName matches an installed controller, and verify the TLS Secret exists.

    kubectl describe ingress store
    kubectl logs -n ingress-nginx deploy/ingress-nginx-controller
    

The nicolaka/netshoot image in step 3 is worth committing to memory - it is a throwaway pod packed with dig, curl, nslookup, tcpdump, and every other network tool, so you can test from inside the cluster's network exactly as a real client pod would.

The whole method is one habit: do not debate which layer is broken, test them from the top and let one fail. Endpoints empty -> selector/readiness. Pod-forward works but Service-forward does not -> Service/kube-proxy. Name will not resolve -> DNS. Everything resolves but one source is refused -> NetworkPolicy. In-cluster fine but outside broken -> Ingress/Gateway. Each layer has one command that proves or clears it, and the packet only travels one path.

If you want to pressure-test this on a realistic, nasty version of the problem, the intermittent Service DNS challenge drops you into exactly the kind of "it works most of the time" networking failure where the layered method earns its keep.

The shape of it

A request reaches your pod by descending a stack of layers, and every Kubernetes networking object is one of those layers. The flat pod network gives every pod an IP but makes those IPs disposable, so a Service puts a stable virtual IP in front, tracking the live pods through its selector and the resulting EndpointSlices, with kube-proxy programming iptables or IPVS on every node to rewrite ClusterIP traffic to a real pod. Cluster DNS turns names into those ClusterIPs on the predictable service.namespace.svc.cluster.local pattern, or hands back raw pod IPs when the Service is headless. At the edge, Ingress or the Gateway API give you one L7 entry point that fans HTTP traffic out to many Services and terminates TLS, instead of a load balancer per service. NetworkPolicies close the default-open network back down to an allow-list. And when any of it breaks, the move is never to guess: test the layers top-down - endpoints, pod-forward, service-forward, DNS, policy, edge - and let exactly one of them fail.