Linux/Linux Networking for DevOps

Linux Networking for DevOps

Read interface config, check what is listening, follow a route, resolve DNS, and find where a connection breaks. ip and ss, routing, DNS, ports, firewalls, and a layered way to debug 'it cannot connect' - the network skills DevOps actually uses.

"It cannot reach the database." "The health check times out." "DNS is broken." Half of all production incidents are network problems wearing a different costume, and the engineer who can methodically find where a connection dies - is it the interface, the route, DNS, the firewall, or the service itself - fixes them while everyone else guesses.

This is the working set: the commands to inspect the network, and a layered method to locate the break.

The modern toolkit: `ip` and `ss`

First, drop the old commands. ifconfig, netstat, and route are deprecated and often not installed on modern distros. The replacements live in the iproute2 package and are what you should reach for:

Old (deprecated)	Use instead
`ifconfig`	`ip addr`, `ip link`
`route`	`ip route`
`netstat`	`ss`
`arp`	`ip neigh`

Interfaces and addresses: `ip`

ip addr                      # every interface and its IP addresses (most used)
ip -br addr                  # brief: one tidy line per interface
ip link                      # interfaces and their state (UP/DOWN), MAC, MTU
ip addr show eth0            # just one interface

Reading ip addr: each interface shows its state (UP/DOWN), its inet (IPv4) and inet6 (IPv6) addresses with the CIDR prefix (/24), and lo is the loopback (127.0.0.1). If a service should be reachable but is not, the first check is whether the interface is UP and has the address you expect.

Bring an interface up or assign an address temporarily (lost on reboot):

sudo ip link set eth0 up
sudo ip addr add 10.0.0.5/24 dev eth0

Permanent config lives in distro-specific places: Netplan (/etc/netplan/*.yaml) on modern Ubuntu, NetworkManager on RHEL/Fedora, or /etc/network/interfaces on older Debian. On a cloud VM you rarely touch these - the provider hands out addresses via DHCP - but you need to know where they are when static config is required.

Routing: how packets leave the box

A packet to an address the machine does not own is sent to a router based on the routing table:

ip route                     # the routing table
ip route get 8.8.8.8         # show exactly which route a packet would take

The line that starts with default is the gateway - where anything not on the local network goes. ip route get is the underused gem: it tells you the exact interface and source IP a packet to a given destination would use, which instantly answers "why can't this box reach that network?" If there is no route, that is your answer.

DNS resolution

A name like db.internal is useless until it resolves to an IP. The resolution path:

/etc/hosts - static overrides are checked first. A stale entry here is a classic "it resolves to the wrong IP" bug.
The resolver config - /etc/resolv.conf lists the nameserver IPs to query. On systems using systemd-resolved, this is a symlink to a stub (127.0.0.53) and the real servers are shown by resolvectl status.

Query DNS directly to separate name resolution from everything else:

dig db.internal              # full DNS query and answer (install dnsutils/bind-utils)
dig +short db.internal       # just the resolved IP
dig @8.8.8.8 example.com     # query a SPECIFIC server, bypassing local config
resolvectl status            # what systemd-resolved is actually using
getent hosts db.internal     # resolve the way the SYSTEM does (honors /etc/hosts + nsswitch)

The most useful distinction: dig asks DNS directly, while getent hosts follows the full system resolution path including /etc/hosts and /etc/nsswitch.conf. If dig works but your app does not resolve a name, the difference is usually /etc/hosts or nsswitch - check getent.

What is listening: ports and sockets with `ss`

When a service will not accept connections, find out if it is even listening, and on which address:

ss -tlnp                     # TCP, listening, numeric, with process (the everyday command)
ss -tlnp | grep :5432        # who is on the Postgres port
ss -tunp                     # all TCP and UDP connections with processes
ss -tn state established     # current established TCP connections

The -tlnp flags: TCP, listening, numeric (no slow name lookups), process. The address column is what matters most: a service bound to 127.0.0.1:8080 is reachable only from the same machine, while 0.0.0.0:8080 (or *:8080) is reachable from the network. A huge share of "the service is up but I cannot connect" incidents are a process bound to localhost when it needed to bind to all interfaces.

Firewalls: nftables, iptables, and the friendly front-ends

Linux packet filtering is nftables now (it replaced the older iptables), but you will meet all of these:

nftables (nft) - the current kernel firewall. sudo nft list ruleset dumps every rule.
iptables - the legacy interface. Still everywhere; on modern systems it is often a compatibility shim over nftables. sudo iptables -L -n -v lists rules.
ufw (Ubuntu) and firewalld (RHEL/Fedora) - friendly front-ends most people actually use day to day.

sudo nft list ruleset        # the real, full firewall state
sudo iptables -L -n -v       # legacy view (numeric, with counters)
sudo ufw status verbose      # Ubuntu's simple firewall
sudo firewall-cmd --list-all # firewalld's active zone and rules

When a connection is refused vs. timing out, the firewall is a prime suspect: a REJECT rule gives an immediate "connection refused," while a DROP rule makes the connection hang and time out (the packet vanishes silently). That difference is a strong diagnostic clue. Also remember the cloud layer: on AWS, GCP, or Azure, a Security Group or network ACL is a firewall too, and it is the one people forget.

Connectivity troubleshooting: a layered method

The skill that matters is not memorizing commands - it is checking the path in order, from the bottom up, so you isolate the break instead of guessing. The tools:

ping 10.0.0.5                # is the host reachable at the IP layer at all?
traceroute db.internal       # every hop on the way - where does it stop?
mtr db.internal              # ping + traceroute combined, live (the best of both)
curl -v https://api:8080/health   # full HTTP request with TLS and timing detail
nc -zv db.internal 5432      # can I open a TCP connection to that port? (no app needed)

The method, bottom of the stack to the top:

DNS - does the name resolve, and to the right IP? dig +short name and getent hosts name. If wrong, stop here.
Reachability - can you reach the IP at all? ping IP. (Note: ICMP is sometimes blocked, so a failed ping is not always conclusive - move to the next step.)
The port - is the specific TCP port open from where you are standing? nc -zv host port. This is the single most useful test: it proves the path is clear up to the service, without needing the application protocol.
The service - is the application actually responding correctly? curl -v for HTTP. Now you are past the network and into the app.

Run those four in order and you will pinpoint nearly any "cannot connect" to a specific layer: the name, the route, the firewall/port, or the service. The mistake everyone makes is jumping straight to step 4, restarting the app, when the break was a stale /etc/hosts entry or a closed Security Group at step 1 or 3.

A worked example

The app logs connection refused talking to cache:6379. Walk it:

getent hosts cache           # 1. resolves? -> 10.0.0.9. Good, the name is fine.
ping 10.0.0.9                # 2. reachable? -> replies. The host is up and routable.
nc -zv 10.0.0.9 6379         # 3. port open? -> "connection refused"

"Connection refused" at step 3 means the host is reachable but nothing is listening on 6379 (or a firewall is actively rejecting). SSH to that host and check: ss -tlnp | grep 6379. If Redis is bound to 127.0.0.1:6379, there is your bug - it needs to listen on 0.0.0.0 or the internal IP. The whole incident is a five-minute trace once you go in order, and a two-hour guess if you do not.

What to actually remember

You do not need every flag. You need: ip addr and ip route to see the machine's network, ss -tlnp to see what is listening and on which address, dig/getent to separate DNS from everything else, the right firewall command for your distro, and the four-step layered check (DNS, reachability, port, service) to find where a connection dies. That set handles the overwhelming majority of real network incidents.

All guides Join the community