Linux and DevOps Command Cheat Sheet
The Linux commands DevOps engineers actually run, in one scannable reference: files, permissions, processes, systemd, networking, packages, performance, disk, and logs. The commands you reach for at 2am, with links to the deep dives.
A working reference, not a textbook - the commands that come up over and over operating real systems. Skim it, keep it open, and follow the links to the full guides when you want the why behind a command.
Navigation and files
pwd # where am I
ls -lah # list, long format, all files, human sizes
cd - # jump back to the previous directory
find . -name "*.log" -mtime -1 # files matching a name, modified in last day
find /var -type f -size +100M # files over 100MB
grep -rn "TODO" src/ # recursive search, with line numbers
grep -C 3 "error" app.log # match plus 3 lines of context
tree -L 2 # directory tree, 2 levels deep
du -sh * # size of each item in the current dir
df -h # disk space per filesystem
tar -czf out.tgz dir/ # create a gzipped archive
tar -xzf out.tgz # extract it
See the full walkthrough in Linux for DevOps: The Fundamentals.
Permissions and ownership
ls -l file # read the permission string
chmod 644 config.yml # rw-r--r-- normal file
chmod 755 script.sh # rwxr-xr-x executable/dir
chmod 600 ~/.ssh/id_ed25519 # rw------- private key
chmod u+x script.sh # add execute for the owner only
chown user:group file # change owner and group
chown -R appuser /opt/app # recursive
stat -c '%a' file # show permissions as octal (e.g. 644)
Full mental model: Linux File Permissions.
Processes and jobs
ps aux | grep nginx # find a process
pgrep -af python # PIDs (and command) matching a name
top # live process view (P = sort by CPU, M = by memory)
kill 1234 # SIGTERM - ask it to stop (try this first)
kill -9 1234 # SIGKILL - force, last resort
pkill -f "worker.py" # kill by full command line
command & # run in the background
nohup command & # background, survives logout
ss -tlnp | grep :8080 # what process holds a port
nice -n 10 ./heavy.sh # start a job at lower priority
Signals, job control, zombies: Process and Job Management.
Services with systemd
systemctl status nginx # is it running, recent logs, PID
systemctl start|stop|restart nginx
systemctl reload nginx # re-read config without dropping connections
systemctl enable --now nginx # start now AND on every boot
systemctl disable --now nginx
systemctl is-enabled nginx # will it start on boot?
systemctl daemon-reload # after editing a unit file - do NOT forget this
journalctl -u nginx -e # the service's logs, jump to the end
systemctl list-units --failed # everything that failed
Writing units, restart policies, debugging: Service Management With systemd.
Networking
ip -br addr # interfaces and their IPs, one line each
ip route # the routing table
ip route get 8.8.8.8 # which route a packet would actually take
ss -tlnp # listening TCP ports, with the owning process
dig +short db.internal # resolve a name via DNS
getent hosts db.internal # resolve the way the SYSTEM does (honors /etc/hosts)
ping 10.0.0.5 # is the host reachable
nc -zv host 5432 # is a TCP port open from here (no app needed)
curl -v https://api/health # full HTTP request with timing and TLS detail
mtr db.internal # live traceroute + ping
sudo nft list ruleset # the firewall rules
The layered "cannot connect" method: Linux Networking for DevOps.
Packages
# Debian / Ubuntu
sudo apt update && sudo apt install -y nginx
sudo apt install nginx=1.24.0-1ubuntu1 # pin a version (reproducible)
apt-mark hold nginx # freeze it from upgrades
dpkg -S /usr/sbin/nginx # which package owns this file
# RHEL / Fedora / Amazon Linux
sudo dnf install -y nginx
dnf --showduplicates list nginx # versions available to pin
rpm -qf /usr/sbin/nginx # which package owns this file
apt vs dnf, repos, pinning, source builds: Linux Package Management.
Performance and troubleshooting
uptime # load average (1, 5, 15 min) - read the trend
top # CPU line: us=CPU work, wa=waiting on IO
free -h # memory - watch `available`, not `free`
vmstat 1 # si/so columns = swapping (out of real RAM)
iostat -xz 1 # per-disk IO; %util near 100 = saturated
iotop # which process is doing the disk IO
df -h # full filesystem - check this early, always
strace -c -p 1234 # what syscalls a process spends time in
lsof -i :8080 # which process uses a port
lsof -nP | grep deleted # deleted-but-open files (disk not freeing up)
dmesg | grep -i oom # did the OOM killer kill something
The systematic CPU/memory/IO sweep: Performance Monitoring and Troubleshooting.
Logs
journalctl -u app -f # follow a service's logs live
journalctl -u app --since "10 min ago"
journalctl -u app -p err # errors and worse only
journalctl -k # kernel messages
tail -f /var/log/nginx/error.log # follow a classic log file
ls -lt /var/log # most recently written logs (active ones)
grep -ri "error string" /var/log/ # search everything
zgrep error /var/log/syslog.1.gz # search rotated, gzipped logs
tail -f app.log | jq . # pretty-print streaming JSON logs
journalctl --vacuum-time=7d # trim the journal to 7 days
journald, /var/log, rotation, incident reading: Linux Logging.
Disk and the "out of space" drill
df -h # which filesystem is full (start here)
du -sh /var/* | sort -h # biggest directories under /var
du -ah /var/log | sort -h | tail # the biggest individual files
lsof -nP | grep deleted # space held by deleted-but-open files
ncdu / # interactive disk usage explorer (if installed)
A full disk causes a dozen confusing symptoms; this drill finds the cause in a minute.
Handy one-liners
# Most CPU-hungry processes, top 5
ps -eo pid,pcpu,comm --sort=-pcpu | head -6
# Most memory-hungry processes, top 5
ps -eo pid,pmem,comm --sort=-pmem | head -6
# Watch a command refresh every 2 seconds
watch -n 2 'ss -tn state established | wc -l'
# Tail the last 50 lines of every recently-changed log in /var/log
tail -n 50 $(ls -1t /var/log/*.log | head -3)
# What is this process's working directory and open files
ls -l /proc/1234/cwd; ls -l /proc/1234/fd
Use it
Keep this open while you work; the muscle memory comes from running the commands, not reading about them. When a command here raises a "but why" - why SIGTERM before SIGKILL, why available and not free, why pin a package version - the linked guide has the full answer. Get fluent with this set and most day-to-day operations, and most "the server is broken" pages, become routine.