Guides/LinuxLinux/Linux and DevOps Command Cheat Sheet

Linux and DevOps Command Cheat Sheet

The Linux commands DevOps engineers actually run, in one scannable reference: files, permissions, processes, systemd, networking, packages, performance, disk, and logs. The commands you reach for at 2am, with links to the deep dives.


A working reference, not a textbook - the commands that come up over and over operating real systems. Skim it, keep it open, and follow the links to the full guides when you want the why behind a command.

Navigation and files

pwd                          # where am I
ls -lah                      # list, long format, all files, human sizes
cd -                         # jump back to the previous directory
find . -name "*.log" -mtime -1     # files matching a name, modified in last day
find /var -type f -size +100M      # files over 100MB
grep -rn "TODO" src/         # recursive search, with line numbers
grep -C 3 "error" app.log    # match plus 3 lines of context
tree -L 2                     # directory tree, 2 levels deep
du -sh *                     # size of each item in the current dir
df -h                        # disk space per filesystem
tar -czf out.tgz dir/        # create a gzipped archive
tar -xzf out.tgz             # extract it

See the full walkthrough in Linux for DevOps: The Fundamentals.

Permissions and ownership

ls -l file                   # read the permission string
chmod 644 config.yml         # rw-r--r--  normal file
chmod 755 script.sh          # rwxr-xr-x  executable/dir
chmod 600 ~/.ssh/id_ed25519  # rw-------  private key
chmod u+x script.sh          # add execute for the owner only
chown user:group file        # change owner and group
chown -R appuser /opt/app    # recursive
stat -c '%a' file            # show permissions as octal (e.g. 644)

Full mental model: Linux File Permissions.

Processes and jobs

ps aux | grep nginx          # find a process
pgrep -af python             # PIDs (and command) matching a name
top                          # live process view (P = sort by CPU, M = by memory)
kill 1234                    # SIGTERM - ask it to stop (try this first)
kill -9 1234                 # SIGKILL - force, last resort
pkill -f "worker.py"         # kill by full command line
command &                    # run in the background
nohup command &              # background, survives logout
ss -tlnp | grep :8080        # what process holds a port
nice -n 10 ./heavy.sh        # start a job at lower priority

Signals, job control, zombies: Process and Job Management.

Services with systemd

systemctl status nginx       # is it running, recent logs, PID
systemctl start|stop|restart nginx
systemctl reload nginx       # re-read config without dropping connections
systemctl enable --now nginx # start now AND on every boot
systemctl disable --now nginx
systemctl is-enabled nginx   # will it start on boot?
systemctl daemon-reload      # after editing a unit file - do NOT forget this
journalctl -u nginx -e       # the service's logs, jump to the end
systemctl list-units --failed  # everything that failed

Writing units, restart policies, debugging: Service Management With systemd.

Networking

ip -br addr                  # interfaces and their IPs, one line each
ip route                     # the routing table
ip route get 8.8.8.8         # which route a packet would actually take
ss -tlnp                     # listening TCP ports, with the owning process
dig +short db.internal       # resolve a name via DNS
getent hosts db.internal     # resolve the way the SYSTEM does (honors /etc/hosts)
ping 10.0.0.5                # is the host reachable
nc -zv host 5432             # is a TCP port open from here (no app needed)
curl -v https://api/health   # full HTTP request with timing and TLS detail
mtr db.internal              # live traceroute + ping
sudo nft list ruleset        # the firewall rules

The layered "cannot connect" method: Linux Networking for DevOps.

Packages

# Debian / Ubuntu
sudo apt update && sudo apt install -y nginx
sudo apt install nginx=1.24.0-1ubuntu1     # pin a version (reproducible)
apt-mark hold nginx                         # freeze it from upgrades
dpkg -S /usr/sbin/nginx                      # which package owns this file

# RHEL / Fedora / Amazon Linux
sudo dnf install -y nginx
dnf --showduplicates list nginx              # versions available to pin
rpm -qf /usr/sbin/nginx                       # which package owns this file

apt vs dnf, repos, pinning, source builds: Linux Package Management.

Performance and troubleshooting

uptime                       # load average (1, 5, 15 min) - read the trend
top                          # CPU line: us=CPU work, wa=waiting on IO
free -h                      # memory - watch `available`, not `free`
vmstat 1                     # si/so columns = swapping (out of real RAM)
iostat -xz 1                 # per-disk IO; %util near 100 = saturated
iotop                        # which process is doing the disk IO
df -h                        # full filesystem - check this early, always
strace -c -p 1234            # what syscalls a process spends time in
lsof -i :8080                # which process uses a port
lsof -nP | grep deleted      # deleted-but-open files (disk not freeing up)
dmesg | grep -i oom          # did the OOM killer kill something

The systematic CPU/memory/IO sweep: Performance Monitoring and Troubleshooting.

Logs

journalctl -u app -f                 # follow a service's logs live
journalctl -u app --since "10 min ago"
journalctl -u app -p err             # errors and worse only
journalctl -k                        # kernel messages
tail -f /var/log/nginx/error.log     # follow a classic log file
ls -lt /var/log                      # most recently written logs (active ones)
grep -ri "error string" /var/log/    # search everything
zgrep error /var/log/syslog.1.gz     # search rotated, gzipped logs
tail -f app.log | jq .               # pretty-print streaming JSON logs
journalctl --vacuum-time=7d          # trim the journal to 7 days

journald, /var/log, rotation, incident reading: Linux Logging.

Disk and the "out of space" drill

df -h                        # which filesystem is full (start here)
du -sh /var/* | sort -h      # biggest directories under /var
du -ah /var/log | sort -h | tail   # the biggest individual files
lsof -nP | grep deleted      # space held by deleted-but-open files
ncdu /                       # interactive disk usage explorer (if installed)

A full disk causes a dozen confusing symptoms; this drill finds the cause in a minute.

Handy one-liners

# Most CPU-hungry processes, top 5
ps -eo pid,pcpu,comm --sort=-pcpu | head -6

# Most memory-hungry processes, top 5
ps -eo pid,pmem,comm --sort=-pmem | head -6

# Watch a command refresh every 2 seconds
watch -n 2 'ss -tn state established | wc -l'

# Tail the last 50 lines of every recently-changed log in /var/log
tail -n 50 $(ls -1t /var/log/*.log | head -3)

# What is this process's working directory and open files
ls -l /proc/1234/cwd; ls -l /proc/1234/fd

Use it

Keep this open while you work; the muscle memory comes from running the commands, not reading about them. When a command here raises a "but why" - why SIGTERM before SIGKILL, why available and not free, why pin a package version - the linked guide has the full answer. Get fluent with this set and most day-to-day operations, and most "the server is broken" pages, become routine.