Linux Logging: journald, syslog, and Log Files
Find the right log fast and read it well during an incident. journalctl and the systemd journal, the classic /var/log files, log rotation with logrotate, and the grep/tail/jq techniques for getting the one line that explains the outage.
During an incident, the answer is almost always in a log - you just have to find the right one and read it well under pressure. The engineers who resolve outages quickly are not smarter; they know where logs live, how to filter to the relevant window, and how to follow a failure as it happens. This guide is that skill.
Modern Linux has two logging worlds living side by side: the systemd journal and the traditional text files in /var/log. You need both.
The systemd journal: journalctl
On any systemd machine, services' stdout and stderr are captured into the binary journal, queried with journalctl. This is usually where you start because it is structured, indexed, and unified - no guessing at filenames.
journalctl -u nginx # all logs for the nginx unit
journalctl -u nginx -e # jump to the end (most recent)
journalctl -u nginx -f # follow live, like tail -f
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx --since "09:00" --until "09:15"
journalctl -u nginx -p err # only error priority and worse
journalctl -u nginx -b # only this boot
journalctl -u nginx -b -1 # the PREVIOUS boot (did it log before a crash?)
journalctl -k # kernel messages (dmesg, but persistent)
The combination you will run most in an incident: journalctl -u service -e to see how it died, then journalctl -u service -f while you try to start it again so you watch the failure live. The --since/--until time filters are what let you zoom straight to the five minutes that matter instead of scrolling through a day.
Priority levels (the -p flag) follow the classic syslog severities, from most to least serious: emerg, alert, crit, err, warning, notice, info, debug. -p err shows err and everything worse, which is how you skip the noise and see only what broke.
One catch: by default many distros keep the journal in memory only, so it is lost on reboot. To investigate a crash after the fact you need a persistent journal - make sure /var/log/journal exists (sudo mkdir -p /var/log/journal && sudo systemctl restart systemd-journald), or set Storage=persistent in /etc/systemd/journald.conf. Without it, journalctl -b -1 has nothing to show.
The classic files: /var/log
Plenty of software (and all older systems) logs to plain text files via syslog/rsyslog. Know the map so you go straight to the right one:
| File | What is in it |
|---|---|
/var/log/syslog (Debian) / /var/log/messages (RHEL) |
the general catch-all system log |
/var/log/auth.log (Debian) / /var/log/secure (RHEL) |
logins, sudo, SSH - the security log |
/var/log/kern.log |
kernel messages |
/var/log/dmesg |
boot-time kernel ring buffer |
/var/log/nginx/, /var/log/postgresql/, etc. |
per-application logs (apps often log here, not the journal) |
/var/log/cron |
scheduled job output |
When you do not know where something logs, two moves find it fast: ls -lt /var/log shows the most recently written files at the top (the active ones during your incident), and sudo grep -ri "error string" /var/log/ searches everything. Applications vary - some log to the journal, some to /var/log/app/, some to a path in their own config - so checking both worlds is the habit.
Reading logs effectively under pressure
Finding the log is half the job; reading it well is the other half. The core toolkit:
tail -f /var/log/nginx/error.log # follow new lines live
tail -n 200 /var/log/nginx/error.log # the last 200 lines
less +G /var/log/syslog # open at the end; / to search, n for next
grep -i error /var/log/syslog # case-insensitive match
grep -C 3 "Out of memory" /var/log/syslog # match plus 3 lines of CONTEXT around it
grep -i error app.log | tail -20 # the most recent errors only
zgrep error /var/log/syslog.1.gz # search ROTATED, gzipped logs without unzipping
Two techniques do most of the work. grep -C 3 (context) is far more useful than a bare match - the error line rarely explains itself; the lines around it (the request, the stack trace, the follow-on failure) do. And zgrep searches rotated .gz logs in place, which matters because the log entry you need is often from before the most recent rotation.
For JSON logs (increasingly common), pipe through jq to make them readable and filterable:
tail -f app.log | jq . # pretty-print streaming JSON logs
jq 'select(.level=="error")' app.log # only the error entries
jq -r 'select(.status>=500) | .path' access.log # paths that 500'd, as plain text
Correlating across logs
Real incidents span services, and the skill is lining them up by time. The trick is simple: filter every relevant log to the same tight time window and read them together.
journalctl --since "14:05" --until "14:10" # everything system-wide in that window
journalctl -u app --since "14:05" --until "14:10" # just the app, same window
grep "14:0[5-9]" /var/log/nginx/access.log # the proxy's view of the same window
When the app logged an error at 14:07, what did nginx see at 14:07? What did the kernel log (OOM? disk error?) at 14:07? Aligning the timestamps turns three confusing logs into one story. This is why consistent timestamps (UTC, ideally) across your stack are worth insisting on.
Log rotation: why logs do not eat the disk
Logs grow forever unless something trims them, and a full /var/log is a classic cause of a server falling over. Two mechanisms handle this:
logrotaterotates the/var/logtext files - renaming, compressing, and deleting old ones on a schedule (config in/etc/logrotate.confand/etc/logrotate.d/). A typical policy keeps a few weeks, gzips the old ones (hence.log.1.gz), and deletes the rest.- The journal is self-limiting by size/time, configured in
/etc/systemd/journald.conf(SystemMaxUse=). Trim it on demand withjournalctl --vacuum-time=7dor--vacuum-size=500M.
sudo logrotate -d /etc/logrotate.conf # DRY RUN - show what rotation would do
journalctl --vacuum-time=7d # keep only the last 7 days of journal
du -sh /var/log/* # what is actually eating /var/log right now
When a disk fills with logs, du -sh /var/log/* | sort -h finds the offender, and the fix is a tighter rotation policy (or a leaking application logging in a loop) - not just deleting files, which a runaway app will refill in minutes. Remember the deleted-but-open-file trap too: if you rm a huge log a process still has open, the space is not freed until that process is restarted (lsof | grep deleted confirms it).
The shape of it
Start in the journal with journalctl -u service -e and -f, narrowed by --since/--until; fall back to /var/log for apps that log there (use ls -lt and grep -ri to find which file). Read with context (grep -C 3), search rotated logs with zgrep, and use jq for structured logs. Correlate incidents by filtering every log to the same time window. And keep rotation healthy so logs never fill the disk. Find the right log, read the right window - that is the whole game during an outage.