System Administrators (SAs) have a tough job: Dealing with users and user accounts, security, patching, updates, upgrades, disk space, performance and other miscellaneous tasks often known as "other duties as assigned." For some SAs, the day never ends. Despite the challenges, pitfalls and occasional irate user; system administration is a fulfilling job with intangible rewards like no other position in IT. To assist those weary SAs in their quest to conquer their Linux systems, I've devised this list of 12 native Linux system monitoring tools that are always at my fingertips.
Any user may issue these commands, if they exist and haven't been protected by the SA. They are harmless and are read-only commands. The only problem with them is that ordinary users might inform the SA of a performance problem before the SA knows about it and that can irritate an overworked SAs nervous system.
1. top - It's only fitting that at the top of this list, that you'd see 'top.' Top is a diagnostic tool and a real time monitoring tool. Execute this command to see a running list of the top system resource consuming processes on a system. Try it for yourself by typing top <ENTER> at the command prompt. To quit top, press the 'q' key.
top - 14:55:04 up 3 days, 20:49, 2 users, load average: 0.07, 0.05, 0.06
Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.8%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2832420k total, 2578360k used, 254060k free, 277288k buffers
Swap: 1540088k total, 0k used, 1540088k free, 1914544k cachedPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17686 root 5 -10 690m 549m 535m S 2 19.9 16:31.18 vmware-vmx
21487 khess 15 0 12584 1060 788 R 0 0.0 0:00.07 top
1 root 18 0 10316 684 568 S 0 0.0 0:01.54 init
2 root RT 0 0 0 0 S 0 0.0 0:00.18 migration/0
3 root 34 19 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0
5 root RT 0 0 0 0 S 0 0.0 0:00.18 migration/1
6 root 34 19 0 0 0 S 0 0.0 0:30.78 ksoftirqd/1
7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1
8 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/0
9 root 10 -5 0 0 0 S 0 0.0 0:00.07 events/1
10 root 10 -5 0 0 0 S 0 0.0 0:00.00 khelper
33 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
38 root 10 -5 0 0 0 S 0 0.0 0:00.00 kblockd/0
2. uptime - The uptime command is simple. It gives you a quick snapshot of system performance and the amount of time the system has been live since the last reboot. Type uptime <ENTER> at a prompt to see your uptime stats. An example of uptime is shown below:
14:57:56 up 3 days, 20:52, 2 users, load average: 0.04, 0.04, 0.05
3. vmstat - The vmstat (virtual memory statistics) command has nothing to do with virtualization but rather it has to do with the health of your system from a swap space point-of-view. Typically, a user issues the vmstat command as shown:
$ vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 253564 277376 1914556 0 0 3 12 23 11 0 0 99 0 0
0 0 0 253564 277380 1914556 0 0 0 23 1064 832 0 0 100 0 0
0 0 0 253564 277380 1914556 0 0 0 205 1114 884 0 0 99 0 0
1 0 0 253440 277380 1914556 0 0 0 7 1060 811 0 0 100 0 0
0 0 0 253812 277380 1914560 0 0 0 16 1089 903 38 3 59 0 0
From the vmstat man page:
vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity.
The first report produced gives averages since the last reboot. Additional reports give information on a sampling period of length delay. The process and memory reports are instantaneous in either case.
4. free - Free displays the amount of free physical memory (RAM) in a system, the used physical memory, free and used swap memory and buffers used by the kernel.
$ free
total used free shared buffers cached
Mem: 2832420 2578732 253688 0 277416 1914556
-/+ buffers/cache: 386760 2445660
Swap: 1540088 0 1540088
5. ps - The ps command shows you a snapshot of currently running processes. It has several possible switches (options) but the most common is the ps -ef (See every process in full format) command. Any user may issue the ps command.
A partial ps listing is given below:
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Apr24 ? 00:00:01 init [3]root 2 1 0 Apr24 ? 00:00:00 [migration/0]
root 3 1 0 Apr24 ? 00:00:00 [ksoftirqd/0]
root 4 1 0 Apr24 ? 00:00:00 [watchdog/0]
root 5 1 0 Apr24 ? 00:00:00 [migration/1]
root 6 1 0 Apr24 ? 00:00:30 [ksoftirqd/1]
root 7 1 0 Apr24 ? 00:00:00 [watchdog/1]
root 8 1 0 Apr24 ? 00:00:00 [events/0]
root 9 1 0 Apr24 ? 00:00:00 [events/1]
root 10 1 0 Apr24 ? 00:00:00 [khelper]
root 33 1 0 Apr24 ? 00:00:00 [kthread]
root 38 33 0 Apr24 ? 00:00:00 [kblockd/0]
root 39 33 0 Apr24 ? 00:00:00 [kblockd/1]
root 40 33 0 Apr24 ? 00:00:00 [kacpid]
root 180 33 0 Apr24 ? 00:00:00 [cqueue/0]
root 181 33 0 Apr24 ? 00:00:00 [cqueue/1]
root 184 33 0 Apr24 ? 00:00:00 [khubd]
6. iostat - Iostat reports CPU, disk and partition (I/O) statistics. The iostat has several possible switches available to it for specific output. It is part of the sysstat package.
An example of CPU iostat is given below:
$ iostat -c
Linux 2.6.18-53.el5 (system.domain.com) 04/28/2010
avg-cpu: %user %nice %system %iowait %steal %idle
0.18 0.00 0.43 0.11 0.00 99.28
7. w - The w (what) command is better than the who command for seeing who's logged on and what they're doing.
$ w
15:28:59 up 3 days, 21:23, 2 users, load average: 0.00, 0.03, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
khess pts/0 megamachine 12:26 1:38m 0.04s 0.04s -bash
khess pts/1 megamachine 12:30 0.00s 0.09s 0.01s w
8. sar - The sar (System Activity Reporter) command is part of the sysstat package. It should be installed by any SA who wants to keep up with extensive system performance measurements. The default setting is to take a system snapshot every ten minutes providing the SA with a 24-hour historic view of performance. It's a valuable tool when trying to find bottlenecks and failures over a one day period.
The sar command has more than three dozen switches associated with it. To see an extensive list of its capabilities, use man sar.
$ sar
Linux 2.6.18-53.el5 (system.domain.com) 04/28/201012:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 0.49 0.00 0.52 0.05 0.00 98.94
12:20:01 AM all 0.13 0.00 0.51 0.08 0.00 99.28
12:30:01 AM all 0.12 0.00 0.53 0.05 0.00 99.29
12:40:01 AM all 0.12 0.00 0.52 0.05 0.00 99.31
12:50:01 AM all 0.13 0.00 0.55 0.07 0.00 99.25
01:00:01 AM all 0.13 0.00 0.65 0.06 0.00 99.16
01:10:01 AM all 0.54 0.00 0.50 0.08 0.00 98.88
01:20:01 AM all 0.13 0.00 0.51 0.08 0.00 99.28
01:30:01 AM all 0.12 0.00 0.52 0.08 0.00 99.28
01:40:01 AM all 0.13 0.00 0.50 0.07 0.00 99.30
9. mpstat - The mpstat command provides you with Multi-processor, CPU-related statistics. It is part of the sysstat package.
$ mpstat 5 5
Linux 2.6.18-53.el5 (system.domain.com) 04/28/201003:44:58 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
03:45:03 PM all 0.30 0.00 8.81 0.00 0.00 0.00 0.00 90.89 1072.80
03:45:08 PM all 0.10 0.00 0.40 1.10 0.00 0.10 0.00 98.30 1109.42
03:45:13 PM all 0.10 0.00 0.40 0.00 0.00 0.00 0.00 99.50 1063.15
03:45:18 PM all 0.20 0.00 3.70 0.00 0.00 0.00 0.00 96.10 1084.57
03:45:23 PM all 0.10 0.00 0.30 0.00 0.00 0.10 0.00 99.50 1067.07
Average: all 0.16 0.00 2.72 0.22 0.00 0.04 0.00 96.86 1079.37
or
mpstat -P ALL
Linux 2.6.18-53.el5 (system.domain.com) 04/28/201003:50:59 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
03:50:59 PM all 0.18 0.00 0.41 0.11 0.01 0.02 0.00 99.28 1071.77
03:50:59 PM 0 0.24 0.00 0.13 0.02 0.00 0.00 0.00 99.61 1000.70
03:50:59 PM 1 0.12 0.00 0.68 0.19 0.03 0.03 0.00 98.95 71.07
10. netstat - The netstat command, replete with options and switches, provides you with diagnostic information about your network statistics including interface statistics, routing tables, network connections and more. A wise SA uses netstat to diagnose network problems, attacks and to see a list of services and connections. An example is shown below.
$ netstat -a |grep LISTEN
tcp 0 0 localhost.localdomain:2208 *:* LISTEN
tcp 0 0 *:vmware-authd *:* LISTEN
tcp 0 0 *:mysql *:* LISTEN
tcp 0 0 *:netbios-ssn *:* LISTEN
tcp 0 0 *:sunrpc *:* LISTEN
tcp 0 0 *:ndmp *:* LISTEN
tcp 0 0 localhost.localdo:findviatv *:* LISTEN
tcp 0 0 localhost.localdomain:ipp *:* LISTEN
tcp 0 0 *:con *:* LISTEN
tcp 0 0 localhost.localdomain:smtp *:* LISTEN
tcp 0 0 localhost.lo:x11-ssh-offset *:* LISTEN
tcp 0 0 localhost.localdomain:6011 *:* LISTEN
tcp 0 0 *:microsoft-ds *:* LISTEN
tcp 0 0 *:ms-wbt-server *:* LISTEN
tcp 0 0 localhost.localdomain:2207 *:* LISTEN
tcp 0 0 *:http *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
tcp 0 0 localhost6.l:x11-ssh-offset *:* LISTEN
tcp 0 0 localhost6.localdomain:6011 *:* LISTEN
11. du - The du command reports on disk usage. You can use it to look at all filesystems or a single one. If you use du, prepare yourself for a long list of files, directories and their sizes. It's better to filter the information so that you just see a snapshot of how much space a particular directory or filesystem is using. Issue the du command and request a human readable (megabytes, gigabytes) summary report of the /opt directory.
$ du -sh /opt
929M /opt
12. df - The df command reports the amount of used vs. free space you have on your filesystems. To see how this output differs from the du command, see the example below. The example shown uses the (-h) or human readable format that many SAs prefer.
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
360G 274G 68G 81% /
/dev/sda1 99M 30M 65M 32% /boot
tmpfs 1.4G 0 1.4G 0% /dev/shm
/dev/hdb1 230G 164G 55G 75% /backups
There you have the top 12 native Linux monitoring tools at your disposal. The real beauty of these commands is that they don't require any web services or third party products to make them work. Their only shortcoming is that they are not predictive nor do they have historical data associated with them. These tools are all snapshot utilities that tell you what's going on right now with your system.
In a future post, I'll cover some predictive and historical monitoring tools.