What? A blog?‎ > ‎

Quick *nix (mostly Linux) Health Checks #1

posted Apr 12, 2013, 8:24 AM by Daniel Gomes   [ updated Apr 12, 2013, 8:39 AM ]
Once upon a time, I spent a whole day trying to convince a junior system administrator to run lsof and tell me the size of largest file open because my app was performing really bad and the file system was filling up no matter what.

Since he was clueless about what lsof does and why a insanely large file ruins your performance, he just ignored me until things became so ugly, he got escalated and his boss provided me the lsof results (Yeah, I could not just become root and do it. Large corporations have these separation of duty rules).

I identified the evil process and killed it, recovering app's performance. Kudos!

So, kids, run lsof whenever your application goes weird. It may be looping through a file descriptor and writing huge amounts of data to the disk!

It must be run as root:

# 1 - Check top 30 processes with number of files opened:
sudo lsof | awk '$5 == "REG" {freq[$2]++ ; names[$2] = $1 ;} END {for (pid in freq) print freq[pid], names[pid], pid ; }' | sort -n -r -k 1,1 | head -30

Get used to your app's regular number of opened files and stay tuned for any unusual large number!

# 2 - Check the file sizes ordering by size, descending:

sudo lsof -s | awk '$5 == "REG"' | sort -n -r -k 7,7 | head -n 30

Get used to your app's regular opened files size. Of course this varies a LOT from app to app, but usually after taking care of some app for sometime you get used to BAU parameters. Also, if you see a 5,000 Gb file and your app is not launching any nuclear missile, hey, c'mon...

(One liners stolen from http://thegoogleof.blogspot.com/2011/11/lsof-sort.html - I could do it if I want! Anytime! :P)


Now let's talk memory. The free command is your friend, a complicated friend but still a friend.

[root@xyz ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          7596       5475       2121          0         21        532
-/+ buffers/cache:       4921       2675
Swap:         8191       1138       7053

-m is to display data in megabytes.

So, I have 2,121 Mb free, right? Wrong. I have 2,675 Mb. And the OS is really using 4,921, not 5,475 (because this includes disk caching which is freed whenever your apps need RAM).

Of course this is a bit more complicated because it includes kernel slab reclaimable and 'free' command behavior, that ignores it. For a real good post on memory, read here:

http://stackoverflow.com/questions/3784974/want-to-know-whether-enough-memory-is-free-on-a-linux-machine-to-deploy-a-new-ap/4417121#4417121

If you need specifics on a process (Linux), get its PID, example:
[root@xyz ~]# ps aux | grep java | grep -v grep
joe  22209  3.3  1.8 954764 143336 ?       Sl   07:56   8:39 /opt/ibm/lotus/notes


Then cat its info from /proc:

[root@xyz~]# cat /proc/22209/status
Name:    notes2
State:    S (sleeping)
Tgid:    22209
Pid:    22209
PPid:    1
TracerPid:    0
Uid:    500    500    500    500
Gid:    500    500    500    500
Utrace:    0
FDSize:    1024
Groups:    18 498 500 502
VmPeak:     1028952 kB
VmSize:      954764 kB
VmLck:           0 kB
VmHWM:      228884 kB
VmRSS:      143336 kB
VmData:      601076 kB
VmStk:         100 kB
VmExe:          16 kB
VmLib:      176108 kB
VmPTE:         868 kB
VmSwap:       52096 kB
Threads:    103
SigQ:    1/60589
SigPnd:    0000000000000000
ShdPnd:    0000000000000000
SigBlk:    0000000000300000
SigIgn:    0000000000301000
SigCgt:    20000001c20864ff
CapInh:    0000000000000000
CapPrm:    0000000000000000
CapEff:    0000000000000000
CapBnd:    ffffffffffffffff
Cpus_allowed:    ff
Cpus_allowed_list:    0-7
Mems_allowed:    00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:    0
voluntary_ctxt_switches:    441126
nonvoluntary_ctxt_switches:    26452


Of course there's a lot of info there you don't need. VmPeak is a good one to know what's the maximum size your app ate from the system.

For detailed description on this:
http://www.lindevdoc.org/wiki//proc/pid/status