Friday, January 16, 2009

Shell Script Performance

In UNIX, creating a separate process is considered very expensive. In order to proof this point, I created an empty shell script and I am going to run truss on it to see how many system calls are needed to a barebone shell.

$ cat empty.sh
#! /bin/sh

$ truss -c ./empty.sh

syscall               seconds   calls  errors
_exit                    .000       1
read                     .000       2
open                     .000       8       1
close                    .000       9
time                     .000       1
brk                      .001      17
getpid                   .000       4
mount                    .000       1       1
getuid                   .000       2
getgid                   .000       2
sysi86                   .000       1
ioctl                    .000       5       1
execve                   .000       1
umask                    .000       2
fcntl                    .000       2
fcntl                    .000       2
readlink                 .000       1       1
sigaction                .000       1
getcontext               .000       1
setustack                .000       1
mmap                     .003      34
munmap                   .000      10
xstat                    .001      11       3
getrlimit                .000       1
memcntl                  .000       7
sysconfig                .000      10
lwp_sigmask              .000       1
lwp_private              .000       1
llseek                   .000       3
schedctl                 .000       1
resolvepath              .000       9
stat64                   .000       3
fstat64                  .000      11
open64                   .000       1
                     --------  ------   ----
sys totals:              .016     167      7
usr time:                .006
elapsed:                 .130

Although it took only 0.13 seconds to run, a total of 173 system calls were invoked. Imagine you need to 'daisy-chain' a few commands together and run that 100 times in a loop. Your total run time will be quite substantial.

Below example clearly illustrates this point. Suppose you do a few invert-match (-v) grep on the output from ps -ef and we try to run that 100 times. I am providing you 3 different solutions:

  1. Daisy chain a number of grep commands
  2. Daisy chain a number of fgrep (fast grep - Interpret PATTERN as a list of fixed strings) commands
  3. Use egrep (Interpret PATTERN as an extended regular expression) command
$ cat ex-grep.sh; time ./ex-grep.sh
#! /bin/sh

for i in `perl -e '$,=" ";print 1..100'`
do
 ps -ef | grep -v root | grep -v daemon | grep -v nothing | grep -v oralce | grep -v weblogic > /dev/null 2>&1
done

real 0m14.696s
user 0m4.081s
sys 0m8.142s

$ cat ex-fgrep.sh; time ./ex-fgrep.sh
#! /bin/sh

for i in `perl -e '$,=" ";print 1..100'`
do
 ps -ef | fgrep -v root | fgrep -v daemon | fgrep -v nothing | fgrep -v oralce | fgrep -v weblogic > /dev/null 2>&1
done

real 0m14.392s
user 0m4.085s
sys 0m7.936s

$ cat ex-egrep.sh; time ./ex-egrep.sh
#! /bin/sh

for i in `perl -e '$,=" ";print 1..100'`
do
 ps -ef | egrep -v 'root|daemon|nothing|oralce|weblogic' > /dev/null 2>&1
done

real 0m8.527s
user 0m2.705s
sys 0m3.884s

As you can see, fgrep gives us only a slight improvement even it is based on fixed string matching. This is because "ex-fgrep.sh script" is still forking out as many processes as the "ex-grep.sh", in other words they are having similar overhead. However, the "ex-egrep.sh" has the least overhead because the regular expression matching can group all the OR cases together in one command and therefore avoided all the unnecessary process forking.

$ truss -c ./ex-grep.sh 2>&1 | tail -5
open64                   .013     107       5
                     --------  ------   ----
sys totals:             1.125   11216   1273
usr time:                .366
elapsed:               19.250

$ truss -c ./ex-fgrep.sh 2>&1 | tail -5
open64                   .014     107       5
                     --------  ------   ----
sys totals:             1.128   11241   1279
usr time:                .369
elapsed:               19.340

$ truss -c ./ex-egrep.sh 2>&1 | tail -5
open64                   .013     107       5
                     --------  ------   ----
sys totals:              .593    6819    869
usr time:                .193
elapsed:               11.490

In order to achieve better performance in UNIX shell scripting, we should use the most appropriate tool for the job and explore fully on all the available options. BTW, do you know what is the UNIX philospphy ?

Write programs that do one thing and do it well.

If you are running the above in a multiple CPUs box, you may not see much different because all the forked processes are running on different CPUs. I tested this out in my office's SunFire X4600 with 16 CPUs and there isn't any significance difference.

FYI, the above benchmark result is obtained in OpenSolaris 2008.11 under VirtualBox environment with 1 CPU assigned on my Intel Centrino Duo notebook

Labels: , ,

0 Comments:

Post a Comment

<< Home