Wednesday, August 29, 2007

Better Internet Bandwidth Utilisation

My office Internet connection is not fast and it will not be efficient to download something biggish during the normal office hour. Also, I don't want to leave my notebook unattended outside office hour.

I wrote a simple proof-of-concept web frontend to assist me to download file at scheduled time slot. A Tcl CGI script is written to dynamically talk to at (execute command at a later time) to achieve this.

Imagine you want to download the Rocks Cluster DVD ISO , which is 4,045,514,752 Bytes in size. All you have to do is to provide the web frontend with the URL and time (how many hour/min later) to download this file. Email notification (currently not implemented) can be easily incorporated at the end of the download if needed. Below shows the web page:

Skeleton of the CGI program is given below:

#! /usr/sfw/bin/tclsh

package require cgi


cgi_input
set when [cgi_import when]
set unit [cgi_import unit]
set url  [cgi_import url]


set dir /var/apache/htdocs/your-dl


set fp [open "|at -m now + $when $unit" w]
fconfigure $fp -buffering line
set filename [file tail $url]
puts $fp "cd $dir"
puts $fp "/usr/local/bin/curl --location --silent --remote-name $url"


set now [clock format [clock seconds]]
set then [clock format [clock scan "now + $when $unit"]]
puts "Content-type: text/plain\n"
puts "Now is $now"
puts "Scheduled to download at $then"
puts ""
puts "Check download area here"

Curl and Cgi.tcl (a Tcl package for CGI) are used in this setup. The whole setup runs on Solaris 10 with the built-in Apache 2. I believe this demo web site can offload some of the daytime Internet traffic.

Labels: , , , ,

Sun Grid Engine Accounting Users Summary

In Sun Grid Engine (SGE), all job accounting information is written to $SGE_ROOT/$SGE_CELL/common/accounting file. Each record of the accounting information consists of 43 fields separated by colon (':') signs.

For details of all the fields, read up the man page of accounting(5).

If you want to find out distribution of jobs and their corresponding elpased time per user, you will be interested in 4th and 14th field in the accounting record.

$ man accounting


N1 Grid Engine File Formats                         ACCOUNTING(5)

NAME
     accounting - N1 Grid Engine accounting file format

DESCRIPTION
     An accounting record  is  written  to  the  N1  Grid  Engine
     accounting file for each job having finished. The accounting
     file is processed by qacct(1) to derive  accounting  statis-
     tics.

FORMAT
     Each job is represented by a line in  the  accounting  file.
     Empty  lines  and  lines which contain one character or less
     are ignored.  Accounting record  entries  are  separated  by
     colon  (':')  signs.  The  entries  denote in their order of
     appearance:

     qname
          Name of the cluster queue in which the job has run.

     hostname
          Name of the execution host.

     group
          The effective group id of the job owner when  executing
          the job.

     owner
          Owner of the N1 Grid Engine job.

     job_name
          Job name.

     job_number
          Job identifier - job number.

     account
          An account  string  as  specified  by  the  qsub(1)  or
          qalter(1) -A option.

     priority
          Priority value assigned to the job corresponding to the
          priority  parameter  in  the  queue  configuration (see
          queue_conf(5)).

     submission_time
          Submission time in seconds (since epoch format).

     start_time
          Start time in seconds (since epoch format).

     end_time
          End time in seconds (since epoch format).

N1GE 6          Last change: 2004/04/19 10:52:07                1

N1 Grid Engine File Formats                         ACCOUNTING(5)

     failed
          Indicates the problem which  occurred  in  case  a  job
          could  not  be  started  on  the  execution  host (e.g.
          because the owner of the  job  did  not  have  a  valid
          account  on  that  machine). If N1 Grid Engine tries to
          start a job multiple times, this may lead  to  multiple
          entries  in  the  accounting  file corresponding to the
          same job ID.

     exit_status
          Exit status of  the  job  script  (or  N1  Grid  Engine
          specific status in case of certain error conditions).

     ru_wallclock
          Difference between end_time and start_time (see above).

Below is an AWK script to summarise the accounting information and it's corresponding output. FYI, the usernames are fictitious.

$ cat sge-summary.sh
#! /bin/sh

awk -F":" '
NF==43 {
        # $4 - owner
        # $14 - wallclock
        ++jsum[$4]
        ++jcnt
        tsum[$4]+=$14
        tcnt+=$14
}
END {
        printf("User\tJob\tRun Time\n")
        for(i in jsum) {
                printf("%-10s\t%.2f%\t%.2f%\n", i, jsum[i]*100/jcnt, tsum[i]*100/tcnt)
        }
}' $SGE_ROOT/$SGE_CELL/common/accounting


$ ./sge-summary.sh
User            Job     Run Time
alan            0.02%    0.00%
bob             2.43%    0.00%
carl            0.02%    0.00%
daryl           0.84%    0.00%
edwin           0.20%    0.00%
francis         0.01%    0.00%
george          0.06%    0.00%
harry           0.02%    0.00%
irene           0.02%    0.00%
jeffrey         0.71%    99.36%
karen           0.05%    0.00%
leo             0.04%    0.00%
mark            0.04%    0.00%
nelson          95.32%   0.64%
oliver          0.22%    0.00%

Labels: ,

Sunday, August 26, 2007

NFS Logging

Further to my last blog which I encountered issues with nfslog, I am telling you that I managed to resolve it. Thanks to some of the Sun folks that I talked to.

I always thought that doing 'ls -lR' in the NFS mount point will be able to generate a lot of NFS traffic and hence produce output in the nfslog. Instead, the fhtable ndbm database grows but not the nfslog. As described by Sameer Smth's blog, it said"

The subset of these activities is to store information about the files/links/directories. nfslogd does not use flat files to log these activities as the searching of data will become very inefficient. Instead nfslogd uses Solaris native database ndbm to log all these records. This makes searching/deleting/inserting the records very efficient. nfslogd stores two set of records for each file/link/directory. These records are primary & secondary.

To test out the nfslogd, I share out the /usr as read only and make sure the client side mount it with nfs version 3. Below shows the process

server# share -F nfs -o log=global,ro /usr

client# mount -o vers=3,ro server:/usr /mnt
client# cd /mnt/include
client# cat *.h


server# cat /var/nfs/nfslog.d
Fri Aug 24 14:19:45 2007 0 client 2780 /usr/include/apptrace.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 1358 /usr/include/apptrace_impl.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 1916 /usr/include/ar.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 7495 /usr/include/archives.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 21174 /usr/include/aspell.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 1553 /usr/include/assert.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 239 /usr/include/atomic.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 23476 /usr/include/audiofile.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 1633 /usr/include/aupvlist.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 2390 /usr/include/auth_attr.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 684 /usr/include/auto_ef.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 7833 /usr/include/bzlib.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 3829 /usr/include/complex.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 6543 /usr/include/config_admin.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 1025 /usr/include/cpio.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 1633 /usr/include/crypt.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 2240 /usr/include/ctype.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 51186 /usr/include/curses.h b _ read r 60001 nfs3-tcp 0 *
Fri Aug 24 14:19:45 2007 0 client 1873 /usr/include/deflt.h b _ read r 60001 nfs3-tcp 0 *

Now the question is shall I use nfslogd to capture all the NFS traffic, but the problem is it does not support NFSv4 and I will have to ensure all my NFS clients talk v3. BSM way or NFS log way for logging? That depends on whether you are logging for audit or just logging.

Labels: ,

Thursday, August 23, 2007

File Logging in Solaris, the NFS way (almost) and the BSM way

One of my potential customers wanted to implement grid computing with their existing servers. Two issues they would like us to address.
  1. Each design team can only access their own design files
  2. All access of files from the central storage have to be captured to ensure design is kept within the team.

Item 1 is pretty easy to address with the standard ACL in Solaris, or any other flavours of UNIX

Item 2. Two possible solutions to this issue. First approach, NFS logging. I know Solaris NFS comes with the capability of logging all the NFS access. All we need is to share out the directory path with logging enable (share -o log=global /some/directory). The nfslog.conf will tell where the log file will go, default /var/nfs

# more /etc/nfs/nfslog.conf

#ident  "@(#)nfslog.conf        1.5     99/02/21 SMI"
#
# Copyright (c) 1999 by Sun Microsystems, Inc.
# All rights reserved.
#
# NFS server log configuration file.
#
# <tag> [ defaultdir=<dir_path> ] \
#       [ log=<logfile_path> ] [ fhtable=<table_path> ] \
#       [ buffer=<bufferfile_path> ] [ logformat=basic|extended ]
#

global  defaultdir=/var/nfs \
    log=nfslog fhtable=fhtable buffer=nfslog_workbuffer

There is a catch. In the man page of nfslogd, it say NFS logging is not supported for NFS version 4.

# man nfslogd

System Administration Commands                        nfslogd(1M)

NAME
 nfslogd - nfs logging daemon

SYNOPSIS
 /usr/lib/nfs/nfslogd

DESCRIPTION
 The nfslogd  daemon  provides  operational  logging  to  the
 Solaris  NFS  server. It is the nfslogd daemon's job to gen-
 erate the activity log by analyzing the RPC operations  pro-
 cessed  by  the  NFS server.  The log will only be generated
 for file systems exported  with  logging  enabled.  This  is
 specified  at  file  system  export  time  by  means  of the
 share_nfs(1M) command.

 NFS server logging is not supported on Solaris machines that
 are using NFS Version 4.

...
OK, I can still fall back to NFS version 3. However, when I enable nfs server (/etc/init.d/nfs.server or svcadm enable svc:/network/nfs/server:default) and have another server to mount using nfsv3 (mount -o vers=3 ...), I can see files in the /var/nfs directory are growing, but most of them are binary files not ascii
# ls -l /var/nfs
total 13502
-rw-r-----   1 root     root           0 Aug 23 11:25 fhtable.0000000000000000.dir
-rw-r-----   1 root     root        1024 Aug 23 11:25 fhtable.0000000000000000.pag
-rw-r-----   1 root     root        4096 Aug 23 11:57 fhtable.0154002000000002.dir
-rw-r-----   1 root     root     8368128 Aug 23 17:31 fhtable.0154002000000002.pag
-rw-r-----   1 root     root           0 Aug 23 11:25 nfslog
-rw-------   1 root     root     2309148 Aug 23 17:44 nfslog_workbuffer_log_in_process
drwxr-xr-x   2 daemon   daemon       512 Mar 21 10:21 v4_oldstate
drwxr-xr-x   2 daemon   daemon       512 Aug 23 11:30 v4_state

# file /var/nfs/*
/var/nfs/fhtable.0000000000000000.dir:  empty file
/var/nfs/fhtable.0000000000000000.pag:  data
/var/nfs/fhtable.0154002000000002.dir:  data
/var/nfs/fhtable.0154002000000002.pag:  data
/var/nfs/nfslog:        empty file
/var/nfs/nfslog_workbuffer_log_in_process:      data
/var/nfs/v4_oldstate:   directory
/var/nfs/v4_state:      directory
All these files seem to be binary in nature and very likely they are in Berkeley DB format. I tried to use perl dbmopen to read it and unpack the key-value, but all I got are garbage. I suppose I need to fully understand the struct of that entry. To my surprise, there isn't any utility in the OS that allow me to view the content of the file. Also, I had no luck at all finding a solution on the Internet. Anyway, I am hitting a dead end and all I can do is to post it to the Sun folk for a solution.

Second approach that I wanted to try is Basic Security Module (BSM). Although this is meant for conforming to the US C2 security audit requirements, it does log access of files (any files, including sharable libraries) in the system. Anyway, I just want to run it to see what it can offer.

cd /etc/security
./bsmconv
Once bsmconv is activated, you need to reboot the server in oder for the bsm audit kernel module to be loaded in the next start up. The log file is located in /var/audit directory by default
$ ls  /var/audit
20070821022314.20070821022315.chihung  20070821023545.20070821023546.chihung
20070821022315.20070821023344.chihung  20070821023546.20070821024602.chihung

In BSM, it comes with a utility (praudit) to view the content. If you read up the man page, there are a couple of flags in this command. In particular, the '-l' flag and the '-x' flag are the one that I wanted to talk about. '-l' gives you one line per record so that you can use any of the standard Solaris utilities like sed/awk/cut to filter out the information you want.

This is a sample output of praudit -l with fm/fa/fc/fd (file modified/access/create/delete) audit turned on. See /etc/security/audit_class for a full list.

file,2007-08-21 10:35:46.111 +08:00,/var/audit/20070821023545.20070821023546.chihung
header,44,2,system booted,na,2007-08-21 10:34:49.704 +08:00,text,booting kernel
header,135,2,stat(2),,chihung,2007-08-21 10:39:22.700 +08:00,path,/usr/lib/pt_chmod,attribute,104511,root,bin,85,623,0,subject,chihung,root,staff,chihung,staff,704,2477344358,756 65558 ftpl_2_207,return,success,0
header,126,2,stat(2),fe,chihung,2007-08-21 10:39:22.727 +08:00,path,/platform/SUNW,UltraSPARC-IIi-cEngine/lib,subject,chihung,root,staff,chihung,staff,704,2477344358,756 65558 ftpl_2_207,return,failure: No such file or directory,-1
header,149,2,access(2),,chihung,2007-08-21 10:39:22.737 +08:00,path,/dev/pts/devices/pseudo/pts@0:1,attribute,20620,root,tty,335,12582918,103079215105,subject,chihung,root,staff,chihung,staff,704,2477344358,756 65558 ftpl_2_207,return,success,0
header,149,2,pathconf(2),,chihung,2007-08-21 10:39:22.737 +08:00,path,/dev/pts/devices/pseudo/pts@0:1,attribute,20620,root,root,335,12582918,103079215105,subject,chihung,root,staff,chihung,staff,704,2477344358,756 65558 ftpl_2_207,return,success,1
header,149,2,access(2),,chihung,2007-08-21 10:39:22.740 +08:00,path,/dev/pts/devices/pseudo/pts@0:1,attribute,20620,chihung,tty,335,12582918,103079215105,subject,chihung,chihung,staff,chihung,staff,703,2477344358,756
...
If you were to analyse the number of fields (FS) per record, you realise that they are not consistent. FS ranges from 3,8,19,21,22,28,29,30,31,32. Ability to extract the right field for analysis is going to be a nightmare.

How about the 'praudit -x' flag. Basically it will output the information in XML format. This is cool and XML is definitely my friend. Let's see the output:

<?xml version='1.0' encoding='UTF-8' ?>
<?xml-stylesheet type='text/xsl' href='file:///usr/share/lib/xml/style/adt_record.xsl.1' ?>

<!DOCTYPE audit PUBLIC '-//Sun Microsystems, Inc.//DTD Audit V1//EN' 'file:///usr/share/lib/xml/dtd/adt_record.dtd.1'>

<audit>
<file iso8601="2007-08-21 10:35:46.111 +08:00">/var/audit/20070821023545.20070821023546.chihung</file>
<record version="2" event="system booted" modifier="na" iso8601="2007-08-21 10:34:49.704 +08:00">
<text>booting kernel</text>
</record>
<record version="2" event="stat(2)" host="chihung" iso8601="2007-08-21 10:39:22.700 +08:00">
<path>/usr/lib/pt_chmod</path>
<attribute mode="104511" uid="root" gid="bin" fsid="85" nodeid="623" device="0"/>
<subject audit-uid="chihung" uid="root" gid="staff" ruid="chihung" rgid="staff" pid="704" sid="2477344358" tid="756 65558 ftpl_2_207"/>
<return errval="success" retval="0"/>
</record>
<record version="2" event="stat(2)" modifier="fe" host="chihung" iso8601="2007-08-21 10:39:22.727 +08:00">
<path>/platform/SUNW,UltraSPARC-IIi-cEngine/lib</path>
<subject audit-uid="chihung" uid="root" gid="staff" ruid="chihung" rgid="staff" pid="704" sid="2477344358" tid="756 65558 ftpl_2_207"/>
<return errval="failure: No such file or directory" retval="-1"/>
</record>
<record version="2" event="access(2)" host="chihung" iso8601="2007-08-21 10:39:22.737 +08:00">
<path>/dev/pts/devices/pseudo/pts@0:1</path>
<attribute mode="20620" uid="root" gid="tty" fsid="335" nodeid="12582918" device="103079215105"/>
<subject audit-uid="chihung" uid="root" gid="staff" ruid="chihung" rgid="staff" pid="704" sid="2477344358" tid="756 65558 ftpl_2_207"/>
<return errval="success" retval="0"/>
It even comes with a XML stylesheet that we can apply. Here is the screen dump of the html after running (xsltproc adt_record.xsl.1 audit.xml > audit-xml.html)

With XML, I can pick up a lot of interesting things from the data. Say I want to find out who access files that do not belong to them. With DOM implementation in Tcl (tDOM), I can script it like this

package require tdom
set doc [dom parse [tDOM::xmlReadFile audit.xml]]
set root [$doc documentElement]
foreach i [$root selectNodes {//attribute[@uid != string(../subject/@uid)]}] {
puts [[$i selectNodes ../path/text()] nodeValue]
}

Until I can resolve the NFS logging, it seems BSM is going to be the answer. Bear in mind that BSM generates tonnes of data. If I were to implement this approach, I will definitely allocate a lot of disk space (100+GB) and make sure this is in a separate disk to avoid too many IO activities in the OS disk (/var)

Labels: , , ,

Saturday, August 18, 2007

Which Alphabet Occurs Most in English Words ?

The other day my son was asking me about a Chinese character question and I posted back another question to him to think about. At that time I have no clue what the answer was. The question is like this, which English alphabet has the highest frequency of occurrence in English words.

Back in my mind I know I can write a script to find out and my initial guess is character 'c' or character 's'. Wanna guess too ?

The answer against /usr/dict/words in Solaris 10 (with 25143 words, with a total of 181519 characters):

e 20079
a 16403
i 13954
r 13410
t 12778
o 12692
n 12055
s 10161
l 10023
c 8207
u 6465
m 5815
d 5758
p 5507
h 5172
g 4119
b 4108
y 3618
f 2657
w 1946
k 1922
v 1883
x 613
z 429
j 426
q 375

I am testing it against all the words with at least 2 alphabets. While I was doing this, I realised that Solaris version of awk/nawk's "split" cannot work with null FS (field separator). I had to use "substr" function to split the word into individual character. Below is the script:

#! /bin/sh


nawk '
/^[a-zA-Z][a-zA-Z]+$/ {
        word=tolower($0)

        # in Solaris, it does not support null as FS
        # n=split(word,a,"")

        n=length(word)
        for(i=1;i<=n;++i) {
                char=substr(word,i,1)
                ++stat[char]
        }
}
END {
        for(i in stat) {
                print i, stat[i]
        }
}' /usr/dict/words | sort -n -r -k 2

Now I know letter 'e' occurs 20079 times in 25143 words, so what is the percentage of occurrence ? Simple, just modify the above script to keep track of the sum of all alphabets and print out the percentage towards the end.

#! /bin/sh


nawk '
/^[a-zA-Z][a-zA-Z]+$/ {
        word=tolower($0)

        # in Solaris, it does not support null as FS
        # n=split(word,a,"")

        n=length(word)
        sum+=n
        for(i=1;i<=n;++i) {
                char=substr(word,i,1)
                ++stat[char]
        }
}
END {
        for(i in stat) {
                printf("%s %i %.2f%\n",i,stat[i],100*stat[i]/sum)
        }
}' /usr/dict/words | sort -n -r -k 2

Result is:

e 20097 11.11%
a 16426 9.08%
i 13976 7.73%
r 13420 7.42%
t 12793 7.07%
o 12711 7.03%
n 12073 6.68%
s 10173 5.63%
l 10030 5.55%
c 8217 4.54%
u 6476 3.58%
m 5832 3.22%
d 5770 3.19%
p 5515 3.05%
h 5180 2.86%
g 4124 2.28%
b 4112 2.27%
y 3624 2.00%
f 2664 1.47%
w 1955 1.08%
k 1928 1.07%
v 1890 1.05%
x 619 0.34%
z 431 0.24%
j 429 0.24%
q 376 0.21%

Alphabet 'e' occurs 11.11% in English words.

Just rebooted to try the same script (change nawk to awk) on my Fedora Core 5 with 479625 words 4471395 characters (/usr/share/dict/words). The answer is still alphabet 'e'. Suprisingly, alphabet 'q' occurs less than 'x'

e 421846 10.75%
i 346205 8.82%
a 341938 8.71%
n 283216 7.22%
s 279915 7.13%
o 278915 7.11%
r 276519 7.05%
t 252848 6.44%
l 222301 5.67%
c 168539 4.30%
u 145006 3.70%
d 126998 3.24%
p 122902 3.13%
m 119398 3.04%
h 106289 2.71%
g 92384 2.35%
y 77498 1.98%
b 74046 1.89%
f 44212 1.13%
v 38780 0.99%
k 33997 0.87%
w 27340 0.70%
z 17233 0.44%
x 11464 0.29%
j 7287 0.19%
q 6541 0.17%

Labels: ,

Tuesday, August 14, 2007

For Every User, Do ...

If you want to run certain command for every user (including system users), you can do this:
for i in `cut -d: -f1 /etc/passwd`
do
    /run/this/command $i
done

If you want to do it in AWK way to loop through every user, you need to construct the command before giving it to 'system' within AWK. It is definitely a lot more involve than using the 'for' loop within the shell. Anyway, this is the AWK way:

awk -F":" '
{
    cmd=sprintf("/run/this/command %s",$1)
    system(cmd)
}' /etc/passwd
Remember if you are running the above on Solaris, change awk to nawk because the awk does not support 'system' function.

How about if you want to run the command for non-system users, then AWK has a niche over shell. In most of UNIX systems, system users are normally having user id small than 100. You can modify the above AWK command to achieve that

awk -F":" '
$3 >= 100 {
    cmd=sprintf("/run/this/command %s",$1)
    system(cmd)
}' /etc/passwd/etc/passwd

IMHO, awk is definitely worth your time to learn.

Labels: ,

Friday, August 03, 2007

Sort Files Appended with "DD-Mth-YYYY"

If you think making a copy of whatever file just by appending the date, you may want to use the date format like yyyy-mm-dd (2007-07-07) rather than dd-Mth-yyyy (7-Jul-2007). Imagine you have tonnes of these files and you want to sort them based on date in the filename, but not in alphabetically order. You are better off appending the date in "yyyy-mm-dd" format than "dd-Mth-yyyy".

OK, how can I sort these "dd-Mth-yyyy" files based on the date, but not alphabetical order.

Let's create lots of files with this notation. You can do this in Bash shell

$ touch index.php.{1,23,17,29,12,4,6,8}-{jul,aug,jun,apr,may,oct}-2007

$ ls
index.php.1-apr-2007   index.php.17-may-2007  index.php.4-jul-2007
index.php.1-aug-2007   index.php.17-oct-2007  index.php.4-jun-2007
index.php.1-jul-2007   index.php.23-apr-2007  index.php.4-may-2007
index.php.1-jun-2007   index.php.23-aug-2007  index.php.4-oct-2007
index.php.1-may-2007   index.php.23-jul-2007  index.php.6-apr-2007
index.php.1-oct-2007   index.php.23-jun-2007  index.php.6-aug-2007
index.php.12-apr-2007  index.php.23-may-2007  index.php.6-jul-2007
index.php.12-aug-2007  index.php.23-oct-2007  index.php.6-jun-2007
index.php.12-jul-2007  index.php.29-apr-2007  index.php.6-may-2007
index.php.12-jun-2007  index.php.29-aug-2007  index.php.6-oct-2007
index.php.12-may-2007  index.php.29-jul-2007  index.php.8-apr-2007
index.php.12-oct-2007  index.php.29-jun-2007  index.php.8-aug-2007
index.php.17-apr-2007  index.php.29-may-2007  index.php.8-jul-2007
index.php.17-aug-2007  index.php.29-oct-2007  index.php.8-jun-2007
index.php.17-jul-2007  index.php.4-apr-2007   index.php.8-may-2007
index.php.17-jun-2007  index.php.4-aug-2007   index.php.8-oct-2007

$ echo $SHELL
/bin/bash

Let's use the normal sorting tool to see what do we get

$ ls -1 | sort
index.php.1-apr-2007
index.php.1-aug-2007
index.php.1-jul-2007
index.php.1-jun-2007
index.php.1-may-2007
index.php.1-oct-2007
index.php.12-apr-2007
index.php.12-aug-2007
index.php.12-jul-2007
index.php.12-jun-2007
index.php.12-may-2007
index.php.12-oct-2007
index.php.17-apr-2007
index.php.17-aug-2007
index.php.17-jul-2007
index.php.17-jun-2007
index.php.17-may-2007
index.php.17-oct-2007
index.php.23-apr-2007
index.php.23-aug-2007
index.php.23-jul-2007
index.php.23-jun-2007
index.php.23-may-2007
index.php.23-oct-2007
index.php.29-apr-2007
index.php.29-aug-2007
index.php.29-jul-2007
index.php.29-jun-2007
index.php.29-may-2007
index.php.29-oct-2007
index.php.4-apr-2007
index.php.4-aug-2007
index.php.4-jul-2007
index.php.4-jun-2007
index.php.4-may-2007
index.php.4-oct-2007
index.php.6-apr-2007
index.php.6-aug-2007
index.php.6-jul-2007
index.php.6-jun-2007
index.php.6-may-2007
index.php.6-oct-2007
index.php.8-apr-2007
index.php.8-aug-2007
index.php.8-jul-2007
index.php.8-jun-2007
index.php.8-may-2007
index.php.8-oct-2007

Hey, that's not I want. I know. Below script will do the sort properly. What it does is to create a hash array in AWK with "yyyy-mm-dd" as the index and the original file name as the value. At the end of the AWK, print out both the index as well as the value. Sort the result based on the index and pipe the sorted result to AWK to print the second field which is the original file name. The result will be file name sorted based on the date.

$ cat sort2.sh
#! /bin/sh


if [ $# -ne 1 ]; then
      echo "Usage: $0 "
      exit 1
fi
prefix=$1


PATH=/usr/bin:/bin
export PATH


ls -1 | awk '
BEGIN {
      map["jan"]="01"; map["feb"]="02"; map["mar"]="03";
      map["apr"]="04"; map["may"]="05"; map["jun"]="06";
      map["jul"]="07"; map["aug"]="08"; map["sep"]="09";
      map["oct"]="10"; map["nov"]="11"; map["dec"]="12";
}
/^'$prefix'/ {
      filename=$0
      sub("^'$prefix'","",filename)
      split(filename,a,"-")
      mth=tolower(a[2])
      ind=sprintf("%s-%s-%02d",a[3],map[mth],a[1])
      sort[ind]=$0
}
END {
      for (i in sort) {
              print i, sort[i]
      }
}' | sort -k 1,1 | awk '{print $2}'

$ ./sort.sh index.php.
index.php.1-apr-2007
index.php.4-apr-2007
index.php.6-apr-2007
index.php.8-apr-2007
index.php.12-apr-2007
index.php.17-apr-2007
index.php.23-apr-2007
index.php.29-apr-2007
index.php.1-aug-2007
index.php.4-aug-2007
index.php.6-aug-2007
index.php.8-aug-2007
index.php.12-aug-2007
index.php.17-aug-2007
index.php.23-aug-2007
index.php.29-aug-2007
index.php.1-jul-2007
index.php.4-jul-2007
index.php.6-jul-2007
index.php.8-jul-2007
index.php.12-jul-2007
index.php.17-jul-2007
index.php.23-jul-2007
index.php.29-jul-2007
index.php.1-jun-2007
index.php.4-jun-2007
index.php.6-jun-2007
index.php.8-jun-2007
index.php.12-jun-2007
index.php.17-jun-2007
index.php.23-jun-2007
index.php.29-jun-2007
index.php.1-may-2007
index.php.4-may-2007
index.php.6-may-2007
index.php.8-may-2007
index.php.12-may-2007
index.php.17-may-2007
index.php.23-may-2007
index.php.29-may-2007
index.php.1-oct-2007
index.php.4-oct-2007
index.php.6-oct-2007
index.php.8-oct-2007
index.php.12-oct-2007
index.php.17-oct-2007
index.php.23-oct-2007
index.php.29-oct-2007

BTW, all the above were tested on my Cygwin. In case you are not familiar with AWK, do not use "index" as the variable name, AWK will complain without telling you that "index" is actually a function with AWK. Initially I used "index", now is being replaced with "ind"

$ ./sort.sh index.php.
awk: cmd. line:18:      index=sprintf("%s-%s-%02d",a[3],mth,a[1])
awk: cmd. line:18:           ^ syntax error
awk: cmd. line:19:      sort[index]=$0
awk: cmd. line:19:                ^ syntax error
awk: cmd. line:19: fatal: invalid subscript expression

Labels: ,

Wednesday, August 01, 2007

Shell Programming

With regard to yesterday's blog, you may want to put that in a script so that you do not have to re-type commands all the time.

So you first cut may be like this:

egrep '^ch' /etc/passwd | cut -f7 -d: | grep -c /bin/ksh

However, this script is not very flexible. First you did not tell the script what shell you want to run with. Second, you assume user has all the search path to be the same as your environment. Thirdly, you have the search string hard coded. OK, lots of room for improvement. Let's give this script a make-over then, we call this script "fshell.sh:

#! /bin/sh

if [ $# -ne 2 ]; then
   echo "Usage: $0 &lgt;pattern> <shell>"
   exit 1
fi
pattern=$1
shell=$2

# set up search path
PATH=/usr/bin
export PATH

egrep "^$pattern" /etc/passwd | cut -f7 -d: | grep -c "$shell"

Now you have a very generic script that can handle different pattern and shell. Let run it:

$ ./fshell.sh
Usage: ./a.sh <pattern> <shell>

$ ./fshell.sh ch /bin/ksh
4

$ ./fshell.sh ch /bin/sh
1
Let me provide you with some explanation. Line 1: Tell the system that this script (fshell.sh) is written in Bourne Shell (/bin/sh) syntax Line 3-8: Check if you supply with 2 arguments, namely the pattern and shell. If not, exit with status 1, else set the variables pattern and shell accordingly Line 10-11: Set up the search path environment variable. We do not want to use user's search path Line 13: The commands, this return the number of match.

Not too difficult to digest, isn't it. Next time when you want to write a simple script, do follow this approach.

Labels: ,