Thursday, November 19, 2009

The AWK Way

Today I was given the task of converting few hundred files (743 to be exact) into CSV format. The filename is prefixed with hostname with a fix suffix and the content contains all the local user names. The task is to put them in rows with hostname in the 1st column and usernames in the 2nd column onwards. One more requirement is to exclude a few users in the output. My initial solution is very much unix shell script-based. Although this is an one-off 'throw-away' solution, it is pretty inefficient because there is a lot of process creation within a for loop. It took 1 min 39.453 sec. After some thoughts, I reckoned it is possible to do it efficiently in just AWK. With the help of some of the built-in variables like FILENAME / NR / FNR, we can process all the input files within a single AWK code. The below code works in Cygwin. The runtime for the AWK code is 2.797 sec, that's 35 times faster !
$ ls *txt
host1_root.txt  host2_root.txt  host3_root.txt  host4_root.txt

$ paste *txt
usera   usere   userm   userx
userb   userx   userx   userw
userc   userf   usern   usery
userd   userg   usero   userz
userdx  usery   userp
userdy  userh   userx
        userz   userq
        useri   userqx
        userj   userr
        userk   userz
        userl   users
        userx   usert
                usery

$ cat a.awk
#! /usr/bin/awk -f


BEGIN {
        suffix="_root.txt"
        len=length(suffix)
}
#
# print CR if first line in input file except first file
FNR==1 && NR>1 {
        printf("\n")
}
#
# print hostname
FNR==1 {
        host=substr(FILENAME, 0, length(FILENAME)-len)
        printf("%s", host)
}
#
# print users, but exclude certain users
$0 !~ /^(userx|usery|userz)$/ {
        printf(",%s", $0)
}


$ ./a.awk *.txt
host1,usera,userb,userc,userd,userdx,userdy
host2,usere,userf,userg,userh,useri,userj,userk,userl
host3,userm,usern,usero,userp,userq,userqx,userr,users,usert
host4,userw

Labels: , ,

Saturday, November 07, 2009

Finding Newer Files

Although find provides flags to locate files newer than certain days (-mtime -2), it is not fine grain enough to allow user to specify based on user-defined timestamp. Below script will touch a temporary file with specific timestamp and find will make use of it as a reference by using the "-newer" flag. FYI, this has been tested on Solaris

#! /bin/ksh

if [ $# -ne 2 ]; then
        echo "Usage: $0 directory time"
        echo "       time - a decimal number of the form: [[CC]YY]MMDDhhmm[.SS]"
        exit 1
fi
directory=$1
timestamp=$2


if [ ! -d $directory ]; then
        echo "Error. Directory $directory does not exist"
        exit 2
fi


tmpfile="/tmp/${0##*/}-$$"
touch -t $timestamp $tmpfile > /dev/null 2>&1
if [ $? -ne 0 ]; then
        echo "Error. Incorrect time format [[CC]YY]MMDDhhmm[.SS]"
        exit 2
fi


trap "rm -f $tmpfile; exit 0" EXIT


find $directory -newer $tmpfile -local -mount -type f

Labels: ,