Thursday, March 25, 2010

File Uploading, A Perl+Python Approach

I created a download server some time ago and always wanted to come up with something similar for uploading.

My initial approach avoided the whole issue by running the download server from the opposite end, but this approach is not very flexible 'cos user requires Python to be installed in their desktop.

My second approach was to develop a CGI program (upload.cgi) based on the established Perl CGI.pm module. The below CGI script is launched via a running Apache web server. However, we have to depend on the existence of Apache web server.

#! /usr/bin/perl

use CGI;
use CGI::Carp qw ( fatalsToBrowser );
use File::Basename;

if ( $ENV{"REQUEST_METHOD"} eq "GET" ) {
        print "Content-type: text/html\n\n";
        print "<html><head><title>Upload</title></head><body>";
        printf("<form action=%s method=POST ENCTYPE=multipart/form-data>", $ENV{'SCRIPT_NAME'});
        print "<input type=file name=upfile> <input type=submit value=Upload></form>";
        print "</body></html>";
} else {
        my $safe_filename_characters = "a-zA-Z0-9_.-";
        my $upload_dir = "/export/software/upload";

        my $query = new CGI;
        my $fname = $query->param("upfile");


        $fname =~ s/\\/\//g;
        my ( $name, $path, $extension ) = fileparse ( $fname, '\..*' );
        $filename = $name . $extension;
        $filename =~ tr/ /_/;
        $filename =~ s/[^$safe_filename_characters]//g;


        my $upload_filehandle = $query->upload("upfile");

        open ( UPLOADFILE, ">$upload_dir/$filename" ) or die "$!";
        binmode UPLOADFILE;

        print "Content-type: text/html\n\n";
        print "<html><head><title>Upload</title></head><body>";

        while ( <$upload_filehandle> )
        {
        print "...";
         print UPLOADFILE;
        }

        close UPLOADFILE;

        printf("Your file is in /export/software/upload/%s (%s)", $filename, $fname);
        print "</body></html>";
}

My third approach was to combine Perl and Python in a single Python script. Python comes with a simple and workable CGIHTTPServer module that allows me to run a self-contained web server that can server CGI script. With the below script (uploadserver), it will dynamically create a random cgi-bin directory with the Perl upload.cgi script. An index.html is also dynamically created that can do auto-redirection to the cgi-bin/upload.cgi. With this script, I can launch the upload server anywhere and able to upload file via a web browser.

#! /usr/local/bin/python


import BaseHTTPServer
import CGIHTTPServer
import tempfile
import os
import random
import platform


#
# standalone CGI perl script for uploading.
# escaped all the backslashes
#
perlscript="""#! /usr/bin/perl


use CGI;
use CGI::Carp qw ( fatalsToBrowser );
use File::Basename;


if ( $ENV{"REQUEST_METHOD"} eq "GET" ) {
        print "Content-type: text/html\\n\\n";
        print "<html><head><title>Upload</title></head><body>";
        printf("<form action=%s method=POST ENCTYPE=multipart/form-data>", $ENV{'SCRIPT_NAME'});
        print "<input type=file name=upfile> <input type=submit value=Upload></form>";
        print "</body></html>";
} else {
        my $safe_filename_characters = "a-zA-Z0-9_.-";
        my $upload_dir = ".";
        my $query = new CGI;
        my $fname = $query->param("upfile");


        #
        # convert backslash to forward flash, remove un-safe characters in
filename
        #
        $fname =~ s/\\\\/\\//g;
        my ( $name, $path, $extension ) = fileparse ( $fname, '\\..*' );
        $filename = $name . $extension;
        $filename =~ tr/ /_/;
        $filename =~ s/[^$safe_filename_characters]//g;


        my $upload_filehandle = $query->upload("upfile");
        open ( UPLOADFILE, ">$upload_dir/$filename" ) or die "$!";
        binmode UPLOADFILE;
        while ( <$upload_filehandle> )
        {
                print UPLOADFILE;
        }
        close UPLOADFILE;

        print "Content-type: text/html\\n\\n";
        print "<html><head><title>Upload</title></head><body>";
        printf("Your file '%s' has been successfully uploaded", $filename,);
        print "</body></html>";
}
"""


if not os.path.isfile('index.html'):
        try:
                tempdir = tempfile.mkdtemp(dir='.')[2:]
                os.chmod(tempdir, 0755)
                uploadcgi = tempdir + '/upload.cgi'
                cgidir = '/' + uploadcgi

                # write index.html
                fp = open('index.html', 'w')
                refresh = '<html><head><meta http-equiv="REFRESH" content="0;url=/%s"></head><body></body></html>' % uploadcgi
                fp.write(refresh)
                fp.close()

                # create cgi script
                fp = open(uploadcgi, 'w')
                fp.write(perlscript)
                fp.close()
                os.chmod(uploadcgi, 0755)

                # change the internal settings
                CGIHTTPServer.CGIHTTPRequestHandler.cgi_directories = [cgidir]
                CGIHTTPServer.nobody = os.getuid()

                port = random.randint(50000,60000)
                url = "http://%s:%d/" % (platform.node(), port)
                httpd = BaseHTTPServer.HTTPServer(('',port), CGIHTTPServer.CGIHTTPRequestHandler)
                print "Ask user to visit this URL:\n\t%s" % url
                httpd.serve_forever()

        except:
                # clean up
                os.remove('index.html')
                os.remove(uploadcgi)
                os.rmdir(tempdir)

else:
                print 'Error. index.html exists'

# ./uploadserver
Ask user to visit this URL:
        http://user-PC:52233/

When you point your web server to the able URL, it will auto redirect.

Labels: , , ,

Tuesday, March 23, 2010

Older == Not Newer

Quite often we need to clean up old files in directory, but the criteria is often based on ctime/atime/mtime in find with unit in days. In certain situation, this may not be fine-grain enough to locate files. In find, you can specify the -newer flag to locate files newer than the file reference. However, find does not have -older. What you can do is to specify 'not newer' to represent 'older':
\( ! -newer reference \)

Here I created a lot of files (4320) with file name based on time stamp and tried to use this trick to delete files older than the reference (log-200912301450). See my previous post in how I use the -newer flag too.



# yr=2009

# for mth in `seq -w 7 12`
do
    for day in `seq -w 1 30`
    do
        for hr in `seq -w 2 6 24`
        do
                for min in `seq -w 0 10 50`
                do
                        touch -t $yr$mth$day$hr$min log-$yr$mth$day$hr$min
                done
        done
    done
done

# ls | wc -l
4320

# ls -lrt | head
total 0
-rw-r--r-- 1 root root 0 2009-07-01 02:00 log-200907010200
-rw-r--r-- 1 root root 0 2009-07-01 02:10 log-200907010210
-rw-r--r-- 1 root root 0 2009-07-01 02:20 log-200907010220
-rw-r--r-- 1 root root 0 2009-07-01 02:30 log-200907010230
-rw-r--r-- 1 root root 0 2009-07-01 02:40 log-200907010240
-rw-r--r-- 1 root root 0 2009-07-01 02:50 log-200907010250
-rw-r--r-- 1 root root 0 2009-07-01 08:00 log-200907010800
-rw-r--r-- 1 root root 0 2009-07-01 08:10 log-200907010810
-rw-r--r-- 1 root root 0 2009-07-01 08:20 log-200907010820

# ls -lrt | tail
-rw-r--r-- 1 root root 0 2009-12-30 14:20 log-200912301420
-rw-r--r-- 1 root root 0 2009-12-30 14:30 log-200912301430
-rw-r--r-- 1 root root 0 2009-12-30 14:40 log-200912301440
-rw-r--r-- 1 root root 0 2009-12-30 14:50 log-200912301450
-rw-r--r-- 1 root root 0 2009-12-30 20:00 log-200912302000
-rw-r--r-- 1 root root 0 2009-12-30 20:10 log-200912302010
-rw-r--r-- 1 root root 0 2009-12-30 20:20 log-200912302020
-rw-r--r-- 1 root root 0 2009-12-30 20:30 log-200912302030
-rw-r--r-- 1 root root 0 2009-12-30 20:40 log-200912302040
-rw-r--r-- 1 root root 0 2009-12-30 20:50 log-200912302050

# find . \( ! -newer log-200912301450 \) -type f -exec rm -f {} \;

# ls -lrt
total 0
-rw-r--r-- 1 root root 0 2009-12-30 20:00 log-200912302000
-rw-r--r-- 1 root root 0 2009-12-30 20:10 log-200912302010
-rw-r--r-- 1 root root 0 2009-12-30 20:20 log-200912302020
-rw-r--r-- 1 root root 0 2009-12-30 20:30 log-200912302030
-rw-r--r-- 1 root root 0 2009-12-30 20:40 log-200912302040
-rw-r--r-- 1 root root 0 2009-12-30 20:50 log-200912302050

Labels: