Monday, October 22, 2012

Remove Hundreds of Thousands of Files, take 3

After blogged about using find to remove files, I realised that there are thousands of error in execve system calls. By simulating the same scenario with just 2 files, I understand that the search PATH is the culprit.
[pid 29481] execve("/usr/lib/lightdm/lightdm/rm", ["rm", "-f", "./somefiles-a"], [/* 50 vars */]) = -1 ENOENT (No such file or directory)
[pid 29481] execve("/usr/local/sbin/rm", ["rm", "-f", "./somefiles-a"], [/* 50 vars */]) = -1 ENOENT (No such file or directory)
[pid 29481] execve("/usr/local/bin/rm", ["rm", "-f", "./somefiles-a"], [/* 50 vars */]) = -1 ENOENT (No such file or directory)
[pid 29481] execve("/usr/sbin/rm", ["rm", "-f", "./somefiles-a"], [/* 50 vars */]) = -1 ENOENT (No such file or directory)
[pid 29481] execve("/usr/bin/rm", ["rm", "-f", "./somefiles-a"], [/* 50 vars */]) = -1 ENOENT (No such file or directory)
[pid 29481] execve("/sbin/rm", ["rm", "-f", "./somefiles-a"], [/* 50 vars */]) = -1 ENOENT (No such file or directory)
[pid 29481] execve("/bin/rm", ["rm", "-f", "./somefiles-a"], [/* 50 vars */]) = 0

The amount of system calls is reduced (especially errors) if I specify the correct full path of 'rm'.

$ touch somefiles-{a..z}{a..z}{a..z}

$strace -cf find . -name "somefiles-*" -exec /bin/rm -f {} \;
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.59   36.418146        2072     17576           waitpid
  2.50    0.940861          54     17576           clone
  0.72    0.271665          15     17576           unlinkat
  0.07    0.025220           0    105467           close
  0.03    0.012440         541        23           getdents64
  0.03    0.011857           1     17577           fstatat64
  0.02    0.009425           1     17576     17576 _llseek
  0.01    0.005058           0     52737           open
  0.01    0.004589           0    140626           mmap2
  0.01    0.003785           0     17577           ioctl
  0.00    0.001053           0     52757           brk
  0.00    0.001028           0     52735           fstat64
  0.00    0.000388           0     17577           munmap
  0.00    0.000245           0     70311           mprotect
  0.00    0.000000           0     17580           read
  0.00    0.000000           0     17577           execve
  0.00    0.000000           0     52734     52734 access
  0.00    0.000000           0         1           gettimeofday
  0.00    0.000000           0         2           uname
  0.00    0.000000           0     17581           fchdir
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           getrlimit
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0     17577           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           openat
  0.00    0.000000           0     17577           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00   37.705760                738330     70311 total

Another way to further reduce the amount of system calls as well as run time is to take advantage of 'rm' ability to take more than one file as argument. Getting 'find' output to pipe to 'xargs -L 10 /bin/rm -f', we are able to ask 'rm' to remove 10 files at a time. You can see the mass reduction in system calls and run time.

$ cat rm.sh
#! /bin/bash

find . -name "somefiles-*" | xargs -L 10 /bin/rm -f


$ touch somefiles-{a..z}{a..z}{a..z}

$ strace -cf ./rm.sh
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.63    9.880623        5611      1761         1 waitpid
  0.98    0.097799           6     17576           unlinkat
  0.18    0.018443          10      1760           clone
  0.12    0.011728         510        23           getdents64
  0.08    0.008373           0     17577           fstatat64
  0.00    0.000331           0      7058         4 open
  0.00    0.000312           0      1842           read
  0.00    0.000289           0     14108           mmap2
  0.00    0.000041           0      1761      1760 ioctl
  0.00    0.000039           0     12342         2 close
  0.00    0.000000           0        69           write
  0.00    0.000000           0      1761           execve
  0.00    0.000000           0         1           time
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0      5296      5288 access
  0.00    0.000000           0         1           pipe
  0.00    0.000000           0      5323           brk
  0.00    0.000000           0         3           dup2
  0.00    0.000000           0         1           getppid
  0.00    0.000000           0         1           getpgrp
  0.00    0.000000           0         2           gettimeofday
  0.00    0.000000           0      1765           munmap
  0.00    0.000000           0         1           sigreturn
  0.00    0.000000           0         3           uname
  0.00    0.000000           0      7049           mprotect
  0.00    0.000000           0         5           fchdir
  0.00    0.000000           0      1761           _llseek
  0.00    0.000000           0        25           rt_sigaction
  0.00    0.000000           0        21           rt_sigprocmask
  0.00    0.000000           0         5           getrlimit
  0.00    0.000000           0        24         8 stat64
  0.00    0.000000           0      5295           fstat64
  0.00    0.000000           0         9           getuid32
  0.00    0.000000           0         9           getgid32
  0.00    0.000000           0         9           geteuid32
  0.00    0.000000           0         9           getegid32
  0.00    0.000000           0         3         1 fcntl64
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0      1761           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           openat
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00   10.017978                106026      7065 total

Remove Hundreds of Thousands of Files, take 2

An alternte way to efficiently remove hundreds of thousands of files with find in Linux.

Normally the boilerplate for removing files in 'find' is find some-dir -name "*pattern*" -exec rm -f {} \;. This is very inefficent because it has to fork as many process as the number of files. As we all know, forking takes time to create process. If fork takes 0.01s to create a process, it will take 1,000s (16+ min) just to create those 'rm' processes for 100,000 files to be removed.

Below is the summary of strace system calls for the 3 solutions (python way, traditional find way with -exec, and find -delete) to delete 17576 files (26*26*26). Definitely 'find -delete' is the winner. See for yourself.

  • Python way - 18647 system calls, 0.0896s run time
  • find -exec rm - 843786 system calls, 42.801s run time
  • find -delete - 17711 system calls, 0.0793s run time

Python way:

$ touch somefiles-{a..z}{a..z}{a..z}

$ strace -cf ./rm.py somefiles
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 81.86    0.073372           4     17576           unlink
 17.78    0.015938         590        27           getdents64
  0.09    0.000080           1        89           close
  0.08    0.000071           0       153           read
  0.08    0.000070           1       135        74 stat64
  0.07    0.000059           0       268       182 open
  0.04    0.000036           0       137           fstat64
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           chdir
  0.00    0.000000           0         9         9 access
  0.00    0.000000           0        12           brk
  0.00    0.000000           0         5         1 ioctl
  0.00    0.000000           0         4         2 readlink
  0.00    0.000000           0        50           munmap
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        10           mprotect
  0.00    0.000000           0         3           _llseek
  0.00    0.000000           0        68           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           getcwd
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0        74           mmap2
  0.00    0.000000           0         9           lstat64
  0.00    0.000000           0         1           getuid32
  0.00    0.000000           0         1           getgid32
  0.00    0.000000           0         1           geteuid32
  0.00    0.000000           0         1           getegid32
  0.00    0.000000           0         1         1 futex
  0.00    0.000000           0         1           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         3           openat
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.089626                 18647       269 total

Traditional find -exec rm -f {} \;:

$ touch somefiles-{a..z}{a..z}{a..z}

$ strace -cf find . -name "somefiles-*" -exec rm -f {} \;
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.91   42.883595        2440     17576           waitpid
  1.30    0.571413          33     17576           clone
  0.55    0.241115          14     17576           unlinkat
  0.07    0.030407           0    105467           close
  0.04    0.017349           1     17577           fstatat64
  0.03    0.014306           1     17576     17576 _llseek
  0.03    0.012770           0     52737           open
  0.02    0.008407           0    140626           mmap2
  0.02    0.006971           0     17577           ioctl
  0.01    0.004180         182        23           getdents64
  0.01    0.004000           0    123033    105456 execve
  0.01    0.003189           0     52757           brk
  0.00    0.001418           0     17577           munmap
  0.00    0.001373           0     52735           fstat64
  0.00    0.000519           0     70311           mprotect
  0.00    0.000000           0     17580           read
  0.00    0.000000           0     52734     52734 access
  0.00    0.000000           0         1           gettimeofday
  0.00    0.000000           0         2           uname
  0.00    0.000000           0     17581           fchdir
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           getrlimit
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0     17577           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           openat
  0.00    0.000000           0     17577           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00   43.801012                843786    175767 total

find -delete way:

$ touch somefiles-{a..z}{a..z}{a..z}

$ strace -cf find . -name "somefiles-*" -delete
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.20    0.069193           4     17576           unlinkat
 12.69    0.010070         438        23           getdents64
  0.10    0.000083           5        17           mmap2
  0.00    0.000000           0         4           read
  0.00    0.000000           0         9           open
  0.00    0.000000           0        11           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         6         6 access
  0.00    0.000000           0        29           brk
  0.00    0.000000           0         1           ioctl
  0.00    0.000000           0         1           gettimeofday
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         2           uname
  0.00    0.000000           0         7           mprotect
  0.00    0.000000           0         5           fchdir
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         7           fstat64
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0         1           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           openat
  0.00    0.000000           0         1           fstatat64
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.079346                 17711         7 total

cron Has No Limit in Linux

cron in Solaris limits the number of cron jobs it can run by the configuration defined in /etc/cron.d/queuedefs.

In Ubuntu (or Linux), it does not have this limitation. I downloaded (apt-get source cron) the source and could not find any hard limit coded in the program. Also, I tried to launch some long running job (sleep 36000000) every minute and managed to pump 1500+ jobs until my little netnook (with 1GB ram) started swapping.

Although the hardware is the limit for cron, it would be nice to monitor how many cron jobs are running and how long each cron job takes to run. According to the 'man cron', you can provide a '-L' flag to capture more info of a job

-L loglevel
               Tell  cron what to log about jobs (errors are logged regardless
               of this value) as the sum of the following values:

                   1      will log the start of all cron jobs

                   2      will log the end of all cron jobs

                   4      will log all failed jobs (exit status != 0)

                   8      will log the process number of all cron jobs

For troubleshooting of cron, this link (Reasons Why Crontab Does Not Work) is realy useful.

Saturday, October 20, 2012

RTFM - Read The Fine Manual

After I resolved my user's issue regarding crontab file format, I asked him to refer to the man page of crontab (man -s 5 crontab).

My colleague thinks that I am being sarcastic :-). I may be cheeky sometimes, but not this round.

The truth is the default section of crontab is section 1, which is man page for user command (see "man Intro" or "man -s 1 Intro").

If you read the "See Also" section of "man crontab", they want you to refer to crontab(5) which is section 5 of crontab

CRONTAB(1)                                                          CRONTAB(1)

NAME
       crontab - maintain crontab files for individual users (Vixie Cron)

SYNOPSIS
       crontab [ -u user ] file
       crontab [ -u user ] [ -i ] { -e | -l | -r }

DESCRIPTION
       crontab  is  the  program used to install, deinstall or list the tables
       used to drive the cron(8) daemon in Vixie Cron.   Each  user  can  have
       their    own    crontab,    and    though    these    are    files   in
       /var/spool/cron/crontabs, they are not intended to be edited directly.

...
SEE ALSO
       crontab(5), cron(8)

FILES
       /etc/cron.allow
       /etc/cron.deny
       /var/spool/cron/crontabs
...

It is only section 5 of the man page that will give you the details of the configuration. So "man -s 5 crontab"

CRONTAB(5)                                                          CRONTAB(5)

NAME
       crontab - tables for driving cron

DESCRIPTION
       A  crontab file contains instructions to the cron(8) daemon of the gen‐
       eral form: ``run this command at this time on this date''.   Each  user
       has  their  own crontab, and commands in any given crontab will be exe‐
       cuted as the user who owns the crontab.  Uucp  and  News  will  usually
       have  their  own  crontabs, eliminating the need for explicitly running
       su(1) as part of a cron command.
...
       Commands are executed by cron(8) when the minute, hour,  and  month  of
       year  fields  match  the current time, and when at least one of the two
       day fields (day of month, or day of week) match the current  time  (see
       ``Note'' below).  cron(8) examines cron entries once every minute.  The
       time and date fields are:

              field          allowed values
              -----          --------------
              minute         0-59
              hour           0-23
              day of month   1-31
              month          1-12 (or names, see below)
              day of week    0-7 (0 or 7 is Sun, or use names)

       A field may be an asterisk (*), which always stands for ``first-last''.
...

For more details of each section, simply do a "man -s SECTION# Intro" to find out more. Here is an overview of all the sections on my ubuntu:

  • Section 1 - user commands
  • Section 2 - system calls
  • Section 3 - library functions
  • Section 4 - special files
  • Section 5 - file formats
  • Section 6 - games
  • Section 7 - overview, conventions, and miscellany section
  • Section 8 - administration and privileged commands