Monday, October 22, 2012

Remove Hundreds of Thousands of Files, take 2

An alternte way to efficiently remove hundreds of thousands of files with find in Linux.

Normally the boilerplate for removing files in 'find' is find some-dir -name "*pattern*" -exec rm -f {} \;. This is very inefficent because it has to fork as many process as the number of files. As we all know, forking takes time to create process. If fork takes 0.01s to create a process, it will take 1,000s (16+ min) just to create those 'rm' processes for 100,000 files to be removed.

Below is the summary of strace system calls for the 3 solutions (python way, traditional find way with -exec, and find -delete) to delete 17576 files (26*26*26). Definitely 'find -delete' is the winner. See for yourself.

  • Python way - 18647 system calls, 0.0896s run time
  • find -exec rm - 843786 system calls, 42.801s run time
  • find -delete - 17711 system calls, 0.0793s run time

Python way:

$ touch somefiles-{a..z}{a..z}{a..z}

$ strace -cf ./rm.py somefiles
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 81.86    0.073372           4     17576           unlink
 17.78    0.015938         590        27           getdents64
  0.09    0.000080           1        89           close
  0.08    0.000071           0       153           read
  0.08    0.000070           1       135        74 stat64
  0.07    0.000059           0       268       182 open
  0.04    0.000036           0       137           fstat64
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           chdir
  0.00    0.000000           0         9         9 access
  0.00    0.000000           0        12           brk
  0.00    0.000000           0         5         1 ioctl
  0.00    0.000000           0         4         2 readlink
  0.00    0.000000           0        50           munmap
  0.00    0.000000           0         1           uname
  0.00    0.000000           0        10           mprotect
  0.00    0.000000           0         3           _llseek
  0.00    0.000000           0        68           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           getcwd
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0        74           mmap2
  0.00    0.000000           0         9           lstat64
  0.00    0.000000           0         1           getuid32
  0.00    0.000000           0         1           getgid32
  0.00    0.000000           0         1           geteuid32
  0.00    0.000000           0         1           getegid32
  0.00    0.000000           0         1         1 futex
  0.00    0.000000           0         1           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         3           openat
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.089626                 18647       269 total

Traditional find -exec rm -f {} \;:

$ touch somefiles-{a..z}{a..z}{a..z}

$ strace -cf find . -name "somefiles-*" -exec rm -f {} \;
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.91   42.883595        2440     17576           waitpid
  1.30    0.571413          33     17576           clone
  0.55    0.241115          14     17576           unlinkat
  0.07    0.030407           0    105467           close
  0.04    0.017349           1     17577           fstatat64
  0.03    0.014306           1     17576     17576 _llseek
  0.03    0.012770           0     52737           open
  0.02    0.008407           0    140626           mmap2
  0.02    0.006971           0     17577           ioctl
  0.01    0.004180         182        23           getdents64
  0.01    0.004000           0    123033    105456 execve
  0.01    0.003189           0     52757           brk
  0.00    0.001418           0     17577           munmap
  0.00    0.001373           0     52735           fstat64
  0.00    0.000519           0     70311           mprotect
  0.00    0.000000           0     17580           read
  0.00    0.000000           0     52734     52734 access
  0.00    0.000000           0         1           gettimeofday
  0.00    0.000000           0         2           uname
  0.00    0.000000           0     17581           fchdir
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           getrlimit
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0     17577           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           openat
  0.00    0.000000           0     17577           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00   43.801012                843786    175767 total

find -delete way:

$ touch somefiles-{a..z}{a..z}{a..z}

$ strace -cf find . -name "somefiles-*" -delete
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.20    0.069193           4     17576           unlinkat
 12.69    0.010070         438        23           getdents64
  0.10    0.000083           5        17           mmap2
  0.00    0.000000           0         4           read
  0.00    0.000000           0         9           open
  0.00    0.000000           0        11           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         6         6 access
  0.00    0.000000           0        29           brk
  0.00    0.000000           0         1           ioctl
  0.00    0.000000           0         1           gettimeofday
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         2           uname
  0.00    0.000000           0         7           mprotect
  0.00    0.000000           0         5           fchdir
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         7           fstat64
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0         1           set_thread_area
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           openat
  0.00    0.000000           0         1           fstatat64
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.079346                 17711         7 total

0 Comments:

Post a Comment

<< Home