Unix – Better Find Command with Parallel Processing

findunix

The unix find(1) utility is very useful allowing me to perform an action on many files that match certain specifications, e.g.

find /dump -type f -name '*.xml' -exec java -jar ProcessFile.jar {} \;

The above might run a script or tool over every XML file in a particular directory.

Let's say my script/program takes a lot of CPU time and I have 8 processors. It would be nice to process up to 8 files at a time.

GNU make allows for parallel job processing with the -j flag but find does not appear to have such functionality. Is there an alternative generic job-scheduling method of approaching this?

Best Answer

xargs with the -P option (number of processes). Say I wanted to compress all the logfiles in a directory on a 4-cpu machine:

find . -name '*.log' -mtime +3 -print0 | xargs -0 -P 4 bzip2

You can also say -n <number> for the maximum number of work-units per process. So say I had 2500 files and I said:

find . -name '*.log' -mtime +3 -print0 | xargs -0 -n 500 -P 4 bzip2

This would start 4 bzip2 processes, each of which with 500 files, and then when the first one finished another would be started for the last 500 files.

Not sure why the previous answer uses xargs and make, you have two parallel engines there!