To solve this problem I wrote parallel, a bash script which takes a single integer argument telling it how many processes it should spawn to do work. It reads standard input for a series of commands which it then distributes to its subprocesses. There must be no ordering dependency between these commands since the order of their execution will not be known in advance. The subprocesses execute these commands until no work is left.
Here is the code.
The script creates a temporary directory to keep track of the work. Under that directory is a subdirectory named in, where we keep the commands waiting to be executed. The subdirectory out is where we keep the output from the individual commands. The subdirectory work is where we keep track of the work which is currently being done.
:
n=$1
if [ -z "$n" ]; then
n=5
else
shift
fi
t=$TMP/parallel.$$
mkdir -p $t/in
mkdir -p $t/work
mkdir -p $t/out
x=0
while read cmd; do
echo $cmd > $t/in/$x
x=`expr $x + 1`
done
while [ $n != 0 ]; do
parallel.1proc $n $t &
n=`expr $n - 1`
done
wait
cat $t/out/*
rm -r $t
A second script parallel.1proc does the actual work per process. This script expects an integer ID and a pointer to the temporary directory. It loops across a sequence where it repeatedly grabs work from the in directory, records what it's doing in the work directory and puts the output into the out directory. Here is the code:
The in is logically the work queue. There will be a race condition between the multiple copies of parallel.1proc as they attempt to move individual command files from the in directory to the work directory. The scheme hinges on the ability of each process to tell whether it won the race. When they attempt to copy work files they make a destination filename which incorporates their unique IDs, so if the destination file including, for example, the ID 4 exists after the move attempt, then sub process 4 knows that it was the lucky winner who succeeded in grabbing the work and can go ahead and execute it. Any other process which was trying to grab the same command will realize that it failed since the file named by its variable next_cmd_fn will not correspond to an existing file. A losing subprocess simply retries until it either does succeed in grabbing a command or else the supply of commands runs out, at which point the sub process exits.
:
id=$1
t=$2
echo parallel.1proc $id $t starting...
x=0
while [ 1 ]; do
current_cmd_fn=$t/work/$id.$x
while [ ! -f "$current_cmd_fn" ]; do
next_cmd_base=`ls $t/in | tail -$id | head -1`
if [ -z "$next_cmd_base" ]; then
echo parallel.1proc $id done
exit 0
fi
next_cmd_fn=$t/in/$next_cmd_base
mv $next_cmd_fn $current_cmd_fn
done
(
date
cat $current_cmd_fn
eval `cat $current_cmd_fn`
date
) > $t/out/$id.$x
x=`expr $x + 1`
done