In cases where the resources on a single host are not enough for the work you are doing, just splitting that work across multiple processes on that one host will not optimize performance. Rather, we need to distribute the work across multiple hosts. Following are changes to the script framework that executes work in parallel that I discussed earlier, now supporting execution across machines.
:
# execute in N parallel processes (default 5)
# if HOST1 HOST2 HOST3 ... specified,
# execute in N processes per specified host
n=$1
if [ -z "$n" ]; then
n=5
hosts=''
else
case "$1" in
[0-9]*)
n=$1
;;
*)
hosts=$*
;;
esac
fi
if [ -z "$hosts" ]; then
hosts='localhost'
fi
t=$TMP/parallel.$$
mkdir -p $t/in
mkdir -p $t/work
mkdir -p $t/out
mkdir -p $t/hosts
x=0
while read cmd; do
echo $cmd > $t/in/$x
((x++))
done
id=1
for host in $hosts; do
n2=$n
while [ $n2 != 0 ]; do
echo `expand_host_name $host` > $t/hosts/$id
parallel.1proc $id $t &
let id++
n2=`expr $n2 - 1`
done
done
wait
cat $t/out/*
rm -r $t
The individual processes still run in a separate script
parallel.1proc with the following code:
:
id=$1
t=$2
host_fn=`ls.nth $id $t/hosts`
if [ -z "$host_fn" ]; then
host=localhost
else
host=`cat $t/hosts/$host_fn`
fi
echo "parallel.1proc $id $t starting (will direct work to $host)..."
x=0
while [ 1 ]; do
current_cmd_fn=$t/work/$id.$x
while [ ! -f "$current_cmd_fn" ]; do
next_cmd_base=`ls.nth $id $t/in`
if [ -z "$next_cmd_base" ]; then
echo parallel.1proc $id done
exit 0
fi
next_cmd_fn=$t/in/$next_cmd_base
mv $next_cmd_fn $current_cmd_fn 2> /dev/null
done
(
cmd=`cat $current_cmd_fn`
echo "==============================================================================="
echo "`date` parallel proc $id executing $cmd on $host"
if [ $host = "localhost" ]; then
eval $cmd
else
unset DISPLAY
export DISPLAY
ssh -o StrictHostKeyChecking=no $host $cmd
fi
date
) > $t/out/$id.$x
x=`expr $x + 1`
done
Of course it is assumed that each of the hosts will be able to access the needed disks to execute the work at hand. This approach works nicely in an environment where there is some set of NFS disks that are available across all the hosts you want working on the task.