Sunday, July 22, 2012

Simple bash scripts to achieve parallel processing of shell tasks

We've all had the experience of scripting commands which we know will take a long time to finish, and wishing for the ability to run them in parallel in some easy fashion. On occasion I scripted special-purpose scripts to manage parallel processes to achieve a long-running task, e.g., to copy large numbers of files across machines. But this is such a general problem it makes more sense to have a general solution.

To solve this problem I wrote parallel, a bash script which takes a single integer argument telling it how many processes it should spawn to do work. It reads standard input for a series of commands which it then distributes to its subprocesses. There must be no ordering dependency between these commands since the order of their execution will not be known in advance. The subprocesses execute these commands until no work is left.

Here is the code.

:
n=$1

if [ -z "$n" ]; then
        n=5
else
        shift
fi

t=$TMP/parallel.$$

mkdir -p $t/in
mkdir -p $t/work
mkdir -p $t/out
x=0

while read cmd; do
        echo $cmd > $t/in/$x
        x=`expr $x + 1`
done

while [ $n != 0 ]; do
        parallel.1proc $n $t &
        n=`expr $n - 1`
done
wait
cat $t/out/*

rm -r $t
The script creates a temporary directory to keep track of the work. Under that directory is a subdirectory named in, where we keep the commands waiting to be executed. The subdirectory out is where we keep the output from the individual commands. The subdirectory work is where we keep track of the work which is currently being done.

A second script parallel.1proc does the actual work per process. This script expects an integer ID and a pointer to the temporary directory. It loops across a sequence where it repeatedly grabs work from the in directory, records what it's doing in the work directory and puts the output into the out directory. Here is the code:

:
id=$1
t=$2

echo parallel.1proc $id $t starting...
x=0
while [ 1 ]; do
        current_cmd_fn=$t/work/$id.$x
        while [ ! -f "$current_cmd_fn" ]; do
                next_cmd_base=`ls $t/in | tail -$id | head -1`
                if [ -z "$next_cmd_base" ]; then
                        echo parallel.1proc $id done
                        exit 0
                fi
                next_cmd_fn=$t/in/$next_cmd_base
                mv $next_cmd_fn $current_cmd_fn
        done
        (
        date
        cat $current_cmd_fn
        eval `cat $current_cmd_fn`
        date
        ) > $t/out/$id.$x
      
        x=`expr $x + 1`
done
The in is logically the work queue. There will be a race condition between the multiple copies of parallel.1proc as they attempt to move individual command files from the in directory to the work directory. The scheme hinges on the ability of each process to tell whether it won the race. When they attempt to copy work files they make a destination filename which incorporates their unique IDs, so if the destination file including, for example, the ID 4 exists after the move attempt, then sub process 4 knows that it was the lucky winner who succeeded in grabbing the work and can go ahead and execute it. Any other process which was trying to grab the same command will realize that it failed since the file named by its variable next_cmd_fn will not correspond to an existing file. A losing subprocess simply retries until it either does succeed in grabbing a command or else the supply of commands runs out, at which point the sub process exits.

Tuesday, July 10, 2012

Generalizing visibility control of an HTML span depending on form field state

Often a form user's answer to a question determines whether parts of a form are relevant, e.g., is the billing address the same as the shipping address? Earlier I wrote about a span attribute which leads to the automatic presentation of a checkbox to control a span's visibility. But in other circumstances it can be nice to have the ability to control a span's visibility depending on some arbitrary form field elsewhere in the page. As usual I am hoping to be able to implement this linkage in markup without a rat's nest of onchange JavaScript peeking around to see what the form looks like.
Using jquery and a small amount of code it is easy to do this. In my implementation described below, I support an attribute visibility_tied_to_field on spans; the attribute's value is the DOM ID of a check box form field whose state will determine the span's visibility. So, for example, the billing address versus shipping address situation could be handled as so:
Billing address different from shipping address? <input type=checkbox id='xyz'>
<span visibility_tied_to_field='xyz'>

... billing address form fields...

</span>

I also support an inverse linking between checkboxes and spans' visibility with an analogous span attribute visibility_inversely_tied_to_field. When this latter attribute is used, the span is visible if the checkbox is not checked, and invisible if the checkbox is checked (the logical inverse of visibility_tied_to_field):
Billing address same as shipping address? <input type=checkbox id='xyz'>
<span visibility_inversely_tied_to_field='xyz'>

... billing address form fields...

</span>

I implement this functionality using just the following code:

function visibility_ties_init()
{
    var init_onchange_show_if_checked = function(index, span)
    {
        init_onchange(true, span, "visibility_tied_to_field")
    }
    
    var init_onchange_show_if_NOT_checked = function(index, span)
    {
        init_onchange(false, span, "visibility_inversely_tied_to_field")
    }
    
    var init_onchange = function(show_if_checked, span, span_attr_that_points_to_field)
    {
        var checkbox_id_attr = span.attributes.getNamedItem(span_attr_that_points_to_field)
        if (checkbox_id_attr != null)
        {
            var checkbox_id = checkbox_id_attr.value
            var checkbox = $('#' + checkbox_id)
            var show_it = (show_if_checked && checkbox.is(':checked')) || (!show_if_checked && !checkbox.is(':checked'))
            span.hidden = !show_it

            checkbox.change(function()
            {
                span.hidden = (show_if_checked ? !this.checked : this.checked)
            })
        }
    }
    
    $("span[visibility_tied_to_field]").each(          init_onchange_show_if_checked)
    $("span[visibility_inversely_tied_to_field]").each(init_onchange_show_if_NOT_checked)
}

The code iterates across spans with the visibility_tied_to_field or visibility_inversely_tied_to_field attributes, looking up the checkboxes which are pointed to. For each one the code hangs a new onchange handler to take care of adjusting the corresponding span's visibility depending on the checkbox state.