It occurred to me a bit later that a broader purpose could be served by memoization in this instance. One of the least attractive aspects of multivcs_query is its testing dependency on the existence of such a variety of source control management systems and also particular code lines which will surely not exist on most people's servers. I had thought about having some sort of bootstrap code which would establish simple code lines with enough content and history to support the tests I need to run, but of course that would be a significant amount of work and also keep the unfortunate assumption that the source control systems involved are even installed on the local system, a good bet with git, but probably not with any of the others. But if multivcs_query uses wrappers with memoization for its interactions with every source control systems it uses, then all I need to do is seed the memoization cache with appropriate data and my tests will work even if none of the source control systems exist on a local server. For each request, the memoization layer will see a hit in the cache and immediately return valid results, never even attempting to run the (possibly non-existent) version control systems that are theoretically involved. And that means I can pursue develop work for multivcs_query on a laptop which naturally has none of the server-side source control software installed.
To make this beautiful vision a reality, a several of pieces of work were required.
- I had to change cache.pl not to generate its cached result files by simply concatenating the inputs -- I was immediately exceeding the filename maximum length with calls referring to multiple source code files. So instead what I do is follow the old method to generate a (sometimes very long) string key, and then just call cksum with that key to make a unique ID.
-
Then, to avoid cache contents becoming unmanageably opaque, I also save a companion file with a
.cmd
suffix recording the actual command that was run. -
Now that I want all test cases which are based on these version control system dependencies to use the cache, I am frequently looking at the current cache and extracting the appropriate files needed to be saved away for future successful test runs. Especially now that the cache files have names based on cksum values, it is no longer a simple matter to look at the cache directory and understand what is what. To make this situation transparent, I have implemented a simple utility cache.ls to list the contents of the cache and propose commands to copy the relevant files to a new location (presumably the folder containing the same to cache contents needed for successful test runs). Here is the code for cache.ls:
#!/bin/bash search_args=$* if [ -z "$search_args" ]; then search_args=. fi for cf in `ls $TMP/cache* | grep -v 'cmd$'`; do if cat $cf.cmd | grepm $search_args; then cat $cf echo EOD echo "cp -p $cf* ." echo '----------------------------------------------------------------------------------------------' fi done -
Finally, it is important to initialize the cache with the appropriate data when running on a host for the first time. In my test wrapper, I added the following code to publish this task:
if [ ! -f $TMP/CACHE_SEEDED_FOR_TESTS ]; then echo "Initializing cache data for test runs on this host:" echo "cp -p test/cache_seed/* $TMP..." if ! cp -p test/cache_seed/* $TMP; then echo "$0: cp -p test/cache_seed/* $TMP failed, exiting..." 1>&2 exit 1 fi if ! touch $TMP/CACHE_SEEDED_FOR_TESTS; then echo "$0: touch $TMP/CACHE_SEEDED_FOR_TESTS failed, exiting..." 1>&2 exit 1 fi fi
use strict;
use IO::File;
my $__trace = 0;
sub get_cached_output_path
{
my($extra_key, $s) = @_;
my $key = $extra_key . $s;
my $fn_base = `echo $key | cksum`;
chomp $fn_base;
$fn_base =~ s/ .*//;
my $fn = "$ENV{'TMP'}/cache." . $fn_base;
my $f = new IO::File("$fn.cmd", "w");
$f->write($key);
$f->close();
return $fn;
}
my @argv = @ARGV;
my $extra_key = $ENV{"CACHE_EXTRA_ARG"};
$extra_key = "" if !defined $extra_key;
if ($argv[1] eq "-cache-clear")
{
my $cached_output_stem = get_cached_output_path($extra_key, $argv[0]);
die "empty output stem" unless $cached_output_stem;
my $cmd = "rm -f $cached_output_stem* 2> /dev/null";
print "$cmd\n" if $__trace;
print `$cmd`;
exit(0);
}
my $cmd = join('" "', @argv);
$cmd =~ s/(" ")*$//g;
$cmd = '"' . $cmd . '"';
$cmd =~ s/"([\w_#,\.\/]+)"/$1/g;
print "cmd=$cmd\n" if $__trace;
my $cached_output = get_cached_output_path($extra_key, $cmd);
if (-f $cached_output)
{
print "using existing $cached_output\n" if $__trace;
}
else
{
my $cmd_with_redirects = "$cmd > $cached_output 2> $cached_output.err";
`$cmd_with_redirects`;
if ($__trace)
{
print "Executed $cmd_with_redirects\n";
}
if ( `cat $cached_output.err` eq '' )
{
if ($__trace)
{
print "No error output, so deleting $cached_output.err\n";
}
unlink "$cached_output.err";
}
}
print `cat $cached_output`;
if (-f "$cached_output.err" )
{
print STDERR `cat $cached_output.err`;
# assume trouble if there was output to stderr, and remove the cached output:
unlink "$cached_output.err";
unlink $cached_output;
}
* It really is strange -- svn in particular has about a 2 second overhead for me no matter how simple my call. I'm guessing this is some sort of pathological misconfiguration of the local subversion server, but I don't control it and it is tangential enough to the central purpose of multivcs_query that I can't justify launching a campaign to improve it. But thanks to memoization, I don't have to care too much.