Pre-caching lapply for radmind

A problem some people run into with radmind is that lapply is not atomic – that is, is can fail with the file system partially changed. The results of this can range from none to an unbootable system – if lapply fails during a major OS update (like 10.3.x to 10.4.x) you can be left with an unbootable volume.

There are many reasons lapply might fail during its run. A common reason is a problem with your transcripts – either a bad sort order, a wrong checksum, or a missing dependancy – a transcript might contain references to files that are in a directory that is not listed in the current transcript. Instead, that directory is listed in another transcript. If you then later remove the transcript with the enclosing directory from the command file, lapply will fail because it will remove the directory (since there are no references to it) but try to add the files. The fix is to manually add the enclosing directory to the transcript that has child files.

Another reason lapply might fail is that the files it needs are not available. Perhaps the radmind server went down or the network cable got pulled. lapply downloads files on an “as needed” basis – that is, it begins its file system modifications, and retreives needed files from the server as it gets to that point in the applicable transcript. If file modifications have started, and then the radmind server becomes unavailable, you are left with a partially modified filesystem.

This last error could be avoided if lapply could be instructed to retreive all needed files before starting the filesystem modifications. If we can’t get all the needed files before we start, we don’t perform any modifications — so we won’t have a partially modified filesystem.

lapply by itself has no ability to prefetch needed files. But with some clever scripting, we can do it. Credit goes to Wout Mertens of Cisco for the original version of this script – I took his ideas and built upon them. The basic idea is this:

  • Run fsdiff as normal.
  • Pass the applicable transcript to this script.
  • Parse the applicable transcript, finding all the files that need to be downloaded.
  • Create a new transcript that contains these files-to-be-downloaded and any needed enclosing directories. Make it a “relative” transcript.
  • Pass this newly-created transcript to lapply, downloading the files to a temporary cache space.
  • If we succeed in retreiving all the needed files, then we get really clever:
  • We create a local config file, and symlink the neeed transcript file directories to our temporary cache space.
  • Then we start up a local radmind server and run lapply against that server. Since the files are on the local box, we will retreive them from the temporary cache space instead of over the network.
  • Finally, clean up all our local caching.

The downsides of this approach are:

  • You need lots more free space on the local drive – you need to be able to cache all the needed files locally before moving them into place. Fortunately, you can calculate the required space before you start downloading and abort if you don’t have enough free space.
  • This approach is up to twice as slow as “non-pre-cached” lapply, since you are in effect, retreiving each file twice – once from the “real” server to your local cache space, and again from the local server to the final place in the filesystem.

Enough with the talk! Where is the script? It follows below, and is also available here.

It’s meant to be a “drop-in” replacement for lapply in your scripts – for example –
/path/to/pclapply -h radmindserver.mydomain -% /path/to/applicable-transcript.T


#!/bin/bash
# Pre-cached lapply script
# greg.neagle@disney.com
# inspired by and much code copied from wmertens@cisco.com
#
# Usage: pclapply <lapply options> <applicable-transcript>
#
# Warning: Not tested much!
#
# TODO
# - A retry mode, where breaking the network connection is ok.
# - more error checking
# - more testing
# - Rewrite in perl?
 
VERSION=0.1
CACHEDIR=/var/radmind/cache
CACHEFILE="$CACHEDIR/file"
CACHETRANS="$CACHEDIR/transcript"
CACHET="cache.$$.T"
TLIST="transcript_list"
CURRENTDIR=`pwd`
 
cleanup() {
      cd "$CURRENTDIR"
      killall radmind 2>/dev/null
      rm -rf "$CACHEDIR"
}
 
die() {
      [ -n "$*" ] && echo "ERROR: $*" >&2
      cleanup
      exit 1
}
 
usage() {
   echo "Usage:      $0 [ -FiV% ] [ -c checksum ] [ -h host ] [ -p port ] [ -w auth-level ] [ -x ca-pem-file ] [ -y cert-pem-file] [ -z key-pem-file ] applicable-transcript " >&2
   exit 1
}
 
while getopts %c:Fh:iVw:x:y:z: opt; do
   case $opt in
 
   %) PROGRESS="-%"
      ;;
 
   c) CHECKSUM="-c $OPTARG"
      ;;
 
   F) FLAGS="-F"
      ;;
 
   h) SERVER="-h $OPTARG"
      ;;
 
   i) OUTPUTBUFFERING="-i"
      ;;
 
   V) echo ${VERSION}
      exit 0
      ;;
 
   w) TLSLEVEL="-w $OPTARG"
      ;;
 
   x) CA_PEM_FILE="-x $OPTARG"
      ;;
 
   y) CERT_PEM_FILE="-y $OPTARG"
      ;;
 
   z) KEY_PEM_FILE="-z $OPTARG"
      ;;
 
   *) usage
      ;;
   esac
done
shift `expr $OPTIND - 1`
 
if [ $# -ne 1 ]; then
      usage
fi
 
TRAN="$1"
shift
 
# Trap meaningful signals
trap cleanup HUP INT PIPE QUIT TERM TRAP XCPU XFSZ
 
[ -r "$TRAN" ] || die "Cannot read transcript $TRAN!"
 
#make our cache directories
cleanup
mkdir $CACHEDIR
mkdir $CACHEFILE
 
echo Checking disk space
# Check free size (add up size of + lines, multiply by 2) (+ 300 MB leeway)
SIZE=`awk '/^\+/{s+=$8}END{print ( int( (s*2)/1024/1024 + 0.5 ) + 300 ) }' $TRAN`
FREE=`df -m / | awk 'NR==2{print $4}'`
[ $SIZE -gt $FREE ] \
      && die "Not enough free space: ${SIZE}MB needed, ${FREE}MB available"
 
echo Processing transcript
# Get the downloadable part, create directories where needed, relativize
perl -nle '
if (/^[^ ]+:/) {
      $transcript = $_;
}
if (/^\+/) {
      @parts = split;
      if ( $parts[2] =~ /^\// ) {
            s/\//.\//;
      }
      $parts[2] =~ s/^\.//;
      $parts[2] =~ s/^\///;
      @now = split "/", $parts[2];
      $i=0;
      while ($i < $#last && $i < $#now) {
            last if $last[$i] ne $now[$i];
            $i++;
      }
      while ($i < $#now) {
            print "d ./".join("/",@now[0 .. $i])." 700 0 0";
            $i++;
      }
      print $transcript if $transcript ne $lasttrans;
      $lasttrans = $transcript;
      $tline = $_;
      $tline =~ s/^\+ a/\+ f/;
      print $tline;
      @last=@now;
}' "$TRAN" > "$CACHEDIR/$CACHET"
 
# Download files. This is the only bit that needs the network
echo Downloading files
mkdir -m 700 "$CACHEFILE/$CACHET"
cd "$CACHEFILE/$CACHET"
OPTIONS="${PROGRESS} ${CHECKSUM} ${FLAGS} ${SERVER} ${OUTPUTBUFFERING} ${TLSLEVEL} ${CA_PEM_FILE} ${CERT_PEM_FILE} ${KEY_PEM_FILE}"
( /usr/local/bin/lapply ${OPTIONS} "$CACHEDIR/$CACHET" ) || die "file download failed"
 
# Get a list of needed transcripts
perl -nle '
if (/^[^ ]+:/) {
      $transcript = $_;
      chop $transcript;
      print $transcript if $transcript ne $lasttrans;
      $lasttrans = $transcript;
}' "$TRAN" > "$CACHEDIR/$TLIST"
 
# symlink transcript dirs referenced in original applicable transcript
# to our cache dir
for T in `cat "$CACHEDIR/$TLIST"`
do
      ln -s "$CACHEFILE/$CACHET" "$CACHEFILE/$T"
done
 
# Make a config file for the local radmind server
echo "localhost command.K" > $CACHEDIR/config
 
# symlink command and transcript dirs for use by the local server
# this gets us a command file and transcripts for "free"
ln -sf /var/radmind/client "$CACHEDIR/command"
ln -sf /var/radmind/client "$CACHEDIR/transcript"
 
# start a local radmind server
echo Starting local radmind server
( /usr/local/sbin/radmind -D ${CACHEDIR} ) || die "local radmind server did not start"
 
echo Modifying local filesystem
OPTIONS="${FLAGS} ${OUTPUTBUFFERING}"
( /usr/local/bin/lapply ${OPTIONS} -h localhost ${TRAN} ) || die "error modifying filesystem"
 
cleanup
exit 0

Pre-caching lapply for radmind