Installing pCache and LSM at AGLT2

We are interested in setting up both a Local Site Mover (LSM) and pCache on our worker nodes. The goals are:

  • Reduce the I/O required for accessing commonly used files (pCache)
  • Reduce the local worker-node temp space by using only 1 real copy of a file (pCache)
  • Allow easier monitoring and management of worker node file transfers (LSM)

There is some information on LSM in its SVN repo at http://repo.mwt2.org/viewvc/lsm/ In addition NET2 documented some of their local customization at http://rc.fas.harvard.edu/atlas/lsmthrottling

The pCache SVN repository is available at http://repo.mwt2.org/viewvc/pcache/ and there is an ATLAS web page about it at: https://twiki.cern.ch/twiki/bin/view/Atlas/Pcache I see the UK cloud looked at pCache study Graeme Stewart has a pCache.ppt

pCache Installation

For AGLT2 we initially tested pCache by levering the OSGWN install location in our AFS area. We had installed two files: dccp.sh and pcache.py into the dccp .../bin area for OSGWN. The contents of the dccp.sh:
#!/bin/bash
#
#  Shell for 'dccp' command.  Invokes "real" binary dccp.bin from pcache.
#
###########################

pargs="-s /tmp -m 45%"
pcache.py $pargs dccp.bin $*

This allows us to move the OSGWN dccp binary to dccp.bin and then copy dccp.sh to dccp. Then, any use of dccp will use pCache.

For May 2011 we are revisiting pCache with Charle's most recent version 3.9. Our original testing showed some small fraction of production job failures while analysis jobs were fine.

In revisiting this we want to test pCache before enabling it cluster-wide. The worker-nodes all source the OSGWN setup file which places the AFS located bin areas first in the path. That makes it difficult to "locally" test since regular jobs coming to a node will use the common location. To allow one production node to try pCache requires that we get the special version of dccp in front of the common version.

The AGLT2 queue configuration is available at: http://panda.cern.ch:25980/server/pandamon/query?tp=queue&id=AGLT2-condor There is information about modifying the SchedConfig at https://twiki.cern.ch/twiki/bin/view/Atlas/SchedConfigNewController

There is an SVN checkout in ~smckee/SchedConfig/pandaconf already. This can use used to update things via SVN over ssh.

The relevant parts of the SchedConfig for pCache and LSM are:
copyprefix = ^srm://head01.aglt2.org
copyprefixin = None
copysetup = None
copysetupin = None
copytool = lcg-cp2
copytoolin = dccp
pcache = None

FYI, the direct-access is documented at https://twiki.cern.ch/twiki/bin/view/Atlas/PandaPilot#Direct_access_vs_stage_in_mode

-- ShawnMcKee - 03 May 2011
Topic revision: r2 - 03 May 2011 - 23:12:31 - ShawnMcKee
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback