Installing pCache and LSM at AGLT2
We are interested in setting up both a Local Site Mover (LSM) and pCache on our worker nodes. The goals are:
- Reduce the I/O required for accessing commonly used files (
pCache
)
- Reduce the local worker-node temp space by using only 1 real copy of a file (
pCache
)
- Allow easier monitoring and management of worker node file transfers (
LSM
)
There is some information on LSM in its SVN repo at
http://repo.mwt2.org/viewvc/lsm/ In addition NET2 documented some of their local customization at
http://rc.fas.harvard.edu/atlas/lsmthrottling
The
pCache
SVN repository is available at
http://repo.mwt2.org/viewvc/pcache/ and there is an ATLAS web page about it at:
https://twiki.cern.ch/twiki/bin/view/Atlas/Pcache I see the UK cloud looked at
pCache study Graeme Stewart has a
pCache.ppt
pCache Installation
For AGLT2 we initially tested pCache by levering the OSGWN install location in our AFS area. We had installed two files:
dccp.sh
and
pcache.py
into the dccp
.../bin
area for OSGWN. The contents of the
dccp.sh
:
#!/bin/bash
#
# Shell for 'dccp' command. Invokes "real" binary dccp.bin from pcache.
#
###########################
pargs="-s /tmp -m 45%"
pcache.py $pargs dccp.bin $*
This allows us to move the OSGWN
dccp
binary to
dccp.bin
and then copy
dccp.sh
to
dccp
. Then, any use of
dccp
will use
pCache
.
For May 2011 we are revisiting pCache with Charle's most recent version 3.9. Our original testing showed some small fraction of
production job failures while
analysis jobs were fine.
In revisiting this we want to test pCache before enabling it cluster-wide. The worker-nodes all source the OSGWN setup file which places the AFS located bin areas
first in the path. That makes it difficult to "locally" test since regular jobs coming to a node will use the common location. To allow one production node to try
pCache
requires that we get the special version of
dccp
in front of the common version.
The AGLT2 queue configuration is available at:
http://panda.cern.ch:25980/server/pandamon/query?tp=queue&id=AGLT2-condor
There is information about modifying the SchedConfig at
https://twiki.cern.ch/twiki/bin/view/Atlas/SchedConfigNewController
There is an
SVN checkout in
~smckee/SchedConfig/pandaconf
already. This can use used to update things via
SVN over
ssh.
The relevant parts of the SchedConfig for pCache and LSM are:
copyprefix = ^srm://head01.aglt2.org
copyprefixin = None
copysetup = None
copysetupin = None
copytool = lcg-cp2
copytoolin = dccp
pcache = None
FYI, the
direct-access is documented at
https://twiki.cern.ch/twiki/bin/view/Atlas/PandaPilot#Direct_access_vs_stage_in_mode
--
ShawnMcKee - 03 May 2011