Storage Element

An SRM/dCache instance is added to the site as a grid accessible Storage Element. dCache is a very flexible package for combining multiple filesystems into one namespace. This allows us to use many (commodity) disks and RAID arrays as one system. SRM provides a standard grid interface to the storage. The VDT project produces an automated install package for these programs.

References:

Install Process

The install process has these general steps:

  • Learning about dCache --- see above references
  • Design and Planning
    • Deciding what hardware to use, what network connections will exit, mount points, etc.
  • Node Preparation
    • Host certificates, disk areas, etc
  • Install Configuration
    • Run config-node.pl script to make site-info.def. Edit site-info.def as needed.
  • Actual Install
    • Install nodes in proper order
  • Services Start
  • Final Configuration
    • Directory tags in pnfs
    • Poolmanager config
  • Testing

Design

Services / Servers

We have 2 dual dual-core (XEON e5320) 2U servers with 8GB each to use in this install. msu2 has an internal 4x 750GB drive array. msu4 has an md1000 shelf with 15 x 750GB. They are installed with SLC45 64 bit via ROCKS. The RAID array is formatted with XFS and mounted at /exports/pool

MSU2 will be used as the admin node. Services will include postgres, PNFS. SRM will also run here.

MSU4 will serve as a door / pool unit with pool, dcap, gridftp services. These will not require postgres databases... The pool will be about 4TB RAID array.

Nodes have 2 1G Ethernet connections. The networks are .local (private) and .aglt2.org (public).

Local worker nodes will not mount pnfs and will use dcap or gridftp to transfer files, not local protocols.

List of External Services

These are available on .local and .aglt2.org networks (services bound to all IPs).

Service Accessible From URL Comments
SRM World srm://msu2.algt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org GSI Authenticated SRM
DCAP Should be local only, but not sure dcap://msu4.aglt2.org/pnfs/msu-t3.aglt2.org Unauthenticated DCAP
GSIDCAP World gsidcap://msu4.aglt2.org/pnfs/msu-t3.aglt2.org GSI authenticated DCAP. Should this be supported?
GridFTP World gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org GSI authenticated GridFTP

List of targets for DZero:

Service Accessible From URL Comments
SRM World srm://msu2.algt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org/dzero/cache "SAM cache" for MC production

Planned upgrades

A 10G Ethernet interface will be added to msu4. Probably this will have a .aglt2.org address, but might try putting multiple VLANs on the interface and having it on .local as well. The PERC5 card on msu4 should be upgraded to PERC6 (will use shelf spare temporarily). The card should be moved to an x8 PCI slot.

Add pools on Dell compute nodes. These will be managed as a "read-pool" for compute nodes to get their job and minbias input files for DZero VO Monte Carlo generation. Considering changing disk partitioning so that dcache pool area is all of one disk. It is undecided whether these pools will be exposed (for reading) to remote sites.

Node Preparation

The nodes are installed with SLC45 64 bit.

Certs

Both machines need host certificates placed in /etc/grid-security

(This is a local script used for putting the certs to a ROCKS nodes...

perl /home/install/extras/install_hostcert.pl msu2
perl /home/install/extras/install_hostcert.pl msu4

Install Configuration

Get the VDT install tarball. Here I'm using vdt-dcache-SL4_64-2.1.6.tar.gz from http://vdt.cs.wisc.edu/software/dcache/server/

Unpack the tarball in a temporary working place. Change into the install directory and run config-node.pl

root@msu4 /tmp/vdt-dcache-SL4_64-2.1.6/install# ./config-node.pl 
Found java at /usr/bin/java. Version is 1.6.0_03.
Installed java version matches bundled java version.
How many admin nodes (non-pool and non-door nodes) do you have? 1
The recommended services for node 1 are:
  lmDomain poolManager pnfsManager dirDomain adminDoor httpDomain utilityDomain gplazmaService infoProvider srm replicaManager 

Enter the FQDN for the node 1: msu2.aglt2.org
Which services do you wish to run on node msu2.aglt2.org (Enter for defaults)? 
Do you wish to use the SRM Watch? [y or n]: y

How many door nodes do you have? 1
Enter the FQDN of door number 0: msu4.aglt2.org
Enter the private network that the pools are in.
If this does not apply, just press enter to skip: 
Enter the number of dcap doors to run on each door node [default 1]: 
Enter a pool FQDN name(Press Enter when all are done): msu4.aglt2.org
Enter the first storage location (Press Enter when all are done)): /exports/pool0
Enter another storage location (Press Enter when all are done)): 
Enter another pool FQDN name(Press Enter when all are done): 
Created site-info.def file.

Changes to site-info.def

Want to make the following changes:

  • change MY_DOMAIN away from default of aglt2.org. That is already in use on the cluster and I want to avoid baking in a name conflict.
  • force install of java, needed since not all nodes will have proper java (above config-node.pl was run a host with previous VDT install of Java...)
  • put logs in /var/log/dcache, dcache system makes multiple logs, want to be able to find them all, otherwise they go into /var/log.
  • change RESET_DCACHE vars to "yes". This does a reset to clean out old (what?)

Changed these vars:

MY_DOMAIN="msu-t3.aglt2.org"

JAVA_LOCATION="/opt/d-cache/jdk1.6.0_03/bin/java"
INSTALL_JDK=1
JDK_RELOCATION=/opt/d-cache
JDK_FILENAME=jdk-6u3-linux-amd64.rpm

DCACHE_LOG_DIR=/var/log/dcache

RESET_DCACHE_CONFIGURATION=yes
RESET_DCACHE_PNFS=yes
RESET_DCACHE_RDBMS=yes

Dryrun

Run the installer with the --dryrun option and it will list the actions it will take. The -s option specifies the site-info file to use:

# ./install.sh --dryrun -s site-info.def.msu-t3

Actual Installs

To perform an install, copy the VDT tarball and site-info file to the node. Unpack tarball and run the install.sh script using the site-info config file. Refer to the VDT documentation for the order to install nodes. For this simple setup, the admin node just needs to be installed first.

Starting Services

After install has been done on all nodes, then start services.

Admin node

Seems that postgresql and pnfs don't need to be started separately???

root@msu2# /opt/d-cache/bin/dcache-core start
Starting dcache services: 
Starting lmDomain  Done (pid=17437)
Starting dCacheDomain  Done (pid=17487)
Starting pnfsDomain  Done (pid=17542)
Starting dirDomain  Done (pid=17597)
Starting adminDoorDomain  Done (pid=17645)
Starting httpdDomain  Done (pid=17709)
Starting utilityDomain  Done (pid=17763)
Starting gPlazma-msu2Domain  Done (pid=17880)
Starting infoProviderDomain  Done (pid=17989)
Starting replicaDomain  Done (pid=18061)
Using CATALINA_BASE:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_HOME:   /opt/d-cache/libexec/apache-tomcat-5.5.20
Using CATALINA_TMPDIR: /opt/d-cache/libexec/apache-tomcat-5.5.20/temp
Using JRE_HOME:       /opt/d-cache/jdk1.6.0_03

Pinging srm server to wake it up, will take few seconds ...
Done

Change admin password

Login and change default password.

[rockwell@cap ~]$ ssh -l admin msu2.local -p 22223 -c blowfish
admin@msu2.local's password: 

    dCache Admin (VII) (user=admin)


[msu2.aglt2.org] (local) admin > cd acm
[msu2.aglt2.org] (acm) admin > create user admin
[msu2.aglt2.org] (acm) admin > set passwd newpass newpass
[msu2.aglt2.org] (acm) admin > ..
[msu2.aglt2.org] (local) admin > logoff
dmg.util.CommandExitException: (0) Done
[msu2.aglt2.org] (local) admin > Connection to msu2.local closed.

Issues

  • Needed to open firewall for remote connections to pnfs (this config is using the .aglt2.org network only). This was noticed after pool node couldn't mount /pnfs/msu-t3.aglt2.org.
  • replicaDomain.log grew rapidly with an error message that "Group Resilient Pools is empty". This is probably due to having replication enabled but not configured. Going to disable replica service for now (have only one pool at the moment anyways...). Turn off in /opt/d-cache/config/dCacheSetup on admin node and restart services.

Pool / Door

Start dcache-core and then dcache-pool.

root@msu4# /opt/d-cache/bin/dcache-core start
/pnfs/msu-t3.aglt2.org/ not mounted - going to mount it now ... 
Starting dcache services: 
Starting dcap-msu4Domain  Done (pid=559844)
Starting gridftp-msu4Domain  Done (pid=559911)
Starting gsidcap-msu4Domain  Done (pid=559980)

root@msu4# /opt/d-cache/bin/dcache-pool start
start dcache pool: Starting msu4Domain  Done (pid=560141)

root@msu4# mount ...
msu2.aglt2.org:/pnfsdoors on /pnfs/msu-t3.aglt2.org type nfs (rw,intr,noac,hard,nfsvers=2,addr=192.41.231.12)

Tests

See https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/ValidatingDcache

Install client rpms (not included in VDT install tarball).

rpm -i /home/install/contrib/4.3/x86_64/RPMS/dcache-*

Try with dccp on msu4:

root@msu4# /opt/d-cache/dcap/bin/dccp /tmp/vdt-dcache-SL4_64-2.1.6.tar.gz /pnfs/msu-t3.aglt2.org/data/

If you get a error about "Can't open destination file", make sure that the pool is actually setup right (paths are correct and a proper link exists in poolmanager). Actual permissions on the pool probably aren't the issue as this test is being done as root... Here is an example of this error condition:

root@msu2 /pnfs/msu-t3.aglt2.org# /opt/d-cache/dcap/bin/dccp /tmp/test.txt testy/
Command failed!
Server error message for [1]: "No write pool available for <teststore:testgroup@osm>" (errno 20).
Failed open file in the dCache.
Can't open destination file : "No write pool available for <teststore:testgroup@osm>"
System error: Input/output error

Try dccp from a user account (permissions are wrong in both pnfs an dpool for this):

rockwell@msu2 ~$ /opt/d-cache/dcap/bin/dccp /tmp/test.txt /pnfs/msu-t3.aglt2.org/testy/
Failed create entry in pNFS.
Can't open destination file : Can not create entry in pNfs
System error: Operation not permitted

Now with permissions in pnfs fixed (chmod 777), surprisingly, it worked.

rockwell@msu2 ~$ /opt/d-cache/dcap/bin/dccp /tmp/test.txt /pnfs/msu-t3.aglt2.org/testy/
8 bytes in 0 seconds

rockwell@msu2 ~$ ls -l /pnfs/msu-t3.aglt2.org/testy/
total 0
-rw-r--r--  1 root     root    32 Apr  9 14:50 modprobe.conf.rocks
-rw-r--r--  1 rockwell umatlas  8 Apr 10 21:45 test.txt

The data file on the pool is owned by root --- ok, this is how it should be. Above also works from msu4 (which has pnfs mounted).

using dccp:// A dcap door is running on msu4 (need to disable it...).

root@msu2 ~# /opt/d-cache/dcap/bin/dccp dcap://msu4/pnfs/msu-t3.aglt2.org/testy/test.txt .
8 bytes in 0 seconds
root@msu2 ~# cat test.txt 
yo baby

to gsidcap door with a grid proxy

rockwell@msu2 ~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/OSG/globus/lib

rockwell@msu2 ~$ /OSG/globus/bin/grid-proxy-init 
Your identity: /DC=org/DC=doegrids/OU=People/CN=Thomas D. Rockwell 611410
Enter GRID pass phrase for this identity:

rockwell@msu2 ~$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/d-cache/dcap/lib
rockwell@msu2 ~$ /opt/d-cache/dcap/bin/dccp /tmp/test.txt gsidcap://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/testy/testnew.txt
Error ( POLLIN POLLERR POLLHUP) (with data) on control line [3]
Failed to create a control line
Error ( POLLIN POLLERR POLLHUP) (with data) on control line [5]
Failed to create a control line
Failed open file in the dCache.
Can't open destination file : Server rejected "hello"
System error: Input/output error
rockwell@msu2 ~$ echo $?
255

That was actually a successful transfer, have to ignore the errors messages I guess.

gridftp

rockwell@msu2 ~$ time /OSG/globus/bin/globus-url-copy file:/tmp/gsiftp.txt gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/testy/

real    0m1.230s
user    0m0.060s
sys     0m0.000s

srmls with grid proxy as above

rockwell@msu2 ~$ time /opt/d-cache/srm/bin/srmls srm://msu2.aglt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org/                  
  512 /pnfs/msu-t3.aglt2.org//
      512 /pnfs/msu-t3.aglt2.org//testy/
      512 /pnfs/msu-t3.aglt2.org//data/


real    0m4.029s
user    0m3.170s
sys     0m0.110s

Issues / Debugging

The error messages from failed transfers are not often not so helpful, so a systematic test method is useful.

srm version 2, probably gives the best error messages.

Consider this problem due to a misconfiguration of the paths in storage-authzdb:

rockwell@msu2 ~$ /OSG/globus/bin/globus-url-copy file:///tmp/afile.txt gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/dzero/cache/

error: globus_ftp_client: the server responded with an error
550 File not found


rockwell@msu2 /tmp$ time /opt/d-cache/srm/bin/srmcp -2 file:////tmp/afile.txt srm://msu2.aglt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt
Mon Apr 21 11:50:30 EDT 2008: srmPrepareToPut update failed, status : SRM_FAILURE explanation= at Mon Apr 21 11:50:30 EDT 2008 state Failed :  at Mon Apr 21 11:50:29 EDT 2008 state Pending : created
RequestFileStatus#-2147481647 failed with error:[  at Mon Apr 21 11:50:30 EDT 2008 state Failed : user`s path ///pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt is not subpath of the user`s root]

Mon Apr 21 11:50:30 EDT 2008: PutFileRequest[srm://msu2.aglt2.org:8443/srm/managerv2?SFN=/pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt] status=SRM_AUTHORIZATION_FAILURE explanation= at Mon Apr 21 11:50:30 EDT 2008 state Failed : user`s path ///pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt is not subpath of the user`s root
Mon Apr 21 11:50:30 EDT 2008: java.io.IOException: srmPrepareToPut update failed, status : SRM_FAILURE explanation= at Mon Apr 21 11:50:30 EDT 2008 state Failed :  at Mon Apr 21 11:50:29 EDT 2008 state Pending : created
RequestFileStatus#-2147481647 failed with error:[  at Mon Apr 21 11:50:30 EDT 2008 state Failed : user`s path ///pnfs/msu-t3.aglt2.org/dzero/cache/srm-copy.txt is not subpath of the user`s root]

srm client error:  stopped 
java.lang.Exception:  stopped 
        at gov.fnal.srm.util.Copier.run(Copier.java:287)
        at java.lang.Thread.run(Thread.java:619)

real    0m4.001s
user    0m3.863s
sys     0m0.130s

The second error message points to just what the problem is...

test progression Note that srmls mainly test authentication, an srmmkdir will show if a write can be done to /pnfs and an actual copy will exercise the pool.

Auth and Auth

Authentication and Authorization uses the gPlazma cell. This is built with a plug-in architecture and provides multiple types of AA to the other dCache cells.

dcache.kpwd

This is a simple mode. The file /opt/d-cache/etc/dcache.kpwd is setup on the node running the gPlazma cell (other nodes don't need this file installed). The file has lines mapping grid subjects to local usernames and usernames to access:uid:gid:homedir:rootdir:root tuples.

[rockwell@cap ~]$ setup vdt

[rockwell@cap ~]$ globus-url-copy gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/testy/test.txt file:///tmp/test.txt 
error: the server sent an error response: 530 530 Authorization Service failed: diskCacheV111.services.authorization.AuthorizationServiceException: authRequestID 693270851 caught exception 
Exception thrown by diskCacheV111.services.authorization.KPWDAuthorizationPlugin: dcache.kpwd Authorization Plugin: Authorization denied for user rockwell with Subject DN /DC=org/DC=doegrids/OU=People/CN=Thomas D. Rockwell 611410

Fixed dcache.kpwd file and now it works:

[rockwell@cap ~]$ globus-url-copy gsiftp://msu4.aglt2.org/pnfs/msu-t3.aglt2.org/testy/test.txt file:///tmp/test.txt 

[rockwell@cap ~]$ cat /tmp/test.txt 
yo baby

Scheme for DZero

To support DZero MC processing, which relies on a small and stable set of subjects and just one local user, will setup gPlazma to use grid-mapfile. Will also have a (higher priority) check to dcache.kpwd so that local and test users can be supported. This will all for instance my rockwell subject to be manually remapped to something other than what is in the gridmap for testing. Will migrate to a full check to a GUMS server in the future.

In /etc/grid-security on msu2, copy in the grid-mapfile from msu-osg. Edit the file /etc/grid-security/storage-authzdb to look like this:

version 2.1
authorize samgrid read-write 825664 55673 / /pnfs/msu-t3.aglt2.org/dzero/samgrid /

Setup the file /opt/d-cache/etc/dcache.kpwd

Setup gPlazma to use these to mechanisms with higher priority for dcache.kpwd, on msu2 edit /opt/d-cache/etc/dcachesrm-gplazma.policy.

# Switches
saml-vo-mapping="OFF"
kpwd="ON"
grid-mapfile="ON"
gplazmalite-vorole-mapping="OFF"

# Priorities (lower priority is tried first, if the method is enabled above)
saml-vo-mapping-priority="1"
kpwd-priority="3"
grid-mapfile-priority="4"
gplazmalite-vorole-mapping-priority="2"

Note, to get this change to the .policy file applied, a restart was needed for gPlazma. (i just restarted all of dcache-core, but there should be a way to do it in the admin console as well.) The mapping files are checked for each authentication, so changes there are picked up automatically.

SRM

The Storage Resource Manager is an new interface to dCache that provides additional functionality to grid users. SRM is also standardized so that other storage systems may provide the same interface, though dCache is currently the main production ready implementation.

SRM features (see the dcache book...):

  • space management
    • including space reservations/tokens
  • data transfers
    • pre-transfer srmPrepareToPut and srmPrepareToGet
  • request status
  • directory functions
  • permissions functions

Install and config

SRM is installed with everything else using the vdt installer. Its config is in /opt/d-cache/config/dCacheSetup and it is enabled per node in /opt/d-cache/etc/node_config.

Errors

Had lots of jobs starting at once after having dcache run for a while without trouble. See this in gridftp log on msu4:

05 May 2009 14:53:57 Socket OPEN (ACCEPT) remote = /192.41.231.46:52023 local = 
/192.41.231.14:2811
java.lang.OutOfMemoryError: GC overhead limit exceeded
05/05 14:54:01 Cell(GFTP-msu4@gridftp-msu4Domain) : Thread : listen got : java.l
ang.OutOfMemoryError: GC overhead limit exceeded
05/05 14:54:05 Cell(GFTP-msu4@gridftp-msu4Domain) : Thread : ClinetThread-/192.4
1.231.46:52023 got : java.lang.OutOfMemoryError: GC overhead limit exceeded
05/05 14:54:07 Cell(GFTP-msu4@gridftp-msu4Domain) : java.lang.OutOfMemoryError: 
GC overhead limit exceeded
05/05 14:54:07 Cell(GFTP-msu4@gridftp-msu4Domain) : java.lang.OutOfMemoryError: 
GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded

The java processes don't have much memory in use, maybe there is a memory limit parameter that can be increased?

Restart stuff on msu4.

-- TomRockwell - 26 Mar 2008
Topic attachments
I Attachment Action Size Date Who Comment
README.txttxt README.txt manage 16 K 26 Mar 2008 - 18:17 TomRockwell README file from VDT 2.1.6
Topic revision: r40 - 18 Aug 2009, TomRockwell
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback