Setting up Two OSG Gatekeepers for a Single Condor Cluster

We need to load balance access to our Condor cluster because of a possible time-out issue we are seeing in Nagios probes and the time it takes to run simple test scripts against our single gatekeeper.

There are two gatekeepers setup already using OSG 1.8.1i on:
  • gate01.aglt2.org (Original primary gatekeeper)
  • gate02.grid.umich.edu (Secondary gatekeeper...used for testing)

We want to use these two gatekeepers to load balance by VO. Below are the steps we did.

First Step: Configure GUMS to Map VO/User by Gatekeeper

What we want to do is to separate the VO's which utilize our cluster into one gatekeeper or the other. Since ATLAS (USATLAS) is our primary VO we will assign it to gate01.aglt2.org (along with the following 'test' like VOs: mis, gridex, osg and gums-test). All other supported OSG VOs will be assigned to gate02.grid.umich.edu.

To do this we configure our GUMS (V1.2.15) servers linat02.grid.umich.edu, linat03.grid.umich.edu and linat04.grid.umich.edu to have the correct host to group mapping based upon the gatekeeper. This results in the following:

  • gate01.aglt2.org has
 Group To Account Mappings: mis, gridex, osg, newUsatlasProd, newUsatlasSoft, newUsatlas, newAtlas, gums-test

  • gate02.grid.umich.edu has
 Group To Account Mappings: cdf, fermilab, gadu, grase, ivdgl, mis, fmri, sdss, star, uscmsuser, cmsuser, uscmst2admin, uscmssoft, uscmsprod, uscmsphedex,
 uscmsfrontier, cmsuser-null, LIGO, dzerouser, dzeroana, dosar, des, glow, grow, gridex, nanohub, geant4, i2u2, mariachi, osg, osgedu, minos, usminos, ukminos, 
 minossoft, nwicg, ops, miniboone, des-production, gugrid, gpn, compbiogrid, engage, ilc, nysgrid, sbgrid, cigi, gums-test, Suragrid

help Still to do later is run the gums-host-cron to generate the correct reverse VO map on each gatekeeper. See below.

Configuration on Gatekeepers

I am starting on gate02.grid.umich.edu and the first step will be to make sure the OSG install is up-to-date:
  • cd /opt/OSG080
  • source setup.sh
  • pacman -update

This gave:
[gate02:OSG080]# pacman -update
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:VDT-Version-Info] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:JDK-1.5] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:PPDG-Cert-Scripts] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:GUMS-Client] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:SRM-V1-Client] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:Configure-SRM] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:SRM-V2-Client] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:Syslog-ng] found...
Updating [VDT-Version-Info]...
Downloading [vdt-version-info-1.8.1-30.tar.gz] from [http://vdt.cs.wisc.edu/software//vdt-version-info/1.8.1]...
Updating [JDK-1.5]...
Downloading [jdk-1.5.0_14.x86_64_rhas_3.tar.gz] from [http://vdt.cs.wisc.edu/software//jdk/1.5.0_14]...
Updating [PPDG-Cert-Scripts]...
Downloading [cert-scripts-2.6.tar.gz] from [http://vdt.cs.wisc.edu/software//ppdg-cert-scripts/2.6]...
Updating [GUMS-Client]...
Downloading [gums-client-1.2.15.tar.gz] from [http://vdt.cs.wisc.edu/software//gums/1.2.15]...
WARNING: Uninstall shell command [vdt/sbin/vdt-uninstall Configure-SRM] has failed [vdt-uninstall failed: package 'Configure-SRM' does not seem to be installed (no filelist)].  Ignoring...
Can't uninstall [Configure-SRM].  Not updated...
Updating [SRM-V1-Client]...
Downloading [srm-v1-client-1.8.0-4.tar.gz] from [http://vdt.cs.wisc.edu/software//srm-v1-client/1.8.0-4]...
Updating [SRM-V2-Client]...
Downloading [srmclient2-2.2.0.8.tar.gz] from [http://vdt.cs.wisc.edu/software//srm-v2-client/2.2.0.8]...
Updating [Syslog-ng]...
Downloading [syslog-ng-2.0.7-x86_64_rhas_4.tar.gz] from [http://vdt.cs.wisc.edu/software//syslog-ng/2.0.7]...

There was a problem on gate01.aglt2.org trying the same thing:

  • cd /opt/OSG080
  • source setup.sh
  • pacman -update
Reading database...
   0    10   20   30   40   50   60   70   80   90   100
   +----+----+----+----+----+----+----+----+----+----+  
   ###################################################    
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:VDT-Version-Info] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:PPDG-Cert-Scripts] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:GUMS-Client] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:SRM-V1-Client] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:Configure-SRM] found...
Update of [/opt/OSG080:http://vdt.cs.wisc.edu/vdt_181_cache:SRM-V2-Client] found...
Can't find [http://vdt.cs.wisc.edu/vdt_181_cache:Apache] in [/opt/OSG080].
Can't find [http://vdt.cs.wisc.edu/vdt_181_cache:Apache] in [/opt/OSG080].
Can't find [http://vdt.cs.wisc.edu/vdt_181_cache:Apache] in [/opt/OSG080].

The upgrade didn't work. I email Saul Youssef about this...*more later*

Configuring Each System

Now we just need to configure the OSG setup on each gatekeeper. After

  • cd /opt/OSG080
  • source setup.sh
  • monitoring/configure-osg.sh

This results in a long dialogue. The important parts are making sure to use the SAME site name (AGLT2 in our case) and to divide up the resources based upon the policy involved. For AGLT2 we are allocating 64 job slots (2 dual dual-core Opteron systems and 7 dual quad-core Intel systems) for "Other" VOs than USATLAS. USATLAS will use the gate01.aglt2.org gatekeeper (along with allowing "test" VOs access) while gate02.grid.umich.edu will be the gatekeeper for all our VOs.

Also I only run MonaLisa? services on gate01.aglt2.org.

-- ShawnMcKee - 14 Mar 2008
Topic revision: r8 - 16 Oct 2009 - 20:14:40 - TomRockwell
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback