Install or Upgrade OSG at AGLT2

The main difference between these instructions and the usual documentation is that we use worker node and wlcg-client installations in AFS as well as certificates in AFS which are kept up to date by gate02.

For full Information of how to install OSG, please refer to this page OSGCE

For a short tuturial see:
  • Most of our config should come over when you do extract_config in an upgrade (more below)
  • Ignore the parts of this tutorial regarding CA setup. Make symlinks as noted below.
  • authorization_method in config.ini is "prima"
  • Also ignore parts about configuring RSV and RSV certs on gate02 at least.
  • Needed host certs pushed automatically into /etc/grid-security from umopt1


The following are the commands I used to install OSG100 on gate02.. there are some site-specific issues:
Updated May 12, 2009, for OSG101 install -- B.Ball

Updated June 13, 2009, for OSG104 -- B.Ball
No changes required to fundamental procedure outlined below.

August 8 2009 - bmeekhof
Renamed topic, edited according to experience upgrading OSG 1.0.4 to OSG 1.2.0 on gate02 following tutorial linked.

August 23 2009 - bmeekhof
Updated after installation on gate01. Additional info about updating AFS installations of Pacman, OSGWN and opt/WLCG-client and setting CA locations.

November 2, 2010 - Bob Ball
Upgrade OSG 1.2.6 to 1.2.15

January 17, 2011 - Bob Ball
Upgrade OSG 1.2.15 to 1.2.16

April 5, 2011 - Bob Ball
Upgrade OSG 1.2.16 to 1.2.19

October 21, 2011 - Bob Ball
Upgrade OSG 1.2.19 to 1.2.23

October 21, 2011 -- Bob Ball
Install OSGWN 1.2.23

November 7, 2011 -- Bob Ball
Upgrade OSG 1.2.23 to 1.2.24

November 15, 2011 -- Bob Ball
Upgrade OSG 1.2.24 to 1.2.25 on gate02, and apply 1.2.25 gratia security fix on gate01

March 8, 2012 -- Bob Ball
Upgrade OSG to 1.2.28 on both gate01 and on gate02.

February 27, 2016 -- Directions used most recently for an OSG 3.3 upgrade

Prepare for install

turn off the existing OSG services

Source the existing OSG install.

source /opt/OSG104/
vdt-control --off

Logout to unexport the env variables or login a new shell.

Set up the env variables.

This is important, don't forget it or you'll be re-installing. Setting OLD_VDT_LOCATION ensures your old configuration gets pulled in, but also we will have to run "extract_config" later to setup config.ini.

export VDTSETUP_CONDOR_CONFIG=/opt/condor/etc/condor_config
export OLD_VDT_LOCATION=/opt/OSG104/

Install the software

Install pyOpenSSL

"We have identified a reporting bug in OSG 1.2 that could impact accounting (for WLCG) and monitoring since it impacts the ability to publish RSV records to the GOC RSV database and WLCG SAM. The current monitoring system shows that all the Tier-2s running 1.2 have either fixed this problem or are aware of it. A VDT update will be available early next week.

The bug stems from a newly introduced dependency in the RSV Gratia probe on pyOpenSSL. If your site is already running pyOpenSSL, it should not be affected. If you are not running pyOpenSSL, this means that your site is not be reporting Gratia accounting data. The work around is to install pyOpenSSL. Alternatively, as noted above, this will be available in the a soon to be released VDT update. "

(message dated Friday Aug 14 2009)

You'll need admin AFS tokens to do this. "kinit admin" and "aklog". Note that sometimes afs paths are when we need the RW volume.

Install latest Pacman

Install pacman (AFS):
cd /afs/
tar -xzvf pacman-latest.tar.gz
rm pacman (remove old symlink)
ln -s pacman-x.xx pacman
rm pacman-latest.tar.gz
vos release opt

cd /afs/

(first pacman source wants you to be in local dir)

Update AFS installations of OSG Worker Node and OSG WLCG client

Source /afs/ if you have not already.

Please read LocalDQ2Tools#The_Installation_Procedure for information about updating this and what you have to do to remount the /opt volume as read-write in AFS.

UPDATE: Or...use /afs/ to use RW volume as documented below and fix the paths in files.

Probably should save a copy of current installation and delete existing files.

Then install worker node and wlcg using pacman (note the "." in afs path to use RW volume, and note that we fix the paths up to use the RO volume in usage):
OSGWN updated May 25, 2010 to osg 1.2.9 version

 cd /afs/ 
pacman -allow trust-all-caches -get

sed -i s/\.atlas\.umich\.edu/atlas\.umich\.edu/g `grep -RIl "\.atlas\.umich\.edu" *`

### NOTE: for the 10/21/2011 update to OSGWN, the OSGWN volume was 
###           remounted rw, all files were moved to the directory old_OSGWN,
###           and the pacman command was run on an "empty" directory.
###           The content of the new dccp/bin directory contained ONLY dccp,
###           so all the old lsm files were copied from the old_OSGWN tree to 
###           the new location

ln -s /afs/ globus/share/certificates
ln -s /afs/ globus/TRUSTED_CA

cd /afs/
pacman -get

sed -i s/\.atlas\.umich\.edu/atlas\.umich\.edu/g `grep -RIl "\.atlas\.umich\.edu" *`

ln -s /afs/ globus/share/certificates
ln -s /afs/ globus/TRUSTED_CA

Check/fix the openssl path so it is as below (don't do install on host with /opt/globus so it picks up the right path):
/afs/ -> /usr/bin/openssl
/afs/ -> /usr/bin/openssl

Be sure to release the volumes:
vos release opt
vos release OSGWN

Install OSG

Install OSG in /opt on Compute Elements (gate01,gate02):
mkdir /opt/osg-1.2 ; cd /opt/osg-1.2
pacman -allow trust-all-caches -get

Install managedfork

Install into /opt/osg-1.2:
cd /opt/osg-1.2
pacman -allow trust-all-caches -get

These instructions were not performed in upgrading to osg-1.2, not sure they are needed or if they are part of upgrade:
$VDT_LOCATION/vdt/setup/configure_globus_gatekeeper --managed-fork y --server y

Install Job Manager for condor

Install in /opt/osg-1.2:
cd /opt/osg-1.2
pacman -allow trust-all-caches -get

These instructions were not performed in upgrading to osg-1.2, not sure they are needed or if they are part of upgrade:
##uncomment this line in the
vi $VDT_LOCATION/globus/lib/perl/Globus/GRAM/JobManager/  
#    $requirements .= " && Arch == \"" . $description->condor_arch() . "\" ";  

Do post-install

source /opt/osg-1.2/

Configure the software

Configure certificates for OSG CE

gate02 is the machine which updates our AFS certs. It may be necessary to do the setupca command below if not upgrading. There is no longer a to run (reference to it removed below).

See the notes in the post-install/README file on CA-Certificates. Edit the value of cacerts_url in the configuration file at /opt/ost-1.2/vdt/etc/vdt-update-certs.conf
cacerts_url =

cd /opt/osg-1.2
source /opt/osg-1.2/
vdt-ca-manage setupca --location local --url osg

At AGLT2 -- point the installation at our AFS certificates:

rm /opt/osg-1.2/globus/share/certificates
rm /opt/osg-1.2/globus/TRUSTED_CA

gate02 (updates certificates, RW):
ln -s /afs/ /opt/osg-1.2/globus/share/certificates
ln -s /afs/ /opt/osg-1.2/globus/TRUSTED_CA

gate01 (RO):
ln -s /afs/ /opt/osg-1.2/globus/share/certificates
ln -s /afs/ /opt/osg-1.2/globus/TRUSTED_CA

Configure authentication

Copy auth files from post-install. The files will have the correct values as long as you set OLD_VDT_LOCATION before the installation.

cp /opt/osg-1.2/post-install/gsi-authz.conf /etc/grid-security/
cp /opt/osg-1.2/post-install/prima-authz.conf /etc/grid-security/
vi /etc/grid-security/prima-authz.conf
   logLevel    info

Setup config.ini

For an upgrade (be sure you set env vars before you started) you will need to run first:
source /opt/osg-1.2/ (if not sourced already)

Copy extracted-config.ini to /opt/osg-1.2/osg/etc/config.ini and check it over. Then check that it verifies and then apply the config:

configure-osg -v
configure-osg -c

Modify your sudo file

Runas_Alias GLOBUSUSERS = ALL, !root

globus   ALL=(GLOBUSUSERS) \
     NOPASSWD: /opt/osg-1.2/globus/libexec/globus-gridmap-and-execute \
     -g /etc/grid-security/grid-mapfile \
     /opt/osg-1.2/globus/libexec/ * 

globus   ALL=(GLOBUSUSERS) \
     NOPASSWD: /opt/osg-1.2/globus/libexec/globus-gridmap-and-execute \
     -g /etc/grid-security/grid-mapfile \
     /opt/osg-1.2/globus/libexec/globus-gram-local-proxy-tool * 

globus   ALL=(GLOBUSUSERS) \
     NOPASSWD: \
     /opt/osg-1.2/globus/libexec/ * 

globus   ALL=(GLOBUSUSERS) \
     NOPASSWD: \
     /opt/osg-1.2/globus/libexec/globus-gram-local-proxy-tool * 

Check perms on containercert/key

Make sure under /etc/grid-security, both containercert.pem and containerkey.pem belong to the same user globus..
gate02:monitoring]# ls -l  /etc/grid-security/container*|grep -v old
-r--r--r--  1 globus osg 1302 Jul  9 11:43 /etc/grid-security/containercert.pem
-r--------  1 globus osg  887 Jul  9 11:43 /etc/grid-security/containerkey.pem

Check your services, enable the ones you want with vdt-control --enable

Turn off condor, turn on anything needed. Gate02 needs to run the cert and crl update services. I didn't need to do the vdt-register-service in an upgrade. This is for gate02:
vdt-control --enable fetch-crl
vdt-control --enable vdt-update-certs
vdt-control --disable condor-cron
vdt-register-service --name condor-cron --disable

NOTE: gate01 is the opposite. Enable condor-cron, disable fetch-crl and vdt-update-certs

Double check that it's all good. Our two gatekeepers are different in requirements. Gate02 needs these:
vdt-control --list
[gate02:osg-1.2]# vdt-control --list
Service                 | Type   | Desired State
fetch-crl               | cron   | enable
vdt-rotate-logs         | cron   | enable
vdt-update-certs        | cron   | enable
globus-gatekeeper       | inetd  | enable
gsiftp                  | inetd  | enable
mysql5                  | init   | enable
globus-ws               | init   | enable
gums-host-cron          | cron   | enable
MLD                     | init   | do not enable
condor-cron             | init   | do not enable
apache                  | init   | enable
tomcat-55               | init   | enable
gratia-condor           | cron   | enable
edg-mkgridmap           | cron   | do not enable

Gate01 needs these:
[gate01:afs]# vdt-control --list
Service                 | Type   | Desired State
fetch-crl               | cron   | do not enable
vdt-rotate-logs         | cron   | enable
vdt-update-certs        | cron   | do not enable
globus-gatekeeper       | inetd  | enable
gsiftp                  | inetd  | enable
mysql5                  | init   | enable
globus-ws               | init   | do not enable
gums-host-cron          | cron   | enable
MLD                     | init   | enable
condor-cron             | init   | enable
apache                  | init   | enable
tomcat-55               | init   | enable
gratia-condor           | cron   | enable
edg-mkgridmap           | cron   | do not enable
osg-rsv                 | init   | enable

Make sure mysql is started up before globus-ws

This was not necessary in upgrade to osg-1.2. It appears to be fixed in the distribution - services started up in the correct order without modifications below. Init files from dist setup put mysql at 90 and tomcat-55,apache,globus-ws at order 99. Init file is named mysql5 now.

sed '/^# chkconfig:/c # chkconfig: 345 97 09' --in-place=.ORI /etc/rc.d/init.d/mysql
sed '/^# chkconfig:/c # chkconfig: 345 98 04' --in-place=.ORI /etc/rc.d/init.d/globus-ws
chkconfig mysql reset
chkconfig globus-ws reset

Start the services

vdt-control --on

Modify crontab for root on gate02 (vdt-control should put these in but you will need to adjust timing)

This applies to gate02 only.
  • fetch-crl.cron should run every hour at 8 minutes after every hour
  • vdt-update-certs-wrapper should run at 12 minutes after every hour
8 * * * * /opt/osg-1.2/fetch-crl/share/doc/fetch-crl-2.6.6/fetch-crl.cron
12 * * * * /opt/osg-1.2/vdt/sbin/vdt-update-certs-wrapper --vdt-install /opt/osg-1.2 --called-from-cron

Make a symlink for OSG104 RSV probes from gate01

It won't find this binary if you don't do the below:
ln -s /opt/osg-1.2/osg/bin/osg-version /opt/osg-1.2/osg-version

Update various other scripts

I did not do 2) when updating to OSG 1.2.0.

1) Following directions here
Add this on a one-time only basis to /etc/security/limits.conf
globus hard nofile 16384

2) Still following those directions, add to GLOBUS_OPTIONS in /opt/OSG104/
This directory is created with 777 permissions

3) Bring these startups in line
sed -i s/OSG104/osg-1.2/g /etc/init.d/gsisshd
sed -i s/OSG104/osg-1.2/g /etc/init.d/gsi_sshd
sed -i s/OSG104/osg-1.2/g /etc/syslog-ng/syslog-ng.conf

Note that for the first 2, the file /etc/sysconfig/vdt.conf is defined now, that specifies the
location of the VDT, like so:
export VDT_CURRENT=/opt/osg-1.2
The gsisshd and gsi_sshd startups now source this file, and then branch accordingly.
syslog-ng.conf cannot do this, and so must be modified by hand.

gate01 now employs the same setup.

Verify the site

Do these as a normal user with your grid cert.

source /opt/osg-1.2/
cd /opt/osg-1.2/verify

Some commands to verify the services:

##verify managedfork
time globus-job-run /bin/hostname 
##verify jobmanager-cordor
time globus-job-run /bin/hostname 
##verify globus-ws
globusrun-ws -submit -F -S -s -c /bin/bash -c 'export CONDOR_CONFIG=/opt/condor/etc/condor_config; condor_q'

Example of setting up RSV Probes

vdt-control --off osg-rsv

perl osg-rsv/bin/misc/ --reset

./osg-rsv/setup/configure_osg_rsv --user rsvuser --init --server y --ce-probes \
--ce-uri ""  --srm-probes --srm-uri "" \ 
--srm-dir /pnfs/  --srm-webservice-path "srm/managerv2" --gratia --grid-type "OSG" \
--consumers --verbose --setup-for-apache --proxy /tmp/x509up_u55625

vdt-control --on osg-rsv

Upgrade OSG 1.2.6 to OSG 1.2.15

This upgrade was performed on November 2, 2010, and went very smoothly. Instructions were followed from this URL. This particular URL is linked from this master URL.

Pre-upgrade steps

# Save some files:
cd /root
mkdir osg1.2.15_up
crontab -l > osg1.2.15_up/crontab_l
vdt-control --list > osg1.2.15_up/vdt-control-list.txt
cp -p /opt/osg-1.2.6/osg/etc/config.ini osg1.2.15_up/
# Check some links so we can ensure they are correctly set at the end
[gate02:~]# ll /opt/osg/globus|grep TRUST
lrwxrwxrwx  1 root root    30 Apr 30 17:26 TRUSTED_CA -> /opt/certificates/certificates
[gate02:~]# ll /opt/osg/globus/share|grep cert
lrwxrwxrwx  1 root root    30 Apr 30 17:27 certificates -> /opt/certificates/certificates
[gate01:~]# ll /opt/osg/globus|grep TRUST
lrwxrwxrwx  1 root    50 Sep  2 12:32 TRUSTED_CA -> /afs/
[gate01:~]# ll /opt/osg/globus/share|grep cert
lrwxrwxrwx  1 root    50 Sep  2 12:32 certificates -> /afs/
#  Make sure that condor is cleaned.  Auto-pilots were previously stopped as this is
#    a scheduled outage.
condor_q -constr 'jobstatus==1'|grep " I "|awk '{print $1}'|xargs -n 1 condor_hold
condor_q -constr 'jobstatus==2'|grep " R "|awk '{print $1}'|xargs -n 1 condor_rm

service condor stop

export VDTSETUP_CONDOR_CONFIG=/opt/condor/etc/condor_config

Actual upgrade steps

This is a summary of the steps explained in the URL above.

vdt-control --off

# Get the latest version of the vdt-updater script:
pacman -update VDT-Updater

# Note: If you do not yet have the updater script (look for $VDT_LOCATION/vdt/update/vdt-updater), 
#   then fetch it with this command:

pacman -get



cp osg/etc/config.ini /tmp/config.ini-backup

pacman -update osg-version
pacman -update osg-config

cp  /tmp/config.ini-backup osg/etc/config.ini

# After updating, re-source the file to load any changes in the environment:



On a CE, you will also need to reconfigure your system

configure-osg -v
configure-osg -c

# Get rid of the gratia probes for gate02 running from gate01
cd /opt/osg/osg-rsv/submissions/probes
mv gate02*gratia* /root/osg1.2.15_up

# Note that the srmcp-srm-probe is also different, having been modified to use
# a dCache token-controlled area.  Compare to /root/srmcp-srm-probe
# Directory is /opt/osg/osg-rsv/bin/probes

vdt-control --on

service condor start

Upgrade OSG 1.2.15 to OSG 1.2.16

Smooth upgrade. Also added in Rack 110 and 119 workers, and bl-5 workers, as sub-clusters 7-9.

This was a small step in versions. Instructions were therefore followed from this URL instead of the path followed for the 1.2.15 upgrade.

Upgrade OSG 1.2.16 to OSG 1.2.19

Smooth upgrade following directions. Two complications and one change.
  • print_local_time = TRUE (or anything) is no longer supported for rsv times in config.ini
  • The max value of SI00 is 5000, whereas we had 6700 for the sub-cluster where it was needed, so that is now reset to 5000
  • org.osg.gratia.condor and org.osg.gratia.metric probes were disabled for rsv on gate02. This is made possible by the new rsv-control command documented here.
    • rsv-control --disable --host org.osg.gratia.condor org.osg.gratia.metric
    • This was followed by a gate01 reboot that actually turned off these probes.

Upgrade OSG 1.2.19 to OSG 1.2.23

Pre-upgrade note:

Directions here look straightforward. However, must be modified as I understand it is changed in this release.

Post-upgrade note:

Modified to not invoke the new . This was the only real change to in this update, on both gate01 and gate02.

gate02 updates smooth and by the book

gate01 updated with one modification to the procedure. Before the last step, "vdt-control --on", a check of the rsv probes shows the same two probes as in the 1.2.19 update were once again enabled. Disabled them.
  • rsv-control --disable --host org.osg.gratia.condor org.osg.gratia.metric

Upgrade OSG 1.2.23 to OSG 1.2.24

Upgrade went smoothly on both gate keepers.

On gate01 the rsv metrics were again disabled. In addition, the global timeout was changed from 1200 to 720 seconds, and the srmcp-readwrite condor-cron interval was changed from "28 *" to "13,28,43,58 *". The following two files were edited to achieve this.
  • /opt/osg/osg-rsv/etc/rsv.conf (timeout)
  • /opt/osg/osg-rsv/meta/metrics/org.osg.srm.srmcp-readwrite.meta (periodicity)
Both gate01 and gate02 were rebooted following the updates. rsv probes that failed during the downtime were run and the report was fully green.

Upgrade OSG 1.2.24 to OSG 1.2.25

Upgrade only gate02 following directions. Total outage was approximately 20 minutes.

On gate01, perform only the gratia fix outlined at

Upgrade to OSG 1.2.28

Upgrade following directions. No changes in or in config.ini.
The srmcp-readwrite rsv probe required a second change, that perhaps should have been there all along. The change is shown in this output from the diff command:
[gate01:probes]# diff srmcp-srm-probe srmcp-srm-probe.orig
<     my $srmcp_cmd = "$o{'srmcpCmd'} -space_token=5904816 -streams_num=1 -srm_protocol_version=".
>     my $srmcp_cmd = "$o{'srmcpCmd'} -streams_num=1 -srm_protocol_version=".
<     $srmcp_cmd = "$o{'srmcpCmd'} -space_token=5904816 -streams_num=1 -srm_protocol_version=".
>     $srmcp_cmd = "$o{'srmcpCmd'} -streams_num=1 -srm_protocol_version=".

The metric interval changes made in the upgrade to version 1.2.24 were retained in this update and did not require re-implementation.

The rsv probe disable for gate02 made in the upgrade to version 1.2.23 was again performed.

Upgrade OSG 3.3

The HTCondor repo is installed, but not active, on aglbatch. From there we can see the URL for the repo is so browse to there and download the needed rpms.


The test case is on gate03, but all gatekeepers are treated identically following confirmation of success on gate03.

# Stop cfengine
service cfengine3 stop

# Stop Condor without terminating the shadow/WN processing
condor_off -fast

# Update condor
cd /atlas/data08/ball/admin/condor_rpms_8.4.11
yum localupdate condor-8.4.11-1.el6.x86_64.rpm condor-classads-8.4.11-1.el6.x86_64.rpm \
condor-external-libs-8.4.11-1.el6.x86_64.rpm condor-procd-8.4.11-1.el6.x86_64.rpm \
condor-python-8.4.11-1.el6.x86_64.rpm condor-cream-gahp-8.4.11-1.el6.x86_64.rpm

# Update osg
yum --enablerepo=osg update

# Now, here, watch the yum output for .rpmnew files, check each one thoroughly to understand
# it, and make any needed cf3 changes to config files.  When all is happy....

# Run cf-agent to re-establish anything needing it
cf-agent -Kf; cf-agent -K

# Verify the osg configuration is clean....
osg-configure -v

# Then apply it
osg-configure -c

# And then reboot.

Clean install of OSG 3.3

gate02 choked. So, a new VM was cloned from the old, it was Cobbler built, and then the following steps were undertaken to do a full osg 3.3 install. This resulted in OSG 3.3.23. For now, this is just a "history" dump. This left condor and condor-ce stopped.

   48  yum install yum-plugin-priorities
   50  rpm -Uvh
   51  yum --enablerepo=osg-empty install empty-ca-certs
   54  yum --enablerepo=osg install condor
   56  yum --enablerepo=osg install osg-ce-condor
   60  mkdir /root/saves
   61  cp -ar /etc/condor/config.d /root/saves/condor_config.d
   64  cp -ar /etc/condor-ce/config.d /root/saves/condor-ce_config.d
   65  cp -ar /etc/osg/config.d /root/saves/osg_config.d
   67  yum --enablerepo=osg install rsv
   68  cf-agent -Kf; cf-agent -K
   69  service cfengine3 stop
   75  cf-agent -Kf; cf-agent -K
   82  service autofs start
   84  osg-configure -v
   87  reboot
   88  exit

   90  yum install ruby
   91  yum install rubygems
   92  yum install rubygem-json.x86_64 rubygem-pg
   93  yum install rubygem-activesupport.noarch
   94  gem install activerecord -v 2.3.18
   95  gem list

Manually edit in a gums server in /etc/lcmaps.db

[root@gate02 osg]# chkconfig gums-client-cron on
[root@gate02 osg]# service gums-client-cron start
Enabling periodic gums-host-cron:                          [  OK  ]
#   Run it once manually
[root@gate02 osg]# [[ ! -f /var/lock/subsys/gums-host-cron ]] || /usr/bin/gums-host-cron

yum install tomcat6
chkconfig tomcat6 on
cf-agent run
Check that the http certs are owned by tomcat.  Found false, so
chown tomcat.tomcat /etc/grid-security/http/*.pem
service tomcat6 start

chkconfig --add gratia-probes-cron
chkconfig gratia-probes-cron on
service gratia-probes-cron start

-- WenjingWu - 09 Jul 2008

Topic attachments
I Attachment Action Size Date Who Comment
gate01-config.iniini gate01-config.ini manage 4.6 K 23 Aug 2009 - 23:06 BenMeekhof gate01:/opt/osg-1.2/osg/etc/config.ini for reference
gate02-config.iniini gate02-config.ini manage 3.9 K 23 Aug 2009 - 23:02 BenMeekhof gate02:/opt/osg-1.2/etc/config.ini for reference
Topic revision: r38 - 01 Jun 2017 - 19:15:16 - BobBall

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback