Install or Upgrade OSG at AGLT2

The main difference between these instructions and the usual documentation is that we use worker node and wlcg-client installations in AFS as well as certificates in AFS which are kept up to date by gate02.

For full Information of how to install OSG, please refer to this page OSGCE

For a short tuturial see: https://twiki.grid.iu.edu/bin/view/ReleaseDocumentation/ComputingElementHandsOn
  • Most of our config should come over when you do extract_config in an upgrade (more below)
  • Ignore the parts of this tutorial regarding CA setup. Make symlinks as noted below.
  • authorization_method in config.ini is "prima"
  • Also ignore parts about configuring RSV and RSV certs on gate02 at least.
  • Needed host certs pushed automatically into /etc/grid-security from umopt1

Changelog

The following are the commands I used to install OSG100 on gate02.. there are some site-specific issues:
Updated May 12, 2009, for OSG101 install -- B.Ball

Updated June 13, 2009, for OSG104 -- B.Ball
No changes required to fundamental procedure outlined below.

August 8 2009 - bmeekhof
Renamed topic, edited according to experience upgrading OSG 1.0.4 to OSG 1.2.0 on gate02 following tutorial linked.

August 23 2009 - bmeekhof
Updated after installation on gate01. Additional info about updating AFS installations of Pacman, OSGWN and opt/WLCG-client and setting CA locations.

November 2, 2010 - Bob Ball
Upgrade OSG 1.2.6 to 1.2.15

January 17, 2011 - Bob Ball
Upgrade OSG 1.2.15 to 1.2.16

April 5, 2011 - Bob Ball
Upgrade OSG 1.2.16 to 1.2.19

October 21, 2011 - Bob Ball
Upgrade OSG 1.2.19 to 1.2.23

October 21, 2011 -- Bob Ball
Install OSGWN 1.2.23

November 7, 2011 -- Bob Ball
Upgrade OSG 1.2.23 to 1.2.24

November 15, 2011 -- Bob Ball
Upgrade OSG 1.2.24 to 1.2.25 on gate02, and apply 1.2.25 gratia security fix on gate01

March 8, 2012 -- Bob Ball
Upgrade OSG to 1.2.28 on both gate01 and on gate02.

February 27, 2016 -- Directions used most recently for an OSG 3.3 upgrade

Prepare for install

turn off the existing OSG services

Source the existing OSG install.

source /opt/OSG104/setup.sh
vdt-control --off

Logout to unexport the env variables or login a new shell.

Set up the env variables.

This is important, don't forget it or you'll be re-installing. Setting OLD_VDT_LOCATION ensures your old configuration gets pulled in, but also we will have to run "extract_config" later to setup config.ini.

export VDTSETUP_CONDOR_LOCATION=/opt/condor
export VDTSETUP_CONDOR_CONFIG=/opt/condor/etc/condor_config
export VDT_GUMS_HOST=linat04.grid.umich.edu
export OLD_VDT_LOCATION=/opt/OSG104/

Install the software

Install pyOpenSSL

"We have identified a reporting bug in OSG 1.2 that could impact accounting (for WLCG) and monitoring since it impacts the ability to publish RSV records to the GOC RSV database and WLCG SAM. The current monitoring system shows that all the Tier-2s running 1.2 have either fixed this problem or are aware of it. A VDT update will be available early next week.

The bug stems from a newly introduced dependency in the RSV Gratia probe on pyOpenSSL. If your site is already running pyOpenSSL, it should not be affected. If you are not running pyOpenSSL, this means that your site is not be reporting Gratia accounting data. The work around is to install pyOpenSSL. Alternatively, as noted above, this will be available in the a soon to be released VDT update. "

(message dated Friday Aug 14 2009)

You'll need admin AFS tokens to do this. "kinit admin" and "aklog". Note that sometimes afs paths are .atlas.umich.edu when we need the RW volume.

Install latest Pacman

Install pacman (AFS):
cd /afs/.atlas.umich.edu/opt
wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-latest.tar.gz
tar -xzvf pacman-latest.tar.gz
rm pacman (remove old symlink)
ln -s pacman-x.xx pacman
rm pacman-latest.tar.gz
vos release opt

cd /afs/atlas.umich.edu/opt/pacman/

source setup.sh
(first pacman source wants you to be in local dir)

Update AFS installations of OSG Worker Node and OSG WLCG client

Source /afs/atlas.umich.edu/opt/pacman/setup.sh if you have not already.

Please read LocalDQ2Tools#The_Installation_Procedure for information about updating this and what you have to do to remount the /opt volume as read-write in AFS.

UPDATE: Or...use /afs/.atlas.umich.edu to use RW volume as documented below and fix the paths in files.

Probably should save a copy of current installation and delete existing files.

Then install worker node and wlcg using pacman (note the "." in afs path to use RW volume, and note that we fix the paths up to use the RO volume in usage):
OSGWN updated May 25, 2010 to osg 1.2.9 version

 cd /afs/.atlas.umich.edu/OSGWN 
pacman -allow trust-all-caches -get http://software.grid.iu.edu/osg-1.2:wn-client

sed -i s/\.atlas\.umich\.edu/atlas\.umich\.edu/g `grep -RIl "\.atlas\.umich\.edu" *`

### NOTE: for the 10/21/2011 update to OSGWN, the OSGWN volume was 
###           remounted rw, all files were moved to the directory old_OSGWN,
###           and the pacman command was run on an "empty" directory.
###
###           The content of the new dccp/bin directory contained ONLY dccp,
###           so all the old lsm files were copied from the old_OSGWN tree to 
###           the new location

ln -s /afs/atlas.umich.edu/OSG_certificates/certificates globus/share/certificates
ln -s /afs/atlas.umich.edu/OSG_certificates/certificates globus/TRUSTED_CA

cd /afs/.atlas.umich.edu/opt/WLCG-client
pacman -get http://www.mwt2.org/caches/osg-1.2:wlcg-client

sed -i s/\.atlas\.umich\.edu/atlas\.umich\.edu/g `grep -RIl "\.atlas\.umich\.edu" *`

ln -s /afs/atlas.umich.edu/OSG_certificates/certificates globus/share/certificates
ln -s /afs/atlas.umich.edu/OSG_certificates/certificates globus/TRUSTED_CA

Check/fix the openssl path so it is as below (don't do install on host with /opt/globus so it picks up the right path):
/afs/atlas.umich.edu/OSGWN/globus/bin/openssl -> /usr/bin/openssl
/afs/atlas.umich.edu/WLCG-client/globus/bin/openssl -> /usr/bin/openssl

Be sure to release the volumes:
vos release opt
vos release OSGWN

Install OSG

Install OSG in /opt on Compute Elements (gate01,gate02):
mkdir /opt/osg-1.2 ; cd /opt/osg-1.2
pacman -allow trust-all-caches -get http://software.grid.iu.edu/osg-1.2:ce

Install managedfork

Install into /opt/osg-1.2:
cd /opt/osg-1.2
pacman -allow trust-all-caches -get http://software.grid.iu.edu/osg-1.2:ManagedFork

These instructions were not performed in upgrading to osg-1.2, not sure they are needed or if they are part of upgrade:
source $VDT_LOCATION/setup.sh
$VDT_LOCATION/vdt/setup/configure_globus_gatekeeper --managed-fork y --server y

Install Job Manager for condor

Install in /opt/osg-1.2:
cd /opt/osg-1.2
pacman -allow trust-all-caches -get http://software.grid.iu.edu/osg-1.2:Globus-Condor-Setup

These instructions were not performed in upgrading to osg-1.2, not sure they are needed or if they are part of upgrade:
##uncomment this line in the condor.pm
vi $VDT_LOCATION/globus/lib/perl/Globus/GRAM/JobManager/condor.pm  
#    $requirements .= " && Arch == \"" . $description->condor_arch() . "\" ";  

Do post-install

source /opt/osg-1.2/setup.sh
vdt-post-install

Configure the software

Configure certificates for OSG CE

gate02 is the machine which updates our AFS certs. It may be necessary to do the setupca command below if not upgrading. There is no longer a vdt-questions.sh to run (reference to it removed below).

See the notes in the post-install/README file on CA-Certificates. Edit the value of cacerts_url in the configuration file at /opt/ost-1.2/vdt/etc/vdt-update-certs.conf
cacerts_url = http://software.grid.iu.edu/pacman/cadist/ca-certs-version

cd /opt/osg-1.2
source /opt/osg-1.2/setup.sh
vdt-ca-manage setupca --location local --url osg

At AGLT2 -- point the installation at our AFS certificates:

rm /opt/osg-1.2/globus/share/certificates
rm /opt/osg-1.2/globus/TRUSTED_CA

gate02 (updates certificates, RW):
ln -s /afs/atlas.umich.edu/Certficates/certificates /opt/osg-1.2/globus/share/certificates
ln -s /afs/atlas.umich.edu/Certficates/certificates /opt/osg-1.2/globus/TRUSTED_CA

gate01 (RO):
ln -s /afs/atlas.umich.edu/OSG_certificates/certificates /opt/osg-1.2/globus/share/certificates
ln -s /afs/atlas.umich.edu/OSG_certificates/certificates /opt/osg-1.2/globus/TRUSTED_CA

Configure authentication

Copy auth files from post-install. The files will have the correct values as long as you set OLD_VDT_LOCATION before the installation.

cp /opt/osg-1.2/post-install/gsi-authz.conf /etc/grid-security/
cp /opt/osg-1.2/post-install/prima-authz.conf /etc/grid-security/
vi /etc/grid-security/prima-authz.conf
   logLevel    info

Setup config.ini

For an upgrade (be sure you set env vars before you started) you will need to run first:
source /opt/osg-1.2/setup.sh (if not sourced already)
extract-config

Copy extracted-config.ini to /opt/osg-1.2/osg/etc/config.ini and check it over. Then check that it verifies and then apply the config:

configure-osg -v
configure-osg -c

Modify your sudo file


Runas_Alias GLOBUSUSERS = ALL, !root

globus   ALL=(GLOBUSUSERS) \
     NOPASSWD: /opt/osg-1.2/globus/libexec/globus-gridmap-and-execute \
     -g /etc/grid-security/grid-mapfile \
     /opt/osg-1.2/globus/libexec/globus-job-manager-script.pl * 

globus   ALL=(GLOBUSUSERS) \
     NOPASSWD: /opt/osg-1.2/globus/libexec/globus-gridmap-and-execute \
     -g /etc/grid-security/grid-mapfile \
     /opt/osg-1.2/globus/libexec/globus-gram-local-proxy-tool * 

globus   ALL=(GLOBUSUSERS) \
     NOPASSWD: \
     /opt/osg-1.2/globus/libexec/globus-job-manager-script.pl * 

globus   ALL=(GLOBUSUSERS) \
     NOPASSWD: \
     /opt/osg-1.2/globus/libexec/globus-gram-local-proxy-tool * 

Check perms on containercert/key

Make sure under /etc/grid-security, both containercert.pem and containerkey.pem belong to the same user globus..
gate02:monitoring]# ls -l  /etc/grid-security/container*|grep -v old
-r--r--r--  1 globus osg 1302 Jul  9 11:43 /etc/grid-security/containercert.pem
-r--------  1 globus osg  887 Jul  9 11:43 /etc/grid-security/containerkey.pem

Check your services, enable the ones you want with vdt-control --enable

Turn off condor, turn on anything needed. Gate02 needs to run the cert and crl update services. I didn't need to do the vdt-register-service in an upgrade. This is for gate02:
vdt-control --enable fetch-crl
vdt-control --enable vdt-update-certs
vdt-control --disable condor-cron
vdt-register-service --name condor-cron --disable

NOTE: gate01 is the opposite. Enable condor-cron, disable fetch-crl and vdt-update-certs

Double check that it's all good. Our two gatekeepers are different in requirements. Gate02 needs these:
vdt-control --list
[gate02:osg-1.2]# vdt-control --list
Service                 | Type   | Desired State
------------------------+--------+--------------
fetch-crl               | cron   | enable
vdt-rotate-logs         | cron   | enable
vdt-update-certs        | cron   | enable
globus-gatekeeper       | inetd  | enable
gsiftp                  | inetd  | enable
mysql5                  | init   | enable
globus-ws               | init   | enable
gums-host-cron          | cron   | enable
MLD                     | init   | do not enable
condor-cron             | init   | do not enable
apache                  | init   | enable
tomcat-55               | init   | enable
gratia-condor           | cron   | enable
edg-mkgridmap           | cron   | do not enable

Gate01 needs these:
[gate01:afs]# vdt-control --list
Service                 | Type   | Desired State
------------------------+--------+--------------
fetch-crl               | cron   | do not enable
vdt-rotate-logs         | cron   | enable
vdt-update-certs        | cron   | do not enable
globus-gatekeeper       | inetd  | enable
gsiftp                  | inetd  | enable
mysql5                  | init   | enable
globus-ws               | init   | do not enable
gums-host-cron          | cron   | enable
MLD                     | init   | enable
condor-cron             | init   | enable
apache                  | init   | enable
tomcat-55               | init   | enable
gratia-condor           | cron   | enable
edg-mkgridmap           | cron   | do not enable
osg-rsv                 | init   | enable

Make sure mysql is started up before globus-ws

This was not necessary in upgrade to osg-1.2. It appears to be fixed in the distribution - services started up in the correct order without modifications below. Init files from dist setup put mysql at 90 and tomcat-55,apache,globus-ws at order 99. Init file is named mysql5 now.

sed '/^# chkconfig:/c # chkconfig: 345 97 09' --in-place=.ORI /etc/rc.d/init.d/mysql
sed '/^# chkconfig:/c # chkconfig: 345 98 04' --in-place=.ORI /etc/rc.d/init.d/globus-ws
chkconfig mysql reset
chkconfig globus-ws reset

Start the services

vdt-control --on

Modify crontab for root on gate02 (vdt-control should put these in but you will need to adjust timing)

This applies to gate02 only.
  • fetch-crl.cron should run every hour at 8 minutes after every hour
  • vdt-update-certs-wrapper should run at 12 minutes after every hour
8 * * * * /opt/osg-1.2/fetch-crl/share/doc/fetch-crl-2.6.6/fetch-crl.cron
12 * * * * /opt/osg-1.2/vdt/sbin/vdt-update-certs-wrapper --vdt-install /opt/osg-1.2 --called-from-cron

It won't find this binary if you don't do the below:
ln -s /opt/osg-1.2/osg/bin/osg-version /opt/osg-1.2/osg-version

Update various other scripts

I did not do 2) when updating to OSG 1.2.0.

1) Following directions here
Add this on a one-time only basis to /etc/security/limits.conf
globus hard nofile 16384

2) Still following those directions, add to GLOBUS_OPTIONS in /opt/OSG104/setup.sh
-Dorg.globus.wsrf.container.persistence.dir=/home/GRAM4_metadata
This directory is created with 777 permissions

3) Bring these startups in line
sed -i s/OSG104/osg-1.2/g /etc/init.d/gsisshd
sed -i s/OSG104/osg-1.2/g /etc/init.d/gsi_sshd
sed -i s/OSG104/osg-1.2/g /etc/syslog-ng/syslog-ng.conf

Note that for the first 2, the file /etc/sysconfig/vdt.conf is defined now, that specifies the
location of the VDT, like so:
export VDT_CURRENT=/opt/osg-1.2
The gsisshd and gsi_sshd startups now source this file, and then branch accordingly.
syslog-ng.conf cannot do this, and so must be modified by hand.

gate01 now employs the same setup.

Verify the site

Do these as a normal user with your grid cert.

source /opt/osg-1.2/setup.sh
grid-proxy-init
cd /opt/osg-1.2/verify
./site_verify.pl

Some commands to verify the services:

grid-proxy-init
##verify managedfork
time globus-job-run gate02.grid.umich.edu:2119/jobmanager-managedfork /bin/hostname 
##verify jobmanager-cordor
time globus-job-run gate02.grid.umich.edu:2119/jobmanager-condor /bin/hostname 
##verify globus-ws
globusrun-ws -submit -F gate01.aglt2.org:9443 -S -s -c /bin/bash -c 'export CONDOR_CONFIG=/opt/condor/etc/condor_config; condor_q'

Example of setting up RSV Probes

vdt-control --off osg-rsv

perl osg-rsv/bin/misc/cleanup-rsv.pl --reset

./osg-rsv/setup/configure_osg_rsv --user rsvuser --init --server y --ce-probes \
--ce-uri "gate01.aglt2.org gate02.grid.umich.edu"  --srm-probes --srm-uri "head01.aglt2.org" \ 
--srm-dir /pnfs/aglt2.org/dq2  --srm-webservice-path "srm/managerv2" --gratia --grid-type "OSG" \
--consumers --verbose --setup-for-apache --proxy /tmp/x509up_u55625

vdt-control --on osg-rsv

Upgrade OSG 1.2.6 to OSG 1.2.15

This upgrade was performed on November 2, 2010, and went very smoothly. Instructions were followed from this URL. This particular URL is linked from this master URL.

Pre-upgrade steps

# Save some files:
cd /root
mkdir osg1.2.15_up
crontab -l > osg1.2.15_up/crontab_l
vdt-control --list > osg1.2.15_up/vdt-control-list.txt
cp -p /opt/osg-1.2.6/osg/etc/config.ini osg1.2.15_up/
#
# Check some links so we can ensure they are correctly set at the end
[gate02:~]# ll /opt/osg/globus|grep TRUST
lrwxrwxrwx  1 root root    30 Apr 30 17:26 TRUSTED_CA -> /opt/certificates/certificates
[gate02:~]# ll /opt/osg/globus/share|grep cert
lrwxrwxrwx  1 root root    30 Apr 30 17:27 certificates -> /opt/certificates/certificates
#
[gate01:~]# ll /opt/osg/globus|grep TRUST
lrwxrwxrwx  1 root    50 Sep  2 12:32 TRUSTED_CA -> /afs/atlas.umich.edu/OSG_certificates/certificates/
[gate01:~]# ll /opt/osg/globus/share|grep cert
lrwxrwxrwx  1 root    50 Sep  2 12:32 certificates -> /afs/atlas.umich.edu/OSG_certificates/certificates/
#
#  Make sure that condor is cleaned.  Auto-pilots were previously stopped as this is
#    a scheduled outage.
condor_q -constr 'jobstatus==1'|grep " I "|awk '{print $1}'|xargs -n 1 condor_hold
condor_q -constr 'jobstatus==2'|grep " R "|awk '{print $1}'|xargs -n 1 condor_rm

service condor stop

export VDTSETUP_CONDOR_LOCATION=/opt/condor
export VDTSETUP_CONDOR_CONFIG=/opt/condor/etc/condor_config

Actual upgrade steps

This is a summary of the steps explained in the URL above.

cd VDT_LOCATION
source setup.sh
vdt-control --off
cp -a $VDT_LOCATION BACKUP_LOCATION

# Get the latest version of the vdt-updater script:
pacman -update VDT-Updater

# Note: If you do not yet have the updater script (look for $VDT_LOCATION/vdt/update/vdt-updater), 
#   then fetch it with this command:

pacman -get http://vdt.cs.wisc.edu/vdt_200_cache:VDT-Updater

cp -a $VDT_LOCATION NEW_BACKUP_LOCATION

vdt/update/vdt-updater

cp osg/etc/config.ini /tmp/config.ini-backup

pacman -update osg-version
pacman -update osg-config

cp  /tmp/config.ini-backup osg/etc/config.ini

# After updating, re-source the setup.sh file to load any changes in the environment:

source setup.sh

vdt-post-install

On a CE, you will also need to reconfigure your system

configure-osg -v
configure-osg -c

# Get rid of the gratia probes for gate02 running from gate01
cd /opt/osg/osg-rsv/submissions/probes
mv gate02*gratia* /root/osg1.2.15_up

# Note that the srmcp-srm-probe is also different, having been modified to use
# a dCache token-controlled area.  Compare to /root/srmcp-srm-probe
# Directory is /opt/osg/osg-rsv/bin/probes

vdt-control --on

service condor start

Upgrade OSG 1.2.15 to OSG 1.2.16

Smooth upgrade. Also added in Rack 110 and 119 workers, and bl-5 workers, as sub-clusters 7-9.

This was a small step in versions. Instructions were therefore followed from this URL instead of the path followed for the 1.2.15 upgrade.

Upgrade OSG 1.2.16 to OSG 1.2.19

Smooth upgrade following directions. Two complications and one change.
  • print_local_time = TRUE (or anything) is no longer supported for rsv times in config.ini
  • The max value of SI00 is 5000, whereas we had 6700 for the sub-cluster where it was needed, so that is now reset to 5000
  • org.osg.gratia.condor and org.osg.gratia.metric probes were disabled for rsv on gate02. This is made possible by the new rsv-control command documented here.
    • rsv-control --disable --host gate02.grid.umich.edu org.osg.gratia.condor org.osg.gratia.metric
    • This was followed by a gate01 reboot that actually turned off these probes.

Upgrade OSG 1.2.19 to OSG 1.2.23

Pre-upgrade note:

Directions here look straightforward. However, condor.pm must be modified as I understand it is changed in this release.

Post-upgrade note:

Modified condor.pm to not invoke the new condor_account_groups.pm . This was the only real change to condor.pm in this update, on both gate01 and gate02.

gate02 updates smooth and by the book

gate01 updated with one modification to the procedure. Before the last step, "vdt-control --on", a check of the rsv probes shows the same two probes as in the 1.2.19 update were once again enabled. Disabled them.
  • rsv-control --disable --host gate02.grid.umich.edu org.osg.gratia.condor org.osg.gratia.metric

Upgrade OSG 1.2.23 to OSG 1.2.24

Upgrade went smoothly on both gate keepers.

On gate01 the rsv metrics were again disabled. In addition, the global timeout was changed from 1200 to 720 seconds, and the srmcp-readwrite condor-cron interval was changed from "28 *" to "13,28,43,58 *". The following two files were edited to achieve this.
  • /opt/osg/osg-rsv/etc/rsv.conf (timeout)
  • /opt/osg/osg-rsv/meta/metrics/org.osg.srm.srmcp-readwrite.meta (periodicity)
Both gate01 and gate02 were rebooted following the updates. rsv probes that failed during the downtime were run and the report was fully green.

Upgrade OSG 1.2.24 to OSG 1.2.25

Upgrade only gate02 following directions. Total outage was approximately 20 minutes.

On gate01, perform only the gratia fix outlined at https://ticket.grid.iu.edu/goc/viewer?id=11248

Upgrade to OSG 1.2.28

Upgrade following directions. No changes in condor.pm or in config.ini.
The srmcp-readwrite rsv probe required a second change, that perhaps should have been there all along. The change is shown in this output from the diff command:
[gate01:probes]# diff srmcp-srm-probe srmcp-srm-probe.orig
103c103
<     my $srmcp_cmd = "$o{'srmcpCmd'} -space_token=5904816 -streams_num=1 -srm_protocol_version=".
---
>     my $srmcp_cmd = "$o{'srmcpCmd'} -streams_num=1 -srm_protocol_version=".
109c109
<     $srmcp_cmd = "$o{'srmcpCmd'} -space_token=5904816 -streams_num=1 -srm_protocol_version=".
---
>     $srmcp_cmd = "$o{'srmcpCmd'} -streams_num=1 -srm_protocol_version=".

The metric interval changes made in the upgrade to version 1.2.24 were retained in this update and did not require re-implementation.

The rsv probe disable for gate02 made in the upgrade to version 1.2.23 was again performed.

Upgrade OSG 3.3

The HTCondor repo is installed, but not active, on aglbatch. From there we can see the URL for the repo is http://research.cs.wisc.edu/htcondor/yum/stable/rhel6/ so browse to there and download the needed rpms.

condor-8.4.11-1.el6.x86_64.rpm
condor-classads-8.4.11-1.el6.x86_64.rpm
condor-cream-gahp-8.4.11-1.el6.x86_64.rpm
condor-external-libs-8.4.11-1.el6.x86_64.rpm
condor-procd-8.4.11-1.el6.x86_64.rpm
condor-python-8.4.11-1.el6.x86_64.rpm

The test case is on gate03, but all gatekeepers are treated identically following confirmation of success on gate03.

# Stop cfengine
service cfengine3 stop

# Stop Condor without terminating the shadow/WN processing
condor_off -fast

# Update condor
cd /atlas/data08/ball/admin/condor_rpms_8.4.11
yum localupdate condor-8.4.11-1.el6.x86_64.rpm condor-classads-8.4.11-1.el6.x86_64.rpm \
condor-external-libs-8.4.11-1.el6.x86_64.rpm condor-procd-8.4.11-1.el6.x86_64.rpm \
condor-python-8.4.11-1.el6.x86_64.rpm condor-cream-gahp-8.4.11-1.el6.x86_64.rpm

# Update osg
yum --enablerepo=osg update

# Now, here, watch the yum output for .rpmnew files, check each one thoroughly to understand
# it, and make any needed cf3 changes to config files.  When all is happy....

# Run cf-agent to re-establish anything needing it
cf-agent -Kf failsafe.cf; cf-agent -K

# Verify the osg configuration is clean....
osg-configure -v

# Then apply it
osg-configure -c

# And then reboot.
reboot

Clean install of OSG 3.3

gate02 choked. So, a new VM was cloned from the old, it was Cobbler built, and then the following steps were undertaken to do a full osg 3.3 install. This resulted in OSG 3.3.23. For now, this is just a "history" dump. This left condor and condor-ce stopped.

   48  yum install yum-plugin-priorities
   50  rpm -Uvh https://repo.grid.iu.edu/osg/3.3/osg-3.3-el6-release-latest.rpm
   51  yum --enablerepo=osg-empty install empty-ca-certs
   54  yum --enablerepo=osg install condor
   56  yum --enablerepo=osg install osg-ce-condor
   60  mkdir /root/saves
   61  cp -ar /etc/condor/config.d /root/saves/condor_config.d
   64  cp -ar /etc/condor-ce/config.d /root/saves/condor-ce_config.d
   65  cp -ar /etc/osg/config.d /root/saves/osg_config.d
   67  yum --enablerepo=osg install rsv
   68  cf-agent -Kf failsafe.cf; cf-agent -K
   69  service cfengine3 stop
   75  cf-agent -Kf failsafe.cf; cf-agent -K
   82  service autofs start
   84  osg-configure -v
   87  reboot
   88  exit

   90  yum install ruby
   91  yum install rubygems
   92  yum install rubygem-json.x86_64 rubygem-pg
   93  yum install rubygem-activesupport.noarch
   94  gem install activerecord -v 2.3.18
   95  gem list

Manually edit in a gums server in /etc/lcmaps.db

[root@gate02 osg]# chkconfig gums-client-cron on
[root@gate02 osg]# service gums-client-cron start
Enabling periodic gums-host-cron:                          [  OK  ]
#   Run it once manually
[root@gate02 osg]# [[ ! -f /var/lock/subsys/gums-host-cron ]] || /usr/bin/gums-host-cron

yum install tomcat6
chkconfig tomcat6 on
cf-agent run
Check that the http certs are owned by tomcat.  Found false, so
chown tomcat.tomcat /etc/grid-security/http/*.pem
service tomcat6 start

chkconfig --add gratia-probes-cron
chkconfig gratia-probes-cron on
service gratia-probes-cron start


Clean install of OSG 3.4

In December, 2017, an SL7.4 gatekeeper, gate01.aglt2.org, was built from Cobbler, and utilizing a full install from cfengine3, including all OSG repos. This was set to use OSG 3.4 via resolved link from the generic rpm osg-3.4-el7-release-latest.rpm to osg-release-3.4-2.osg34.el7. So, Cobbler run to build the gatekeeper, cfengine3 runs, multiple times, until no errors are returned, to configure the gatekeeper.

Currently (12/21/2017) this is running with only a small sub-cluster in 30-gip.ini (/etc/osg/config.d) and has about 40 WN slots backing it. The OIM Resource AGLT2_PROD is defined, but AGIS is broken and so no PandaQueue can be cloned until January.

Following multiple, initial cf-agent runs, the directions at this OSG URL were followed to set the machine going. This seems to devolve down to "yum install osg-ce-condor". Upon install, condor-cron is enabled to run, but none of rsv, condor-ce or condor were enabled. We found the following manual actions were required to start this gatekeeper going

  • systemctl enable rsv
  • systemctl enable condor-ce
  • systemctl enable condor
  • osg-configure -v
  • osg-configure -c
  • systemctl start rsv
  • systemctl start condor
  • systemctl start condor-ce
  • rsv-control --run --all-enabled
  • chkconfig -add gratia-probes-cron
  • chkconfig gratia-probes-cron on
  • service gratia-probes-cron start

Various submit tests were successfully performed.

Notes on installing osg 3.4 on gate02, April, 2018

The repo would not install from cfe, and was manually installed

AFTER THE FACT NOTE; CFENGINE WAS NOT PROPERLY CONFIGURED TO SAY THAT GATE02 WAS SL7, HENCE THIS ERROR IN SETTING UP THE REPO

Had to manually "yum install osg-version"

The rsv service does not need to run, but, we see that the "org.osg.general.vo-supported" probe is now deprecated (both gate01 and gate02)

Remembered to dump from old gate02, and re-copy to new gate02, the condor job plot files from /var/www/html/Monitoring
  • count_MP8_logs directory
  • count_logs directory
  • MPmon_count.log
  • count_jobs.log

From /opt/condor/scan, compile the .c program, and restore scanHistoryTime

The ruby install documented here does not work, ruby crashes

REMEMBER, run the rsv user cron once, manually, to generate a grid proxy.

Manually mount /pnfs after cfe adds it to /etc/fstab

CFEngine issues on initial gatekeeper build

The lines of policy in osg_ce_condor.cf where the cron-intervals are modified for 2 probes, do not work with the out-of-the-box rpms, becasue they do not take into account that the intervals are commented out. This should be fixed at some point.

-- WenjingWu - 09 Jul 2008

Topic attachments
I Attachment Action Size Date Who Comment
gate01-config.iniini gate01-config.ini manage 4 K 23 Aug 2009 - 23:06 BenMeekhof gate01:/opt/osg-1.2/osg/etc/config.ini for reference
gate02-config.iniini gate02-config.ini manage 3 K 23 Aug 2009 - 23:02 BenMeekhof gate02:/opt/osg-1.2/etc/config.ini for reference
Topic revision: r42 - 17 May 2018, BobBall
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback