Fill the umatlas repository

Download the rpms

Download the relevant rpms into the umatlas repo from condor repo http://research.cs.wisc.edu/htcondor/yum/stable

Because the central server aglbatch is running on SL6, so make sure to download both el6 and el7 package(x86_84).

Take el6 for example (el7 should have the equivalent rpms)

-bash-4.2$ hostname
Sysprov02.aglt2.org
-bash-4.2$ pwd
/home/packagers/wuwj/umatlas_test/packages/el6/x86_64
-bash-4.2$ ls condor*
condor-8.6.13-1.el6.x86_64.rpm      condor-classads-8.6.13-1.el6.x86_64.rpm        condor-external-libs-8.6.13-1.el6.x86_64.rpm  condor-kbdd-8.6.13-1.el6.x86_64.rpm   condor-python-8.6.13-1.el6.x86_64.rpm
condor-all-8.6.13-1.el6.x86_64.rpm  condor-classads-devel-8.6.13-1.el6.x86_64.rpm  condor-externals-8.6.13-1.el6.x86_64.rpm      condor-procd-8.6.13-1.el6.x86_64.rpm

Please note:

Can use “rpm -qa|grep condor “ to figure out what rpms to download, and update the umatlas repo from sysprov02,

Because the osg repo also has condor rpms, to make sure to exclude condor from the osg.repo or osg-el6.repo
-bash-4.2$ more /etc/yum.repos.d/osg.repo
[osg]
exclude=*condor*

Update condor via yum

#yum clean all; yum --enablerepo=umatlas_test update condor*

For different types of nodes, the procedure varies

Work nodes

  • Retire condor from the work node first, wait until all the jobs are finished. Other condor update will stop the condor services, and the node will fail the jobs.
  • Update condor via yum
#yum --enablerepo=umatlas_test update condor*
  • Reconfigure condor
#cf-agent -Kf failsafe.cf;cf-agent -K -b condor_t2
  • Make sure the work node passes the sanity checks, if so, it will start condor again.

#sh /root/tools/health_check

Interactive machines(umt3int0X)

  • Update condor via yum

#yum clean all; yum --enablerepo=umatlas_test update condor*
  • Reconfigure condor

#cf-agent -Kf failsafe.cf;cf-agent -K -b condor_t2;cf-agent -K -b umt3int
  • Check if condor is running

 #systemctl status condor

The update will shut down condor services, but the submitted jobs won’t be lost

Central server (aglbatch.aglt2.org, running SLC6)

  • Update condor via yum

#yum clean all; yum --enablerepo=umatlas_test update condor*
  • Reconfigure condor
#cf-agent -Kf failsafe.cf;cf-agent -K -b condor_t2
  • Check if condor is running
 #service condor status

Gatekeepers (gate01/02/03)

The condor on the gatekeepers are installed from the osg.repo. They are always up to date whenever we upgrade OSG software.

-- WenjingWu - 07 Feb 2020
Topic revision: r1 - 07 Feb 2020 - 20:30:38 - WenjingWu
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback