Setting up gate02.grid.umich.edu as our AGL-Tier2 Gatekeeper

There are a number of steps we followed to get our new gatekeeper running.

Hardware

We had an Intel SE7520AF2 motherboard from VENUS.ultralight.org (at CERN, bldg 513) which we replaced. We used this motherboard as the basis for the new gatekeeper. We purchased an Intel chassis SC5300BASE to house it. Initial installs failed (problem with motherboard) so we RMA'ed the board. Recieved the replacement on September 6th. Processors are Intel Xeon P4 3.6 GHz, 2MB cache dual procesors. We have 4 GB of DDR2 400 RAM.

Completed install of SLC V4.3 on Thursday, September 6th.

Software

After installing and upgrading (via YUM) SLC V4.3 x86_64, we tried to install Intel Server Management software V8.4. Mostly successful except for the IMB driver build. Newest version of gcc (V3.4.6) complains:

gcc -O2 -DLINUX -D__KERNEL__ -DMODULE -DMODULES -I. -I/lib/modules/2.6.9-42.0.2.EL.cernsmp/build/include -DCONFIG_IA64_GENERIC -D__SMP__   -c -o imb_lin.o imb_lin.c
imb_lin.c: In function `IMBmmap':
imb_lin.c:891: error: dereferencing pointer to incomplete type
imb_lin.c:910: error: dereferencing pointer to incomplete type
imb_lin.c:910: error: dereferencing pointer to incomplete type
imb_lin.c:910: error: dereferencing pointer to incomplete type
imb_lin.c:910: error: dereferencing pointer to incomplete type
make: *** [imb_lin.o] Error 1

Older version (V3.4.5) doesn't have this problem.

Need to install/setup alternate gcc on gate02 to allow us to compile

September 13: No, the problem was we were apparently using an older version if ipmidrvr.rpm. I copied the code from gate01 and things worked.

Setup Authentication using AFS/NIS/Kerberos

We needed to install our AFS software and configure NIS/Kerberos:

  • We first copied the /etc/yum.conf and /etc/yum.repos.d from gate01 to gate02
  • We installed openafs V1.4.1-1.4 from linat05 (our repository) after removing the CERN openafs
  • We copied the /etc/krb5.conf from gate01
  • We copied the /etc/nsswitch.conf from gate01
  • We copied the /etc/pam.d/system-auth from gate01

At this point logins worked and users would get Kerberos TGTs but not AFS tokens. We need to copy the /etc/ssh/sshd_config from gate01 and restart sshd. After this users get both Kerberos TGTs and AFS tokens.

Michigan node configuration

We have a number of tasks to get the node working as part of our cluster:
  • Setup yum correctly
  • Setup net-snmp
  • Install correct smartd and configure to monitor local disks
  • Add node to Cacti
  • Setup iptables

Setup OSG software

We had already installed the OSG software on gate01.grid.umich.edu. Our plan is to use this installation for gate02 as well. The software was installed in AFS at /afs/atlas.umich.edu/OSG. Details:

[root@gate02 src]# fs lsmount /afs/atlas.umich.edu/OSG64
'/afs/atlas.umich.edu/OSG64' is a mount point for volume '#OSG_041'
[root@gate02 src]# fs lsmount /afs/atlas.umich.edu/OSG32
'/afs/atlas.umich.edu/OSG32' is a mount point for volume '#OSG32_041'
[root@gate02 src]# fs lsmount /afs/atlas.umich.edu/OSG
'/afs/atlas.umich.edu/OSG' is a mount point for volume '%OSG32_041'

I copied a number of services from gate01 to gate02:
  • MLD
  • edg-crl-upgraded
  • gris
  • globus-ws
  • mysql

I also copied gsiftp and globus-gatekeeper from gate01:/etc/xinetd.d to gate02.

Last change was to add the above services to /etc/services:
siftp  2811/tcp        # Added by the VDT
globus-gatekeeper       2119/tcp        # Added by the VDT

Reinstallation of OSG software

The OSG software needed an update and running 'pacman -update' got it in a broken state. I decided to reinstall the OSG software on September 19, 2006 into /afs/atlas.umich.edu/OSG. I first moved the existing install:
cd /afs/atlas.umich.edu/OSG
mkdir OSG_Old_Install
mv * OSG_Old_Install

I then got Pacman and unpacked it into /afs/atlas.umich.edu/OSG/pacman-3-18.5

I ran script install_OSG_gate02.log and then:
export VDT_PRETEND_32=1
pacman -allow save-setup
pacman -get OSG:ce

Started at 10:31 AM. Answered some trust questions and one about installing on this platform.

Details of setup and config are on OSGInstallGate02

Finished pacman install at 10:47 AM.

Configuration of Accounts/Permission/Authorization

Testing of access to gatekeeper

Installation of DQ2 Software

Setup Torque client software

We need to setup the Torque (OpenPBS) client for gate02. Part of a message from Andy Caird (CAC) follows:
You can download Torque from http://www.clusterresources.com/downloads/torque/ - we're running 2.1.2 on nyx.  I've opened up the Torque ports on nyx to 141.211.43.122, so when you get it compiled, you should be able to type a command like:
   qstat @nyx
and get some output.

The subnet that the nyx nodes will come from is 141.212.30.0/28 (141.212.30.0-141.212.31.255).

Let us know if you have any questions about Torque.

--andy

I downloaded the source on gate02:/root/

[root@gate02 ~]# wget http://www.clusterresources.com/downloads/torque/torque-2.1.2.tar.gz
[root@gate02 ~]# cd /usr/local/src
[root@gate02 src]# tar -zxvf /root/torque-2.1.2.tar.gz 
torque-2.1.2/
torque-2.1.2/contrib/
...
<verbatim>

I then built/installed the 32bit version:

=#  Similarly, 32bit builds on an x86_64 platform:=
= ./configure CC="gcc -m32"=
=make=
*NOTE: this failed with the messages below:*
<verbatim>
gcc -m32 -DHAVE_CONFIG_H -I. -I. -I../../src/include  -I../../src/include  -I/usr/X11R6/include -DPBS_SERVER_HOME=\"/var/spool/torque\" -DPBSPD=\"/usr/local/bin/pbspd\" -g -O2 -D_LARGEFILE64_SOURCE -c `test -f 'qstat.c' || echo './'`qstat.c
qstat.c:107:23: tclExtend.h: No such file or directory
qstat.c: In function `attrlist':
qstat.c:1576: warning: passing arg 2 of `Tcl_Merge' from incompatible pointer type
qstat.c:1581: warning: passing arg 2 of `Tcl_Merge' from incompatible pointer type
qstat.c: In function `tcl_stat':
qstat.c:1673: warning: passing arg 2 of `Tcl_Merge' from incompatible pointer type
qstat.c:1678: warning: passing arg 2 of `Tcl_Merge' from incompatible pointer type
qstat.c:1682: warning: passing arg 2 of `Tcl_Merge' from incompatible pointer type
qstat.c: In function `tcl_run':
qstat.c:1709: warning: assignment discards qualifiers from pointer target type
make[2]: *** [qstat.o] Error 1
make[2]: Leaving directory `/usr/local/src/torque-2.1.2/src/cmds'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/local/src/torque-2.1.2/src'
make: *** [all-recursive] Error 1
</verbatim>
I then retried the =config= step by specifying:
=./configure CC="gcc -m32" --without-tcl=
This worked.  Then:
=make install_clients=

After this =qstat= works:
<verbatim>
[gate02:torque-2.1.2]# /usr/local/bin/qstat @nyx.engin.umich.edu
Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
14436.nyx           insds3           fiedler         473:11:2 R violi          
14898.nyx           BGO4W            wangyi          123:36:4 R long           
14899.nyx           BGO4W            wangyi          121:37:2 R long           
...
</verbatim>

---+++ Testing NYX (Opteron, North Campus) Access

---+++ End-to-End Testing of PANDA Submission


-- Main.ShawnMcKee - 11 Sep 2006
</verbatim>
<nop>

This topic: AGLT2 > Gate02Setup
Topic revision: 19 Sep 2006, ShawnMcKee
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback