Procedures for building a Rocks 6.1 Frontend

Background

Please also refer to Build Rocks 5.5 Frontend

The Rocks 6.1 jumbo roll comes with CentOS built in. However, it was non-trivial to figure out how to install with SL6.4 . Investigating, I found an Email from Philip Papadopoulos of SDSC from 9/27/2012. In this, he described how to create an SL Roll that Rocks 6.1 could use.

rocks create mirror
http://ftp.scientificlinux.org/linux/scientific/6.3/x86_64/os/Packages/
rollname=SL
version=6.3

-P

Current SL version is 6.4, and we have a nightly mirror of this repo, so I used the following command:

rocks create mirror http://linat05.grid.umich.edu/pub/SL/6x/x86_64/os/Packages/ rollname=SL version=6.4

This created the iso in the directory from which the command ran, and using it I successfully built umrocks6.aglt2.org. See below for complete details and file locations.

VM setup

Create a VM with:

  • 2 CPUs
  • 4GB memory
  • a 40GB volume for / (34GB), /var/cache/openafs (4GB), and swap (2GB)
  • a 200GB volume for /export
  • a 20GB volume for /var
  • 2 NICs, one on local and one on public AGLT2 VLANs
    • The VMXNET 3 NIC driver is recognized by this version of Rocks

Create an SL6 iso useful for the Rocks6 frontend build

The standard SL6 iso DVDs are not useful when installing a Rocks6 frontend. Unlike Rocks 5.5, they are not recognized. Instead, create one using the "rocks create mirror" command. Point this at the local SL6 mirror at UM, that is updated nightly. Run this on the current Rocks 5.5 headnode, umrocks.

Copy the iso to a useful location
  • cp SL-6.4-0.x86_64.disk1.iso /atlas/data08/ball/admin/rocks

Making an os update roll

For doing an SL update, the same procedure can be used on the new, umrocks6 head node after it is built. That will give a local copy of all the SL security rpms, and it can be used for the update process. Make the repo from http://linat05.grid.umich.edu/pub/SL/6x/x86_64/updates/security/

Using that repo the procedure described here will work. The roll in this case is called agl-update-sl6 and is checked into svn.

ROCKS frontend kickstart

The ROCKS docs for frontend build (screen shots etc.) are here.

Fetch the ROCKS 6.1 jumbo DVD. This includes the current 6.1 service pack, as of May 15, 2013. Place it and the SL6.4 iso on the MD3600i volume. Attach the jumbo DVD to the CD of the machine, set it to mount at power-on, and make sure the VM boots first from that device. In the console of the machine, enter the following command to start the process:

build ksdevice=eth1 asknetwork

As usual, eth1 is the public network. Configure networking using manual configuration option (just IPV4), set address info for public interface, and some DNS server visible from the public NIC. The installer then will switch to graphical mode.

Select these 5 rolls from ROCKS Jumbo DVD:

  • base
  • ganglia
  • kernel
  • web-server
  • service pack
Then provide the SL64 ISO and select the roll name presented.

Select manual partitioning and create the ext4 partition as indicated above. Switch attached iso files as requested.

Boot into Rocks 6

Configure various Rocks attributes

Various Rocks attributes such as the latitude and longitude of the cluster are configured at build time above. To see what these are currently set as, do "rocks list attr" on the Rocks 5.5 headnode of the cluster. However, others may need setting, and current appliance and host definitions are not (yet) transferred. To get the current set from umrocks, I used the shell script at /export/rocks/svn-trunk/hostconfigs/msurxx.aglt2.org/tools/rocks-db-dump.sh, but made one modification:

[root@umrocks tools]# diff rocks-db-dump.sh /root/tools/rocks-db-dump.sh
5a6
> mkdir -p $outdir

With that change on umrocks, I then produced the dumps, and proceeded to make a few shell scripts based upon them. The shell history below shows what was done, as near as I can reproduce a few weeks after the fact.

  701  /root/tools/rocks-db-dump.sh
  702  cd /var/svn/umrocks.aglt2.org/rocks-db
  719  cp rocks-dump.out rocks-dump.sh
  725  sed -i /rocks\ add\ appliance\ attr/d rocks-dump.sh
  729  sed -i /add\ host/d rocks-dump.sh

Delete any lines that are no longer needed, or are duplicates, for example, built-in appliance
configurations (nas, frontend, compute....) using your favorite editor.

Delete any lines that were added via the initial Rocks6 above.

I found the PrivateDNSServers were incorrect, so I set it the way I wanted it, ie, UM private, 
then MSU private
rocks set attr Kickstart_PrivateDNSServers 10.10.1.195,10.10.128.8

The file I ended up with is shown below.  Some lines commented out, as I did not know if I would 
need them later.

  753  grep "add\ host" rocks-dump.out > rocks-dump-add-host.sh
  756  sed -i -e 'sZ\"add\ host\"Zadd\ hostZ' rocks-dump-add-host.sh

Delete any host information where the host no longer exists.

  761  sed /sysdevice/d rocks-dump-hostinfo.out >rocks-dump-hostinfo.sh
  762  vi rocks-dump-hostinfo.sh
  763  sed -i /condor_conf/d rocks-dump-hostinfo.sh
  764  sed -i /gmond_conf/d rocks-dump-hostinfo.sh

Delete any host information where the host no longer exists.

[root@umrocks6 create_db] cat rocks-dump.sh
/opt/rocks/bin/rocks set attr Kickstart_PrivateDNSServers 10.10.1.195,10.10.128.8
/opt/rocks/bin/rocks add appliance bl graph=default node=bl membership=Dell-Blade public=yes
/opt/rocks/bin/rocks add appliance bl-mdisk graph=default node=bl-mdisk membership=Dell-Blade-JBOD public=yes
/opt/rocks/bin/rocks add appliance ci graph=default node=ci membership=Interactive public=yes
/opt/rocks/bin/rocks add appliance dc2 graph=default node=dc2 membership=Dell-Compute-JBOD public=yes
/opt/rocks/bin/rocks add appliance dc graph=default node=dc membership=Dell-compute public=yes
/opt/rocks/bin/rocks add appliance kickstart membership=Kickstart public=yes
/opt/rocks/bin/rocks add appliance sx graph=default node=sx membership=Sun-X4600 public=yes
/opt/rocks/bin/rocks add attr agl_site UM
/opt/rocks/bin/rocks add attr cfe_policy_host 10.10.2.51
/opt/rocks/bin/rocks add attr cfe_policy_path /var/cfengine/masterfiles-rocks
/opt/rocks/bin/rocks add attr primary_net public
/opt/rocks/bin/rocks add attr primary_net_ssh private
# /opt/rocks/bin/rocks add attr ssh_use_dns true
/opt/rocks/bin/rocks add network physics98 subnet=141.213.133.192 netmask=255.255.255.224 mtu=1500
# /opt/rocks/bin/rocks add os attr linux HttpConf /etc/httpd/conf
# /opt/rocks/bin/rocks add os attr linux HttpConfigDirExt /etc/httpd/conf.d
# /opt/rocks/bin/rocks add os attr linux HttpRoot /var/www/html
# /opt/rocks/bin/rocks add os attr linux RootDir /root


Execute the rocks-dump.sh on the Rocks6 host. Retain the other 2 shell scripts until they are useful.

Check the network configuration

The network routing tables may not be correct, either for the frontend, or for the rocks build parameters. So, fix it. At UM, this consisted of:

------------------- Rocks6 begin --------------------
[root@umrocks ~]# vi /etc/sysconfig/static-routes
At this point on Rocks 6.1, also changed /etc/sysconfig/static-routes, commenting all out, and 
inserting these 3 lines:

any net 224.0.0.0 netmask 240.0.0.0 dev eth0
any net 10.10.128.0 netmask 255.255.240.0 gw 10.10.1.2
any net 10.1.0.0 netmask 255.255.254.0 gw 10.10.1.2

Restart the network now:
service network restart

Now, fix up the Rocks DB entries to which these correspond
------------------- Rocks6 end --------------------
[root@umrocks ~]# rocks list route
NETWORK          NETMASK         GATEWAY
224.0.0.0:       255.255.255.0   private
255.255.255.255: 255.255.255.255 private
0.0.0.0:         0.0.0.0         10.10.1.42
192.41.230.42:   255.255.255.255 10.10.1.42

------------------- Rocks6 and Rocks 5 start  --------------------
[root@umrocks ~]# rocks remove route 0.0.0.0
[root@umrocks ~]# rocks remove route 255.255.255.255
[root@umrocks ~]# rocks remove route 192.41.230.42
[root@umrocks ~]# rocks remove route 224.0.0.0
[root@umrocks ~]# rocks add route 0.0.0.0 192.41.230.1 netmask=0.0.0.0
[root@umrocks ~]# rocks add route 10.1.0.0 10.10.1.2 netmask=255.255.252.0
[root@umrocks ~]# rocks add route 10.10.128.0 10.10.1.2 netmask=255.255.240.0
[root@umrocks ~]# rocks add route 224.0.0.0 eth0 netmask=240.0.0.0
------------------- Rocks6 and Rocks 5 end  --------------------

[root@umrocks ~]# rocks list route
NETWORK        NETMASK         GATEWAY
0.0.0.0:       0.0.0.0         192.41.230.1
10.1.0.0:      255.255.252.0   10.10.1.2
10.10.128.0:   255.255.240.0   10.10.1.2
224.0.0.0:     240.0.0.0       eth0

[root@umrocks ~]# rocks sync config

For MSU, the following settings should be implemented:

/etc/sysconfig/static-routes should be:
any net 224.0.0.0 netmask 240.0.0.0 dev eth0
any net 10.10.0.0 netmask 255.255.240.0 gw 10.10.128.1
any net 10.1.0.0 netmask 255.255.254.0 gw 10.10.128.1

[root@msurx6 ~]# rocks list route
NETWORK    NETMASK       GATEWAY
0.0.0.0:   0.0.0.0       192.41.236.1
10.1.0.0:  255.255.252.0 10.10.128.1
10.10.0.0: 255.255.240.0 10.10.128.1
224.0.0.0: 240.0.0.0     eth0

Set the local domain name at MSU

Ssh to frontend as root and ...

Make sure that ROCKS DB has correct name for private network ("msulocal") and that the /etc/hosts file includes correct private entry for the host.

$ rocks set network zone private msulocal
$ rocks set attr Kickstart_PrivateDNSDomain msulocal
$ rocks sync config

$ cat /etc/hosts | grep msurx
10.10.128.11    msurx.msulocal  msurx
192.41.236.11   msurx.aglt2.org

The /etc/hosts file on any WN built should also have a .msulocal entry for itself following this.

Set up the cvmfs account entries

Add the following two lines to /etc/group
  • fuse:x:301:cvmfs
  • cvmfs:x:302:

Add this line to /etc/passwd
  • cvmfs:x:302:302:CernVM-FS service account:/var/cache/cvmfs2:/sbin/nologin

Configure using cfengine3

Grab the tar file cf3_bootstrap.tar from /atlas/data08/ball/admin/fs_saves_dir, and unpack it in, say /root

  • tar xf cf3_bootstrap.tar
  • cd cf3_bootstrap
  • ./bootstrap_cf3.sh
The tar file and created directory can now be deleted. CF3 is pointed at masterfiles-T2, on the site-specific cf3 server. Now, cf3 can be run
  • source /etc/profile.d/cfengine3.sh
  • cf-agent -f failsafe.cf; cf-agent
Run it a second time, for grins. Add the -K argument though as it is otherwise too close in time to the first run.
  • cf-agent -K

SVN and Bind Mounts

Only a portion of the Rocks tree is needed/checked out, so, do the following

  • cd /export/rocks
  • mkdir svn-trunk
  • cd svn-trunk
  • set up your ssh key for accessing svn
  • svn co svn+ssh://ndt.aglt2.org/repos/rocks/trunk/tools
  • svn co svn+ssh://ndt.aglt2.org/repos/rocks/trunk/rolls-src
  • mkdir /export/rocks/install/tools
  • Add the bind mount to /etc/fstab, adding this line
    • /export/rocks/svn-trunk/tools /export/rocks/install/tools none bind 0 0
  • Mount the bind mount
    • mount /export/rocks/install/tools
  • Add the following alias to /root/.bashrc
    • alias build_distro="/root/tools/build_distro.sh"
    • The build_distro.sh is distributed via svn. I use it to take the assembled rolls and create the Rocks build profiles.

Put the CFE client keys in place

To do this, follow the equivalent Rock5.5 directions

Build the Rocks distribution

ON umrocks, the following rolls are now active.

[root@umrocks6 ~]# rocks list roll
NAME            VERSION ARCH   ENABLED
ganglia:        6.1     x86_64 yes
web-server:     6.1     x86_64 yes
service-pack:   6.1     x86_64 yes
kernel:         6.1     x86_64 yes
base:           6.1     x86_64 yes
SL:             6.4     x86_64 yes
agl-osg3:       0.07    x86_64 yes
agl-lustre:     0.05    x86_64 yes
agl-umt3:       0.03    x86_64 yes
agl-condor:     0.08    x86_64 yes
agl-dell-7.0:   0.08    x86_64 yes
agl-cfengine:   0.18    x86_64 yes
agl-update-sl6: 0.01    x86_64 yes
agl-base:       0.27    x86_64 yes
agl-cvmfs:      0.06    x86_64 yes

This list was assembled as follows, adding agl-lustre and agl-umt3 on the UM cluster to the lists below

$ cd rolls-src
$ for aroll in agl-base agl-cfengine agl-condor agl-cvmfs agl-dell-7.0 agl-osg3 agl-update-sl6; do cd $aroll; make clean; make roll; cd ..; done

$ for aroll in agl-base agl-cfengine agl-condor agl-cvmfs agl-dell-7.0 agl-osg3 agl-update-sl6; do cd $aroll; rocks add roll *.iso; cd ..; done

$ for aroll in agl-base agl-cfengine agl-condor agl-cvmfs agl-dell-7.0 agl-osg3 agl-update-sl6; do rocks enable roll $aroll; done

Then, make the distribution using the alias defined above
  • build_distro

Yum update the head node against itself

The agl-update-sl6 roll contains late security updates, etc. The following should work. I do not recall if any other repo had to be enabled to accomplish this update, but I believe the answer to that question is "no".
  • yum clean all
  • yum update
  • reboot

Work around for public_net ssh issue

As with Rocks 5.5, the net used for ssh was changed. At line 365 of /opt/rocks/lib/python2.6/site-packages/rocks/commands/run/host/__init__.py change to "primary_net_ssh" from "primary_net" and add that attribute to the db. (This attribute addition was already made above)
  • rocks add attr primary_net_ssh private

Problems

  • Something is wrong with Ganglia. The rrd partition fills, errors are logged accessing 127.0.0.1....

Example: Extract information on a single host for addition to the Rocks DB

During testing, it was desirable to extract information on a single host from the rocks dump information above, so that it could be PXE booted into SL6. The procedure below exemplifies how to do this for the host bl-11-1.aglt2.org (bl-11-1.local).

[root@umrocks6]# cd /root/create_db

[root@umrocks6 create_db]# ll
total 676
-rw-r--r-- 1 root root   1067 May 21 13:37 bl-11-1.sh
-rw-r--r-- 1 root root   1067 May 22 15:38 bl-11-2.sh
-rw-r--r-- 1 root root   1067 May 27 15:41 bl-11-3.sh
-rw-r--r-- 1 root root   1065 May 29 16:20 dc-1-28.sh
-rw-r--r-- 1 root root  20365 May 19 20:30 rocks-dump-add-host.sh
-rw-r--r-- 1 root root 208276 May 19 20:30 rocks-dump-hostinfo.sh
-rw-r--r-- 1 root root   1420 May 23 20:57 rocks-dump.sh

[root@umrocks6 create_db]# grep "bl\-11\-1\ " rocks-dump-add-host.sh > temp.sh
[root@umrocks6 create_db]# grep "bl\-11\-1\ " rocks-dump-hostinfo.sh >> temp.sh

[root@umrocks6 create_db]# diff bl-11-1.sh temp.sh
16,17d15
< /opt/rocks/bin/rocks set host boot bl-11-1 action=install
<

The last line, action=install, was added by hand to ensure the machine will build. It is necessary to do a "shoot node" on the SL5 machine (from the new frontend) before the SL6 head node can successfully be used as a PXE host, as the partitions are all rebuilt as ext4. This is equivalent to doing "/bin/rm -f /.rocks-release; /boot/kickstart/cluster-kickstart-pxe" on the WN.

-- BobBall - 15 May 2013
Topic revision: r11 - 07 Jun 2013, BobBall
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback