Hardware Transition Planning from head01 (old R610) to head01-temp (new R630)

We purchased a new Dell R630 to act as replacement hardware for our existing head01 system (on R610 hardware). This document describes the neccessary steps to switch to the new hardware with minimal downtime.

The R630 has been installed with SL6.6 64-bit and named head01-temp (10.10.1.43; 192.41.230.43). It currently has the same version of Postgresql installed as is on the current head01.

We have configured it to be a hot-standby streaming replica of the current head01 Postgresql DBs.

There are three main areas that need some detailed planning:
  1. dCache
  2. Postgresql
  3. Host configuration (IP/name; crontabs; RPMS)

dCache

Since dCache is not running on head01-temp we can copy the complete configuration over from the current head01 system to make sure it is ready to start and identically configured. After install the right dCache RPMS, the following are the files we need to make sure are copied over from the existing head01:

Note that some of these files are straight from the dCache rpm, some are modified when GUMS needs to change servers, and some must be constructed one way or the other. Below I will mark those straight from the rpm (*) and those that are constructed (#) and those that are modified (m) starting from the rpm content.

As of Oct 23, 2015, all but the * files, and those marked obsolete, are correctly synced by cfengine.

/root/
   .pgpass 
/etc/dcache/
    dcache.conf
    gplazma.conf   m  (GUMS modified)
    logback.xml   *
    dcachesrm-gplazma.policy    (GUMS modified)
    hostcert.p12    #
    dcache.kpwd    
    certificates.jks    #
    tc-config.xml   *
    info-provider.xml   m
    httpd.conf
/etc/dcache/admin
    authorized_keys2
    server_key.pub
    server_key
    host_key.pub
    host_key
    ssh_host_dsa_key.pub
    ssh_host_dsa_key
    authorized_keys  (obsolete)
/etc/dcache/layouts
    head01.conf  
/var/lib/dcache/config
    poolmanager.conf   
    LinkGroupAuthorization.conf   
    passwd  
/etc/grid-security
    hostkey.pem
    hostcert.pem
    monit.pem
    storage-authzdb
    gsi-authz.conf   (obsolete)
    grid-vorolemap

Make sure the dcache-server service is chkconfig'ed off.

The /etc/grid-security/vomsdir and /etc/grid-security/certificates directories must be setup and configured as well. There is an /etc/cron.d/rsync-certificates.cron which needs setting up. See Host-config below.

dCache can be upgraded on this host anytime before the transition.

Once these are in place we should be ready to do 'dcache start' once we have the old host shutdown, postgresql updated and running and the host reconfigured as head01

Postgresql Details

We have setup the same version of postgresql-9.3 running on the new head01 system. It is currently configured to be a hot-standby streaming replication server. The primary task is to ensure that all changes from the current head01 are propagated to this host before the original head01 is shutdown. Steps are detailed below on how to do the transition. The idea is that we make sure we are current before shutting down the hot-standby postgresql on the new head01, then shut it down, move the postgresql configuration files from the current (old) host into place and restart.

The following files will need updating when we transition from hot-standby mode to master:

In /var/lib/pgsql are scripts used to "seed" hot-standby hosts from head01. These should be copied to the new host.

[root@head01 pgsql]# pwd

/var/lib/pgsql

[root@head01 pgsql]# ls

9.3 pg_hba.conf~ reseed_hot_standby-9.3.sh reseed_hot_standby-o-head01.sh tmp.shlost+found pg_ident.conf reseed_hot_standby-n-head01.sh reseed_hot_standby.shmake_backup.sh postgresql-9.2-nfs reseed_hot_standby-nhead01.sh reseed_hot_standby.sh.07Jun2013pg_hba.conf postgresql.conf reseed_hot_standby-nhead01.sh~ setup_ivukotic.psql

The important configuration files are stored in /var/lib/pgsql/9.3/data : pg_hba.conf pg_ident.conf postgresql.conf

These just need to be copied into place on the new system once it is up-to-date and shut-down. I have setup /var/lib/pgsql/head01 and /var/lib/pgsql/head01-temp to host these files for the new master and hot-standby setup's respectively.

Host-config

To check RPMS, get a sorted list from each host: On head01-temp:
rpm -qa --queryformat='%{NAME}\n'| sort > head01-temp-rpms.txt

On head01:
rpm -qa --queryformat='%{NAME}\n' | sort > head01-rpms.txt

Generate lists:
comm -2 -3 head01-rpms.txt head01-temp-rpms.txt > rpms-only-on-head01.txt
comm -1 -3 head01-rpms.txt head01-temp-rpms.txt > rpms-only-on-head01-temp.txt
comm -1 -2 head01-rpms.txt head01-temp-rpms.txt > rpms-on-both.txt

I then find the missing packages on head01 and install them.

Will need GPG keys for RPMS.
scp /etc/pki/rpm-gpg/* root@10.10.1.43:/etc/pki/rpm-gpg/

Will need to temporarily enable the OSG and EPEL repo's to get vo-client, voms and voms-clients installed.
scp /etc/cron.d/* root@10.10.1.43:/etc/pki/rpm-gpg/

Make sure AFS is chkconfig'ed on.

Need to add /pnfs mount to /etc/fstab:
head02.aglt2.org:/pnfs /pnfs         nfs  rw,hard,nfsvers=3 0 0.

NOTE: We don't add the LABEL=pgsql-head01 mount since it is controlled by ZFS on new head01.

Networking reconfig

The following locations need updating to move head01-temp to head01 on the network /etc/sysconfig/network-scripts ifcfg-em1 ifcfg-em2 ifcfg-em3

The em1 and em2 (both 10G) participate in a LACP bonded configuration. The ifcfg-em3 (1G) is currently running as head01-temp (10.10.1.43).

To prepare for the network change I made two subdirectories under /etc/sysconfig/network-scripts: head01-temp and head01.

I then copy all the ifcfg-* files into both. Then I edit the head01 values to match the network information for head01.local and head01.aglt2.org.

We also need to update: /etc/sysconfig/network and change
HOSTNAME=head01.aglt2.org

The /etc/zfs/zpool.cache may have something coded in it as well, so we might need to do:
zpool set cachefile='/etc/zfs/zpool.cache' pgqsl

I created a script to move to the new network config:
[root@head01-temp network-scripts]# cat mv-net-to-head01.sh
#!/bin/bash
#
# Move network setup from current to head01.aglt2.org
#######################################################
/sbin/service networking stop
unalias cp
cp -f /etc/sysconfig/network-scripts/head01/* /etc/sysconfig/network-scripts/
cp -f /etc/sysconfig/network.head01 /etc/sysconfig/network
hostname head01.aglt2.org
/sbin/service networking start
echo " Restarted network as head01.aglt2.org"
exit
#######################################################

Sequence to transition

On head01(old) On head01(new)
chkconfig postgresql-9.3 and dcache-server off
dcache stop
Verify postgresql-9.3 running; wait ~2 minutes
service postgresql-9.3 stop
service postgresql-9.3 stop
Reconfigure host/network as head01-temp
shutdown -h now
Once head01(old) down, run mv-net-to-head01.sh
Run cf-agent
Move postgresql configuration for head01 in place
reboot
Verify ZFS properly started pgsql location (/nvme*)
service postgressql-9.3 start; verify proper startup
dcache start; check logfiles; verify proper startup
chkconfig postgresql-9.3 and dcache-server to 'on'
--

ShawnMcKee - 08 Jun 2015
Topic revision: r2 - 23 Oct 2015, BobBall
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback