You are here: Foswiki>AGLT2 Web>DCacheNotes>UpgradedCacheHeadnodesJun13 (09 Jun 2013, ShawnMcKee)Edit Attach

Updating and Upgrading dCache Headnodes for AGLT2 June 2013

The dCache headnodes head01.aglt2.org and head02.aglt2.org were transitioned to VMware VMs in 2012. As of the beginning of June they are running:

dCache 2.2.8
Scientific Linux 5.8 64-bit
Postgresql 9.0.10

We wish to transition these instances to:

dCache 2.2.12 --- Most recent version with minor fixes
Scientific Linux 6.4 64-bit --- We are moving to all SL6 systems
Postgresql 9.2.4 --- Supports concurrent reindexing and is needed to minimize downtime when upgrading dCache to 2.6.x

To help with the process we have setup the original physical headnodes with new SSD disk areas and cleanly installed them with Scientific Linux 6.4, Postgresql-9.2 and dCache 2.2.12

Plan for transitioning to updated version

We have installed Slony on both the original VM headnodes and the new physical nodes. See https://www.aglt2.org/wiki/bin/view/AGLT2/SlonyReplicationdCache for details. Using Slony allows us to replicate between Postgresql and OS versions. Once Slony was installed, all tables needed for dCache on both head01 and head02 are being replicated to n-head01 and n-head02 (the physical systems with SSDs). Here is the plan for transitioning the headnodes with minimal disruption:

Create needed scripts to help with the transition.
- On Head01 create /root/dcache-local-scripts on head01 and move existing head01 scripts (perl, bash) into this area
- Create 'save-dcache-config.sh' to capture all the needed configuration details for;
  - dCache
  - OS (any grid-security, crontabs or customizations in the OS areas that need preservation)
- Create 'save-postgresql-config.sh' to capture all the needed Postgresql and Slony config info
- Create 'restore-dcache-config.sh' to reverse the process, renaming existing files suitably before overwriting.
- Create 'restore-postgresql-config.sh' to reverse the process for 'save-postgresql-config.sh', renaming files suitably before overwriting.
- Create 'rename-node.sh' to migrate the node OS configuration from oldname and oldIP to newname and newIP
- On Head02 create /root/dcache-local-scripts and move existing head02 scripts (perl, bash) into this area
- Copy the 'save-dcache-config.sh' created for head01 to head02 and update appropriately for head02
- Copy the 'restore-dcache-config.sh' created for head01 to head02 and update appropriately for head02
- Copy the 'save-postgresql-config.sh' created for head01 to head02 and update appropriately for head02
- Copy the 'restore-postgresql-config.sh' created for head01 to head02 and update appropriately for head02
- Copy the 'rename-node.sh' created for head01/n-head01 to head02 and update appropriately for head02
Prepare new VMs from existing n-head01 and n-head02
- Use VMware to P2V from existing n-head01/02 before proceeding further. Don't copy the database in the P2V process to save time/space.
Stop postgresql hot-standby replication to t-head01 and t-head02 and upgrade those nodes
- Update the Postgresql installation to 9.2.4
- Clear the existing database
- This removes a copy of the database but will allow us to much more quickly restore a replica after the transition.
Verify Slony replication is running on head01, n-head01, head02 and n-head02

At this point we have the tools in place to migrate from the VM to new physical node

Start with head01 and capture all relevant configuration files for dCache, Postgresql and local needed system files and crons using 'save-dcache-config.sh' script in /root/dcache-local-scripts.
- Running this saves files in /tmp/dcache-conf/..
- Output is two tarballs: /tmp/head01-dcache.bz2 and /root/billing.bz2
Copy the saved tarballs to n-head01
Unpack the head01-dcache.bz2 into the local /tmp area
Verify Slony is properly replicating the 'dcache' and 'billing' DBs
In OIM put an "at-risk" notice for the AGLT2_SE noting that the head01.aglt2.org services will be transitioning to a new host
On head01 (VM), stop dCache: dcache stop
Check slony to verify all records to billing and dcache DBs are completely replicated to n-head01
Configure network to NOT start if rebooted
Shutdown head01
Move to n-head01 for the following steps
Stop postgresql
Run 'rename-node.sh' to readdress the node from n-head01 to head01
- Verify host is renamed and reachable on head01.aglt2.org and head01.local
Run 'restore-dcache-config.sh' to put the correct dCache, postgresql and OS configs in place
Start postgresql and verify it starts OK
Start dcache and verify proper operation
If services are working OK, remove OIM at-risk. We have completed head01 update

Next we can repeat the same basic steps above for head02 migration:

On head02 and capture all relevant configuration files for dCache, Postgresql and local needed system files and crons using 'save-dcache-config.sh' script in /root/dcache-local-scripts.
- Running this saves files in /tmp/dcache-conf/..
- Output is two tarballs: /tmp/head02-dcache.bz2 and /root/billing.bz2
Copy the saved tarballs to n-head02
Unpack the head02-dcache.bz2 into the local /tmp area
Verify Slony is properly replicating the 'dcache' and 'billing' DBs
In OIM put an "at-risk" notice for the AGLT2_SE noting that the head02.aglt2.org services will be transitioning to a new host
On head02 (VM), stop dCache: dcache stop
Check slony to verify all records to billing and dcache DBs are completely replicated to n-head02
Configure network to NOT start if rebooted
Shutdown head02
Move to n-head02 for the following steps
Stop postgresql
Run 'rename-node.sh' to readdress the node from n-head02 to head02
- Verify host is renamed and reachable on head02.aglt2.org and head02.local
Run 'restore-dcache-config.sh' to put the correct dCache, postgresql and OS configs in place
Start postgresql and verify it starts OK
Start dcache and verify proper operation
If services are working OK, remove OIM at-risk. We have completed head02 update

At this point we should have two physical nodes running the head01 and head02 dCache services. They should be able to replicate via Postgresql to t-head01 and t-head02 respectively once those nodes are updated. Until t-head01 and t-head02 are updated to Postgresql 9.2.4 we have no replica of the dCache DB and are at risk!

Restoring Hot-Standby or Replication

Two choices

We can create two new VMs call n-head01 and n-head02, built just like we built the new physical nodes. (See step above) We can then setup Slonly replication to these nodes right after head01 and head02 transition to their new host.
We can update t-head01 and t-head02 to match the same version of Postgresql and restart the hot-standy process. In fact this can be done before the migration.

Things I Forgot

On head01 I forgot:

The /root/.ssh area
The /etc/ssh files (need to especially keep the keys)
The /var/lib/pgsql/9.2/data/postgres.conf needs to NOT just use the postgres.conf from the 9.0 original instance. Instead edit the 9.2 instance and set

archive_mode=on

archive_command = 'cp -i %p /atlas/data08/postgres/archive/%f </dev/null'

vacuum_defer_cleanup_age = 1000

log_filename = 'postgresql-%Y-%m-%d_M%S.log'

The /etc/fstab needs a mount for /pnfs

head02.aglt2.org:/pnfs /pnfs nfs rw,hard,nfsvers=3 0 0

Make sure /pnfs exists
Make sure the /etc/grid-security area has the certificates directory and it is being updated by an /etc/cron.d/rsync-certificates.cron entry
Make sure the /etc/grid-security/hostkey.pem is owned by dcache
Copy the /var/lib/dcache/.pgpass file and make sure is it mode 600 and owned by dcache
The network restart in the rename-nodes.sh hangs when run via ssh. Need to run out-of-band
Make sure Slony is stopped and the Slony schema is removed on the new headnode before activating dCache. Otherwise the tables are locked.

su - postgres
cd /usr/psql-92/bin
./slon_kill --config /etc/slon_tools-dcache.conf
./slon_kill --config/etc/slon_tools-billling.conf
psql
\c dcache
\dnS+
delete schema _rep_dcache cascade;
\c billilng
\dnS+
delete schema _rep_billing cascade;
\q

Need to suitably update monit on nodes
Make sure /etc/security/limits.d/90-nproc.conf is empty and chattr +i
Make sure /etc/security/limits.conf has the following settings:

* soft nofile 32000

* hard nofile 42000

* soft nproc 62000

* hard nproc 64000

For head02 a similar list:

The /root/.ssh area
The /etc/ssh files (need to especially keep the keys)
The /etc/exports file (needed for dCache NFS)
Make sure new head02 has Java (jdk) 1.7 installed
The /var/lib/pgsql/9.2/data/postgres.conf needs to NOT just use the postgres.conf from the 9.0 original instance. Instead edit the 9.2 instance and set

archive_mode=on

archive_command = 'cp -i %p /atlas/data08/postgres/archive/%f </dev/null'

vacuum_defer_cleanup_age = 1000

log_filename = 'postgresql-%Y-%m-%d_M%S.log'

Make sure the /etc/grid-security area has the certificates directory and it is being updated by an /etc/cron.d/rsync-certificates.cron entry
Make sure the /etc/grid-security/hostkey.pem is owned by dcache
The network restart in the rename-nodes.sh hangs when run via ssh. Need to run out-of-band
Had a problem with the default gateway not being setup which caused red health issues...check for default route.
Make sure Slony is stopped and the Slony schema is removed on the new headnode before activating dCache. Otherwise the tables are locked.

su - postgres
cd /usr/psql-92/bin
./slon_kill --config /etc/slon_tools-chimera.conf
./slon_kill --config/etc/slon_tools-rephot.conf
psql
\c chimera
\dnS+
delete schema _rep_chimera cascade;
\c rephot
\dnS+
delete schema _rep_rephot cascade;
\q

Need to suitably update monit on nodes (as appropriate)
Make sure /etc/security/limits.d/90-nproc.conf is empty and chattr +i
Make sure /etc/security/limits.conf has the following settings:

* soft nofile 32000

* hard nofile 42000

* soft nproc 62000

* hard nproc 64000

Make sure proper iptables is in place

-- ShawnMcKee - 06 Jun 2013

Topic revision: r8 - 09 Jun 2013, ShawnMcKee

AGLT2

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback