Upgrading Postgresql on CentOS/RHEL/SL with Hot-standby Systems

This Wiki topic covers upgrading our existing PostgreSQL version 9.3.11 on Scientific Linux 6.7 64-bit to 9.5.1. These are instances supporting dCache 2.10.42 which we intend to upgrade to 2.13.x later this week.

The relevant systems:
  • head01 --- hosts primary dCache DBs: dcache, pinmgr, spacemanager, billing
  • head02 --- hosts dCache namespace DBs: chimera and custom 'rephot'
  • t-head01 --- Hot-standy server for head01
  • t-head02 --- Hot-standy server for head02
We should note that we have two Intel NVMe P3600 cards in a zfs RAID-1 configuration host the database on head01. The head02 disk is a hardware RAID-1 of two Samsung 850 Pro 480GB SSDs.

Source for Postgresql

Repos exist for all current versions of postgresql at http://yum.postgresql.org/repopackages.php. Here is the SL6 repo RPM to use
rpm -Uvh http://download.postgresql.org/pub/repos/yum/9.5/redhat/rhel-6-x86_64/pgdg-sl95-9.5-2.noarch.rpm

You can survey what is available at the repopackages.php link above to get exactly what you want. You can install more than one version of Postgresql at a time however you need to be aware that installing a newer version will set it as the default in alternatives. Commands like psql will start using the 9.5 version once you install the new RPMS

You can see what rpms are available via

yum list postgresql95*

Planning the Upgrade

We wish to minimize the downtime for the dCache system which depends upon the various Postgresql databases we wil be updating. The idea is to upgrade the head02 and t-head02 set of servers first, then do head01 and t-head01. The upgrade steps are documented at http://www.postgresql.org/docs/9.5/static/pgupgrade.html. We plan to deploy 9.5 in parallel with the existing 9.3 version on each system and use the --link option of pg_upgrade to speed up the process.
  1. Install the new binaries
  2. Initialize the 9.5 DB: service postgresql-9.5 initdb
  3. Adjust authentication to support 9.3 to 9.5 communication
    • In 9.5 pg_hba.conf have a line like: local all all trust
  4. Stop the 9.3 and 9.5 instances on the master node and note time so we can track how long this upgrade takes
    • Disable any configuration management systems that may override you system changes (turn off Puppet or CFEngine for example)
    • date > /root/psql-update.start
    • dcache stop
    • service postgresql-9.3 stop
    • service postgresql-9.5 stop
  5. Verify hot-standby server is caught up
    • On master and slave run the following command and verify locations are the same:
    • /usr/pgsql-9.3/bin/pg_controldata /var/lib/pgsql/9.3/data/ | grep "Latest checkpoint location:"
    • Initial run on head02 and t-head02 showed t-head02 was behind. Had to restart both, let them resync (with dCache off) and then stop them.
  6. Run pg_upgrade on the master node
    • This must be run as the postgres user
    • See the attached script run_pg_upgrade.sh
    • On head02 this took 8.58 seconds to run
    • On head01 we had to pause and restart services on 9.3. Problem: /var/lib/pgsql/9.3 is a zfs mount point which puts /var/lib/pgqsl/9.5 on another volume. The --link won't work in this case. Working on a fix.
    • Solution for zfs issue was to mount the pgsql/9.3 volume as /var/lib/pgsql, moving all the former files in /var/lib/pgsql there. Then the update can work as it did on head02.
    • On head01 the upgrade too 9.19 seconds to run
  7. While upgrade is running install the new binaries on the hot-standby server (see step 1 above)
    1. Ensure data directories do not exist or are empty for this new version on hot-standby server
    2. Stop the standby server: service postgresql-9.3 stop; service postgresql-9.5 stop
    3. Copy the standby 9.3 server config files: pg_hba.conf, postgres.conf, recovery.conf to /root
    4. On the new master cluster (9.5 on master), change wal_level to hot_standby in the postgresql.conf file and then start and stop the cluster.
  8. Once the master is upgraded, you must rsync its directories to the slave server
    • On master cd /var/lib/pgsql
    • /usr/bin/time -o /root/rsync-timing.log rsync --archive --delete --hard-links --size-only 9.3 9.5 t-head02.local:/var/lib/pgsql/.
    • On head02 this took about 4 seconds to run!!
    • The head01 rsync got screwed up. Had to revert to reseeding the hot-standby after getting head01 running on 9.5
  9. Restore or update all configuration files on master and slave
    • You need to check/edit the files for updates. For example the recovery.conf needs to be updated for the new postgresql version and paths.
  10. Start master 9.5 instance, then start slave 9.5 instance. Note time so we can track how long this upgrade took.
    • On master: service postgresl-9.5 start
    • On slave: service postgresql-9.5 start
    • dcache start
    • date > /root/psql-update.end
    • Stats: for head02 and t-head02 the total timing was start at Sun Feb 21 12:36:04 EST 2016 and finish at Sun Feb 21 12:58:28 EST 2016 . It only took this long because of some minor things that needed fixing/updating in the plan.
  11. Verify replication (see below)
  12. Follow post-upgrade steps from pgupgrade.html page
    • As user postgres run
    • analyze_new_cluster.sh (took about 10 minutes for head02; cannot run this on the slave t-head02)
    • delete_old_cluster.sh (don't forget to clean up on the slave as well)
  13. Set the host to use postgresql-9.5 instead of 9.3 ON THE MASTER and SLAVE servers!
    • chkconfig postgresql-9.3 off
    • chkconfig postgresql-9.5 on
    • Consider removing the postgresql93* rpms.
Verifying Replication

You can check that status of the replication in the following ways.

On the master do the following:
pgsql -U postgres

postgres=# \x
Expanded display is on.
postgres=# table pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 3082333
usesysid         | 16385
usename          | repuser
application_name | walreceiver
client_addr      | 10.10.4.205
client_hostname  |
client_port      | 53332
backend_start    | 2015-06-16 14:12:09.019447-04
state            | streaming
sent_location    | 21C/A2BA7230
write_location   | 21C/A2BA7230
flush_location   | 21C/A2BA7230
replay_location  | 21C/A2BA64C8
sync_priority    | 0
sync_state       | async

Or on the slave try:

psql -U postgres

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

The above command returns 't' (for "true") or 'f' (for "false"). You want to see 't'.
postgres=# select now() - pg_last_xact_replay_timestamp() AS replication_delay;
 replication_delay
-------------------
 00:00:00.025955
(1 row)

The above shows the "delay" in getting the updates from the master .

Some URLs for further reference

See the following -- ShawnMcKee - 21 Feb 2016

  • run_pg_upgrade.sh: pg_upgrade procedure for postgresql upgrade from 9.3 to 9.5 at AGLT2
Topic revision: r4 - 22 Feb 2016 - 01:05:13 - ShawnMcKee
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback