Update Lustre on a testbed from 2.10.4 (SL7.6, zfs-0.7.9) to 2.12.3 (SL7.7, zfs-0.7.13)

We are trying to upgrade lustre-server from 2.10.4 (SL7.6, zfs-0.7.9) to 2.12.3 (SL7.7, zfs-0.7.13), see the compatibility matrix here

Before updating, umount all lustre clients first.

Sync the lustre repo from whamcloud to a local repository according to here

create lustre.repo file

Details see /atlas/data08/manage/lustre/

#cat /atlas/data08/manage/lustre/lustre.repo
[lustre-server2123]
name=lustre-server2123
baseurl=http://umt3int05.aglt2.org/repo/lustre-server

\x85\x85

Copy this file to /etc/yum.repo.d/

# cp /atlas/data08/manage/lustre/lustre.repo /etc/yum.repo.d/

Upgrade the OS to SL7.7 from earlier SL7 versions

#yum update --skip-broken

Update the kernel from the lustre repository

#yum --enablerepo=*lustre*2123* update kernel kernel-devel kernel-headers kernel-abi-whitelists kernel-tools kernel-tools-libs kernel-tools-libs-devel

Reboot all lustre servers to the new kernel

Update the lustre rpm on all lustre serves

#yum --enablerepo=*lustre*2123* update lustre-dkms lustre-osd-zfs-mount lustre lustre-resource-agents zfs
#modprob -v lustre

Restore zpools

#modprob -v zfs
#zpool status

If the zpool is not imported, try to import it

#zpool import ost-001

List the available lustre file systems

#zfs list 

The above procedure can be run in parallel on all mdtmgs and oss servers, then

On mdtmgs server

#mount -t lustre ost-001/mdtmgs /lustre/mgt/

After mdtmgs is up, on OSS (OST) server,

#mount -t lustre ost-001/ost0001 /mnt/ost-001/

After all OSS are mounted, on the client

#mount -t lustre 10.10.2.120@tcp:/t3test /luste/t3test/

Update Lustre on production system from 2.10.4 to 2.12.3, zfs from 0.7.9 to 0.7.13, kernel from 862 to 1062

Kernel update

On all lustre servers/clients:

Copy the lustre repository file

# cp /atlas/data08/manage/lustre/lustre.repo /etc/yum.repo.d/

Update to SLC 7.7

#yum update --skip-broken 

Install the most recent kernel (Lustre 2.13 works on the most recent kernel)

#yum --enablerepo=*2123* update kernel kernel-devel kernel-headers kernel-abi-whitelists kernel-tools kernel-tools-libs kernel-tools-libs-devel; 

Umount clients

On all lustre clients (wn, umint03XX, lustre-nfs)

#systemctl stop lustre_umt3 

Or

#umount -l /lustre/umt3

Lustre and ZFS Software update

On all lustre servers

#reboot

Check if all are booted with the new kernel, if so, then install the zfs and lustre rpms, please note, from SL6 to SL7, zfs update does not work, need to remove the previous zfs rpms then reinstall them.

#screen -dm bash -c "yum --enablerepo=sl -y install asciidoc audit-libs-devel automake bc binutils-devel bison device-mapper-devel elfutils-devel elfutils-libelf-devel expect flex gcc gcc-c++ git glib2 glib2-devel hmaccalc keyutils-libs-devel krb5-devel ksh libattr-devel libblkid-devel libselinux-devel libtool libuuid-devel libyaml-devel lsscsi make ncurses-devel net-snmp-devel net-tools newt-devel numactl-devel parted patchutils pciutils-devel perl-ExtUtils-Embed pesign python-devel redhat-rpm-config rpm-build systemd-devel;yum -y remove kmod-spl;yum -y --enablerepo=*2123* install lustre-dkms lustre-osd-zfs-mount lustre lustre-resource-agents zfs"
 

Verify if all the new lustre and zfs rpms are installed

mdtmgs server:

Start the lustre service

#mount /mnt/mdtmgs

Check if the service is running, lctl dl should show devices

#lct dl

OSS:

Import the zpools

(repeat this for all zpools listed in /etc/fstab)

#zpool import ost-001 

Check if all zfs are present

#zfs list  

Start all OSTs

#mount -t /mnt/ost-001

Check if all the OSTs on this OSS are up

#lct dl|grep obdf

Lustre clients

Update the kernel

#yum -y --enablerepo=*2123* install kernel kernel-devel kernel-headers kernel-abi-whitelists kernel-tools kernel-tools-libs kernel-tools-libs-devel;

Reboot to the new kernel

#reboot

Update the lustre-client software, I also tried lustre-client-dkms , it does not work, so had to use kmod-lustre-client instead

#yum -y remove lustre-client kmod-lustre-client;yum -y --enablerepo=*2123* install lustre-client kmod-lustre-client
#systemctl start lustre_umt3

Only updated on umt3int02/03/04/05, umt3int01 is SL6, and all the work nodes need to update and reboot to the new kernel in batches.

-- WenjingWu - 07 Feb 2020
Topic revision: r1 - 07 Feb 2020, WenjingWu
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback