Lustre Reinstall Notes

After testing Lustre "in production" (mostly tests by Tiesheng) we have decided to go ahead with our plans to utilize Lustre to replace the set of AGL UM Tier-3 NFS servers. The rough pan is the following:

  • Reinstall Lustre to use "failout" instead of "failover", clean up the missing OSTs and rename the filesystem to "umt3" instead of "aglt2"
    • Use UMFS05 and UMFS16 as the initial OSS'es
  • Migrate existing NFS servers into Lustre (starting from the largest in size)
    • Content of existing mount areas moved into Lustre as follows: /atlas/dataxx -> /lustre/dataxx
    • Once content is migrated off NFS server, reconfigure server to become Lustre OSS
    • Rebalance data to include the new node
    • When sufficient space exists in Lustre to copy the "next" NFS server NOT including UMFS16, remove it from being written to.
  • Once all servers are migrated, decommission UMFS16 from Lustre

Reinstalling Lustre

We first needed to migrate all existing data that needed saving (about 7TB) from /lustre to /data17[c|d]. This only took about 12 hours. All data was copied by around 12:30 AM on April 20, 2010.

  • All Lustre clients were dismounted.
  • Next we shutdown the MDT systems (HA setup; use Luci web interface to 'disable' the service)
  • Dismount /mnt/mgs on mgs.aglt2.org
  • Dismount all OSS'es (UMFS05/UMFS16)

At this point we have stopped all existing Lustre activity in the cluster. Now we can "recreate" Lustre with the new settings.

  • On 'MGS' node, reformat with the new filesystem name
    • mkfs.lustre --fsname umt3 --mgs --reformat /dev/sdb
    • /lib/udev/vol_id /dev/sdb
      (To get new UUID for mount point)
    • Update /etc/fstab to use new UUID
  • Next, redo the MDT setup on LMD01
    •  mkfs.lustre --fsname umt3 --mgsnode=10.10.1.140@tcp0 --mgsnode=192.41.230.140@tcp1
       --reformat --mdt --failnode=10.10.1.49@tcp0,192.41.230.49@tcp1 /dev/sdd
      (NOTE: needed to restart 'iscsi' to be able to update disk...became /dev/sdf temporarily)
    • Get UUID (as above) and update cluster config and /etc/fstab on BOTH LMD01 and LMD02
  • Last setup the OSS/OST's by reformatting
    • UMFS05 smaller OSTs:
       mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0,192.41.230.140@tcp1 --fsname=umt3 --reformat --mkfsoptions="-i 524288"
       --mountfsoptions="stripe=160" /dev/sdb
    • UMFS16 smaller OSTs:
       mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0,192.41.230.140@tcp1 --fsname=umt3 --reformat --mkfsoptions="-i 1048576"
       --mountfsoptions="stripe=160" /dev/sdb
    • UMFS05 larger OSTs:
       mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0,192.41.230.140@tcp1 --fsname=umt3 --reformat --mkfsoptions="-i 629149"
       --mountfsoptions="stripe=192" /dev/sdc
    • UMFS16 larger OSTs:
       mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0,192.41.230.140@tcp1 --fsname=umt3  --reformat --mkfsoptions="-i 1258298"
       --mountfsoptions="stripe=192" /dev/sdc
    • Next use tunefs.lustre to make OSTs use failout mode
      tunefs.lustre  --param failover.mode=failout /dev/sdb
    • After all OSTs are formatted, redo the /etc/fstab entries. (Use make_lustre_fstab.sh)

That was it. There were some strange problems with the LMD01/02 nodes until I discovered there was still a lustre client from the old setup with an old mount. Once I stopped that mount things were OK.

-- ShawnMcKee - 20 Apr 2010
Topic revision: r4 - 21 Apr 2010, ShawnMcKee
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback