LustreReinstall < AGLT2

You are here: Foswiki>AGLT2 Web>LustreSetup>LustreReinstall (21 Apr 2010, ShawnMcKee)Edit Attach

Lustre Reinstall Notes

After testing Lustre "in production" (mostly tests by Tiesheng) we have decided to go ahead with our plans to utilize Lustre to replace the set of AGL UM Tier-3 NFS servers. The rough pan is the following:

Reinstall Lustre to use "failout" instead of "failover", clean up the missing OSTs and rename the filesystem to "umt3" instead of "aglt2"
- Use UMFS05 and UMFS16 as the initial OSS'es
Migrate existing NFS servers into Lustre (starting from the largest in size)
- Content of existing mount areas moved into Lustre as follows: /atlas/dataxx -> /lustre/dataxx
- Once content is migrated off NFS server, reconfigure server to become Lustre OSS
- Rebalance data to include the new node
- When sufficient space exists in Lustre to copy the "next" NFS server NOT including UMFS16, remove it from being written to.
Once all servers are migrated, decommission UMFS16 from Lustre

Reinstalling Lustre

We first needed to migrate all existing data that needed saving (about 7TB) from /lustre to /data17[c|d]. This only took about 12 hours. All data was copied by around 12:30 AM on April 20, 2010.

All Lustre clients were dismounted.
Next we shutdown the MDT systems (HA setup; use Luci web interface to 'disable' the service)
Dismount /mnt/mgs on mgs.aglt2.org
Dismount all OSS'es (UMFS05/UMFS16)

At this point we have stopped all existing Lustre activity in the cluster. Now we can "recreate" Lustre with the new settings.

On 'MGS' node, reformat with the new filesystem name
- ```
mkfs.lustre --fsname umt3 --mgs --reformat /dev/sdb
```
- ```
/lib/udev/vol_id /dev/sdb
```
  (To get new UUID for mount point)
- Update /etc/fstab to use new UUID
Next, redo the MDT setup on LMD01
- ```
 mkfs.lustre --fsname umt3 --mgsnode=10.10.1.140@tcp0 --mgsnode=192.41.230.140@tcp1
```
```
 --reformat --mdt --failnode=10.10.1.49@tcp0,192.41.230.49@tcp1 /dev/sdd
```
  (NOTE: needed to restart 'iscsi' to be able to update disk...became /dev/sdf temporarily)
- Get UUID (as above) and update cluster config and /etc/fstab on BOTH LMD01 and LMD02

Last setup the OSS/OST's by reformatting

UMFS05 smaller OSTs:

 mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0,192.41.230.140@tcp1 --fsname=umt3 --reformat --mkfsoptions="-i 524288"

 --mountfsoptions="stripe=160" /dev/sdb

UMFS16 smaller OSTs:

 mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0,192.41.230.140@tcp1 --fsname=umt3 --reformat --mkfsoptions="-i 1048576"

 --mountfsoptions="stripe=160" /dev/sdb

UMFS05 larger OSTs:

 mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0,192.41.230.140@tcp1 --fsname=umt3 --reformat --mkfsoptions="-i 629149"

 --mountfsoptions="stripe=192" /dev/sdc

UMFS16 larger OSTs:

 mkfs.lustre --ost --mgsnode=10.10.1.140@tcp0,192.41.230.140@tcp1 --fsname=umt3  --reformat --mkfsoptions="-i 1258298"

 --mountfsoptions="stripe=192" /dev/sdc

Next use tunefs.lustre to make OSTs use failout mode

tunefs.lustre  --param failover.mode=failout /dev/sdb

After all OSTs are formatted, redo the /etc/fstab entries. (Use make_lustre_fstab.sh)

That was it. There were some strange problems with the LMD01/02 nodes until I discovered there was still a lustre client from the old setup with an old mount. Once I stopped that mount things were OK.

-- ShawnMcKee - 20 Apr 2010

Topic revision: r4 - 21 Apr 2010, ShawnMcKee

AGLT2

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback