<fencedevice agent="fence_drac5" ipaddr="x.x.x.x" login="xxx" name="lmd01_drac" passwd="xxx" secure="1"/> <fencedevice agent="fence_drac5" ipaddr="x.x.x.x" login="xxx" name="lmd02_drac" passwd="xxx" secure="1"/>Only do this edit on one host, and increment the "config_version" tag at the top of the file. Then propagate:
ccs_tool update /etc/cluster/cluster.confNext time you look in the web interface you will see that it doesn't display the DRAC info anymore. It does display a "remove this device" button but it apparently doesn't know how to display the info. I recommend testing that this works before continuing. There is a menu of tasks at the top right of the node configuration screen, one of them is "fence this node". Try this will watching log messages on the node that you are not fencing. APC Fencing
This is straightforward. Once again on the node configuration page choose to add a Backup Fencing Method. Choose APC power strip from the choices. Fill in information for one strip. The outlet goes in the "Port" field. Then, choose to "Add an instance" and add the other power strip (if you have a 2-supply machine). Then click "update fence device properties". No further manual editing should be needed.
This should also be tested by disabling network access to the DRAC card and trying again to fence each node.In order to fail over a service, the cluster has to have a way to check the status of the service and dependencies.
TO-DO FIRST Perhaps you can do the final configuration before doing these customizations, but I think it will go much smoother if you take care of some customizations needed by Lustre. NOTE: It is probably redundant to have both a service and a filesystem resource given modifications made to the filesystem resource script. They perform essentially the same checks, though a managed filesystem resource will check that the options are right when it mounts a filesystem. So could the init script. I recommend staying away from modifying RH packaged scripts since they are likely to be overwritten in updates. In that case, skip using a filesystem resource at all and simply write checks into the init script which return "1" under any failure.
root@lmd01 /usr/share/cluster# diff -urN fs.sh.28Mar2010 fs.sh
--- fs.sh.28Mar2010 2010-03-28 10:37:32.000000000 -0400
+++ fs.sh 2010-03-28 13:38:34.000000000 -0400
@@ -384,7 +384,7 @@
[ -z "$OCF_RESKEY_fstype" ] && return 0
case $OCF_RESKEY_fstype in
- ext2|ext3|jfs|xfs|reiserfs|vfat|tmpfs|vxfs)
+ ext2|ext3|jfs|xfs|reiserfs|vfat|tmpfs|vxfs|lustre)
return 0
;;
*)
@@ -505,6 +505,23 @@
;;
esac
;;
+ lustre)
+ case $o in
+ flock|localflock|noflock|user_xattr|nouser_xattr)
+ continue
+ ;;
+ acl|noacl|nosvc|nomgs|exclude=*|abort_recov)
+ continue
+ ;;
+ md_stripe_cache_size|recovery_time_soft=*)
+ continue
+ ;;
+ recovery_time_hard=*)
+ continue
+ ;;
+ esac
+ ;;
+
esac
echo Option $o not supported for $OCF_RESKEY_fstype
@@ -643,7 +660,19 @@
fi
[ $OCF_CHECK_LEVEL -lt 10 ] && return $YES
-
+
+#+SPM March 28, 2010 Add Lustre check/test
+ fsmnt=`grep $mount_point /proc/mounts | awk '{print $3}'`
+ if [ $fsmnt = "lustre" ]; then
+ ocf_log debug "fs (isAlive): Found Lustre filesystem"
+ lstatus=`cat /proc/fs/lustre/health_check`
+ if [ $lstatus = "healthy" ]; then
+ return $YES
+ else
+ return $NO
+ fi
+ fi
+
# depth 10 test (read test)
ls $mount_point > /dev/null 2> /dev/null
if [ $? -ne 0 ]; then
@@ -999,6 +1028,7 @@
case "$fstype" in
reiserfs) typeset fsck_needed="" ;;
ext3) typeset fsck_needed="" ;;
+ lustre) typeset fsck_needed="" ;;
jfs) typeset fsck_needed="" ;;
xfs) typeset fsck_needed="" ;;
ext2) typeset fsck_needed=yes ;;
#!/bin/bash
#
#
#
# chkconfig: - 26 74
# description: mount/unmount lustre MDT filesystem in /etc/fstab
# copied from GFS2 init script
#
#
### BEGIN INIT INFO
# Provides:
### END INIT INFO
. /etc/init.d/functions
[ -f /etc/sysconfig/cluster ] && . /etc/sysconfig/cluster
#
# This script's behavior is modeled closely after the netfs script.
#
LUSTREFSTAB=$(LC_ALL=C awk '!/^#/ && $3 == "lustre" { print $2 }' /etc/fstab)
LUSTREMTAB=$(LC_ALL=C awk '!/^#/ && $3 == "lustre" && $2 != "/" { print $2 }' /proc/mounts)
NOTHEALTHY=`grep -o "NOT HEALTHY" /proc/fs/lustre/health_check`
function mount_lustre {
if [ -n "$LUSTREFSTAB" ]
then
action $"Mounting LUSTRE filesystem: " mount /mnt/mdt
fi
}
# See how we were called.
case "$1" in
start)
mount_lustre
if [ $? -ne 0 ]
then
echo "Mount failed, restarting iscsi service and retrying"
/sbin/service iscsi restart
sleep 10
mount_lustre
fi
touch /var/lock/subsys/lustre
;;
stop)
if [ -n "$LUSTREMTAB" ]
then
sig=
retry=6
remaining=`LC_ALL=C awk '!/^#/ && $3 == "lustre" && $2 != "/" {print $2}' /proc/mounts`
while [ -n "$remaining" -a "$retry" -gt 0 ]
do
action $"Unmounting LUSTRE filesystems: " umount /mnt/mdt
if [ $retry -eq 0 ]
then
action $"Unmounting lustre filesystems (lazy): " umount -l /mnt/mdt
break
fi
sleep 2
remaining=`LC_ALL=C awk '!/^#/ && $3 == "lustre" && $2 != "/" {print $2}' /proc/mounts`
[ -z "$remaining" ] && break
/sbin/fuser -k -m $sig $remaining &> /dev/null
sleep 10
retry=$(($retry - 1))
sig=-9
done
fi
rm -f /var/lock/subsys/lustre
;;
status)
if [ -f /proc/mounts ]
then
[ -n "$LUSTREFSTAB" ] && {
echo $"Configured lustre mountpoints: "
for fs in $LUSTREFSTAB; do echo $fs ; done
}
[ -n "$LUSTREMTAB" ] && {
echo $"Active lustre mountpoints: "
for fs in $LUSTREMTAB; do echo $fs ; done
}
if [ "$NOTHEALTHY" == "NOT HEALTHY" ]
then
echo "Filesystems show 'not healthy'"
exit 1
fi
else
echo "/proc filesystem unavailable"
fi
;;
restart)
$0 stop
$0 start
;;
reload)
$0 start
;;
*)
echo $"Usage: $0 {start|stop|restart|reload|status}"
exit 1
esac
Final Configuration:
/etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="LMD" config_version="45" name="LMD">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="lmd02.local" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="lmd02_drac"/>
</method>
<method name="2">
<device name="lmd02_socket1" option="off" port="19"/>
<device name="lmd02_socket2" option="off" port="19"/>
<device name="lmd02_socket1" option="on" port="19"/>
<device name="lmd02_socket2" option="on" port="19"/>
</method>
</fence>
</clusternode>
<clusternode name="lmd01.local" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="lmd01_drac"/>
</method>
<method name="2">
<device name="lmd01_socket1" option="off" port="16"/>
<device name="lmd01_socket2" option="off" port="16"/>
<device name="lmd01_socket1" option="on" port="16"/>
<device name="lmd01_socket2" option="on" port="16"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<rm>
<failoverdomains/>
<resources/>
<service autostart="1" exclusive="1" name="MountMDT" recovery="relocate">
<script file="/etc/init.d/MountMDT" name="MountMDT-init">
<fs device="UUID=977bdf9c-0645-465e-9e72-1a1d1e5f93ca" force_fsck="0" force_unmount="1" fsid="9659" fstype="lustre" mountpoint="/mnt/mdt" name="LMD" self_fence="1"/>
</script>
</service>
</rm>
<fencedevices>
<fencedevice agent="fence_drac5" ipaddr="x.x.x.x" login="" name="lmd01_drac" passwd="" secure="1"/>
<fencedevice agent="fence_drac5" ipaddr="x.x.x.x" login="" name="lmd02_drac" passwd="" secure="1"/>
<fencedevice agent="fence_apc" ipaddr="x.x.x.x" login="" name="lmd01_socket1" passwd=""/>
<fencedevice agent="fence_apc" ipaddr="x.x.x.x" login="" name="lmd01_socket2" passwd=""/>
<fencedevice agent="fence_apc" ipaddr="x.x.x.x" login="" name="lmd02_socket1" passwd=""/>
<fencedevice agent="fence_apc" ipaddr="x.x.x.x" login="" name="lmd02_socket2" passwd=""/>
</fencedevices>
</cluster>
-- BenMeekhof - 06 Apr 2010
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.