Operational notes

Setup zfs filesystems and quotas for users

Mkdir as usual:
mkdir /atlas/data19/bmeekhof
chown bmeekhof:umatlas /atlas/data19/bmeekhof
Set quotas:
 zfs set userquota@bmeekhof=1T pool1/data19
View quotas:
[root@thor01 /atlas/data19]# zfs userspace pool1/data19
TYPE        NAME       USED  QUOTA
POSIX User  bmeekhof  2.98K     1T
POSIX User  diehl     2.98K     1T
POSIX User  hsong     2.98K   100G
POSIX User  root      5.97K   none

Just a note, not procedure Since we generally create a directory for each user it might also seem to make sense to setup a new volume per-user. This is easy:
 zfs create -o quota=1T -o reservation=1T pool1/data19/bmeekhof
The problem is that setting local permissions on that volume does not propagate to what we see when we nfs mount /atlas/data19. As root, one has to set permissions via the nfs mount too. This might have something to do with ZFS ACL inheritance or with how the NFS volume is exported. We don't export each zfs filesystem - zfs could do this, but the freebsd export syntax makes it impossible to set multiple networks in the "sharenfs='options'" directive to zfs. Normally one would apply a sharenfs option to the top level of the pool and all created zfs filesystems would just automatically inherit that and be shared individually. We have to do exports with /etc/exports and sharenfs='off' on the zfs filesystem. Anyways all this was way too much trouble so we went with directories and per-user quotas on the volume. That's pretty much how our linux NFS systems work too.

Setup Notes

  • All startup configuration is in /etc/rc.conf and all possible directives noted in /etc/defaults/rc.conf.
  • Non-core packages added with pkg_add or compiled from /usr/ports are based in /usr/local. Pretty much everything not part of the kernel or system userland is non-core. Use pkg_info to see packages.
  • Though extra packages install init scripts in /usr/local/etc/rc.d/ instead of /etc/rc.d/ the procedure to enable at startup is the same: put 'scriptname_enable="YES"' in /etc/rc.conf followed by 'scriptname_flags=' to set flags. When in doubt check /etc/defaults/rc.conf.
  • The system is running on a custom kernel with support only for the things we need. New hardware may require building a new kernel
  • Handbook: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook

Considerations for CF based OS drive

OS is FreeBSD Release 9 installed on the flash drive. Like we did with the previous Solaris installation we make small volumes in the ZFS pool for areas with heavy, regular writing (/var) and a tmpfs filesystem for /tmp.

Definition of 2G tmpfs in /etc/fstab. The kernel issues this message so if things go terribly wrong make /tmp a filesystem in the zfs pool: " WARNING: TMPFS is considered to be a highly experimental feature in FreeBSD".

tmpfs           /tmp            tmpfs   rw,size=2147483648              2       2

***** NOTE: Things went terribly wrong one day during an iozone benchmark which was using a lot of memory (> 2G was still available). The /tmp filesystem suddenly showed as 100% full and nothing could be written there except whatever small file a "touch" creates could fit. A /tmp zfs mount was created instead and the above entry commented out of fstab.

Create 5G volume for /var (after pool setup, see below). I didn't bother moving in single user mode. Some empty directory named "empty" didn't move.
 zfs create -o mountpoint=/newvar pool1/var
 mv /var/* /newvar/
 zfs set mountpoint=/var pool1/var
 zfs set reservation=5G pool1/var
 zfs set quota=5G pool1/var

Build new kernel

This is a simple process documented at http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-building.html

At a minimum must add "device mxge" and "device firmware" for Myricom card which is not there by default (kldload mxge didn't do it either). I also commented out everything else not needed. However I left drivers for Intel X520 (ixgbe) and Megaraid SAS (mfi) in case we ever put one in there.

Synopsis:
  1. Copy /usr/src/sys/amd64/conf/GENERIC to some name (I used THOR01)
  2. Edit configfile /usr/src/sys/amd64/conf/THOR01
  3. cd /usr/src; make buildkernel KERNCONF=THOR01
  4. make installkernel KERNCONF=THOR01
  5. reboot

Network configuration

These lines in /etc/rc.conf create an aggregate interface natively in vlan 4001 also carrying vlan 4010. There is also a static route defined so ganglia multicast goes out on the proper interface.

ifconfig_nfe0="mtu 9000 up"
ifconfig_mxge0="mtu 9000 up"
cloned_interfaces="lagg0"
ifconfig_lagg0="laggproto failover laggport mxge0 laggport nfe0"
vlans_lagg0="4010"
ipv4_addrs_lagg0="192.41.230.181/23"
ipv4_addrs_lagg0_4010="10.10.1.181/22"
defaultrouter="192.41.230.1"
static_routes="gmond"  
route_gmond="-net 224.0.0.0/4 -interface lagg0.4010"

Resulting in:
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=c019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
        ether 00:60:dd:47:7d:64
        inet 192.41.230.181 netmask 0xfffffe00 broadcast 192.41.231.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        laggproto failover
        laggport: nfe0 flags=0<>
        laggport: mxge0 flags=5<MASTER,ACTIVE>
lagg0.4010: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
        options=103<RXCSUM,TXCSUM,TSO4>
        ether 00:60:dd:47:7d:64
        inet 10.10.1.181 netmask 0xfffffc00 broadcast 10.10.3.255
        inet6 fe80::260:ddff:fe47:7d64%lagg0.4010 prefixlen 64 scopeid 0xa 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        vlan: 4010 parent interface: lagg0

Add some packages

The pkg_add command will automatically download packages if given the -r flag. Set http_proxy to cache4.local:3128 to get to most download sites (should be set by default at login). Besides those noted below with additional directions I added bash, nano and whatever else came up that I wanted for convenience.

Ganglia

There is a ganglia-monitor-core package but I chose to make install in /usr/ports/sysutils/ganglia-monitor-core because it gave the option to not include gmetad and thus not install a variety of unnecessary dependencies. Config default in /usr/local/etc/gmond.conf. Modified cfengine accordingly.

Cfengine

pkg_add -r cfengine
add cfexecd_enable="YES" and cfservd_enable="YES" to /etc/rc.conf
Manually ran /usr/local/sbin/cfkey prior to first run. Did manually "ln -s /usr/local/sbin/cf* /var/cfengine/bin".

Kerberos and OpenAFS

Didn't see precompiled krb5 or openafs with pkg_add. Compiling from /usr/ports still results in a package managed by the pkg_ tools. Must set KRB5_HOME to /usr so existing libs in /usr/lib are overwritten by package. Otherwise the resulting binaries pickup some stock libs in /usr/lib and don't work. Procedure:
export KRB5_HOME=/usr
cd /usr/ports/security/krb5
make install

cd /usr/ports/net/openafs
make install
You'll have to answer a few questions. Defaults are fine. I did not build openafs with fuse support but that might be worth trying someday.

Net-snmp

FreeBSD bsnmpd is nothing like net-snmp and the config totally not compatible. I installed net-snmp from /usr/ports/net-mgmt/net-snmp. The precompiled net-snmp installed with pkg_add -r had some perl library issues. I am pretty sure this is because perl was compiled and installed out of ports as well and the net-snmp package probably linked against the precompiled package. Possibly it all could have been avoided by installing perl with pkg_add prior to compiling krb5 instead of letting it compile one for me. I'm assuming the package is not as new as what the ports system downloads at compile time.

ZFS pool setup

This isn't really the recommended setup but has worked for us and is a fair compromise between storage devoted to parity and overall performance. Recommended is raidz pools of 3-9 disks such as provided in the alternate example below. If performance of this system becomes an issue, switch to the alternate configuration.

zpool create pool1 \
raidz2 /dev/da0 /dev/da8 /dev/da16 /dev/da24 /dev/da32 /dev/da40 /dev/da1 /dev/da9 /dev/da17 /dev/da25 /dev/da33 /dev/da41 \
raidz2 /dev/da2 /dev/da10 /dev/da18 /dev/da26 /dev/da34 /dev/da42 /dev/da3 /dev/da11 /dev/da19 /dev/da27 /dev/da35 /dev/da43 \
raidz2 /dev/da4 /dev/da12 /dev/da20 /dev/da28 /dev/da36 /dev/da44 /dev/da5 /dev/da13 /dev/da21 /dev/da29 /dev/da37 /dev/da45 \
raidz2 /dev/da6 /dev/da14 /dev/da22 /dev/da30 /dev/da38 /dev/da46 /dev/da7 /dev/da15 /dev/da23 /dev/da31 /dev/da39 /dev/da47

Alternative configuration:
zpool create pool1 \
raidz /dev/da0 /dev/da8 /dev/da16 /dev/da24 /dev/da32 /dev/da40 \
raidz /dev/da1 /dev/da9 /dev/da17 /dev/da25 /dev/da33 /dev/da41 \
raidz /dev/da2 /dev/da10 /dev/da18 /dev/da26 /dev/da34  \
raidz /dev/da3 /dev/da11 /dev/da19 /dev/da27 /dev/da35 /dev/da43 \
raidz /dev/da4 /dev/da12 /dev/da20 /dev/da28 /dev/da36 /dev/da44  \
raidz /dev/da13 /dev/da21 /dev/da29 /dev/da37 /dev/da45 \
raidz /dev/da6 /dev/da14 /dev/da22 /dev/da30 /dev/da38 /dev/da46  \
raidz /dev/da7 /dev/da15 /dev/da23 /dev/da31 /dev/da39 /dev/da47
zpool add pool1 spare /dev/da42 /dev/da5

NFS setup

Create volume:
zfs create -o mountpoint=/atlas/data19 pool1/data19
Exports format is a little different. No host wildcards allowed and you can't mix -network with hostnames. Might be a way to do this better with /etc/netgroups.

/atlas/data19 -maproot=root -network 141.211.101.0/24
/atlas/data19 -maproot=root -network 141.211.43.96/27
/atlas/data19  -maproot=root -network 10.0.0.0/8
/atlas/data19  -maproot=root  -network 192.41.230.0/23
/atlas/data19  -maproot=root  -network 192.41.236.0/23
/atlas/data19  -maproot=root  physttd0nt04.physics.lsa.umich.edu \
                physd0.physics.lsa.umich.edu \
                venus.ultralight.org \
                atums1.cern.ch atums2.cern.ch atums3.cern.ch \
                mars01.cern.ch \
                mars02.cern.ch \
                mars03.cern.ch \
                mars04.cern.ch \
                mars05.cern.ch \
                avoda.physics.lsa.umich.edu \
                lxbing.physics.lsa.umich.edu \
                bzoffice.physics.lsa.umich.edu \
                jwc.physics.lsa.umich.edu \
                ganesh.physics.lsa.umich.edu

Add appropriate lines to /etc/rc.conf copied from /etc/defaults/rc.conf and set to "YES" as needed.

### Network daemon (NFS): All need rpcbind_enable="YES" ###
amd_enable="YES"                 # Run amd service with $amd_flags (or NO).
amd_program="/usr/sbin/amd"     # path to amd, if you want a different one.
amd_flags="-a /.amd_mnt -l syslog /host /etc/amd.map /net /etc/amd.map"
amd_map_program="NO"            # Can be set to "ypcat -k amd.master"
nfs_client_enable="NO"          # This host is an NFS client (or NO).
nfs_access_cache="60"           # Client cache timeout in seconds
nfs_server_enable="YES"          # This host is an NFS server (or NO).
nfs_server_flags="-u -t -n 4"   # Flags to nfsd (if enabled).
mountd_enable="NO"              # Run mountd (or NO).
mountd_flags="-r"               # Flags to mountd (if NFS server enabled).
weak_mountd_authentication="NO" # Allow non-root mount requests to be served.
nfs_reserved_port_only="YES"     # Provide NFS only on secure port (or NO).
nfs_bufpackets=""               # bufspace (in packets) for client
rpc_lockd_enable="YES"           # Run NFS rpc.lockd needed for client/server.
rpc_lockd_flags=""              # Flags to rpc.lockd (if enabled).
rpc_statd_enable="YES"           # Run NFS rpc.statd needed for client/server.
rpc_statd_flags=""              # Flags to rpc.statd (if enabled).
rpcbind_enable="YES"             # Run the portmapper service (YES/NO).
rpcbind_program="/usr/sbin/rpcbind"     # path to rpcbind, if you want a different one.
rpcbind_flags=""                # Flags to rpcbind (if enabled).
nfsv4_server_enable="YES"        # Enable support for NFSv4
nfscbd_enable="NO"              # NFSv4 client side callback daemon
nfscbd_flags=""                 # Flags for nfscbd
nfsuserd_enable="NO"            # NFSv4 user/group name mapping daemon
nfsuserd_flags=""               # Flags for nfsuserd

Firewall

Simple firewall setup, allows anybody around here to get in but blocks most of the outside world.
# No restrictions on Inside LAN Interface for private network
# Not needed unless you have LAN
#################################################################

pass out quick on lagg0.4010 all
pass in quick on lagg0.4010 all

#################################################################
# No restrictions on Loopback Interface
#################################################################
pass in quick on lo0 all
pass out quick on lo0 all

#################################################################
# Interface facing Public Internet (Outbound Section)
#################################################################
# noooooo....let's not do that
#block out log first quick on dc0 all
pass out quick on lagg0 all keep state

#################################################################
# Interface facing Public Internet (Inbound Section)
#################################################################

# grid.umich.edu 
pass in quick on lagg0 from 141.211.43.96/27 to any

# most of physics.lsa.umich.edu (?)
pass in quick on lagg0 from 141.211.97.0/22 to any
pass in quick on lagg0 from 141.211.101.0/24 to any

# the rest
pass in quick on lagg0 from 192.41.230.0/23 to any
pass in quick on lagg0 from 192.41.236.0/23 to any
pass in quick on lagg0 from 141.213.133.192/27 to any
pass in quick on lagg0 from 141.213.154.32/27 to any
pass in quick on lagg0 from 192.91.245.64/28 to any

# ignore/no log for igmp multicast
block in quick on lagg0 proto igmp all
# allow pings and such
pass in quick on lagg0 proto icmp all 

block in log first quick on lagg0 all

-- BenMeekhof - 25 Apr 2012
Topic revision: r13 - 18 May 2012, BenMeekhof
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback