You are here: Foswiki>AGLT2 Web>ZFsforAFS (28 Apr 2015, BobBall)Edit Attach

Using ZFS on Linux for AGLT2 AFS Fileservers

Recently ZFS on Linux became available. ZFS has lots of nice features including Copy On Write (COW), data integrity via checksum, inexpensive snapshots and other features.

The AFS file servers (linat06/07/08.grid.umich.edu) need upgrading to AFS 1.6.2 and Scientific Linux 6.4. As part of the upgrade we will try to use ZFS as the backend storage for AFS volumes

Creating Initial AFS Server VM

Our AFS cell aglt2.org has been virtualized in VMware vSphere 5.x for more than one year. To migrate to a new OS and AFS version, we intend to create a temporary AFS server (atback1.grid.umich.edu) to host AFS volumes from linat06/07/08, while we recreate each VM.

The initial atback1.grid.umich.edu VM (linat06n) was created using Ben's Cobbler setup (see https://www.aglt2.org/wiki/bin/view/AGLT2/CobblerInfrastructure ) This setup was used to build the VM using Scientific Linux 6.4, 64-bit.

After the base host was built the VM had a new iSCSI LUN (~1TB) attached from the UMFS15 (Oracle NAS) system. This will become /viceph once we get ZFS installed.

The following OpenAFS RPMS were installed:
openafs-plumbing-tools-1.6.2-0.144.sl6.x86_64 
openafs-1.6.2-0.144.sl6.x86_64 
kmod-openafs-358-1.6.2-0.144.sl6.358.0.1.x86_64 
openafs-krb5-1.6.2-0.144.sl6.x86_64 
openafs-server-1.6.2-0.144.sl6.x86_64 
openafs-client-1.6.2-0.144.sl6.x86_64 
kmod-openafs-1.6.2-5.SL64.el6.noarch 
openafs-kpasswd-1.6.2-0.144.sl6.x86_64 
openafs-authlibs-1.6.2-0.144.sl6.x86_64 
openafs-module-tools-1.6.2-0.144.sl6.x86_64 
openafs-compat-1.6.2-0.144.sl6.x86_64 
openafs-devel-1.6.2-0.144.sl6.x86_64

Install ZFS

First we setup the ZFS repo /etc/yum.repos.d/zfs.repo
[zfs]
name=ZFS of Linux for EL 6
baseurl=http://archive.zfsonlinux.org/epel/6/$basearch/
enabled=1
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux

[zfs-source]
name=ZFS of Linux for EL 6 - Source
baseurl=http://archive.zfsonlinux.org/epel/6/SRPMS/
enabled=0
metadata_expire=7d
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux

This is installled via =rpm -ivh /afs/atlas.umich.edu/home/smckee/public/zfs-release-1-2.el6.noarch.rpm=
or, alternately directly from zfsonlinux.org, yum localinstall --nogpgcheck http://archive.zfsonlinux.org/epel/zfs-release.el6.noarch.rpm

Once the repo is in place you can do:

yum -y install zfs

You can create a new zpool on a suitable device (iSCSI 1TB) via zpool create <poolname> </dev/sdX>. I had to force the creation using
zpool create -f zfs /dev/sdb

My iSCSI device was mounted on the VM at /dev/sdb That creates a zpool named zfs. I then created a new pool called zfs/viceph via:
zfs create zfs/viceph

The next important step is to create the right mountpoint so the zfs/viceph shows up as /viceph:
zfs set mountpoint=/viceph zfs/viceph

Now when zfs starts the zpool zfs/viceph is mounted at /viceph where AFS can find it.

We need to make sure zfs starts automatically via chkconfig --add zfs; chkconfig zfs on

NOTE: mounting for zfs is not controlled by mount but by zfs mount: zfs mount /viceph

Last step is to disable atime updates and turn on lz4 compression (See notes below on tuning)

SNAPshots on ZFS

The zfs filesystem supports snapshots. There are some nice cron-based utilities that implement an "auto-snapshot" capability. I installed the one from https://github.com/zfsonlinux/zfs-auto-snapshot. Just unzip the package (copy in ~smckee/public/zfs-auto-snapshot-master.zip), run make, then make install.

You can see the resulting cron entries in /etc/cron*. The snapshots show up in the zfs mount areas in <MOUNTPOINT>/.zfs (NOTE this area is not visiable but you can cd to it and run ls)

Enable AFS on New Server

We first needed to copy over the /usr/afs/local directory from linat06.grid.umich.edu. Once on atback1, we removed any of the sysid* files and edited the NetInfo and NetRestrict files suitable for the atback1 IP addresses.

Next copy over the /usr/afs/etc contents from linat06, which includes the KeyFile

Make sure zfs is started: service zfs start

Verify /viceph is mounted on atback1: df

Start AFS server: service afs-server start

Verify AFS server is running:
[root@atback1 ~]# bos status atback1.grid.umich.edu
bos: running unauthenticated
Instance fs, currently running normally.
    Auxiliary status is: file server running.
Instance dafs, disabled, currently shutdown.
    Auxiliary status is: file server shut down.

At this point we have the new temporary server running. We can now proceed to move all the RW volumes from linat06 to this new file server. (First test a few relatively unused volumes and verify access continues to work). Once linat06 has all RW volumes moved, we can remove all the RO replicas (vos remote linat06.grid.umich.edu /vicepe <VOLUME>) When /vicepe is empty we can build a new linat06 VM (renaming the old one) and attach the iSCSI volume which hosted /vicepe to it. We then install linat06 with OpenAFS and zfs (as we did above for atback1), format /vicepe as zfs, mount it and enable AFS. Then migrate everything back to linat06 from atback1.

Then we do the same process for linat07 and then linat08.

Firewall/iptables Setup Information

An AFS file server needs certain ports open to function. One example for changes to /etc/sysconfig/iptables is below. First near the top add:
:AFS-INPUT - [0:0]

Later in a suitable location put these lines:
-A INPUT -j AFS-INPUT
-A AFS-INPUT -p udp -m udp --dport 7000:7010 -j ACCEPT
-A AFS-INPUT -p udp -m udp --sport 7000:7010 -j ACCEPT

Test access from a client. If the client has problems, try fs checkservers and fs checkvolumes.

Notes on Memory Issues

ZFS on Linux has a problem where the ARC cache uses about twice as much memory as it should. The ZFS developers are aware of this problem (found by CERN) and will have a fix in a future version (beyond 0.6.1). Meanwhile the recommendation from CERN is to create an /etc/modprobe.d/zfs.conf containing something which restricts the maximum ARC mem usage to about 25% of physical memory:

# Set max ARC to 25% of physical memory (8G on this VM so about 2G)

options zfs zfs_arc_max=214733648

Notes on DKMS and ZFS

Sometimes a new kernel may have problems getting DKMS to properly build the needed spl and zfs modules. The following sequence should work to force a build:

dkms uninstall -m zfs -v 0.6.1 -k 2.6.32-358.11.1.el6.x86_64 #(Put in the correct versions as needed)
dkms uninstall -m spl -v 0.6.1 -k 2.6.32-358.11.1.el6.x86_64
dkms build -m spl -v 0.6.1 -k 2.6.32-358.14.1.el6.x86_64
dkms install --force -m spl -v 0.6.1 -k 2.6.32-358.14.1.el6.x86_64
dkms build -m zfs -v 0.6.1 -k 2.6.32-358.14.1.el6.x86_64
dkms install --force -m zfs -v 0.6.1 -k 2.6.32-358.14.1.el6.x86_64
dkms status

Notes on Tuning ZFS for AFS (and vice-versa)

There was a thread on the OpenAFS list that had some good suggestions. One involves a future version of AFS:
>    - if running OpenAFS 1.6 disable the sync thread (the one which
>    syncs data every 10s) It is pointless on ZFS (and most other
>    file systems) and all it usually does is it negatively impacts
>    your performance; ZFS will sync all data every 5s anyway
>>    There is a patch to OpenAFS 1.6.x to make this tunable. Don't
>    remember in which release it is in.

This is the -sync runtime option, which will be in the release after
1.6.2 (it is present in the 1.6.3pre* releases). To get the behavior
that Robert describes here, give '-sync none' or '-sync onclose'. Or
technically, '-sync always' will also get rid of the sync thread, but
will probably give you noticeably worse performance, not better.

For our setup I edited the /usr/afs/local/BosConfig and changed the fileserver lines to include '-sync onclose':

parm /usr/afs/bin/fileserver -sync onclose -pctspare 10 -L -vc 800 -cb 96000

...

parm /usr/afs/bin/dafileserver -sync onclose -pctspare 10 -L -vc 800 -cb 96000

The thread is http://lists.openafs.org/pipermail/openafs-info/2013-June/039633.html

Another tuning used on linat06
- Disable access time updates

     OpenAFS doesn't rely on them and you will be saving on some
unnecessary i/o,you can disable it for entire pool and by default all file system
within the pool will inherit the setting:

        zfs set atime=off zfs/vicepe

Also turned on compression:

[root@linat06 ~]# zfs set compression=lz4 zfs/vicepe

[root@linat06 ~]# zfs get "all" zfs/vicepe | grep compress

zfs/vicepe compressratio 1.00x

zfs/vicepe compression lz4 -

The compression only affect file created after it is set.

-- ShawnMcKee - 21 Jul 2013
Topic revision: r5 - 28 Apr 2015, BobBall
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback