You are here: Foswiki>AGLT2 Web>CernCluster (21 Jan 2016, BenMeekhof)EditAttach

Configuration of the UM CERN Computing Cluster in BAT 188

In November 2014, the UM CERN Computing Cluster was upgraded to SLC6. Some old hardware was retired, new hardware was installed and configured. The details of how they were configured for use are shown in later sections. Since then an install script has been written to incorporate these actions. New systems should use the install script if possible, and new actions incorporated into that script on an ongoing basis.

System Install Script

The script atint01:/export/scripts/setup/um-cern-setup.sh can take care of most basic setup. From the README in same directory:
The um-cern-setup.sh script will do some basic setup for systems at CERN.

Including but not limited to:
- add UM/USTC users to passwd (kerberos does auth). See addusers.sh script for list of users.  
- Modify /etc/passwd to point to /net/s3_data_home/ if user has directory there.  Also checks /net/ustc_home.   
        * Modify EXCEPTIONS in addusesr.sh script to skip this for users
- DOES NOT run /net/share/scripts/create_local_home_users.sh to create 
  symlinks in /home to /net homes  (if user has NFS home the full path will be put into passwd).  
- NFS automounts
- add B188 cluster root key to root authorized_keys
- install our krb5.conf with ATLAS realm in addition to CERN
- install UMATLAS repo
- install FusionInventory Agent (remove CERN OcsInventory)
- install some basic core packages (emacs,vi,nano,gcc,screen,etc).  Also installs CERN libs/packages documented in following sections.
- if hostname matches *pcatum* install a bunch of additional useful packages
- Install iptables with openings for condor
- Install/configure cvmfs (runs /net/share/cvmfs/install.sh)
- Install/configure condor, starts condor service.  Installs RPM from /net/share/condor-8.2.10-345812.rhel6.7.x86_64.rpm, runs install scripts:
        * /net/share/condor_config_files/nodeinfo/config_condor_prio.sh
        * /net/share/condor_config_files/set_condor_dirs_and_files.sh
- Add sysctls as documented in following sections
- Run addprinters.sh to add printers for building as mappped by hostname in file 'locationmap'.
  Info from https://network.cern.ch/sc/fcgi/sc.fcgi?Action=SearchForDisplay&DeviceName=pcatum*


To use, copy directory contents to target system and run um-cern-setup.sh <primary user>.  
Primary user will be given sudo access but is an optional argument.

To automate this you can try 'setuphost.sh <host> <primary user>' (from atintXX as root)
The script will copy everything and run the setup script over ssh. 

Backups

Important directories on atint01 are backed up nightly with the following in /etc/cron.d/backup. A one-time 'fallback' backup of all three was also taken at time of this writing.
# backup useful filesystems nightly
0 1 * * * root tar -cf /net/s3_datad/atint01/atint01-export-nightly.tgz /export
3 1 * * * root tar -cf /net/s3_datad/atint01/atint01-root-nightly.tgz /root
4 1 * * * root tar -cf /net/s3_datad/atint01/atint01-etc-nightly.tgz /etc

Cookbook doings.

  • Re-establish /root/.ssh (700) directory and its save content authorized_keys (600)
  • Set up /etc/auto.master and /etc/auto.net from saved configurations
    • ustc-lin01 disks are no longer mounted
  • Configure iptables (see below)
    • service iptables restart
  • On atint01 only, restore /export directory from saved locations
    • Modify /etc/exports and restart nfs service
  • useraddcern roball diehl mckee bmeekhof qianj zzhao
    • These use default afs directories, so leave them that way
  • Install what should be the set of ATLAS computing rpms
  • On interactive machines only
    • yum -y install screen
  • Pre-set condor and cvmfs account information. Note that the initial set of machines installed used group 492 for cvmfs, but later machines have the sfcb group at this group id. So, these directions will be for future machines. ALWAYS check first which 49X group IDs are really in use before inserting these definitions.
[root@atint01 ~]# grep -e fuse -e cvmfs -e condor /etc/passwd
cvmfs:x:496:490:CernVM-FS service account:/var/lib/cvmfs:/sbin/nologin
condor:x:495:491:Owner of Condor Daemons:/var/lib/condor:/sbin/nologin
[root@atint01 ~]# grep -e fuse -e cvmfs -e condor /etc/group
fuse:x:493:cvmfs
cvmfs:x:490:
condor:x:491:
  • Install cvmfs using the default partition name for its cache
    • rpm -Uvh http://cvmrepo.web.cern.ch/cvmrepo/yum/cvmfs/EL/6/`uname -i`/cvmfs-release-2-4.el6.noarch.rpm
      • Edit cernvm.repo and modify [cernvm] to enabled=0 so it must be explicitly enabled for use.
      • sed -i s/enabled=1/enabled=0/ /etc/yum.repos.d/cernvm.repo
    • yum -y --enablerepo=cernvm install cvmfs-2.1.19-1.el6 cvmfs-auto-setup-1.5-1 cvmfs-init-scripts-1.0.20-1 cvmfs-keys-1.5-1
    • Create /etc/cvmfs/default.local with the content below
    • Create /etc/security/limits.d/cvmfs.conf with the content below
    • service autofs restart
    • Create the cvmfs setup files for ATLAS software
      • mkdir -p /usr/local/bin/setup
      • cp -p cvmfs_atlas.[sh|csh] from /net/share/cvmfs directory.
  • Test cvmfs in various ways
    • cvmfs_config chksetup (does not like cernvmfs.gridpp.rl.ac.uk, but this is a CERN issue)
    • cvmfs_config probe
    • cvmfs_config status
  • Locally install the cern libs
    • ln -s /usr/libexec/CERNLIB /cern
    • cd /net/share; tar cf - cern|(cd /usr/libexec; tar xf -); mv /usr/libexec/cern /usr/libexec/CERNLIB
  • Create all the local user accounts based upon the maintained list
    • /net/share/scripts/create_local_home_users.sh
    • This uses the list in /net/share/scripts/user_list_local_home.txt
  • Install and configure Condor
    • yum -y localinstall /net/share/condor-8.2.3-274619.rhel6.5.x86_64.rpm
    • /net/share/condor_config_files/nodeinfo/config_condor_prio.sh
      • Run this any time the Condor configuration changes
    • /net/share/condor_config_files/set_condor_dirs_and_files.sh
      • This should only be run once, but it is non-destructive to do it again
  • Edit sysctl.conf on the interative machine, and manually update the changed parameters
    • echo 1000 > /proc/sys/net/core/somaxconn
    • echo 4194303 > /proc/sys/kernel/pid_max
# Increase the PID max from the default 32768
kernel.pid_max = 4194303

# Increase the default connection backup from the default 128
net.core.somaxconn = 1000

Pushing commands to the cluster machines

A "push_cmd.sh" script, closely emulating the cluster_control suite at UM (but modified to not use the cluster_control DB), is in place on atint01 in /root/tools. Many of the commands below utilize this script, and its list of machines held in /root/tools/acct_machines.

Making a new user account

New user accounts on the BAT188 cluster rely upon the existence of that user's CERN account. Credentials are copied via "useraddcern", and then the home directory is modified in the passwd file to place the user's home directly on the cluster instead of in afs space, although the latter is still available to the user. The script on atint01 "/export/scripts/mk_nfs_homed_account.sh" is utilized to this purpose.

This script also utilizes the "push_cmd.sh" script in /root/tools, and the subdirectory there that contains lists of active machines.

[root@atint01 scripts]# ./mk_nfs_homed_account.sh
  Usage: ./mk_nfs_homed_account.sh <account_name> <institute> [<bulk disk>]
  Must supply the account name as first argument
  Legal institutes for second arg are: um ustc
  Third (optional) argument is the bulk disk directory to use
    This defaults to datac (/net/s3_datac) for um if not present
    And it defaults to data1 (/net/ustc_data1) for ustc if not present

File content as referenced above

iptables content

=========================================
iptables on atint01 needs following additions:
# Accept UMATLAS muon cluster
-A INPUT -s 137.138.94.64/26 -j ACCEPT
# Also accept ustclin0N machines
-A INPUT -s 137.138.100.0/24 -j ACCEPT
#
# For Condor
#
-A INPUT -m udp -p udp --dport 9600:9700 -j ACCEPT
-A INPUT -m tcp -p tcp --dport 9600:9700 -j ACCEPT
-A INPUT -m udp -p udp --dport 33000:35000 -j ACCEPT
-A INPUT -m tcp -p tcp --dport 33000:35000 -j ACCEPT
#
# Open up the NFS ports needed to mount all the volumes
# NFS ports
-A INPUT -m state --state NEW -m udp -p udp --dport 875 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 875 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 32769 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 32803 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 892 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 892 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 111 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 111 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 2049 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 2049 -j ACCEPT

=======================================
iptables on  atums1 needs just this set
# Accept UMATLAS muon cluster
-A INPUT -s 137.138.94.64/26 -j ACCEPT
# Open up the NFS ports needed to mount all the volumes

# NFS ports
-A INPUT -m state --state NEW -m udp -p udp --dport 875 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 875 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 32769 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 32803 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 892 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 892 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 111 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 111 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp --dport 2049 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 2049 -j ACCEPT

=========================================================
Any other WN needs just this set

#
# Accept UMATLAS muon cluster
-A INPUT -s 137.138.94.64/26 -j ACCEPT
# Also accept ustclin0N machines
-A INPUT -s 137.138.100.0/24 -j ACCEPT
#
# For Condor
#
-A INPUT -m udp -p udp --dport 33000:35000 -j ACCEPT
-A INPUT -m tcp -p tcp --dport 33000:35000 -j ACCEPT
#

====================================

Content of /etc/cvmfs/default.local

======================================
# This is /etc/cvmfs/default.local

# this files overrides and extends the values contained
# within the default.conf file.

# Use 0.875*partition size for the quota limit
CVMFS_QUOTA_LIMIT='28450'
CVMFS_HTTP_PROXY="http://ca-proxy.cern.ch:3128;http://ca-proxy1.cern.ch:3128|http://ca-proxy2.cern.ch:3128|http://ca-proxy3.cern.ch:3128|http://ca-proxy4.cern.ch:3128|http://ca-proxy5.cern.ch:3128"
CVMFS_CACHE_BASE='/var/lib/cvmfs'

# the repos available
CVMFS_REPOSITORIES="\
atlas.cern.ch,\
atlas-condb.cern.ch,\
atlas-nightlies.cern.ch,\
sft.cern.ch"

===============================

The content of /etc/security/limits.d/cvmfs.conf

cvmfs soft nofile 32768
cvmfs hard nofile 32768

-- BobBall - 19 Nov 2014
Topic revision: r15 - 21 Jan 2016 - 21:08:54 - BenMeekhof
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback