The AGIS System and Changing SchedConfig Parameters at AGLT2 Why AGIS Information about changeover to AGIS came in this Email of 2/28/2013 Ladies and gentleme...
Initial install of ATLAS 12.0.1 software kit (Bob, Ed, Shawn 18 Jul 06). Follow instructions on InstallingAtlasSoftware, and DraftNewInstallForWB (which is more r...
IO with EOS command lines User can access (list and read) files in CERN EOS from non lxplus nodes, without being authenticated. In order to get full permission(wr...
Adding New OSS to Lustre Below are the steps needed to add a new OSS (storage server) to Lustre * Install or re purpose an SL5.5 node * Update all BIOS/Firm...
Monitoring: AGLT2 Compute Summary Page The initial idea was to simply extract and rearrange some lines of the HTML Ganglia page for the MSU site, in order to reor...
AFS Tape Backups with Amanda Amanda Commands For operations with amanda, you should be the amanda user on bambi: "su amanda". The exception is "amrecover". Her...
Intro A simple example of how to get Condor to run Athena jobs. Submitting a Simple Fastsim Job Transform Make a file called test.cmd. This sets up the basic co...
Controlling the ATLAS Queues, and the pilot rate Much of the basic command structure is documented in this document. There is also a newer document about setting...
ATLAS Software PLEASE READ FIRST: This is a guide to installing and using Atlas software of the Tier3. It is for reference for Administrators only. If you need a ...
Auto Test Programs over AGLT2 Cluster Related PNFS mount point test Purpose Make sure every computer node has "/pnfs/aglt2.org" mounted , and every gridftp door ...
Bi weekly AGLT2 site meeting notes Thursday, June 26, 2008 See SecurityPlanning Security Notes Need to check syslog ng configuration for all hosts and base on mo...
Building Lustre RPMs for a new kernel These are very old (version 1.8) directions When we move to a new kernel on a machine where lustre must also be mounted, ne...
11/13/2007 Replaced Dell cables with Gore cables on the following machines after seeing physical link counters increase: umfs07,umfs09,umfs10,umfs11 1/25/2008...
SVN repository: insert repository here How to use and usage pointers: insert things here Notes on CalibDataClass and structure of script: notes here Creating dev...
Configuration of the UM CERN Computing Cluster in BAT 188 In November 2014, the UM CERN Computing Cluster was upgraded to SLC6. Some old hardware was retired, new...
Grid Certificate Distribution at AGLT2 The certificates in /etc/grid security/certificates are used by the OSG authentication stack. It is a regularly updated, st...
Transition from CFEngine v2 to v3, and Build dCache Pool Servers Introduction As documented elsewhere in this Wiki, cfengine2 is currently (Oct 2012) in use to c...
Configuring a simple routing change with cfengine 2 for linux Unfortunately, this is one of those "not very portable" things that other config management tool adv...
Checking out and editing CFEngine policy Some general notes and information Policies are exported from /var/cfengine/policy on umcfe or msucfe. Any directory un...
Cleaning Up the srmspacefile Table (SRM Space token Allocations) We recently found out that our srmspacefile table in dcache was inconsistent with our actual spac...
MSU 2008 May BNX2 DKMS Ganglia Found that the existing bnx2 network driver was the cause of the large spikes in the ganglia network plots. It intermittently pu...
Cluster Control This is the main page for information about the Cluster Monitoring and Control tools. At this time, the state of two conditions is maintained an m...
Manuals Cobbler manual: http://www.cobblerd.org/manuals/ For information on the Cheetah template language used in kickstart templates: http://www.cheetahtemplate....
AGLT2 Compute Node Health Assessment Utilities General Goals These compute mode health assessment utilities were designed to assist in managing the AGLT2 compute...
Setting up Condor CE Condor CE is a replacement for globus on our gatekeepers. Condor G can still be used to submit jobs to the gatekeeper, but then the JobRoute...
Job Queing at Michigan State NOTE: THIS PAGE IS PRETTY MUCH OUT OF DATE The queing system at Michigan State has not yet been established. Job Queing at the Unive...
Condor Batch System This is the main page for administrative info about the Condor batch system(s) in use at AGLT2. User info is at CondorUser. A description of t...
Planning Condor Configuration Updates for AGLT2 Now that AGLT2 is running on an SL6.4 OS we can plan on implementing some new features in Condor that will take ad...
What To Do after losing the dcache partions of a Node? Due to the rocks rebuilding failures. dcache partitions could be wiped during the rebuilding, thus we have ...
MSU Raritan sh /usr/local/Raritan/Raritan MPC/5.0.3.5.36/start.sh The first time running it, you will need to tell this client where the KVM is: create "new prof...
Shutdown/Startup procedures for AGLT2 Clusters Procedures to cleanly bring All AGLT2 activity to a halt * service cfengine3 stop * This prevents changes...
If a line starts with $ it is a command to be run as a normal user, if it starts with # it is a command to be run as root. * $ cd /net/data07/tests/stress_test/ ...
SingleTopDPDMaker This page is aiming at presenting a tool producing D3PDs (TTrees) and D2PDs for analyses within the Athena framework. The SingleTopDPDMaker tool...
dCache Config Overview dCache from Database to Filesystem Below is a view of the chain of configuration for the dCache system starting with the PostgreSQL databa...
Dcache on AGLT2 service distribution: all the dcache services are physically distributed to 37 nodes. tow of them are head nodes(head01.aglt2.org and head02.aglt...
(Re)Configuring the log4j.properties in dCache to output to Syslog At AGLT2 we have setup a central loghost running syslog ng which also has a php syslog ng web i...
This page is OBSOLETE All site services now run at BNL. AGLT2 DQ2/DDM Verification and Debugging One of the central tasks for our Tier2 is to support a DQ2 ser...
Just starting this out... Tom, May 21 Data Storage Locations What locations are in use and how they are used. We have a number of storage locations for AGLT2: ...
Error: E1000 Failsafe E1229 CPU2 VCORE Replaced motherboard and CPU 2 from Dell support. System not showing error. Now stops and asks for F1 or F2(setup) to co...
Dcache TroubleShooting Case 1.. node c 104 2 has some broken files which mean the md5sum of local file doesnt match the md5sum of source file.. we decided to rec...
Cacti Setup for Dell Nodes The Dell PE1950 and PE2950 nodes have a large number of fans and temperature probes which are not exposed via SNMP. The presents a pro...
AGLT2/Dell.OmreportOmconfig There is a Dell ROCKS Roll, do we want that? AGLT2/Dell.DellOrderStatusMSU check status of a Dell order for MSU AGLT2/Dell.DellService...
HS06 Measurements Performed at the Dell Innovations Lab in August/September, 2017 32 bit Results, Summary Machine/Model ChipSet Speed BIOS Settings RA...
PC6248 Reformat Note that using the command show dir seems be a good stress test. Even switches that pass the check disk commands below can fail running the show...
Power Connect SNMP Lots of info is available via SNMP from the PowerConnects. References * http://wiki.xdroop.com/space/snmp/Switching Tables * http://for...
Dell Poweredge x950 Hardware Notes Information about the Dell 1950 and 2950 nodes. Dell Docs BIOS The Fall '07 order arrived with BIOS v1.5.1. This was release...
Provisioning the Dell 2950/MD1000 Storage Servers Recipe for getting a new PE2950 and MD1000 combination going as an fss nfs appliance in our ROCKS cluster. Refer...
Details on Configuring the /etc/multipath.conf for our Nexsan Setup The multipath v2 d with our configuration gives: root@umfs03 ~ # multipath v2 d create: m...
Setting up Two OSG Gatekeepers for a Single Condor Cluster We need to load balance access to our Condor cluster because of a possible time out issue we are seeing...
Dark Sector Event Generation This section describes getting Itay Yavin's dark sector event generation files into a format usable for ATLAS simulation. Itay's cod...
Extend Compute Need to update extend compute for use at MSU and for changes in ROCKS 4.3. Actions in version from early Nov 2007 * install libgfortran with an...
Extending LVM Disks on VMware VMs We sometimes have partitions fill during operations and when those partitions are on VMs and using LVM we can easily extend them...
FTS Channel Management Instructions The FTS channels for AGLT2 and be managed using the glite software. You will need an ATLAS VOMS production role (voms proxy i...
Fixing OMD Setup for AGLT2 Use cases There are a number of issues we are hitting as we try to setup OMD for AGLT2 use. Below are the current list of issues. When ...
Puppet Infrastructure Setup from our repo: To bootstrap foreman and puppetmaster from our svn code: Initial setup notes Installed from puppetlabs repo: http://y...
Monitor Disk Activity with iostat and Ganglia The iostat utility from the sysstat package provides information about disk operations and throughput. It works in ...
Setting up gate02.grid.umich.edu as our AGL Tier2 Gatekeeper There are a number of steps we followed to get our new gatekeeper running. Hardware We had an Int...
We have a bunch of scripts which relies on the space token information, therefore,I define 2 hash based subroutine in the dcache perl Library, everytime you add/r...
Getting a Grid Certificate for ATLAS Use For getting a new certificate or renew a certificate,you can use the CERN CA to request the grid certificate: https://ca....
Testing Glusterfs For testing purposes only we used the Redhat Storage Appliance demo which has gluster tools pre installed. Docs are here: http://docs.redhat.com...
Setup of GRAM Auditing for AGLT2 (OSG 0.8.1) The current OSG installation (0.8.1) has Globus 4.0.5 which supports a new "auditing" feature. You can request that ...
A Plan for the ROCKS Graph ROCKS Graphs ROCKS uses the Redhat Anaconda installer to do installs. Using the Anaconda installer provides many advantages: * It ...
High Available Lustre MDS failover nodes with Redhat cluster tools Background: The Lustre Meta Data Server is integral to using Lustre. If it is not available...
Hardware Deployment This page will discuss the steps to deploy new hardware in the MSU server room. Preparing for a purchase Determine resource usage * Where...
HS06 Measurements Performed at AGLT2 We have made a variety of measurements at AGLT2 during September of 2009 in preparation for the upcoming purchase cycle. We p...
how to copy files from dcache system to your local machines? Now our dcache system support three protocols to access files in dcache:Dcap SRM and GSIFTP proto...
How To Get Data From Dcache directly by Root when using Root, you can specify the root files in 2 different ways: 1 download these files from the dcache director...
How To Add New Pools to Dcache This page is obsolete for dCache/Chimera. See HowToAddNewPoolsToChimera instead. On our existing dcache system ,I add some New ...
How to Add New Storage to dCache We will look at the example of umfs16, where 12 new pools were added. Following the xfs file system creation, all disks were mou...
Directions on Draining then Removing a Pool Set pool readonly. Can start drain right away, but will likely miss a few files ssh to admin domain \c PoolManager \c...
Easy Move dcache data from pool to pool UserCase we want to retire non resilient pool umfs07_2 from dcache which holds 1.7TB data,so we need to move these data t...
How to convert a RO volume to a RW volume in AFS 1) Get tokens as 'admin' via 'kinit admin' folllowed by 'aklog' 2) Check the mount: 'fs lsmount /afs/atlas.umi...
How To Extend PoolView Purpose Allows to group pools on the web pages either according to the PoolManager groups or to customized groups. How to ? Uncomment th...
How to Move data from pool to pool (how to drain a pool) *usecase*: we want to retire the node c 2 33 from the pool nodes..so we need to move the data from po...
Here is from an email: In dCacheSetup (in Aglt2, it is poolSetup)you need to define: metaDataRepositoryImport=org.dcache.pool.repository.meta.file.FileMetaDataRe...
Setup Srm Space reservation background details about why space revervation is needed, refer to srm space reservationdcache book. steps to set up space reservat...
Procedure for Installing or Upgrading dCache Servers Procedure for installing dCache servers. This is tested for use on the dCache storage nodes / gridftp doors....
Upgrading dcache storage hardware This is a procedure to follow when you don't wish to create a new pool but instead copy an old one to new hardware and have it o...
IO performance test with tuning (reset readahead) Here are some IO performance (we focus on the tow IO patterns :read and write)test with tuning of the readahead ...
Implementing Network QoS at AGLT2 Recently we have seen periods where our LANs have been congested and packets are dropped. This has resulted in some of the monit...
TOC% This document describes how to install amanda on Centos7, and also connect it to a new tape library EMC ML3. About EMC ML3 It has 2 drivers, It has 32 usabl...
Installation of OSG 0.6.0 on gate01.aglt2.org The installation procedure for OSG 0.6.0 on gate01.aglt2.org is below. It was installed on April 2nd, 2007. Please...
Install or Upgrade OSG at AGLT2 The main difference between these instructions and the usual documentation is that we use worker node and wlcg client installation...
KVM at MSU While most of the VM infrastructure resides within vSphere, we want to keep a backup windows install with the standalone client to debug problems with ...
LFC SQL Queries Below are some potentially useful SQL queries to check the status of the LFC. These are my test queries and I don't guarantee they are correct ...
Resizing LVM Partitions Some CERN systems were built with little space in /, with the bulk of the space in /home. However, this means HTCondor, that wants at lea...
Install Problem (really libaio architecture issue) During the "Configuration Assistants" startup I have an error. The "Oracle Net Configuration Assistant" succee...
The desire was to set up a local installation of the dq2 tools, eg, dq2_get and dq2_ls. Previous setups used by the UM group did not work for a variety of reasons...
Lustre 2.10 with ZFS 0.7.1 from standard repo This page documents building the Lustre 2.10 RPMs on CentOS 7.3 using the default yum install of ZFS 0.7.1. The ste...
Lustre At Aglt2 Lustre Deployment MDS(metadata Server) we have a failover pair of metadata servers,lmd01 and lmd02, both servers can access the same device (/...
Lustre Backup Following: http://wiki.lustre.org/manual/LustreManual18_HTML/BackupAndRestore.html Snapshots On umfs15 there are regularly scheduled hourly and dai...
General steps to follow on Migrating data from one OST to another 1. Set the source OST in read only status from mds server, if you do not want the files to be mi...
Lustre Configuration and Setup for AGLT2 In March 2010 we revisited our exploration of Lustre for use at AGLT2. This was motivated in part by the release of Lus...
Lustre Reinstall Notes After testing Lustre "in production" (mostly tests by Tiesheng) we have decided to go ahead with our plans to utilize Lustre to replace the...
Test results comparing zfs to ldiskfs The tests below run a test Lustre system (mgs umdist10) through its paces, starting with a zfs 0.6.4.2 straight up install...
Update Lustre on a testbed from 2.10.4 (SL7.6, zfs 0.7.9) to 2.12.3 (SL7.7, zfs 0.7.13) We are trying to upgrade lustre server from 2.10.4 (SL7.6, zfs 0.7.9) to 2...
Notes on upgrade of Lustre to 2.1.6 from 1.8.4 With the implementation of SL6.4 everywhere it became necessary to also upgrade Lustre from 1.8.4, which was not re...
Notes on setting up and configuring Lustre version 2.7 Index of Sections Source rpms We have chosen to use the kernel distributed with the rpms from the Lustre ...
MultiCore Condor Set UP Introduction AGLT2 implements a mix of static and dynamic job slots for MultiCore jobs. At the time of this writing, we use 10 static sl...
Installation and Configuration of Dell MD3460 Storage Basic Hardware This page refers specifically to hardware purchased in August 2016 using RBD 2016 funds. A s...
See also: * MSUDZeroOsgSE about the storage element * MSUDZeroOsgStartup Restarting the system * MSUDZeroOsgTests Testing the OSG site * MSUDZeroOsgJo...
Monitoring D0 Jobs Samgrid monitoring is at http://samgrid.fnal.gov:8080/ The list of resent jobs for the samgrid scheduler that is used for MSU jobs is here. Jo...
Storage Element An SRM/dCache instance is added to the site as a grid accessible Storage Element. dCache is a very flexible package for combining multiple filesy...
Restarting the MSU OSG Grid How to restart the system after an outage. Bring Up and Check Services Cluster Services General cluster services are required, for in...
These instructions use scripts and files found on senna at /home/koll/ * indicates step contains drill specific instructions Drill Preparation* * Check that ...
This page is obsolete Hardware maintenance is now logged at http://glpi.aglt2.org/ MSU Hardware Repairs Until we have a better system, I'm recording hardware rep...
Room/Site Infrastructure Monitoring at MSU Liebert Air Handlers The two Liebert System/3 Air Handler units have Intellislot Web / 485 cards. See: * http://ww...
Lustre Basics The Lustre file system is made of three types of servers: the management server (MGS), meta data servers (MDS), and object storage servers (OSS). Ea...
Submiting Jobs Often you will find yourself with the need to run batch jobs on Condor. This should be done entirely on the tier3's work/fast disks instead of on g...
Running a single command on condor runcommand.sh is a script for submitting a single line to be executed on the MSU tier 3 without having to mess around with cond...
Setting up the Bypass Queue The users requested a queue that would bypass the timed queues, i.e., a queue with no limits on it. The agreed upon way to denote such...
Setting up a new login/submit node Hardware 1 Pick a machine that will host the new login node if one has not already been picked. (Discuss with Philippe) ...
Setting up the timed queues The user's requested several timed queues that would hold a job after it had exceeded a certain amount of runtime. These queues each h...
MSU Tier 2 Administration MSU's computing resources make up approximately half of AGLT2. These machines are jointly administrated by MSU and UM. This page will br...
User Info for MSU Tier3 Regulations Your usage of the cluster must conform with MSU's acceptible use statement http://www.msu.edu/au/ Privacy The cluster is a m...
Workflows for modifying HTCondor configuration When modifying condor there are two broad phases any steps taken can be put into: the testing phase and the impleme...
Overview of MSU's Tier3 HTCondor Setup Intro Video for Admins Types of Machines on HTCondor HTCondor consists of three types of machines: submit nodes, worker n...
How to update the tier3 rack spreadsheet The spreadsheet that holds all of the rack information is located here. In order to edit this, you will need a google acc...
How to update visio 1 Log into senna. 1 Open up a terminal and run the command "rdesktop g 1280x1000 hepwin.pa.msu.edu" 1 Log into this machine with the...
pe2950 Utility Node Install Have a pe2950 with 2x 250GB drive and 4x 750 GB drives. Want to set it up to support a variety of cluster services including running ...
Backing up and moving VMs If the VM is running you need to pull a snapshot and backup, otherwise the .vmdk may not be consistent. Spaces in VM names for some back...
Management of Dcache main services to maintain head01 : dcache core head02: postgresql pnfs dcache core pool nodes: dcache core dcache pool main configurati...
Manual Replication of Hot Files in dCache Particularly for the Health Check, we need multiple copies of the source file All work is performed in either a browser ...
MDTChambers MDT status application Application public link (CERN login required): https://atlasop.cern.ch/atlas point1/muon/MDTchambers/ Following are some i...
Merging Existing Space Tokens When we setup space tokens for AGLT2 we assumed we needed a space token for each VO/Role that needed to be able to write to a space ...
Hardware Transition Planning from head01 (old R610) to head01 temp (new R630) We purchased a new Dell R630 to act as replacement hardware for our existing head01 ...
Migrating VMs into ESXI Once VMs are in ESXI, sloshing them between hosts is easy. But moving existing hardware and VMs into ESXI can be a little tricky. VMware p...
Migrating files to newly added OST so as to balance content It is desirable to distribute access over as many Lustre OSS as possible, so when a new OSS umdist04.a...
Local AGLT2 Monitors There are many monitors we've implemented. These include both AGLT2 and general USATLAS pages. Summaries * AGL Compute Summary page of Ph...
Installation of NDT on ndt.aglt2.org See also Patrick McGuigan's page at NDTInstallation. Installation overview (more details below) 1. Applied the web100 ker...
Potentially Useful Network Equipment Info about hardware we are considering using. Dell Powerconnect 6248 This is one of a new (Fall 2006) fixed switches that s...
Network Issues at AGLT2 This page is intended to capture the network related issues at AGLT2 Network Issues after UltraLight Router at Starlight (R04CHI) was Ret...
Planning for the production network. NetworkHardwareInfo Near term To Do List Here is a list of network related items that need doing as of February 4, 2011: ...
Network Testing and Debugging for AGLT2 During the last year we have seen many indications that all is not right with our network connections to BNL (and perhaps ...
Network Tuning and Testing On September 18, 2007 Dimitri Katramatos, Kunal Shroff and Shawn McKee tried to test and tune the following machines at BNL and Michiga...
Procedures followed to bring gate03 online as a test gate keeper NOTE: This page is changing as the procedures and tests evolve. This note will be removed once te...
Creation of New dCache Headnodes (Dell R610) in January 2011 As part of our Fall 2010 procurements we purchased 2 Dell R610 nodes to host the dCache services (he...
Procedure to Migrate dCache headnodes (head01/head02.aglt2.org) to new hardware and operating system. During fall 2009 and winter 2010, AGLT2 is migrating all lin...
Evaluation and testing of Nexsan SATABeast with B60E expansion Unpacking and Installation See photos and some comments here: https://picasaweb.google.com/ben.mee...
MSU Each rack has 2 PDUs named PDU RACKNUM N.msulocal where N is 1 or 2. For racks with UPSs, the 1 PDU is on the UPS. You can connect to the web interface usin...
Numpy and Scipy at AGLT2 The numpy and scipy software packages are in common use at AGLT2, but, the installed versions are somewhat old, having to do with the dea...
OSG CE 0.4.1 Install Instructions Introduction This document is intended for administrators responsible for installing and configuring: OSG Compute Element (CE) ...
The content below was copied from the OSG install Twiki page on June 5, 2006. This was done to allow us to use this Twiki to record install details for our OSG i...
I removed all of the packages from the Dag repository on linat05. To get the list of packages and remove them I used these commands: rpm qia grep B1 A1 "Ven...
OpenAFS and Kerberos on Windows Software prerequisites Kerberos for windows. The current release of OpenAFS 1.7.4 recommends the Heimdal Kerberos implementation. ...
Setting up Oracle on Linux The following documents the installation and setup of Oracle at the University of Michigan for use by the ATLAS Muon Calibration and Al...
Installing Updated Muon Calibration Schema New schema was made available in early February 2008. Since the changes were significant I totally removed the origin...
Some info on Oracle setup at AGLT2 * Oracle Installation on linux for the ATLAS Muon Calibration/Alignment centers. * Oracle MuonDB updated (new) schema Feb...
Oracle Upgrade from 10.2.0.2 to 10.2.0.3 Prior to installing the Rome muon calibration DB for replication we needed to update our Oracle installation. I received...
Installing pCache and LSM at AGLT2 We are interested in setting up both a Local Site Mover (LSM) and pCache on our worker nodes. The goals are: * Reduce the I...
Useful PNFS/Chimera SQL Queries NOTE: This page assumes you are running Chimera/PNFS rather than the older PNFS from dCache 1.8.x or earlier. First query: Fix PN...
The PanDA Auto Exclusion process for ANALY_AGLT2 Introduction Procedures here were documented by D. van der Ster in this talk. To see this you will need a CERN ...
Client tools for Panda Analysis jobs Intro The panda client package contains following tools to submit/manage analysis jobs on PanDA. The following instructions...
Install Postgresql on CentOS/RHEL/SL with Replication for Esmond This Wiki topic covers installing Postgresql with replication to support the Esmond DB. You will ...
Upgrading Postgresql on CentOS/RHEL/SL with Hot standby Systems This Wiki topic covers upgrading our existing PostgreSQL version 9.3.11 on Scientific Linux 6.7 64...
This section is already implemented for new user when the account was setup Protecting SSH Keys (or X509 Certificates) on AFS This section applies to user's with...
Proof on demand Quick start This section briefly describes how to setup a proof on demand system. Follow these steps: * Setup a root version or skip this step...
Overview See these URLs for an overview: https://twiki.cern.ch/twiki/bin/view/Atlas/PandaRun https://twiki.cern.ch/twiki/bin/view/Atlas/PandaTools Setup procedur...
Index of other pages ForemanPuppetInitialSetup unorganized notes from initial setup. Mostly you won't need these. HOWTO: Build new host with foreman Mostly se...
How msurxx was setup Create config files in SVN In the ROCKS SVN repo, below hostconfigs, copy msurxii.aglt2.org to msurxx.aglt2.org. Checkout (nominal location...
Introduction A prototype build of a file server in Rocks 5 is described, along with caveats and difficulties. In this instance, the file server is destined to be...
Configuring the Frontends Record of configuration done to frontends. Updated Dec 16th for msurx build. Connecting You can ssh to the frontend as root to perform...
Installing the Frontends The frontends are installed on VMware clients. Note that you must have a valid resolvable IP address and name or the install will fail. ...
Frontend Config in SVN Have a scheme to track frontend config in SVN. A directory structure is created at /var/svn, below here modified configurations are copied....
VMware Hosting of Frontend Wish to run the frontends in VMware ESXi. The primary benefit is server consolidation. This also provides a good way to make a full b...
How to empty all OST on an OSS, then re create the underlying Lustre file systems Motivation The underlying striping for a Lustre OST, as seen in the mail list, ...
Procedure for rebuilding a compute node In general, compute node rebuilding is fairly easy and the ROCKS should be maintained so that compute nodes can be rebuild...
(Re)Configuration of gPlazma on AGLT2 Due to issues with SRM failing that were traced to probable issues in gPlazma we are planning to implement some changes to g...
MSU Hummm... Normal procedure is to plug keyboard/monitor into node and see if there are any kernel messages on screen. On Dells also note errors from LCD. U M ...
Background Info: Unexpected Power Loss on file servers During backup generator test on 12 may 09 at the MSU BPS bldg , most UPSs received an errant EPO (Emergenc...
Recovering from a Lost Pool When we lose a pool we need to do a number of things to recover. Once we determine we have really lost the pool we will need to find t...
Reinstallation of Oracle for the Muon Calibration Center On May 2nd, 2009 our primary Oracle server (umors.grid.umich.edu) was compromised because we had forgott...
* SwitchAccess including how to find where a node is on the network * NodeConsoleAccess Including via KVM and IPMI/DRAC * NodePowerControl including PDUs an...
Removing PNFS (Chimera) Ghosts There is the possibility that the chimera DB can become out of sync with the actual files stored on disk. The t_dirs table holds t...
Reworking AGLT2's Logging Setup In upgrading atgrid we have an opportunity to migrate from syslog ng and php syslog ng to something new. The ELK stack (Elasticse...
To build the rocks distribution (in /home/install/ or /export/rocks/install/): rocks create distro To have a node reinstall: rocks set host boot hostname action...
Well, I haven't actually done it, but here's the directions. Just reverse the architectures since we're running a x86_64 cluster and are going to be kickstarting ...
Manually Adding Nodes to Database The insert ethers command used to support manually adding nodes, it no longer does, however this can be performed using the rock...
Intro Here are listed releases or "tags" of the ROCKS installation. Issues with each can be added here. Summer 2011 making an attempt to maintain this page go...
lighttpd service is running on client during install. it matches URLs that have HOST == 127.0.0.1. Then does a redirect of "/install/(. )$" = rocks by.py?filena...
ROCKS Node Info A feature that is weak or missing from ROCKS is a way to add user defined parameters on a per node basis. There is a mechanism for adding user de...
Cross Kickstarting in Rocks 4.3 So you want to cross Kickstart nodes that aren't the same architecture as your front end? Don't worry, rocks can do that, or it's ...
Building a ROCKS client worker node Build Server Status The software revision for ROCKS and CFEngine that are active on the ROCKS frontend are shown here, the ne...
DNS in ROCKS References: * http://rscott.org/dns/ DNS Oversimplified This page written for ROCKS 4.3 with the update given below. ROCKS will manage the config ...
In ROCKS5 whole new scheme for user customization of partitioning. The annoyances with getting custom partitioning done in ROCKS4 seem to be gone we no longer ne...
ROCKS FAQs Introduction FAQs are divided into groups... Database management using ROCKS command Add an appliance This will add new entries in the membership an...
First begin by installing your Frontend by following the directions in the Rocks users guide, make sure you include the service pack roll, as it's required and fi...
To manually add a node in Rocks v5.2 you only really need two commands provided that the appliance type already exists, you could check that it's in the list when...
ROCKS 4.3 Install From Scratch Install log of ROCKS 4.3 and SLC45 on a Dell Poweredge 2950 server and 1950 client. The client will be installed over the network ...
Generic Kickstart Want to perform a network install of a node that won't be a ROCKS client but using the ROCKS frontend as the kickstart server. Have tried and f...
ROCKS This is the main local page for the ROCKS cluster software. Subpages: * BuildingRocksRolls * RocksAglReleases Notes on configs used in production ...
ROCKS MySQL Database ROCKS stores configuration information in a MySQL database. Normal operations on configuration are performed with the rocks command, but thi...
Configuring to PXE boot Servers from the Rocks 5.5 HeadNodes From Ben on 10/10/2012, the following procedure can be used to PXEboot a machine into SL6 via Rocks 5...
Using RCS in ROCKS and How ROCKS Uses RCS ROCKS uses RCS on files that are written or appended with the file tag in kickstart xml. This provides some possibili...
ROCKS Site Sync We have two ROCKS clusters and wish to keep their configurations synchronized. This page will describe how to do that. Note that a closely relat...
Managing the ROCKS Installer with Subversion See local subversion pages at Subversion Creating a Branch or Tag SVN root@msurox /home/install # svn copy m "c...
Update Installer Kernel Warning this is a cludge. Darn seems to work fine with the r610 hardware, but on the existing pe1950s, hardisk doesn't get mounted for rei...
Setup and Running ATLAS Software (from Ed Diehl email) I have found in the past that the validation scripts have errors themselves, or there are other obscure pro...
SEC.pl (System Event Correlator) There is a nice two part article on SEC which describes how it works and what it provides. I encourage you to look it over. Th...
Creation of gate04.aglt2.org, the SL6 gatekeeper Core dump style... Not bothering for now to make this pretty, just recording the actions taken 2 cd /etc/yu...
Issues Fixed in CFEngine for SL7.3 Upgrade Known Issues Need to limit yum output on overnight updates so that so many Emails are not sent. The update_dell_firmwa...
AGLT2 SRM Hangs Starting in late April 2009 AGLT2 was having more and more dCache/SRM issues. One problem that significantly increased in frequency was SRM faili...
Security Planning Config Changes to Tighten Security Ideas from June 26 meeting: * Firewall changes, see SystemInstallChecklist * See below...implement...
Tier2 Services at UM Services for Tier2 job submission and remote monitoring are distributed across several physical machines at UM. Below is a breadown of what ...
Setting up ATLAS Area HOTDISK for AGLT2 As of mid September 2009 we need to provide a new space token area in Tiers of ATLAS call 'HOTDISK'. There are a few com...
Athena can be tricky to set up and run under your user account. These are some minimal directions to follow. The ATLAS Computing Workbook is chock full of helpfu...
In order to setup your GRID Certificate, you need to have already completed the initial steps of requesting the certificate, registering for membership in the ATL...
Setup and Configuration of the AGLT2 MD3820i This details our installation and configuration of our new MD3820i (UMVMSTOR03). We received both units on August 4th...
Installing gssklog/gssklogd on our cluster We have user home spaces (including grid "group" accounts) in our AFS cell (atlas.umich.edu). Currently any user tryi...
Setup and Configuration of the AGLT2 MD3600i July 25, 2011 This details our installation and configuration of our new MD3600i (UMVMSTOR02) plus MD1200 shelf. W...
Setting Up OMD on AGLT2 Systems Monitoring for AGLT2 has used lots of different software: Ganglia, Syslog ng, Cacti, Nagios, Shinken, Rancid, Monit/MMonit, OpenMa...
Setting up SSH Keys for AGLT2 SSH is able to use a variety of methods for authenticating users. Each method has security strengths and weaknesses. The normal user...
Notes on Building ShawnGenerator Below are the series of steps I used to create my "ShawnGenerator". This generator is based upon source code from Loek Hooft va...
HowTo Shutdown a Pool Node While In Production We sometimes need to restart/reboot pool nodes and would like to make this as least disruptive to the production sy...
Using Slony to Replicate dCache Postgresql DBs We have been running postgresql 9.0.x on our dCacheheadnodes for almost two years. Currently we have have the follo...
Running ATLAS software on SLC43 x86_64 We had installed the mars01.cern.ch mars05.cern.ch systems with SLC43 and an experimental kernel. Though we could install ...
Solaris Installation notes A lot of this was compiled and figured out via extensive reading of this forum. There are a number of confusions and misleading direct...
Startup and Shutdown This page will describe the shutdown and start up procedures for the cluster Unplanned power problems This will be a discussion of how to gr...
How to submit test jobs in panda from umt3 interactive nodes 1 get permission to access BNL CVS BNL CVS is a mirror to Cern CVS, it is a readonly CVS, you can us...
Subversion Subversion is a software revision control system designed to be an improvement on CVS. It generally replicates the features of CVS. References * T...
Proxy server This system has no access to outside world, and in general systems can't talk to non edu locations. To be able to download things, use proxy server. ...
Hardware notes here...little to say about setup. As the manual also says, hit appropriate keys during boot and configure the management/IPMI card using an exter...
Operational notes Setup zfs filesystems and quotas for users Mkdir as usual: mkdir /atlas/data19/bmeekhof chown bmeekhof:umatlas /atlas/data19/bmeekhof Set quota...
This is now obsolete thor01 now runs FreeBSD. See SunX4540ConfigFreeBSD. Zpool configuration Destroy existing pools (pool1 through pool4): zpool destroy pool1 T...
Info about actually using the 4540 is at SunX4540ConfigSolaris Migrating pre installed X4540 Solaris 10 to boot from flash drive NOTE: There's probably no need ...
Testing of New dCache Storage Node Build Here is the testing result for a new storage node (summary: it works). After running \x91install_dcache.sh\x92 on UMFS16...
Getting the MSU site up in ROCKS Had some initial difficulty getting the compute nodes installed. Restarted from scratch with the following plan: * Clean up /...
Installation Details for VDT v1.6.0 We are trying the VDT v1.6.0 installation on gate01.grid.umich.edu following instructions at the VDT 1.6.0 release note page. ...
This document helps the UM Tier3 users to diagnose their condor job problems. Submitting Machines Tier 3 users can submit their condor jobs from the following ma...
0. Useful webpages This webpage is a mix between a tutorial and reference. If you are just interested in a quick overview of useful condor commands, just google "...
Setup and Configuration of USATLAS Tier 3 Queue Background To assist Fred Luehring in testing remote, Tier 3 job queues for USATLAS, we have set up a test PanDA ...
Using TortoiseSVN with Putty/Pageant for SVN access to AGLT2 SVN server Prerequisites Install TortoiseSVN from http://tortoisesvn.net/downloads.html Setup putty a...
Track Transfers to dCache Just some quick notes on tracking specific transfers to dCache. * First, make sure the SRM logging is verbose enough (going to catali...
Trouble Atlas Atlas Analysis Job mishandled OSG APP Paul, Bob, OSG_APP should be "/atlas/data08/OSG/APP". "atlas_app/atlas_rel" are subdirectory created when i...
Athena Code Checkout Overview At Michigan we have Athena "kits" installed which have only Athena binaries located at /afs/atlas.umich.edu/atlas/software/kits . ...
Athena Software Setup at Michigan Overview We have installed an Atlas software "mirror" and the Athena "Multi" installation of all 12.X.Y Athena versions on the ...
Athena Kit Installation Technical Details Overview This document gives the details about how the Michigan Athena kit installation is installed and how to update ...
Michigan Computers Overview The Michigan computer cluster consists of several interactive machines, and 2 condor batch queue clusters. Here is the current list ...
Condor Setup at Michigan Overview Condor is University of Wisconsin system to run batch jobs on CPU farms and/or random groups of desktop machines. Condor jobs a...
Copying files to/from CERN AFS copying from AFS with 'cp' If you get AFS tokens (with klog cell CERN.CH) you can copy files directly from CERN AFS space with sp...
Pacman Notes Overview pacman is a "package manager" which is used to install Athena Kit Distributions. This document gives some additional notes about how to us...
SSH password free login Introduction SSH offers the possibility to login without using passwords using shared keys. Not only is this more convenient than using ...
Installing and Using X2Go for UM T3 Users This page describes how to get the X2Go client software, install it on Windows (explicitly, it will likely also go on a ...
Michigan Test Page This page is to the responsivenss of the wiki. It has no other purpose. * My text goes here * Let us DO another bullet A. Item 1 A. ...
Tier3 for Users For information on using ATLAS software please see this section of our index page: WebHome#AGLT2_User_Information Information here includes how to...
UMATLAS yum repository NOTE: As of January, 2018, sysprov02, an SL7 VM, has replaced sysprov01, and sysprov01 has been shut down. All refs to sysprov01 below have...
Update Kerberos on our Servers The kerberos servers were installed long ago when DES was the primary encryption. We need to change to using newer more secure algo...
How to update OSG and condor ce on gatekeepers The following steps should be first tested on gate03, if it works, then do it on gate01/02 Please note: the gate ke...
Updating LFC for AGLT2 The LFC host for AGLT2 is lfc.aglt2.org. This is a VMware VM (SL5.2/x86_64). As of September 13, 2009 the LFC software was installed in /...
Updating umrocksi.aglt2.org to the most recent SL5.4 rpm set Following is a description of the process followed to ready a Rocks 5.3 client node built with the SL...
On December 23, 2005 I "upgraded" umfs01.grid.umich.edu from the i686 (32 bit) version of Scientific Linux 4.1 to the x86_64 (64 bit) version of Scientific Linux ...
Updating and Upgrading dCache Headnodes for AGLT2 June 2013 The dCache headnodes head01.aglt2.org and head02.aglt2.org were transitioned to VMware VMs in 2012. As...
Upgrade Planning for AGLT2 SL5 Systems Introduction We need to upgrade our remaining SL5 systems to SL6 soon. We should use this page to track which systems stil...
Reconfigure dCacheConfig to use both Cell and Module As noted before, the preferred gPlazma configuration uses the cell method but it has been pointed out that th...
Useful Links This page will link to many pages useful for day to day administration Monitoring * Ganglia Monitoring * PerfSonar (latency) (bandwidth) * ...
Intro This is a guide for using Athena on the Tier 3 that all users should follow. Useful webpages: * ATLAS computing workbook * nice introdution, a bit ol...
Using CVMFS at AGLT2 General Information CVMFS is a new method of distributing ATLAS software that relies on using central repositories of software on servers lo...
Using "Monit" for Monitoring and Repairing AGLT2 Services NOTE: THIS PAGE IS NOW MOSTLY OBSOLETE, WITH MONIT INSTALLED VIA CFENGINE The monit application monitors...
Using OMD and GLPI for AGLT2 We have some nice tools installed to monitor our systems and software (OMD/Check_MK) and track the resolution of problems (GLPI). It ...
Want to do test installs nodes in a VMWare ESXi guest. Expect that more things can be made to work similarly to an install on a physical host, but expect that th...
VMWare Setup and Updates This page should keep track of VMware related setup/updates and information. Update to vSphere 5.1 This section will document the detail...
VMWare vSphere Upgrade at AGLT2 In September 2012 VMware release a signficant upgrade to vSphere: 5.1. There are a number of nice features that we want to benefit...
Video Conferencing Help Asking for help or suggestions Email: aglt2 umich #64;umich.edu 348 West Hall Howto Guides Set outputs for each screen On the "HDMI ...
Virtuozzo Information and HowTo We have been testing Virtuozzo on our new virtualization hardware. Virtuozzo runs multiple "servers" on a host system, sharing t...
WLCG Accounting for Tier 2 Sites This page contains some plots showing WLCG Tier 2 accounting results for Tier 2's worldwide. Currently the plots are only availa...
Atlas Great Lakes Tier2 Web How to contact us * For problems, contact us through our signatures here: * Main.WenjingWu AGLT2 Manager and University of M...
Installing and Using X2Go We are dropping the Remote Desktop machine aglt2rd, and replacing it with a Linux machine set, starting with bridge um at the UM site, a...
Tests of ZFS on SunX4540 running FreeBSD 9.0 For setup notes see SunX4540ConfigFreeBSD For nearest comparison of similar system running Solaris see BenchmarkOnX4...
Introduction xCAT is a cluster management tool originally developed at IBM and now Open Source. xCAT v1 was rewritten with much of the same functionality but a n...
Installing Follow the Install chapter of "Top Document" xCAT2top.pdf xCAT is installed with the command: yum install xCAT (There's some stuff to do before and aft...
yum cron Configuration in SL7 Un modified yum cron ALWAYS sends emails upon completion. This is an overwhelming flood given the number of systems we have. We th...
Using ZFS on Linux for AGLT2 AFS Fileservers Recently ZFS on Linux became available. ZFS has lots of nice features including Copy On Write (COW), data integrity v...