Auto Test Programs over AGLT2

Cluster Related

PNFS mount point test

Purpose Make sure every computer node has "/pnfs/" mounted , and every gridftp door nodes has both "/pnfs/" mounted and "/pnfs/ftpBase " which is a soft link to "/pnfs/aglt2" exists.

Frequency every 4hours

Alert Sending emails to

cron service (check all computer nodes at UM)
0 0-23/4 * * * /usr/bin/perl /home/install/wuwj_extras/bin/ 2> /dev/null 1> /dev/null all computer nodes and msu fs servers at MSU)
0 0-23/4 * * * /usr/bin/perl /home/install/extras/bin/ 2> /dev/null 1> /dev/null run as wuwj which needs to update the afs token every 30 days (check fs servers at UM, because um fs servers don't allow passwordless root ssh )
0 0-23/4 * * * /usr/bin/perl /home/install/wuwj_extras/bin/ 2> /dev/null 1> /dev/null

Host Cert expiration check

Purpose check the expiration date of all host certs which are stored on

Frequency every day

Sending emails about expiring certs(less than a month to the expiration) to
display expiration date of certs on this web pagemonitor_cert

cron service (check all certs for UM and MSU nodes)
0 5 * * * /usr/bin/perl /home/install/extras/bin/

Dcache Related

check dead pools of dCache

Purpose check if there are any pools whose status is dead

Frequency every 5 minutes

Alert send emails to wenjing and Shawn if any fs pools are becoming dead..

*/5 * * * * cd /root/dcache_adm_script/dCache/check_poolstate;perl


Purpose clean stale db entries from srm database which would stop a user to write a file to dcache with the same name which failed before..

Frequency every 10 minutes

Alert None

*/10 * * * * cd /root/dcache_adm_script/dCache/clean_sp_db/;/usr/bin/perl

srm put/get report/statistics

stats the successful and failed rate of SRM PUT/Get requests within each space token area
classify error messages
rotate srm requests db (delete entries from 4 hours ago)

Frequency every 4 hours

Alert send email to wenjing, shawn and bob if there are any unusual (fatal )failures..

0 0-23/4 * * * cd /root/dcache_adm_script/dCache/srm_err_report; perl


Purpose compare the file numbers from the pool cell and the file numbers registered in PNFS DB, see if any pools failed to register to PNFS

Frequency every day

Alert display the File numbers of each pool cell and registered DB in this monitor pageFileNO_Stat

0 8 * * * cd /root/dcache_adm_script/dCache/stat_fileno_inpool; perl poollist

Stat Usage of Typical Pools

for each space tokens, list all affiliated pools and their usage.
for each fs nodes, list all its pools's group and their usage

Frequency every day

Alert display the stat in this webpageTypical Pool Usage

0 */8 * * * cd /root/dcache_admin_script/stat_dcache_pools;perl





-- WenjingWu - 11 Dec 2008
Topic revision: r3 - 16 Oct 2009 - 20:14:37 - TomRockwell

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback