Ganglia

See:

MSU Plan

ROCKS 4

Wish to separate machines at MSU into different groups by function.

Group IP DescriptionSorted ascending
MSU OSG 239.2.11.65 DZero OSG compute node at MSU
MSU Server 239.2.11.69 Servers at MSU that don't fall in other catagories
AGLT2 at MSU 225.31.5.18 This is the default from ROCKS
MSU T2 239.2.11.61 Tier2 compute nodes at MSU
MSU T2 Storage 239.2.11.63 Tier2 storage nodes at MSU
MSU T3 239.2.11.67 Tier3 compute nodes at MSU

A gmetad is running on msurox, the groups don't run their own gmetad just gmonds and have the setting "deaf = no" so that they listen to (and repeat) their peers.

Gmetad

Gmetad on msurox needs to query gmonds that are in each group to fetch data. The gmond on msurox is not in all the groups... This also allows us to specify multiple gmonds per group --- useful if a node is down.

=== root@msurox /home/install/extras/ganglia > more /etc/gmetad.conf
# gmetad.conf for msurox

# our grid
gridname "USATLAS"

# remote clusters
data_source "AGLT2 at UM" 10.10.1.3:8649
data_source "UM ATLAS Servers" 10.10.1.8:8649
data_source "UM ATLAS Storage" 10.10.1.27:8649 10.10.1.28:8649

# local clusters
#no nodes should be in the default (ROCKS) group AGLT2 at MSU
#data_source "AGLT2 at MSU" localhost:8649
data_source "MSU OSG Compute" cc-112-32.local:8649 cc-113-33:8649
data_source "MSU T2 Compute" cc-102-1.local:8649 cc-104-1.local:8649 cc-106-1.local:8649
data_source "MSU T2 Storage" msufs01.local:8649 msufs02.local:8649
data_source "MSU T3 Compute" cc-113-1.local:8649 cc-113-2.local:8649
data_source "MSU Server" msu2.local:8649

ROCKS 5

Still not ideal

Really want to have multiple clusters. There needs to be one (or more) gmonds per cluster that aggregate info (they need to not be "deaf") and get queried by gmetad. It seems that a single gmond can only agrigate the information for one cluster --- the on it is in. If other nodes talk on the same multicast channel or do unicast to the gmond, their info will be included, even if they are nominally in different clusters.

ROCKS5 has put unicast sending within the cluster in the default config.

So, need multiple listening gmonds. Can place them on different nodes, one per cluster, or could perhaps place them all on the frontend, but give them different multicast addresses to use and different ports to use for unicast. For now, am using the first option, see table below.

Gmond

Will have ROCKS 4 frontend running plus msurx (production) and msurxi (test) ROCKS 5 frontends. Plan is that the same group names will be used in ROCKS 5, but the groups will be in different (new) multicast ranges. The MSU T3 and MSU OSG groups will be combined to just MSU T3.

Group Multicat IP Unicast IP gmond.conf filename Description
AGLT2 at MSU ROCKS 5 224.0.0.4 10.10.128.11 None, from ROCKS This is the default from ROCKS on msurx
AGLT2 at MSU ROCKS 5 Test 224.0.0.4 10.10.128.12 None, from ROCKS This is the default from ROCKS on msurxi
MSU T2 239.2.12.61 cc-117-1.msulocal gmond.conf-msut2 Tier2 compute nodes at MSU
MSU T2 dCache Pool 239.2.12.63 msufs01.msulocal gmond.conf-msut2pool Tier2 dCache pool nodes at MSU
MSU T3 239.2.12.67 cc-115-1.msulocal gmond.conf-msut3 Tier3 compute nodes at MSU
MSU Server 239.2.12.69 msurx.msulocal gmond.conf-msuserv Servers at MSU that don't fall in other catagories
MSU Test 239.2.12.71 msurxi.msulocal gmond.conf-test Systems installed from msurxi

The ROCKS 5 implementation of above sets an attribute in the database named "gmond_conf" that hows the gmond.conf filename.

Gmetad

Gmetad on msurox needs to query gmonds that are in each group to fetch data. The gmond on msurox is not in all the groups... This also allows us to specify multiple gmonds per group --- useful if a node is down.


# gmetad.conf for msurx

# our grid
gridname "USATLAS"

# remote clusters
data_source "AGLT2 at UM" 10.10.1.42:8649
data_source "UM ATLAS Servers" 10.10.1.8:8649
data_source "UM ATLAS Storage" 10.10.1.27:8649 10.10.1.28:8649

# local clusters
#no nodes should be in the default (ROCKS) group AGLT2 at MSU
#data_source "AGLT2 at MSU" localhost:8649
data_source "MSU T2 Compute" cc-102-1.local:8649 cc-104-1.local:8649 cc-106-1.local:8649
data_source "MSU T2 Storage" msufs01.local:8649 msufs02.local:8649
data_source "MSU T3 Compute" cc-113-1.local:8649 cc-113-2.local:8649
data_source "MSU Server" msu2.local:8649

On msurxi have these data sources:

hum not sure how this will work on msurxi. Will have issue that we don't know a gmond to query --- nodes will come and go. Probably need to move the gmond on msurxi in the multicast for "MSU Test".

data_source "AGLT2 at MSU ROCKS 5 Test" localhost:8649

udp_send_channel

In ROCKS 4, the default gmond setup had:

/* UDP Channels for Send and Recv */

udp_recv_channel {
        mcast_join = 225.31.5.18
        port = 8649
        mcast_if = eth0
}

udp_send_channel {
        mcast_join = 225.31.5.18
        port = 8649
        mcast_if = eth0
}

In ROCKS 5 this is altered to:

/* UDP Channels for Send and Recv */

udp_recv_channel {
        mcast_join = 224.0.0.4
        port = 8649
}

udp_send_channel {
        host = 10.10.128.12
        port = 8649
}

Removing Spikes

Had an issue with Broadcom NIC driver giving impossibly large numbers from counters, this resulted in spikes in ganglia (rrdtool) graphs. Try googling "rrdtool remove spikes". There is a perl script on the rrdtool website that automates removing spikes from the rrdtool databases. See this blog http://acktomic.com/?p=6

Anyways, tried this by hand doing rrdtool dump then editing the xml by hand and then doing rrdtool restore. Did it work???

Have removespikes.pl from http://oss.oetiker.ch/rrdtool/pub/contrib/ Needed to modify to handle filenames with spaces. Using it using a threshold level instead of a percentage heuristic. Modified copy is in tools directory.

Alternate and More Permanent Solution

Another solution for the network rate spikes is to set the min and max parameters in the rrd database files (Ganglia doesn't set values for these when it creates the rrd files). Then you can dump and restore the data. The restore process respects the max values. This will also prevent spikes in the future.

Example of doing it

[root@msurxi ~]# rrdtool tune /var/lib/ganglia/rrds/AGLT2\ at\ MSU\ ROCKS\ 5\ Test/cc-117-1.msulocal/bytes_in.rrd --maximum sum:1000000000

[root@msurxi ~]# rrdtool info /var/lib/ganglia/rrds/AGLT2\ at\ MSU\ ROCKS\ 5\ Test/cc-117-1.msulocal/bytes_in.rrd 

 filename = "/var/lib/ganglia/rrds/AGLT2 at MSU ROCKS 5 
 Test/cc-117-1.msulocal/bytes_in.rrd"
 rrd_version = "0003"
 step = 15
 last_update = 1259094985
 ds[sum].type = "GAUGE"
 ds[sum].minimal_heartbeat = 120
 ds[sum].min = NaN
 ds[sum].max = 1.0000000000e+09
 ds[sum].last_ds = "13513.02"
 ds[sum].value = 1.3513020000e+05
 ds[sum].unknown_sec = 0

Seems fixed in ROCKS 5.2.2 - Nope

Have seen spikes in ROCKS 5.2.2, code mentioned below must not be enabled.

There is code in the gmond from ganglia 3.1.2 that rejects network counts above certain thresholds. This was put in to work around this issue. It seems that the gmond that is in ROCKS 5.2 service pack 5.2.2 has this code enabled. We shouldn't see these anymore.

Private Network

The options in gmond.conf don't seem to be effective in making gmond use eth0. Adding a route to the system for the multicast addresses on eth0 is effective.

# route

224.0.0.0       *               240.0.0.0       U     0      0        0 eth0

Setting for bonds

Nodes dropping in MSUROX

Have continuous problem with nodes not being included in collection.

Have switched to using unicast for metric sending, seems to resolve issue.

Note that the prblem may be iptables/igmp/kernel version related, have not investigated this though.

-- TomRockwell - 01 May 2008
Topic revision: r12 - 01 Dec 2009, TomRockwell
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback