Setting Up OMD on AGLT2 Systems
Monitoring for AGLT2 has used lots of different software: Ganglia, Syslog-ng, Cacti, Nagios, Shinken, Rancid, Monit/MMonit, OpenManage
and various home-made scripts and applications. Recently I became aware of Open Monitoring Distribution (OMD) which is an open-source project bundling a number of monitoring components (based upon Nagios) into a single, easy-to-install distribution. Details at http://omdistro.org/start
Compnents are summarized (from a 'Check_MK' perspective) in the following diagram. Each box describes what the components primary functions are:
We have used Nagios for DYNES monitoring and have played with Shinken/PNP4Nagios as a possible replacement for Cacti. To try this out at AGLT2, I decided to repurpose the VM shinken-ha.aglt2.org and try an install
On December 3, 2013 I renamed shinken-ha.aglt2.org to omd.aglt2.org. I requested the forward/reverse DNS changes via the Merit portal and uninstalled shinken from the VM.
To install OMD
I just needed to copy the current RPM from http://files.omdistro.org/releases/centos_rhel/ omd-1.00-rh61-30.x86_64.rpm
I also did some work on the VM (updated tools, VM hardware and run 'yum update')
Then did 'yum install --nopgpgcheck omd-1.00-rh61-30.x86_64.rpm
I found a problem with OMD on CentOS6
.4 documented here: http://blog.christian-stankowic.de/?p=5312&lang=en
After patching the binary things worked fine.
supports more than one site per installation. I wanted to setup a new site named 'aglt2' BUT the setup tries to create a new user and group with the sitename and the group aglt2 already existed. So I setup the site 'atlas' via 'omd create atlas'
The default login on OMD
is 'omdadmin' with password 'admin'. I set this to one of our admin pw ('S').
can have different "core" processing element depending upon how you configure it (see 'check_mk' diagram below). This gives a pretty good overview of the components in OMD as well.
Reconfigured to use Shinken
instead of Nagios
) : 'su - atlas; omd stop; omd config; (respond to prompts/menu options) omd start'
Lots of configuration information is at http://mathias-kettner.com/checkmk.html
Installed the 'check_openmanage' plugin on omd.aglt2.org (see http://folk.uio.no/trondham/software/check_openmanage.html
Our site is protected against non-root use of cron so I neeeded to add any new 'site' users to /etc/cron.allow. Since we created a site called 'atlas' we need to add 'atlas'to /etc/cron.allow. This is now in CFengine3.
Need to open two ports for remote access: 443 (https) and 57767 (shinken) (Also added to CFengine3)
Basic steps to configure starting as 'root':
- su - atlas (become the site user)
cd /omd/site/atlas (this is the 'root' of the site)
- cd etc/check_mk (this is the location of the confguration for check_mk)
- Files for check_mk configuration end in .mk. See
/omd/site/atlas/etc/check_mk/cron.d for examples.
The command to inventory
is 'check_mk -I <hostname>
' but <hostname> needs to be already defined in check_mk config files.
The command to reload is 'check_mk -O
' (after config changes).
Agent Installation on AGLT2 Systems
The OMD configuration benefits strongly from Check_MK. The Check_MK component has a set of rules and agents that can inventory and setup test-monitoring for a number of different systems and applications. To benefit from this we need to install the check_mk agent RPMS on our systems. As of December 4th we have the following two RPMS installed cluster-wide::
We may need additional "local" agents installed to properly monitor various databases and other applications. See next section.
Issues for AGLT2 (Mis-configuration, False Problem Detection, Missing Functionality)
You can login to the AGLT2 'atlas' site at http://omd.aglt2.org/atlas
This is a front page which lets you select which Web interface you want to go to. There is also a Shinken page at http://omd.aglt2.org:57767/
As we began to add hosts and tests we found a few issues that were NOT problems with the hosts or services at AGLT2. Some tests make wrong assumptions or have bad default checks that indicate problems when there are no real problems to address. We need to tune our site setup to fix this false-positives. Please see the FixOMDFalsePositives
page for the current list of issues, and, when known, their solutions.
- 09 Dec 2013
- Check_mk components and their functions: