You are here: Foswiki>AGLT2 Web>UsingMonit (10 Dec 2013, BobBall)Edit Attach

Using "Monit" for Monitoring and Repairing AGLT2 Services


The monit application monitors and "repairs" host and service problems. It is easy to deploy and configure. If you have the DAG repo setup in yum simply do:
yum install monit

Monit has many built-in options for testing resources and services. See man monit once it is installed for an overview.

"Monit" Configuration

The default setup installs the current system with 'monit'. The default is to create a 'monit' service (which is chkconfig'ed off) but it is more robust to remove this service and use inittab:

chkconfig --del monit

Then edit /etc/inittab and append something like:

#+SPM monit daemon 09Apr2009
mo:2345:respawn:/usr/bin/monit -Ic /etc/monit.conf

Before "starting" this we need to fix the default configuration.

The config file is /etc/monit.conf and the following lines are the ones to make sure are present (and suitably customized for your install):
set daemon  60
set logfile syslog facility log_daemon
set mailserver,,, localhost
set eventqueue basedir /var/monit slots 100
set alert
set httpd port 2812 and use address 
    allow admin:<pw_removed>
    ssl enable
    pemfile /etc/grid-security/monit.pem
check system if loadavg (5min) > 4 then alert
include /etc/monit.d/*

Some quick comments on the options in the monit.conf file above. First you need to
set mailserver <Your_smtp_server>
to be an appropriate and accessible mail server from this host. As you can see you are allowed to provide a list of servers. The
set alert <email_address>
should be configured to use an appropriate email destination.

The set httpd line needs to be setup for this host. Put in your own password for the 'admin' user. NOTE: protect this file so only 'root' can read it! You can also specify the hosts/subnets which are allowed to connect. To enable "ssl" you add ssl enable but NOTE this requires a pemfile line (as shown). If you have host certificates already you can create the 'monit.pem' file as follows:
  • Copy the hostkey.pem cp /etc/grid-security/hostkey.pem /etc/grid-security/monit.pem NOTE doing this gives the monit.pem file the right protection.
  • Add the hostcert.pem cat /etc/grid-security/hostcert.pem /etc/grid-security/monit.pem

The check system line also needs to be customized using the install host name. The last line "includes" whatever other configurations you want to apply to this host. This is nice and creates a "plugin" environment where you can supply common service, device or resource configurations that can be easily shared between monit configurations.

To start monit via the inittab simply do
telinit q

Managing Monit on AGLT2 Nodes

The 'monit' service is very persistent and if you turn off services it is monitoring it will quickly restart them (and/or alert on that fact). If you need to change the state of a service "manually" be sure to reconfigure monit to disable monitoring for that service. This can be done via the web interface (see list below) or via the monit command line interface.

Here are some useful monit commands:
monit -t                     # This tests the current configuration's syntax for validity
monit status               # Gives information on monit's status (details of what it is monitoring and their status)
monit unmonitor <x>  # Turns  off monitoring for <x>
monit reload               # Reload the (updated?) configuration
monit -h                     # Get list of commands possible 

Also if you update or change services that 'monit' is watching you may need to update the corresponding configuration in /etc/monit.d/. If you don't AND something about the service configuration is different after your change, 'monit' may complain or fail to properly handle this service until you fix the config.

Current "Monit" Service/Resource Configurations

For AGLT2 we are primarily monitoring the following services and resources:

  • MySQL via a msyqld.conf configuration. Needs customization for the PID file, MySQL port and MySQL socket. Will restart the 'msyql' or 'mysqld' service as required if the service fails.
  • ntpd via a ntpd.conf configuration. This one is fairly generic and shouldn't require customization. Checks the ntp service directly on udp port 123 as well. Will (re)start ntpd as required if it is not running or fails.
  • Root filesystem via filesystem.conf configuration. This is also generic and shouldn't require customization. Monitors the '/' filesystem and alerts if the flags change (e.g. changes to RDONLY) or if the disk usage goes over 98%.
  • LFC via lfcdaemon.lfc configuration. Monitors the lfcdaemon process and the lfc log file. Can restart the lfcdaemon if either the CPU usage is > 80% or the log file has not been updated in 60 minutes. Alerts are sent if CPU usage > 60% or the log file isn't changing in 5 minutes.

We need to create additional configurations for the following:

  • httpd and/or apache
  • globus-gatekeeper
  • tomcat-55
  • dCache services --- there is a large list of possibilities here

List of "Monit" URLs for AGLT2

NOTE: These are only accessible from AGLT2 IPs!

  • linat02 monit services (AFS DB server, GUMS server, NIS/KRB5 server)
  • linat03 monit services (AFS DB server, GUMS server, NIS/KRB5 server)
  • linat04 monit services (AFS DB server, GUMS server, NIS/KRB5 server)
  • linat05 monit services (Web Server, CFengine server)
  • linat06 monit services (AFS File server)
  • linat07 monit services (AFS File server)
  • linat08 monit services (AFS File server)
  • gate02 monit services (Globus Gatekeeper)
  • gate01 monit services (Globus Gatekeeper)
  • lfc monit services (AGLT2 LFC server)
  • dq2 monit services (AGLT2 DQ2 server)

-- ShawnMcKee - 09 Apr 2009
Topic revision: r5 - 10 Dec 2013, BobBall
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback