Using OMD and GLPI for AGLT2

We have some nice tools installed to monitor our systems and software (OMD/Check_MK) and track the resolution of problems (GLPI). It seems worthwhile to document how we can utilize these tools in a coordinated way. The OMD installation includes Check_MK which provides some very nice methods to monitor for software and hardware problems in our systems. A nice feature is the ability to add comments, setup downtimes and acknowledge problems using the Web interface in Check_MK. I propose that we use this to "link" to known tickets in GLPI by using the comment field in Check_MK.

In general I think we should try to address ALL CRIT level problems in one of the following ways:
  1. Fix the problem (if possible to do this right away; e.g., clear up disk space, change config, etc.)
  2. Assign a GLPI ticket and then "acknowledge" the problem, inserting the GLPI URL in the acknowledgement comment field (detailed below)
  3. Change the OMD/Check_MK test so this event is not CRIT(ical)
See current OMD status at http://omd.aglt2.org/atlas/check_mk/

Example for Acknowledging Problems

The initial Check_MK screen from omd.aglt2.org is shown below. Notice in the upper left in the Tactical Overview, we have hosts and service problems called out. Let's click on the Service Problems:check_mk_initial-screen.png

Now we get to the ordered list of Service Problems. The default display originally showed all problems in order of severity, with CRIT(itcal) at the top. I have modified this view to first order by "service problem acknowledged" which puts all "acknowledged" problems at the bottom of the screen.

check_mk-service_problems.png

So how do we "acknowledge" a service problem? Or add a comment or put in a downtime? See the "Hammer" icon just under the "Service Problems" title (See graphic below). We need to click that.

check_mk-select_hammer.png

Once we click the hammer, it brings up the page below.

check_mk-acknowledgement.png

There are a few things to note on this page. In the "Acknowledge" section we can add a comment and we also have a few options to select from. Let me describe the options (details from http://serverfault.com/questions/519900/what-is-a-check-mk-sticky-comment-when-acknowledging-a-host-service )
  • sticky Normally, Nagios will notifiy you on each status change:
    1. So if your service becomes "WARN", you get a notification.
    2. You acknoweledge the service now, and will not get another (i.e. perioditc) notification as long as the service stays in the "WARN" state.
    3. If it traverses to "CRIT", you get a notification.
    4. If it goes back to "WARN", you get a notification.
    5. If it then goes to "OK", you get a recovery notification.
    6. After that, acknowledgment is expired since it becomes "OK"
In the sticky scenario, the will be no notifications about traversals between problem states:
    1. So if your service becomes "WARN", you get a notification.
    2. You acknoweledge the service now with the sticky option set.
    3. If it traverses to "CRIT", you get no notification.
    4. If it goes back to "WARN", you get no notification.
    5. If it then goes to "OK", you get a recovery notification.
    6. After that, the sticky setting is removed since it's a property of the acknowledgment - expired since it becomes "OK"
  • persistent If the "persistent" option is set, the comment associated with the acknowledgement will survive across restarts of the Nagios process. If not, the comment will be deleted the next time Nagios restarts.
In general we may want to persist comments. The stick option is by operator descretion. So what does acknowleging do for us?
  1. Acknowledged events move to the bottom of the service problems display, preventing clutter which may cause us to miss new problems
  2. We can use the comment field to tie the acknowledgement to a specific GLPI ticket. See example screen shot below
check_mk-example_acknowledgement.png

I would like to see us use the acknowledgements to tie to specific GLPI ticket URLs.

Configuring the Service Problems Screen

As mentioned above, the default Service Problems screen was modified to put Acknowledged problems at the bottom. How to do this? From the service problems screen:

You need to click on the "Edit View" icon near the top. This brings up the following screen: check_mk-edit_view.png

On this screen we are concerned with two items: Grouping and Sorting. First scroll down to "Grouping" (see screen shot below) check_mk-service_problems-grouping.png

By default there is only one group "Service State". We need to add a second group using "Service problem acknowledged" Next we need to setup the sort order (see below) check_mk-service_problems-sorting.png

On this screen we need to make sure we first sort on "Service problem acknowledged" in the "Ascending" direction. That's it. Click "Save" at the bottom and now we are setup with the new (default) view for Service Problems.
Topic revision: r1 - 12 Jan 2015 - 18:08:24 - ShawnMcKee
 

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback