Translate this

    Translate to:

The back catalog

Simple Neverfail monitoring with Zabbix part 1

Background

This is the first of a couple of posts on how I’ve cobbled together some basic monitoring of Neverfail’s  Neverfail Heartbeat H/A software which is also now the basis for VMWare’s vCenter Server Heartbeat. Since Neverfail seems to consider their command lines privileged information I will only cover how to do some simple monitoring using the registry. When starting on this effort internally I was only interested initially in figuring out a quick and simple way to get the info I needed and not so much on the how to get it into something part.

I’ve been working with another team where I work to look at Zabbix as an alternative for some of the monitoring we do in our environment. We use Microsoft Operations Manager 2005 (MOM) but haven’t fully cut over from out previous monitoring solution. I had looked at Zabbix earlier as a potential solution for monitoring a bunch of VMware ESX boxes but another team ended up getting tasked with that particular duty. So I had had some experience with Zabbix but hadn’t done too much with it since.

One of the things that’d been rattling around in my brain is using the capabilities of using the zabbix_sender feature/client to monitor some of other components/things we can’t easily get into MOM.  Zabbix_Sender is a utility that is available for use with Zabbix that allows one to “send” information to Zabbix. In my case it was appealing because we’re already running two different monitoring agents on the Exchange servers where we have Neverfail installed.  Since I only wanted to use Zabbix to monitor a small set of data related specifically to Neverfail zabbix_sender lets me do that without having to run the fullblown zabbix_agent as a service on the boxes.

Getting the Data

Neverfail (at least the versions we have installed) doesn’t obviously expose performance data. However if you look in the registry on each Neverfail server you will find some registry values (see HKLM\Software\Neverfail\R2\Performance) that get updated on a regular and frequent basis that correspond to data presented in the Neverfail GUI . Because of the way Neverfail works some of this data (Unsafe Queue info) is available on the Active node and some of it (Safe Queue info) is in the registry on the Passive node. This presents a couple of issues when trying to put together the solution (at least in my environment).

The first of these is trying to find a single consistent way to get the data out of the registry, especially since all the counters involved are of the REG_DWORD_BIG_ENDIAN variety (you can see a previous entry related to BIG_ENDIAN here).  I ended up settling on using the Reg.exe util available in Windows.  This utility let’s you manipulate the registry locally and remotely. While it doesn’t necessarily deal happily with REG_DWORD_BIG_ENDIAN (RDBE) entries in the registry it is able to extract the data which we can then manipulate to get the correct values.

As an example if I have the following two values in the registry as shown by RegEdit

reg_example_01

When I run reg.exe I get the following output…

reg_query_rdword_rdbe

So while Dword_example and DWORD_BE_Example nominally have the same value reg.exe doesn’t get the data out correctly for the latter. However as I said earlier once we have the data out we can actually do some magic to get the right value.

We can also use reg.exe to get values on a remote machine (i.e. our Passive Neverfail node) by pre-pending the host info to the query registry path. So in this case to reach the passive secondary node over the private channel at 10.0.0.2 I can do something like  reg.exe Query \\10.0.0.2\CRT_CORP\Performance. Testing this out leads us to  a second issue. Getting an  ”Acces is denied” error.

reg_error_access_denied

Since my passive Neverfail node is essentially off-net but still thinks the network cables is live I can’t use a domain based account to run the reg.exe command because it can’t contact a domain controller to authenticate my domain account. However if I use the local Administrator account which has a common password on both nodes I can get this work just fine. (It may be possible to use an account other than the local Administrator but in my case where I also run some Neverfail command lines I need an account that’s authorized in Neverfail)

reg_remote_as_admin

Given this info I was able to put together a vbscript that takes two arguments: a reg path and a value name;  and it returns the data value to the console converting REG_DWORD and REG_DWORD_BIG_ENDIAN to the correct decimal value. Using this script it’s then possible to get  any of the counters we’re interested in on either the active or passive node.  So based on the example above where I ran reg.exe hklm\software\CRT_CORP\Performance /s we can run the script for each of the values and see that we do in fact get the right decimal value for each one.

getregvalue_example_01

So now the trick is to figure out which of the registry based perf values we want to use and which host we need to draw them from.  Each of the Neverfail nodes has the same set of values present even though they’re not all populated the same way. That is to say that the counters related to the Safe Queue are not updated on the Active node since the Safe Queue exists on the passive node. And the converse is true with regard to the UnsafeQueue counters.  As I was mostly interested in alerting related to an issue we have occur occassionally I really wanted to get the SafeQueue and UnsafeQueue related counters (OldestSafeUpdateQueueEntry, SafeUpdateQueueSize etc). But since the other counters are also equally easy to get I decided I to include several more.  The image below shows the available values.

nf_perf_reg_values

So now that I have a simple way of getting the information I want I can focus on how to get it into whatever system I want to monitor with whether it’s Zabbix (now) or Systems Center Operations Manager 2007 (later).  In the next article(s) I’ll talk about setting up the Zabbix part of this monitoring.

Acknowledgement: The hex to decimal routine in the GetRegValue.vbs script is lifted directly from http://www.sonofsofaman.com/hobbies/code/hextodec.asp Thanks to Joel for keeping me from having to reinvent the wheel. -crt

Addendum: While traipsing through the registry in figuring this stuff out I also discovered that there’s a bunch of configuration information stored in a whole different key under HKLM\Software\Javasoft\Prefs\neverfail\current\* It’s also possible to watch a few entries here to help monitor the  file and registry synchronization status even though it’s not as granular/descriptive/timely as can be obtained by using the command line.

The two items I’ve found that might be of interest are the /Registry/State/Manager\/Status Key and the /Value entry

reg_java_prefs_reg

and  the /New/File/State/Mgr\/Synchronization/Status key and /Tag entry

reg_java_prefs_file

Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Posterous
  • Google Bookmarks
  • LinkedIn
  • Twitter

1 comment to Simple Neverfail monitoring with Zabbix part 1

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>