Translate this

    Translate to:

The back catalog

Nagios Monitoring of ESX 3i on our PowerEdge

In earlier posts when I mentioned the use of IPMI I don’t think I ever circled back around on where we stood with that. While we can get some information about the status of the hardware via IPMI I haven’t been able to figure out how to get at the array controller to get the status. I’ve looked at sites like Nagios Exchange and some others, but none of those had what I was looking for. I found one script that came close but would only tell me when the RAID disk was actually rebuilding, not when it was degraded. If we were to use RAID5 it might have been useful, but since we’re talking about going forward with RAID 6 to minimize our window of vulnerability and we’re using large disks. So that’s why I went back and looked at the VMware SDK and VI-Perl toolkit.

Now that I have a script that will work from the command line I needed to take it and make it work for Nagios since that’s what we’ll probably end up using to monitor these servers once they’re deployed. (We’ll look at Dell’s IT Assistant again since we’re a dot version or two behind but I’m not holding out high hopes.)

While the simple version of this script will tell us when a disk is missing (i.e. we pulled one out to test it) it won’t tell us which one is missing but looking at the VI client can provide a clue. If we look at the screenshot from the earlier post we can see Disks 0-9, but that Disk 7 is missing from the list. That’s the one we pulled. If the disk were bad my assumption (yet to be tested/seen) is that the drive would show up with a status of RED. While it’d be possible to figure this out in the script I would need to keep configuration data for each class of server (or even each individual server). I’m more interested in a general purpose script to alert us, and then leave it up to the responsible admin to figure out what’s going on.


One of the nice things that does happen though is when the disk is put back in and the RAID set is rebuilding we do see the new disk that was plugged in because it also starts in a Yellow state until the volume is rebuilt.

Figure 1 – The Nagios View

Figure 2 – The view from the VI client

Anyhow without further ado here’s the script. Again it comes with the usual caveats: it’s rough, it mostly works, use at your own risk etc….

Script arguments are the same as before:

check_3i_storage –password <passwd> –username <user> –server <ip addr/hostname> –datacenter ha-datacenter

I’ve also included the preliminary service and command definitions I’ve used in my test environment for reference.

define service{
use                             generic-service
host_name                       svr-esx-test-01.company.com
service_description             ESX 3i Storage Status
check_command                   check_3i_storage!root!mypassword!
}

define command{
command_name    check_3i_storage
command_line    $USER1$/check_3i_storage --server $HOSTADDRESS$ --username $ARG1$ --password $ARG2$ --datacenter ha-datacenter
}

My next steps would be to clean up the code and see if I can speed it up a little by starting with the host object directly rather than the “datacenter” managed object. It’d also be nice if it checked to make sure this was an ESX 3i box and not 3.5

Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Posterous
  • Google Bookmarks
  • LinkedIn
  • Twitter

1 comment to Nagios Monitoring of ESX 3i on our PowerEdge

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

To comment, click below to log in.