<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Carlos&#039; Corner &#187; Work</title>
	<atom:link href="http://cars.lostroncos.org/category/work/feed/" rel="self" type="application/rss+xml" />
	<link>http://cars.lostroncos.org</link>
	<description>The tired geek-dad in the corner</description>
	<lastBuildDate>Wed, 12 May 2010 19:46:13 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Simple Neverfail Monitoring with Zabbix part 2</title>
		<link>http://cars.lostroncos.org/2010/02/23/simple-neverfail-monitoring-with-zabbix-part-2/</link>
		<comments>http://cars.lostroncos.org/2010/02/23/simple-neverfail-monitoring-with-zabbix-part-2/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 17:53:36 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[neverfail]]></category>
		<category><![CDATA[zabbix]]></category>
		<category><![CDATA[neverfail for Exchange]]></category>
		<category><![CDATA[neverfail heartbeat]]></category>
		<category><![CDATA[reg_dword_big_endian]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/?p=363</guid>
		<description><![CDATA[Recap
<p>So in the previous post I put together a simple script for getting the data out of a specified registry entry that handled the REG DWORD BIG ENDIAN data type.  In this one I&#8217;ll go over the general process of getting the registry based perf data into Zabbix and setting up alerting based on it.</p>
Setting [...]]]></description>
			<content:encoded><![CDATA[<h2>Recap</h2>
<p>So in the<a href="http://cars.lostroncos.org/2009/05/31/simple-monitoring-of-neverfail-with-zabbix-part-1/"> previous post</a> I put together a simple script for getting the data out of a specified registry entry that handled the REG DWORD BIG ENDIAN data type.  In this one I&#8217;ll go over the general process of getting the registry based perf data into Zabbix and setting up alerting based on it.</p>
<h2>Setting up Zabbix</h2>
<p>I won&#8217;t cover the actual installation of Zabbix here, but before we can put data into Zabbix we need to add the counters/items that I will be populating in the future. The first thing I need to do is determine exactly what those counters are and which of the nodes they need to come from.</p>
<table border="1">
<tbody>
<tr>
<th>Registry Path/Value</th>
<th>Node</th>
<th>Description</th>
</tr>
<tr>
<td>\Neverfail\R2\Performance\CurrentThroughput</td>
<td>Active</td>
<td>Nominally the throughput  of data between the two nodes</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\MegaBytessent</td>
<td>Active</td>
<td># of Megabytes sent</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\MegabytesReceived</td>
<td>Active</td>
<td># of MB received</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\OldestUnsafeupdatequeueentry</td>
<td>Active</td>
<td>Age of the oldest item in the Unsafe Queue</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\UnsafeUpdateQueueSize</td>
<td>Active</td>
<td>How much data is in the Unsafe Queue waiting to be passed to the passive node</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\UnsafeUpdateQueueSize (dup)</td>
<td>Active</td>
<td>Same as above but I want to measure the rate of growth as a possible factor in alerting</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\KBDispatchedFromUnsafeQueue</td>
<td>Active</td>
<td>How much total data has been sent from the unsafe queue</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\Oldestsafeupdatequeueentry</td>
<td>Passive</td>
<td>The age of the oldest item in the safe queue</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\safeUpdateQueueSize</td>
<td>Passive</td>
<td>Size of the Safe Queue</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\safeUpdateQueueSize(dup)</td>
<td>Passive</td>
<td>Same as above but I want to measure the rate of growth as a possible factor in alerting</td>
</tr>
<tr>
<td>\Neverfail\R2\Performance\KBDispatchedFromsafeQueue</td>
<td>Passive</td>
<td>How much total data has been written from the Safe Queue</td>
</tr>
<tr>
<td>\JavaSoft\Prefs\Neverfail\current\/Registry/State/Manager\/Status\/Value</td>
<td>Active</td>
<td>Current status of the registry synchronization.</td>
</tr>
<tr>
<td>\JavaSoft\Prefs\Neverfail\current\/New/File/State/Mgr\/Synchronization/Status\/Tag</td>
<td>Active</td>
<td>Current file synchronization status.</td>
</tr>
<tr>
<td>\JavaSoft\Prefs\Neverfail\current\/Controller\/Is/Primary/Server</td>
<td>Active</td>
<td>Is the active server the primary or not. From this I can tell which node is active.</td>
</tr>
</tbody>
</table>
<p>Because I have multiple Neverfail clusters in my environment I will create a template in Zabbix that has all the necessary counters associated with it that I can then apply to the hosts rather than adding them manually to each host.  Since a host can have multiple templates assigned to it I&#8217;ll also include a new &#8220;application&#8221; called Neverfail to help with separating Neverfail counters from any other counters that might be associated with a host (ex: Exchange counters).</p>
<p>To help with some of the drudgery associated with manually creating all the items, I&#8217;ve provided  <a href="http://cars.home.lostroncos.org/wp-uploads/2010/02/zbx_Template_NeverfailCluster.xml">a version of the template</a> that can simply be imported into Zabbix. The template includes all of the counters from above as well as a couple of basic triggers for alerting.</p>
<p>Here are a couple of short videos that walk through manually creating a template, and importing the one I&#8217;ve provided.</p>
<table border="0" width="550">
<tbody>
<tr>
<td><a href="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_create_template.swf" target="_blank"><img class="alignnone size-full wp-image-493" title="Creating a template" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/create_video.png" alt="Creating a template" /><br />
Creating a template</a></td>
<td><a href="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_import_and_create.swf" target="_blank"><img class="alignnone size-full wp-image-493" title="Importing a template" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/import_video.png" alt="Importing a template" /><br />
Importing the Neverfail Template into Zabbix</a></td>
</tr>
</tbody>
</table>
<p>Sharp eyes might notice that I&#8217;m capturing  bothUnsafeUpdateQueueSize and safeUpdateQueueSize twice.  In doing so these values are being treated differently. The first is a simple measurement of how much data is in the queue.</p>
<h2>About Zabbix_sender</h2>
<p>Now turning our attention to how we get the info into Zabbix let&#8217;s look at Zabbix_sender.  It&#8217;s available a pre-compiled binary for Windows from <a href="http://www.zabbix.com/download.php">Zabbix&#8217;s website</a>. Getting it ready is as simple as unzipping the download and putting the executable somewhere. By running <em>zabbix_sender -h</em> we can see it can take a number of options.</p>
<div class="codecolorer-container text blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;height:300px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">C:\Temp&amp;gt;zabbix_sender -h<br />
ZABBIX send v1.6.2 (16 January 2009)<br />
<br />
usage: zabbix_sender [-Vhv] {[-zpsI] -ko | [-zpI] -i &amp;lt;file&amp;gt;} [-c &amp;lt;file&amp;gt;]<br />
<br />
Options:<br />
-c Specify configuration file<br />
-z Hostname or IP address of ZABBIX Server.<br />
-p Specify port number of server trapper running on the server. Default is 10051.<br />
-s Specify hostname or IP address of a host.<br />
-I Specify source IP address<br />
-k Specify metric name (key) we want to send.<br />
-o Specify value of the key.<br />
-i<br />
<br />
&lt;input type=&quot;text&quot; /&gt; Load values from input file.<br />
Each line of file contains:<br />
.<br />
-v Verbose mode<br />
Other options:<br />
-h Give this help.<br />
-V Display version number.</div></div>
<p>The ones I use  are -s, -z, -k and -o.  So a typical command line for me would look something like:</p>
<div class="codecolorer-container text blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">C:\temp\Zabbix_sender -z zabbix.crtcorp.com -s neverfail01 -k&quot;nf_cluster[file_sync_status]&quot; -o &quot;/Synchronized&quot;</div></div>
<p>Breaking down the command line:</p>
<ul>
<li><em><strong>zabbix.crtcorp.com</strong></em> is the Zabbix server we&#8217;re sending this data to</li>
<li><strong><em>neverfail01</em></strong> is the Neverfail node we&#8217;re sending information about</li>
<li>the key for the Zabbix item (i.e. counter) we want the information associated with is <strong><em>nf_cluster[file_sync_status]</em></strong>;</li>
<li>the value we want in the key is  &#8221;<strong><em>/Synchronized</em></strong>&#8220;</li>
</ul>
<p>In the example the value we&#8217;re putting into Zabbix is a string rather than a numerical value. Here&#8217;s an example with a numeric value being put into Zabbix:</p>
<p>C:\temp\Zabbix_sender -z zabbix.crtcorp.com -s neverfail01 -k&#8221;nf_cluster[throughput]&#8221; -o &#8220;103453&#8243;</p>
<p>Here we&#8217;re specifying the item with key <strong><em>nf_cluster[throughput]</em></strong> and giving it a value of <strong><em>103453</em></strong>.</p>
<h3>Adding Zabbix_Sender</h3>
<p>Now what I needed to do is to combine the script I wrote earlier with zabbix_sender to actually put the registry data into Zabbix. So  I added a new function to the GetRegValue.vbs script to execute the actual zabbix_send. It is pretty straightforward it builds a formulaic command line and then executes it. You&#8217;ll notice there is no error checking.</p>
<div class="codecolorer-container text blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">'###################################################################################<br />
Function Zabbix_Send(ZabbixKey,Value)<br />
Dim WshShell, oExec, CommandLine<br />
Set WshShell = CreateObject(&quot;WScript.Shell&quot;)<br />
'Build Our Command line so we can also echo it to console<br />
'Ex zbx send cmd line = C:\temp\Zabbix_sender -z&quot;wv-zabbix-01&quot; -s&quot;neverfail01&quot; -k&quot;nf_cluster[file_sync_status]&quot; -o &quot;/Synchronized&quot;<br />
CommandLine = ZBXSend &amp;amp; &quot; -v -z&quot;&quot;&quot; &amp;amp; ZBXServer &amp;amp; &quot;&quot;&quot; -s&quot;&quot;&quot; &amp;amp; ZBXClient &amp;amp; &quot;&quot;&quot; -k&quot;&quot;&quot; &amp;amp; ZabbixKey &amp;amp; &quot;&quot;&quot; -o &quot;&quot;&quot; &amp;amp; Value &amp;amp; &quot;&quot;&quot;&quot;<br />
WScript.Echo &quot;Commandline is [&quot; &amp;amp; CommandLine &amp;amp; &quot;]&quot;<br />
'Execute our command line<br />
Set oExec = WshShell.Exec(CommandLine)<br />
End Function</div></div>
<p>The next step is to modify the main body of the original GetRegValue script to turn it into a function. I then changed the WScript.Echos so that we were returning the registry value rather than simply writing it to the console.  (WScript.Echo HexToDec(HexValue) -&gt; GetRegValue = HexToDec(HexValue) , Wscript.Echo strValue -&gt; GetRegValue=strValue, and so on).  At the end we have this script which is good for reading <strong><em>one</em></strong> specified registry value and then inserting it into Zabbix.</p>
<p>Since there are a number of values we want to put into Zabbix we need to think about how to approach this given that the script only handles one value at a time.  What I settled on was a a batch file that used a<strong><em> for</em></strong> loop to go through a file with a list of registry based perf counters related to Neverfail.  The script as it now stands needs three arguments passed to it. It needs the ZabbixKey, the registry key path, and the registry value .  For values I want to get from the passive node the registry path needs to include the private IP address of the passive node (ex: \\10.0.0.2\HKLM\Software\Neverfail\R2\Performance) so that reg.exe knows where to go get them from.  The script can then query the registry using the path and value combination to get the data which it can then send to Zabbix using the key specified on the command line.  So having the list of registry values from the part 1 post I&#8217;m able to put together my file.</p>
<p>Because  I need to specify a delimiter to the <strong><em>for</em></strong> command and I use commas &#8216;,&#8217; in the Zabbix keys that I&#8217;ve defined, I need to use something else as a delimiter for my input file, so I&#8217;ve settled on using a pipe symbol as shown below.</p>
<div class="codecolorer-container text blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">nf_cluster[throughput]|HKLM\Software\Neverfail\R2\Performance|CurrentThroughput<br />
nf_cluster[MB_sent]|HKLM\Software\Neverfail\R2\Performance|MegaBytessent<br />
nf_cluster[MB_recvd]|HKLM\Software\Neverfail\R2\Performance|MegabytesReceived<br />
nf_q[unsafe,age]|HKLM\Software\Neverfail\R2\Performance|OldestUnsafeupdatequeueentry<br />
nf_q[unsafe,size]|HKLM\Software\Neverfail\R2\Performance|UnsafeUpdateQueueSize<br />
nf_q[unsafe,rate]|HKLM\Software\Neverfail\R2\Performance|UnsafeUpdateQueueSize<br />
nf_q[unsafe,total_kb_sent]|HKLM\Software\Neverfail\R2\Performance|KBDispatchedFromUnsafeQueue<br />
nf_q[safe,age]|\\10.0.0.2\HKLM\Software\Neverfail\R2\Performance|Oldestsafeupdatequeueentry<br />
nf_q[safe,size]|\\10.0.0.2\HKLM\Software\Neverfail\R2\Performance|safeUpdateQueueSize<br />
nf_q[safe,rate]|\\10.0.0.2\HKLM\Software\Neverfail\R2\Performance|SafeUpdateQueueSize<br />
nf_q[safe,total_kb_sent]|\\10.0.0.2\HKLM\Software\Neverfail\R2\Performance|KBDispatchedFromsafeQueue<br />
nf_cluster[reg_sync_status]|HKLM\Software\JavaSoft\Prefs\Neverfail\current\/Registry/State/Manager\/Status|/Value<br />
nf_cluster[file_sync_status]|HKLM\Software\JavaSoft\Prefs\Neverfail\current\/New/File/State/Mgr\/Synchronization/Status|/Tag<br />
nf_cluster[primary]|HKLM\Software\JavaSoft\Prefs\Neverfail\current\/Controller|/Is/Primary/Server</div></div>
<p>While my batch file  is about 35 lines, it really boils down to one line which does all the real work:</p>
<div class="codecolorer-container text blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">for /F &quot;tokens=1-3 delims=|&quot; %%I in (%ZBXKEYS%) do cscript %SENDVALUES% &quot;%%I&quot; &quot;%%J&quot; &quot;%%K&quot;</div></div>
<p>With the environment variables expanded it would look more like;</p>
<div class="codecolorer-container text blackboard" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">for /F &quot;tokens=1-3 delims=|&quot; %%I in (zbx_keys_to_send.txt ) do cscript SENDVALUES.vbs &quot;%%I&quot; &quot;%%J&quot; &quot;%%K&quot;</div></div>
<p>This for loop reads in each line of  the text file zbx_keys_to_send.txt and using the pipe symbol as a delimiter reads in the first three tokens/strings of each line and call the SENDVALUES.vbs script with the three tokens/strings as arguments.  The script and input file worked fine when I ran them on the primary node, but not so well when I ran them while the secondary was active. After some troubleshooting I realized  that one thing I didn&#8217;t think through at first wat that I actually need two lists/input files. Since the private IP address I want to use to get data from the passive node&#8217;s registry will change depending on which node is active I&#8217;ll need one list for when the script is sending from the primary node (10.0.0.1)  and another for when the secondary (10.0.0.2) is active. The lists should essentially be identical with the only difference being the IP address specifed for the passive node.</p>
<p><img class="alignnone size-full wp-image-369" title="A generic Neverfail cluster" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/nf_cluster.png" alt="A generic Neverfail cluster" width="474" height="275" /></p>
<p>Now all that is left to do is copy the batch file, the vbscript file and the approapirate inputer to each node in the cluster. Prior to setting up the scehiled task I like to manually run the batch file a few time to make sure  the data is getting populated into Zabbix. To do this I need to use a local account that exists on both nodes (in my case I use the local Administrator account). This is so that the reg.exe util can seamlessly get values from the passive node (assuming the account has the same password on both nodes).</p>
<h3>A little troubleshooting hint.</h3>
<p>When running the script manually I can see each time the VBScript file calls zabbix_sender and whether or not that submission was successful. Running zabbix_sender and mistyping the key was not an common issue when I was putting this together. Fortunately zabbix_sender lets me know what happened when I attempted to submit data.  As an example, below is the output I get when trying to submit a value for the nf_q[safe,size] key, if I mistype the key as nfq[safe,size]</p>
<p><img class="alignnone size-full wp-image-390" title="zbx_send_failed" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_send_failed.png" alt="zbx_send_failed" width="739" height="86" /></p>
<p>I can see that it reports that I have 1 failed item, and no Processed items. When I run it without any typos (intentional or otherwise) I get:</p>
<p><img class="alignnone size-full wp-image-389" title="zbx_send_good" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_send_good.png" alt="zbx_send_good" width="718" height="85" /></p>
<p>Now I can see that I had 1 item processed successfully and no Failed ones.</p>
<h2>Setting up Alerting</h2>
<p>If you import the template I&#8217;ve provided it should have also created four triggers that can be used to generate actions within Zabbix.</p>
<p><img class="alignnone size-full wp-image-392" title="template_triggers" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/template_triggers.png" alt="template_triggers" width="851" height="151" /></p>
<p>These triggers are based on situations I&#8217;ve run into in my environment that I want to be aware of.  The first is when the size of either the Safe or Unsafe queue has been above 2GB for over an hour. Neverfail was great at letting me know the queue was full and it was going to stop replicating but not so much on the warning me it was happening front.  I generally wanted to be aware well before we got to that state where it stopped replicating and these triggers are a way of warning me something is going on.  The second situation is when data to be replicated has been sitting in one of the queue&#8217;s for more than a specified amount of time.  This is similar to watching the queue get beyond a certain size as the first two triggers do but is helpful in situations where there isn&#8217;t a whole of data changing on the active node(i.e. over weekends).</p>
<p>It is of course  possible to change these and set them to what fits for your environment and even to add other triggers. In later versions of this monitoring I&#8217;ve added some other counters/keys related to the task state using the nfcmd.exe command line tool. This allows me to see when a server is doing a full system check or even the dreaded &#8220;internal system task&#8221; as well as how much progress it&#8217;s made.  Some example screenshots are included below.</p>
<table border="0" width="100%">
<tbody>
<tr align="center">
<td><div id="attachment_394" class="wp-caption alignnone" style="width: 160px"><br />
<a rel="lightbox[nf]" href="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_nf_data.png"><img class="size-thumbnail wp-image-394" title="Sample Data for one cluster" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_nf_data-150x150.png" alt="Sample Data for one cluster" width="150" height="150" /></a><p class="wp-caption-text">Sample Data for one cluster</p></div></td>
<td><div id="attachment_396" class="wp-caption alignnone" style="width: 160px"><a rel="lightbox[nf]" href="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_nf_landscape.png"><img class="size-thumbnail wp-image-396" title="Overview of all the clusters in my environment" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_nf_landscape-150x150.png" alt="Overview of all the clusters in my environment" width="150" height="150" /></a><p class="wp-caption-text">Cluster Overview</p></div></td>
<td><div id="attachment_397" class="wp-caption alignnone" style="width: 160px"><a rel="lightbox[nf]" href="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_nf_safeq_size_graph.png"><img class="size-thumbnail wp-image-397 " title="Graph of the Safe Queue size" src="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_nf_safeq_size_graph-150x150.png" alt="Graph of the Safe Queue size" width="150" height="150" /></a><p class="wp-caption-text">Sample Graph of the Safe Queue size</p></div></td>
<td><div id="attachment_395" class="wp-caption alignnone" style="width: 160px"><a rel="lightbox[nf]" href="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_nf_fullcheck.png"><img class="size-thumbnail wp-image-395" title="Enhanced view showing a Full System Check that is 3% complete." src="http://cars.lostroncos.org/wp-content/uploads/2010/02/zbx_nf_fullcheck-150x150.png" alt="Enhanced view showing a Full System Check that is 3% complete." width="150" height="150" /></a><p class="wp-caption-text">Enhanced view</p></div></td>
</tr>
</tbody>
</table>
<p>The three files I use are included here:</p>
<ul>
<li><a href="http://cars.lostroncos.org/wp-content/uploads/2010/02/Do_Zabbix.cmd.txt">DO_Zabbix.cmd</a> &#8211; The batch file that reads the input file with reg values &amp; zabbix keys and calls SendRegValue.vbs</li>
<li><a href="http://cars.lostroncos.org/wp-content/uploads/2010/02/SendRegValue.vbs.txt">SendRegValue.vbs</a> &#8211; The vbscript file that actually reads the registry entry and does any necessary conversions to send the value to Zabbix</li>
<li><a href="http://cars.lostroncos.org/wp-content/uploads/2010/02/zabbix_keys_to_send.txt">zabbix_keys_to_send.txt</a> &#8211; the input file used by DO_Zabbix.cmd. This version is the one I run when the primary node is active. IP addresses would need to be changed for this to run on a passive node.</li>
</ul>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</p>
<p><em>A few additional notes:</em></p>
<p><em>Because Neverfail  is continually pushing the perf data to the registry it does happen on occasion that the script will catch spuriously large or odd values for some counters. </em></p>
<p><em>If I were to use the zabbix_agent on my Neverfail nodes it is possible to include all this same monitoring within the agents configuration so that the agent pushes the data rather than using zabbix_sender via a scheduled task. <em>However that&#8217;s a post for some other time&#8230;<br />
-crt</em></em></p>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2010/02/23/simple-neverfail-monitoring-with-zabbix-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A less simple (but better) Replay Report</title>
		<link>http://cars.lostroncos.org/2009/10/12/a-less-simple-but-better-replay-report/</link>
		<comments>http://cars.lostroncos.org/2009/10/12/a-less-simple-but-better-replay-report/#comments</comments>
		<pubDate>Mon, 12 Oct 2009 08:23:41 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[powershell]]></category>
		<category><![CDATA[appassure]]></category>
		<category><![CDATA[charting]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[google chart api]]></category>
		<category><![CDATA[recovery point]]></category>
		<category><![CDATA[Replay]]></category>
		<category><![CDATA[snapshot]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/?p=220</guid>
		<description><![CDATA[<p>A while back I posted about a Replay report that I wrote to help me monitor the multiple Replay servers we have deployed globally.  It was a good first effort and was useful, but having to engage my brain first thing in the morning to read (and more importantly actually comprehend) the emailed reports eofre [...]]]></description>
			<content:encoded><![CDATA[<p>A while back I posted about a <a href="http://cars.lostroncos.org/2009/04/30/a-simple-replay-report/" target="_blank">Replay report that I wrote</a> to help me monitor the multiple Replay servers we have deployed globally.  It was a good first effort and was useful, but having to engage my brain first thing in the morning to read (and more importantly actually comprehend) the emailed reports eofre my second cup of coffee was less than ideal.</p>
<p>Thwe original idea behind generating the report was to have the info come to me rather than logging into multiple servers and firing up the console multiple times (what can I say I&#8217;m lazy).</p>
<p>The report in the first version of the script was straighforward text. Recently I&#8217;ve been looking into and thinking about different ways to present the information in the report so I could just sort of glance at it and get the status. The disk related portion of the report wasn&#8217;t initially where I was focusing my attention. I was more interested in being able to get a quick idea of where we stood with the # of Recovery Points we were expecting to have.  An example of  one of the simple reports is below. From this we can see that we&#8217;re in pretty good shape with 100% valid RPs spanning about 24 days.</p>
<p style="padding-left: 60px;">Starting Script at 04/30/2009 23:20:12</p>
<p style="padding-left: 60px;">Replay Service is running</p>
<p style="padding-left: 60px;">Server <strong><em>mailserver.company.com</em></strong> snapshots are being stored on R: and currently using 818.54GB. This is 99.98% of the used space(818.68GB) on the volume which is 1,360.22GB</p>
<p style="padding-left: 60px;">The drive currently has 39.81% free space (e.g. 541.54GB)</p>
<p style="padding-left: 60px;">Number of reported Recovery Points is 395 of these 395 are valid, and 0 are invalid (100.00%).<br />
The valid RPs span 23.98 days</p>
<p style="padding-left: 60px;">The most recent valid RP was taken 1 Minutes ago</p>
<p>The issue becomes less clear when invalid RPs occur for whatever reason. If I have 395 RPs and only 250 of the are valid is that a good or bad state? It&#8217;s not immediately clear but one can log in to the Replay server and get a better idea of how things stand. It might be the case where there was network issue during the day and instead of 96 RPs  (that&#8217;s an RP every 15 minutes * 24 hrs) for each of the last three days we&#8217;ve only gotten 40 RPs each of those days which while less than ideal might still be an okay state. Or it could be that there are several days for which we don&#8217;t have RPs.</p>
<p><span id="more-220"></span></p>
<p>I was trying to think of a way to visualize this information. Because of the retention schedule some days we&#8217;d expect a large number of RPs (~90) and some other days we&#8217;d expect to have just one.  I looked into the possibility of using sparklines even going so far as to download a <a href="http://ewbi.blogs.com/develops/2005/07/sparklines.html">C# based version</a> of  a<a href="http://sparklines.bitworking.info/"> PHP based sparkline web service</a> from Joe Gregorio.</p>
<p>I tried several different iterations of the script using sparklines trying to use the data I had in different ways (ex: use percentages of expected RPs, diffs between expected and actaul) but wasn&#8217;t able to find a good way to represent the state using those. In digging around I came across the Google Chart API and that looked at bar different ways of using bar graphs to represent the info I wanted.  Using either side-by-side bar graphs</p>
<p><a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_bar_sidebyside.png"><img class="alignright size-full wp-image-224" title="replay_report_rps_ex_bar_sidebyside" src="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_bar_sidebyside.png" alt="replay_report_rps_ex_bar_sidebyside" width="634" height="275" /></a></p>
<p>overlapping ones with green and red where a lot of red would be a bad thing.</p>
<p><a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_bar_overlay.png"><img class="aligncenter size-full wp-image-223" title="replay_report_rps_ex_bar_overlay" src="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_bar_overlay.png" alt="replay_report_rps_ex_bar_overlay" width="662" height="277" /></a></p>
<p>Either of these would have been an improvement over what I was getting in the text based report.  While trying to refine the overlaid version I came across an example in the Google <a href="http://code.google.com/apis/chart/styles.html#line_styles">documentation on line style</a>s of  this graph:</p>
<div class="wp-caption alignnone" style="width: 210px"><img title="Chart Data Line example from Google Chart API" src="http://chart.apis.google.com/chart?cht=bvg&amp;chbh=5,2&amp;chm=D,0033FF,1,0,5,1&amp;chbh=20&amp;chs=200x150&amp;chd=s1:1XQbnf4,43ksfg6&amp;chco=76A4FB" alt="Chart Data Line example from Google Chart API" width="200" height="150" /><p class="wp-caption-text">Chart Data Line example from Google Chart API</p></div>
<p>This caught my eye as a possible solution to my problem about how to present this data because of the ability to show both sets of data overlaid on each other while still managing to keep both of them visible.</p>
<p>In my particular scenario the retention schedule is:</p>
<ul>
<li>RPs every 15  minutes which are kept for 4 days</li>
<li>These roll up to hourly RPs which are kept for 5 days</li>
<li>Hourly&#8217;s roll up to dailies which are kept for ~25 days</li>
</ul>
<p>Our goal is to keep about 30 <strong><em>consecutive</em></strong> days worth of RPs on hand. When plotting out the # of expected RPs per day we get a graph that looks like the one below.</p>
<p><a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_bar_solo.png"><img class="aligncenter size-full wp-image-228" title="replay_report_rps_ex_bar_solo" src="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_bar_solo.png" alt="replay_report_rps_ex_bar_solo" width="624" height="266" /></a></p>
<p>As one can see the number of Recovery Points per day decreases over time. When adding the line showing the number of actual RPs it can be hard to tell what the status is for the days where there&#8217;s only one RP per day. If things are going well the green bars will be obscured by the red line.</p>
<p><a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_ideal.png"><br />
<img class="aligncenter size-full wp-image-218" title="replay_report_rps_ex_ideal" src="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_ideal.png" alt="replay_report_rps_ex_ideal" width="640" height="278" /></a></p>
<p>In the rare instance where we might be missing a few daily RPs the green bars do become somewhat visible as shown in the blue box below.</p>
<p><a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_missing_daily.png"><img class="aligncenter size-full wp-image-232" title="replay_report_rps_ex_missing_daily" src="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_missing_daily.png" alt="replay_report_rps_ex_missing_daily" width="600" height="250" /></a></p>
<p>I experimented with a couple of different ways to try to make this more obvious including altering the width and height of the chart to make it more obvious. (see below). Using the Google Chart API one is limited to an image with 300000 pixels (500&#215;600)</p>
<p><a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_missing_daily_tall.png"><img class="aligncenter size-full wp-image-233" title="replay_report_rps_ex_missing_daily_tall" src="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_missing_daily_tall.png" alt="replay_report_rps_ex_missing_daily_tall" width="526" height="626" /></a><a title="Tall Graph" href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_missing_daily_tall.png" target="_blank"><br />
</a></p>
<p>But for me persoanlly making the chart a lot bigger like  this seemed like it didn&#8217;t really add all that much to being able to see what was going on.  So I stuck with 600&#215;250 for the graph.</p>
<p>It should also be noted that in the case where you aren&#8217;t taking snapshots every 15 minutes but evey 30 minutes or maybe even every hour it becomes easier to see missed daily RPs.  Here&#8217;s an example</p>
<p><a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_missing_daily_50.png"><img class="aligncenter size-full wp-image-234" title="replay_report_rps_ex_missing_daily_50" src="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_rps_ex_missing_daily_50.png" alt="replay_report_rps_ex_missing_daily_50" width="627" height="276" /></a></p>
<p>After going through all of this with the RPs I almost as an afterhtought went back and added the logic to graph the disk usage data as well. It shows the size of the Replay archive data, the free space and other used space on the drive by generating something like this:</p>
<p><a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_disk_usage.png"><img class="aligncenter size-full wp-image-235" title="replay_report_disk_usage" src="http://cars.lostroncos.org/wp-content/uploads/2009/10/replay_report_disk_usage.png" alt="replay_report_disk_usage" width="366" height="217" /></a></p>
<p>Here&#8217;s a <a title="Example Replay Report" href="http://cars.lostroncos.org/wp-content/uploads/2009/10/example_replay_report.png" target="_blank">&#8220;real-life&#8221;  example of the whole report</a>.</p>
<p>The <a href="http://cars.lostroncos.org/wp-content/uploads/2009/10/Replay_Report_v2-01.txt" target="_blank">script is available here</a>. If it&#8217;s of any use to you please drop me a line and let me know.</p>
<p>To use it rename it to something like ReplayReport.ps1. You&#8217;ll need to modify the variables at the beginning of the file:</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>$ReportRecipients &#8211; array of recipient email addresses.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span></div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>$MailServer &#8211; The SMTP server to use to send the report out</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>$ReportSender -Address the email should appear to come from.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>$replay_exe &#8211; Path to the Replayc.exe file. May differ on x64 vs x86</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span> servers.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>$ExpectedRPCount &#8211; array containing the number of expected RPs for the</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span> last X days. Used to generate the graph of expected vs</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span> present RPs. (See Note below)</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">The script doesn&#8217;t take any arguments to run. I run it via a scheduled task</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">on the replay server.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">***************</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">In reference to $ExpectedRPCount the time of day that the report is run will</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">affect how the first data point on the graph appears. RPs are tracked by the</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">date that they were taken. If you run the script just before midnight there</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 3491px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">will obviously be a lot more RPs for &#8220;today&#8221; than if you run it at 1 am.</div>
<ul>
<li>$ReportRecipients &#8211; array of recipient email addresses.</li>
<li>$MailServer &#8211; The SMTP server to use to send the report out</li>
<li>$ReportSender -Address the email should appear to come from.</li>
<li>$replay_exe &#8211; Path to the Replayc.exe file. May differ on x64 vs x86  servers.</li>
<li>$ExpectedRPCount &#8211; array containing the number of expected RPs for the last X days. Used to generate the graph of expected vs present RPs. (See Note below)</li>
</ul>
<p>The script doesn&#8217;t take any arguments to run. I run it via a scheduled task on the replay server.</p>
<p>***************</p>
<p><em>In reference to $ExpectedRPCount the time of day that the report is run will affect how the first data point on the graph appears. RPs are tracked by the date that they were taken. If you run the script just before midnight there will obviously be a lot more RPs for &#8220;today&#8221; than if you run it at 1 am.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2009/10/12/a-less-simple-but-better-replay-report/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Simple Replay Report</title>
		<link>http://cars.lostroncos.org/2009/04/30/a-simple-replay-report/</link>
		<comments>http://cars.lostroncos.org/2009/04/30/a-simple-replay-report/#comments</comments>
		<pubDate>Fri, 01 May 2009 06:08:41 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[powershell]]></category>
		<category><![CDATA[Replay]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/?p=122</guid>
		<description><![CDATA[<p>Where I work we use AppAssure&#8217;s Replay product to back up some of our Exchange servers.  Because the servers in question are very geographically dispersed we have multiple servers running Replay.  Monitoring and keeping an eye on them to assure backups are happening properly was requiring more time than I wanted to spend because we [...]]]></description>
			<content:encoded><![CDATA[<p>Where I work we use AppAssure&#8217;s Replay product to back up some of our Exchange servers.  Because the servers in question are very geographically dispersed we have multiple servers running Replay.  Monitoring and keeping an eye on them to assure backups are happening properly was requiring more time than I wanted to spend because we had different versions of Replay running in the environment. I ended up having to RDP to multiple machines on a regular basis to ensure things were going smoothly.</p>
<p>In poking around the install directory I came across the <a href="https://support.appassure.com/ics/support/KBAnswer.asp?questionID=119" target="_blank">Replayc.exe command</a>. Replayc is a command line utilty that offers information about the Replay server and a way to manually mount and dismount Recovery Points (RPs). After playing with it a little and being the very lazy person that I am  I decided to write a Powershell script to help give me a high level status overview of my servers.  The script runs on each server at about the same time (relative to me here in Oregon) every day and emails me the output. So instead of having to muck around in the console Ionly have to spend a few seconds each to make sure everything&#8217;s running properly.</p>
<p>The <a href="http://cars.lostroncos.org/?attachment_id=145">script is available here</a> and needs to be renamed appropriately.</p>
<p>When the script runs the email (HTML formatted)  I get is like the one below.  It tells me a number of things:</p>
<ul>
<li>The status of the Replay Server (running/not running)</li>
<li>The name of the server that&#8217;s being protected</li>
<li>How much disk space is available and being used for RPs for that protected server</li>
<li>The size of the disk where those RPs are being stored</li>
<li>The # of valid and invalid RPs</li>
<li>The timespan between first and last valid RP</li>
<li>Last time an RP occurred.</li>
</ul>
<p>Example Email:</p>
<p style="padding-left: 60px;">Starting Script at 04/30/2009 23:20:12</p>
<p style="padding-left: 60px;">Replay Service is running</p>
<p style="padding-left: 60px;">Server <strong><em>mailserver.company.com</em></strong> snapshots are being stored on R: and currently using 818.54GB. This is 99.98% of the used space(818.68GB) on the volume which is 1,360.22GB</p>
<p style="padding-left: 60px;">The drive currently has 39.81% free space (e.g. 541.54GB)</p>
<p style="padding-left: 60px;">Number of reported Recovery Points is 395 of these 395 are valid, and 0 are invalid (100.00%).<br />
The valid RPs span 23.98 days</p>
<p style="padding-left: 60px;">The most recent valid RP was taken 1 Minutes ago</p>
<p style="padding-left: 60px;"> </p>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2009/04/30/a-simple-replay-report/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Creating Monitoring Items in Zabbix for Nagios plugins &#8211; part 1 (Log data)</title>
		<link>http://cars.lostroncos.org/2008/04/03/creating-monitoring-items-in-zabbix-for-nagios-plugins-part-1-log-data/</link>
		<comments>http://cars.lostroncos.org/2008/04/03/creating-monitoring-items-in-zabbix-for-nagios-plugins-part-1-log-data/#comments</comments>
		<pubDate>Fri, 04 Apr 2008 05:11:12 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[zabbix]]></category>
		<category><![CDATA[3i]]></category>
		<category><![CDATA[ESX]]></category>
		<category><![CDATA[external check]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/2008/04/03/creating-monitoring-items-in-zabbix-for-nagios-plugins-part-1-log-data/</guid>
		<description><![CDATA[<p style="margin-left: 1pt">One of the things I wanted to check in looking at Zabbix was how hard it would be to use the Nagios plugins I wrote/modified for monitoring ESX 3i in Zabbix.</p>
<p style="margin-left: 1pt">It turns out that they are usable pretty much as is though there is a minor modification that needs to be [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-left: 1pt">One of the things I wanted to check in looking at Zabbix was how hard it would be to use the <a href="http://cars.lostroncos.org/2008/03/07/updated-esx-3i-scripts-for-nagios/">Nagios plugins</a> I wrote/modified for monitoring ESX 3i in Zabbix.</p>
<p style="margin-left: 1pt">It turns out that they are usable pretty much as is though there is a minor modification that needs to be made on how they accept/expect parameters. There are however a couple of ways to approach setting them up. Zabbix supports maintaining a couple of different kinds of data for external checks (as well as in general). These include:</p>
<ul>
<li>
<p style="margin-left: 28pt">Float</p>
</li>
<li>
<p style="margin-left: 28pt">Integer</p>
</li>
<li>
<p style="margin-left: 28pt">Text</p>
</li>
<li>
<p style="margin-left: 28pt">Log</p>
</li>
<li>
<p style="margin-left: 28pt">Character</p>
</li>
</ul>
<p style="margin-left: 1pt">The Nagios plugins I &#8216;m concerned in looking at will probably work with either the Log type or Integer. The external check &#8220;Item&#8221; type is just that a check. In and of itself it doesn&#8217;t make anything happen in terms of alerting or notifications. For that we need to set up &#8220;Triggers.&#8221; I&#8217;ll cover setting up an Item using Log type data in this post.</p>
<p style="margin-left: 1pt"><span id="more-76"></span></p>
<p style="margin-left: 1pt">I&#8217;ve created a template in Zabbix for my 3i boxes to which I&#8217;ll be attaching these &#8220;Items&#8221; so that they&#8217;re available for anything built off this template.</p>
<p style="margin-left: 1pt">To start we&#8217;ll need to log into the Zabbix web console, select &#8220;Configuration&#8221; and then Items. Then we need to narrow down object we&#8217;re working on using the Group and Host dropdowns. Here I&#8217;ve used the group GO_ESX and the template Template_GO_ESX_3i. Next click &#8220;Create Item&#8221;</p>
<p style="margin-left: 1pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/04/040408-0510-creatingmon1.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
<p style="margin-left: 1pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/04/040408-0510-creatingmon2.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
<p style="margin-left: 1pt">We need to fill in the &#8220;description&#8221; field with something meaningful to us. In this case we&#8217;re going to be using the script <strong><em>check_3i_sensors</em></strong> which returns info about the various sensors in the machine (of particular interest are the power supplies and their redundancy) So we&#8217;ll use &#8220;Check 3i Sensors&#8221; as the description. The &#8220;Type&#8221; needs to be changed to External Check. The key in this case is actually the name of the script to run and any parameters. For external checks the format is:</p>
<p style="margin-left: 28pt">Script(parameters)</p>
<p style="margin-left: 28pt">Where:</p>
<p style="margin-left: 55pt">script &#8211; is the name of the script</p>
<p style="margin-left: 55pt">Parameters is the list of command line parameters</p>
<p style="margin-left: 1pt">Zabbix will execute the script from the directory specified by the ExternalScripts option in zabbix_server.conf. (By default this /etc/zabbix/externalscripts) Zabbix will add the hostname as the first parameter and then append the list specified in the (parameters) piece of the definition. As an example:</p>
<table border="0" style="border-collapse: collapse">
<tr>
<td style="border: #a3a3a3 1pt solid; padding: 5px"><span style="font-size: 10pt">Example 1: </span></td>
<td style="border-right: #a3a3a3 1pt solid; border-top: #a3a3a3 1pt solid; border-left: medium none; border-bottom: #a3a3a3 1pt solid; padding: 5px"><span style="font-size: 10pt">Execute script check_oracle.sh with parameters &#8220;-h 192.168.1.4&#8243;.<br />
</span><span style="font-size: 10pt">Host name &#8216;www1.company.com&#8217;.</span></td>
</tr>
</table>
<p style="margin-left: 28pt"><span style="font-size: 10pt; font-family: Courier New">check_oracle.sh[-h 192.168.1.4]<br />
</span></p>
<p style="margin-left: 28pt"><span style="font-size: 10pt">ZABBIX will execute:<br />
</span></p>
<p style="margin-left: 28pt"><span style="font-size: 10pt; font-family: Courier New">check_oracle.sh www1.company.com -h 192.168.1.4.<br />
</span></p>
<p style="margin-left: 28pt">&nbsp;</p>
<p style="margin-left: 1pt"><em>[ Here is where one of the drawbacks of Zabbix appears when compared to Nagios. I haven't yet found a way to alter the parameters that are sent to the script on a per host basis. It is of course possible to set up the checks on each host rather than using the template, but the template should in theory be used to save us some of that repetitive work. If we were to go forward with this it ought to be possible to automate the process. If we define our process to include creating a monitoring user on each ESX host with the same name and password then using the template becomes much more feasible ]<br />
</em></p>
<p style="margin-left: 1pt">So for the sensors script we would enter the following for the key: check_3i_sensors( &#8211;username zabbix &#8211;password zabbix)</p>
<p style="margin-left: 1pt">Since zabbix already includes the hostname for us we don&#8217;t need to specify it. (But this is where the modification to the nagios plugins becomes necessary to properly handle the way the argument is passed) On my test ESX 3i server I&#8217;ve created a local user called &#8216;zabbix&#8217; which has read-only privileges to use for connecting and running the external check.</p>
<p style="margin-left: 1pt">In the &#8220;Type of information&#8221; field select &#8220;Log&#8221; via the dropdown menu. This will cause the fields on the screen to change. Specify a value for the update interval (I usually use 60 seconds) and any flexible intervals if necessary as well as the number of days to keep history and trend data (see image below) Then click the &#8220;Save&#8221; button.</p>
<p style="margin-left: 1pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/04/040408-0510-creatingmon3.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
<p style="margin-left: 1pt">This needs to be repeated for any other plugins that are applicable to the template. As show below I&#8217;ve created two External checks, one for Sensor data, one for Storage status.</p>
<p style="margin-left: 1pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/04/040408-0510-creatingmon4.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
<p style="margin-left: 1pt">Next you&#8217;ll need to create a host based on your template. Here I&#8217;ve created a host call &#8220;Steve_ESX_2950&#8243; based on the template. Next if you wait a few minutes and then go the &#8220;Monitoring&#8221; tab and select overview (and if necessary narrow down the show servers to see your new host) you should see something like the image below:</p>
<p style="margin-left: 1pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/04/040408-0510-creatingmon5.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
<p style="margin-left: 1pt">Looking at this we can tell out external check has run and in this particular case it appears the sensor status is &#8220;GREEN&#8221;. If you click on the field you can see the last couple of hundred values for that particular item. You&#8217;ll notice the &#8220;Severity&#8221; is not classified. What we need to do next (in the next post) is to set up a trigger based on the Value of the Item (i.e. when it&#8217;s not &#8220;Green&#8221;).</p>
<p style="margin-left: 1pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/04/040408-0510-creatingmon6.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2008/04/03/creating-monitoring-items-in-zabbix-for-nagios-plugins-part-1-log-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Modifying Neverfail permissions to be able to run client utils remotely.</title>
		<link>http://cars.lostroncos.org/2008/03/12/modifying-neverfail-permissions-to-be-able-to-run-client-utils-remotely/</link>
		<comments>http://cars.lostroncos.org/2008/03/12/modifying-neverfail-permissions-to-be-able-to-run-client-utils-remotely/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 05:22:09 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[neverfail]]></category>
		<category><![CDATA[addtrustedclient]]></category>
		<category><![CDATA[neverfail for Exchange]]></category>
		<category><![CDATA[neverfail group]]></category>
		<category><![CDATA[neverfail heartbeat]]></category>
		<category><![CDATA[nfclient]]></category>
		<category><![CDATA[permissions]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/2008/03/12/modifying-neverfail-permissions-to-be-able-to-run-client-utils-remotely/</guid>
		<description><![CDATA[<p>In early June I received a call at work from the Neverfail sales rep I had been working with on a recent purchase expressing concern about the Neverfail related content here. In contacting one the PR folks at Neverfail I got the following response.</p>
<p>Glad to hear from you. I can shed some light on this [...]]]></description>
			<content:encoded><![CDATA[<p>In early June I received a call at work from the Neverfail sales rep I had been working with on a recent purchase expressing concern about the Neverfail related content here. In contacting one the PR folks at Neverfail I got the following response.</p>
<blockquote><p><em>Glad to hear from you. I can shed some light on this concern. While your blog is not copying information from the KB&#8217;s, it is making direct reference to KB numbers and <strong><span style="color: #ff0000;">more importantly, giving out exact command syntax</span></strong> which is intended only for Neverfail customers/partners. <span style="color: #ff0000;">We don&#8217;t put this information into our technical documentation as it is competitive/proprietary to our intellectual property. </span>We post this information on the extranet which is only available to registered users via password for this reason. We&#8217;re happy to work with you, as we do many bloggers, to provide information that you can post. Please let me know if you have further questions and I&#8217;ll be happy to help.</em></p></blockquote>
<p>Given that I&#8217;m trying to run a technical blog for which posting &#8220;exact command syntax&#8221; might be useful I&#8217;ve decided to pull the content.</p>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2008/03/12/modifying-neverfail-permissions-to-be-able-to-run-client-utils-remotely/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Checking storage on a dell poweredge 2900 running ESX 3i</title>
		<link>http://cars.lostroncos.org/2008/02/21/checking-storage-on-a-dell-poweredge-2900-running-esx-3i/</link>
		<comments>http://cars.lostroncos.org/2008/02/21/checking-storage-on-a-dell-poweredge-2900-running-esx-3i/#comments</comments>
		<pubDate>Fri, 22 Feb 2008 06:03:46 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/2008/02/21/checking-storage-on-a-dell-poweredge-2900-running-esx-3i/</guid>
		<description><![CDATA[<p style="margin-left: 1pt">As I mentioned in an earlier post one of the issues we&#8217;ve had with the idea of deploying ESX 3i vs 3.5 is the ability to monitor the hardware since neither the DRAC card nor the BMC via IPMI seem to be able to give us all the info we need. I had [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-left: 1pt">As I mentioned in an <a href="http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge/">earlier post</a> one of the issues we&#8217;ve had with the idea of deploying ESX 3i vs 3.5 is the ability to monitor the hardware since neither the DRAC card nor the BMC via IPMI seem to be able to give us all the info we need. I had looked briefly at the VI-Perl toolkit and the VI SDK but not spent a lot of time on it.</p>
<p style="margin-left: 1pt">I installed 3i on a new PE 2900 today to take a look at this again. I had previously pulled one of the disks in the server so that I could be certain something was &#8220;wrong&#8221; so I had something to test against. Below is the &#8220;Health Status&#8221; as shown via the VI client. As you can see &#8220;Storage&#8221; shows up as being in a warning state since RAID 6 Virtual Disk shows as being in a &#8220;Warning&#8221; state. It&#8217;s worth noting that since I pulled a hard drive Physical Disk 7 does not show in the list of items under Storage. I&#8217;m assuming that if the drive was actually bad it&#8217;d show up as failed. But I don&#8217;t know that I want to damage a perfectly good drive to find out.</p>
<p style="margin-left: 1pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/022208-0803-checkingsto1.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
<p style="margin-left: 1pt"><span id="more-46"></span> </p>
<p style="margin-left: 1pt">You can also see this same info via the &#8220;Browse Objects managed by this host&#8221; option on the local web page for the host. (<a href="http://%3cesx_ip_address%3e/mob">http://&lt;esx_ip_address&gt;/mob</a>).</p>
<p><img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/022208-0803-checkingsto2.png" /></p>
<p>As you browse through the managed objects you&#8217;ll notice each screen has a Properties and a Methods section. The Properties section will have three columns Name, Type and Value. The path to get to the storage info object is presented using a Name (Value) format. [ex: on the screen below to represent the value "content" would be written as content(content)] If you don&#8217;t want to navigate manually you can try the following URL:</p>
<p><a href="https://%3cesx_server_ip%3e/mob/?moid=healthStatusSystem&amp;doPath=runtime%2ehardwareStatusInfo%2estorageStatusInfo">https://&lt;esx_server_ip&gt;/mob/?moid=healthStatusSystem&amp;doPath=runtime%2ehardwareStatusInfo%2estorageStatusInfo</a></p>
<p><img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/022208-0803-checkingsto3.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
<p>Home MOB Page -&gt; Content(content) -&gt; rootFolder(ha-folder-root) -&gt; childEntity(ha-datacenter) -&gt; hostFolder (ha-folder-host) -&gt; Child-entity(ha-compute-res) -&gt; Host (ha-host) -&gt; configManager(configManager) -&gt; healthStatusSystem(healthStatusSystem) -&gt; Runtime(runtime) -&gt;Hardwarestatusinfo (hardwareStatusInfo) -&gt; Storagestatusinfo(storageStatusInfo)</p>
<p>Once there you&#8217;ll (hopefully) see that we have an array of HostStorageElementInfo objects. Here we&#8217;re interested mainly in the name and status-&gt;key values.</p>
<p style="margin-left: 5pt"><u></u><a href="https://147.34.35.69/mob/?moid=healthStatusSystem&amp;doPath=runtime%2ehardwareStatusInfo%2estorageStatusInfo"></a></p>
<table border="0" style="border-collapse: collapse">
<tr>
<td style="border: #a3a3a3 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">[11]</span></p>
</td>
<td style="border-right: #a3a3a3 1pt solid; border-top: #a3a3a3 1pt solid; border-left: medium none; border-bottom: #a3a3a3 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">HostStorageElementInfo</span></p>
</td>
<td style="border-right: #a3a3a3 1pt solid; border-top: #a3a3a3 1pt solid; border-left: medium none; border-bottom: #a3a3a3 1pt solid; padding: 7px">
<table border="0" style="border-collapse: collapse">
<tr>
<td style="border: #a3a3a3 1pt solid; padding: 7px"><span style="font-size: 7pt; font-family: Verdana"><strong>Name</strong></span></td>
<td style="border-right: #a3a3a3 1pt solid; border-top: #a3a3a3 1pt solid; border-left: medium none; border-bottom: #a3a3a3 1pt solid; padding: 7px"><span style="font-size: 7pt; font-family: Verdana"><strong>Type</strong></span></td>
<td style="border-right: #a3a3a3 1pt solid; border-top: #a3a3a3 1pt solid; border-left: medium none; border-bottom: #a3a3a3 1pt solid; padding: 7px"><span style="font-size: 7pt; font-family: Verdana"><strong>Value</strong></span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">dynamicProperty</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">DynamicProperty[]</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">Unset</span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">dynamicType</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">string</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">Unset</span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana; background-color: yellow">name</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana; background-color: yellow">string</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana; background-color: yellow">&#8220;RAID 6 Virtual Disk 0 of Controller 0&#8243;</span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">operationalInfo</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">HostStorageOperationalInfo[]</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">Unset</span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana; background-color: yellow">status</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">ElementDescription</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<table border="0" style="border-collapse: collapse">
<tr>
<td style="border: #a3a3a3 1pt solid; padding: 7px"><span style="font-size: 7pt; font-family: Verdana"><strong>Name</strong></span></td>
<td style="border-right: #a3a3a3 1pt solid; border-top: #a3a3a3 1pt solid; border-left: medium none; border-bottom: #a3a3a3 1pt solid; padding: 7px"><span style="font-size: 7pt; font-family: Verdana"><strong>Type</strong></span></td>
<td style="border-right: #a3a3a3 1pt solid; border-top: #a3a3a3 1pt solid; border-left: medium none; border-bottom: #a3a3a3 1pt solid; padding: 7px"><span style="font-size: 7pt; font-family: Verdana"><strong>Value</strong></span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">dynamicProperty</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">DynamicProperty[]</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">Unset</span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">dynamicType</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">string</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">Unset</span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana; background-color: yellow">key</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana; background-color: yellow">string</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana; background-color: yellow">&#8220;Yellow&#8221;</span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">label</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">string</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">&#8220;Yellow&#8221;</span></td>
</tr>
<tr>
<td style="border-right: 1pt solid; border-top: medium none; border-left: 1pt solid; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">summary</span></td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px">
<p style="text-align: right"><span style="font-size: 9pt; color: black; font-family: Verdana">string</span></p>
</td>
<td style="border-right: 1pt solid; border-top: medium none; border-left: medium none; border-bottom: 1pt solid; padding: 7px"><span style="font-size: 9pt; color: black; font-family: Verdana">&#8220;All functionality is available but some might be degraded&#8221;</span></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
<p style="margin-left: 5pt">All of this info should be accessible via the VI SDK. I had earlier tried to download the VI Perl Toolkit for both Windows and Linux but was having issues with the Windows version. I also has some issues with the Linux version but that was solved by downloading and building some modules that aren&#8217;t apparently part of the standard perl install. Rather than mucking with that too long, I downloaded and installed the VI Perl Toolkit Virtual Appliance. This virtual appliance is unfortunately in Open Virtual Machine Format (OVF) which meant I had to go download another tool from the VMware website to convert it for use with ESX. Once I got it up and running I was able to run some of the sample scripts to get a feel for what they did and how they worked.</p>
<p style="margin-left: 5pt">In a little while I was able to take one of them, datacenterlisting.pl and modify it to show me the same storage info I was able to see via the VI client. This is where the ability to use the manage objects browser came in handy since that provided the info I needed to be able to modify the script to get what I wanted. This first version doesn&#8217;t do anything fancy, it simply lists each of the storage elements and their status as Red, Yellow, or Green. If we have to run a script to regularly monitor the servers we&#8217;re deploying we&#8217;ll probably use already existing internal Nagios server to monitor these ESX boxes. That will require a slightly less verbose version of the script that only really outputs info when there&#8217;s an issue.</p>
<p style="margin-left: 5pt">Here&#8217;s the <a target="_blank" href="http://cars.lostroncos.org/wp-content/uploads/2008/02/check_3i_storage_simple.txt" title="Check ESX 3i Storage Status script">simple version of the script</a>. There&#8217;s a sample below of the command line. But you need to specify a user, password and ipaddress/hostname for the server you&#8217;re checking on. You should also be able to point this script at Virtual Center but I don&#8217;t know what it will do if it encounters 3.5 hosts vs 3i hosts. I added a dummy user for nagios on the host with readonly permissions since that&#8217;s our ultimate target.</p>
<p style="margin-left: 5pt"> check_3i_storage_simple &#8211;password &lt;passwd&gt; &#8211;username &lt;user&gt; &#8211;server &lt;ip addr/hostname&gt; &#8211;datacenter ha-datacenter</p>
<p style="margin-left: 5pt">the setting &#8216;ha-datacenter&#8217; is the value at the root of the managed object hierarchy that we use to find the host info. (<em>This is true even on a standalone host</em>).</p>
<p style="margin-left: 5pt"><strong>visdk@vaos:~$ ./check_3i_storage_simple &#8211;password badpassword &#8211;username root &#8211;server 11.22.33.44 &#8211;datacenter ha-datacenter<br />
</strong></p>
<pre>Datacenter = ha-datacenter
Hosts found:
1: w35d154.company.com
Boot time 2008-02-21T12:36:34.19314Z
Controller 0 (PERC 6/i Integrated) Has status of Green
Battery of Controller 0 Has status of Green
Physical Disk 0/E32 of Controller 0 Has status of Green
Physical Disk 1/E32 of Controller 0 Has status of Green
Physical Disk 2/E32 of Controller 0 Has status of Green
Physical Disk 3/E32 of Controller 0 Has status of Green
Physical Disk 4/E32 of Controller 0 Has status of Green
Physical Disk 5/E32 of Controller 0 Has status of Green
Physical Disk 6/E32 of Controller 0 Has status of Green
Physical Disk 8/E32 of Controller 0 Has status of Green
Physical Disk 9/E32 of Controller 0 Has status of Green
RAID 6 Virtual Disk 0 of Controller 0 Has status of Yellow
Port 0 of Controller 0 Has status of Green
Port 1 of Controller 0 Has status of Green
Port 2 of Controller 0 Has status of Green
Port 3 of Controller 0 Has status of Green
Port 4 of Controller 0 Has status of Green
Port 5 of Controller 0 Has status of Green
Port 6 of Controller 0 Has status of Green
Port 7 of Controller 0 Has status of Green</pre>
<p style="margin-left: 5pt"><strong>visdk@vaos:~$<br />
</strong></p>
<p style="margin-left: 5pt">Reference stuff:</p>
<p style="margin-left: 5pt">The VI-Perl toolkit in its various incarnations as well as the VMware Infrastructure SDK are available from VMware at <a href="http://www.vmware.com/support/pubs/sdk_pubs.html">http://www.vmware.com/support/pubs/sdk_pubs.html</a> To actually download components you have to have an account on their site.</p>
<p>The original script that I started with is in the samples/discovery folder and called datacenterlisting.pl.</p>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2008/02/21/checking-storage-on-a-dell-poweredge-2900-running-esx-3i/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Working with Neverfail for Exchange &#8211; command line utils</title>
		<link>http://cars.lostroncos.org/2008/02/18/working-with-neverfail-for-exchange-command-line-utils/</link>
		<comments>http://cars.lostroncos.org/2008/02/18/working-with-neverfail-for-exchange-command-line-utils/#comments</comments>
		<pubDate>Mon, 18 Feb 2008 21:25:47 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[neverfail]]></category>
		<category><![CDATA[addtrustedclient]]></category>
		<category><![CDATA[neverfail for Exchange]]></category>
		<category><![CDATA[neverfail group]]></category>
		<category><![CDATA[neverfail heartbeat]]></category>
		<category><![CDATA[nfclient]]></category>
		<category><![CDATA[permissions]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/2008/02/18/working-with-neverfail-for-exchange-command-line-utils/</guid>
		<description><![CDATA[<p>In early June I received a call at work from the Neverfail sales rep I had been working with on a recent purchase expressing concern about the Neverfail related content here. In contacting one the PR folks at Neverfail I got the following response.</p>
<p>Glad to hear from you. I can shed some light on this [...]]]></description>
			<content:encoded><![CDATA[<p>In early June I received a call at work from the Neverfail sales rep I had been working with on a recent purchase expressing concern about the Neverfail related content here. In contacting one the PR folks at Neverfail I got the following response.</p>
<blockquote><p><em>Glad to hear from you. I can shed some light on this concern. While your blog is not copying information from the KB&#8217;s, it is making direct reference to KB numbers and <strong><span style="color: #ff0000;">more importantly, giving out exact command syntax</span></strong> which is intended only for Neverfail customers/partners. <span style="color: #ff0000;">We don&#8217;t put this information into our technical documentation as it is competitive/proprietary to our intellectual property. </span>We post this information on the extranet which is only available to registered users via password for this reason. We&#8217;re happy to work with you, as we do many bloggers, to provide information that you can post. Please let me know if you have further questions and I&#8217;ll be happy to help.</em></p></blockquote>
<p>Given that I&#8217;m trying to run a technical blog for which posting &#8220;exact command syntax&#8221; might be useful I&#8217;ve decided to pull the content.</p>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2008/02/18/working-with-neverfail-for-exchange-command-line-utils/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IPMI and the Dell PowerEdge &#8211; Part the Third</title>
		<link>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge-part-the-third/</link>
		<comments>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge-part-the-third/#comments</comments>
		<pubDate>Sun, 17 Feb 2008 06:40:13 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[hardware]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge-part-the-third/</guid>
		<description><![CDATA[<p style="margin-left: 19pt">Okay now that we have a user that&#8217;s set up for access to IPMI what can we find out about our server from a monitoring perspective?</p>
<p style="margin-left: 19pt">If run ipmitool -h we get a list of commands we can run.</p>
<p style="margin-left: 19pt"></p>
<p style="margin-left: 19pt">Several of these &#8220;commands&#8221; have sub commands. For example the [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-left: 19pt">Okay now that we have a user that&#8217;s set up for access to IPMI what can we find out about our server from a monitoring perspective?</p>
<p style="margin-left: 19pt">If run ipmitool -h we get a list of commands we can run.</p>
<p style="margin-left: 19pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0640-ipmiandthed1.png" /></p>
<p style="margin-left: 19pt">Several of these &#8220;commands&#8221; have sub commands. For example the &#8216;chassis&#8217; command</p>
<p style="margin-left: 19pt">Has sub-commands of: status, power, identify, policy, restart_cause, poh, bootdev, selftest.</p>
<p style="margin-left: 19pt"><span id="more-35"></span>So let&#8217;s get started by running ipmitool again. Ipmitool -h</p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">ctronco@orw-ctronco-vm-01:~$ ipmitool -h<br />
</span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">ipmitool version 1.8.7<br />
</span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">usage: ipmitool [options...] &lt;command&gt;<br />
</span><span style="font-size: 9pt; font-family: Courier New">-h This help<br />
</span><span style="font-size: 9pt; font-family: Courier New">-V Show version information<br />
</span><span style="font-size: 9pt; font-family: Courier New">-v Verbose (can use multiple times)<br />
</span><span style="font-size: 9pt; font-family: Courier New">-c Display output in comma separated format<br />
</span><span style="font-size: 9pt; font-family: Courier New">-I intf Interface to use<br />
</span><span style="font-size: 9pt; font-family: Courier New">-H hostname Remote host name for LAN interface<br />
</span><span style="font-size: 9pt; font-family: Courier New">-p port Remote RMCP port [default=623]<br />
</span><span style="font-size: 9pt; font-family: Courier New">-U username Remote session username<br />
</span><span style="font-size: 9pt; font-family: Courier New">-f file Read remote session password from file<br />
</span><span style="font-size: 9pt; font-family: Courier New">-S sdr Use local file for remote SDR cache<br />
</span><span style="font-size: 9pt; font-family: Courier New">-a Prompt for remote password<br />
</span><span style="font-size: 9pt; font-family: Courier New">-e char Set SOL escape character<br />
</span><span style="font-size: 9pt; font-family: Courier New">-C ciphersuite Cipher suite to be used by lanplus interface<br />
</span><span style="font-size: 9pt; font-family: Courier New">-k key Use Kg key for IPMIv2 authentication<br />
</span><span style="font-size: 9pt; font-family: Courier New">-L level Remote session privilege level [default=ADMINISTRATOR]<br />
</span><span style="font-size: 9pt; font-family: Courier New">-A authtype Force use of auth type NONE, PASSWORD, MD2, MD5 or OEM<br />
</span><span style="font-size: 9pt; font-family: Courier New">-P password Remote session password<br />
</span><span style="font-size: 9pt; font-family: Courier New">-E Read password from IPMI_PASSWORD environment variable<br />
</span><span style="font-size: 9pt; font-family: Courier New">-m address Set local IPMB address<br />
</span><span style="font-size: 9pt; font-family: Courier New">-b channel Set destination channel for bridged request<br />
</span><span style="font-size: 9pt; font-family: Courier New">-l lun Set destination lun for raw commands<br />
</span><span style="font-size: 9pt; font-family: Courier New">-t address Bridge request to remote target address<br />
</span><span style="font-size: 9pt; font-family: Courier New">-o oemtype Setup for OEM (use &#8216;list&#8217; to see available OEM types)<br />
</span><span style="font-size: 9pt; font-family: Courier New">-O seloem Use file for OEM SEL event descriptions<br />
</span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">Interfaces:</span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New"></span><span style="font-size: 9pt; font-family: Courier New">open Linux OpenIPMI Interface [default]<br />
</span><span style="font-size: 9pt; font-family: Courier New">imb Intel IMB Interface<br />
</span><span style="font-size: 9pt; font-family: Courier New">lan IPMI v1.5 LAN Interface<br />
</span><span style="font-size: 9pt; font-family: Courier New">lanplus IPMI v2.0 RMCP+ LAN Interface<br />
</span><span style="font-size: 9pt; font-family: Courier New"> </span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">Commands: <span style="color: red"><strong>&lt;we&#8217;ve already seen these&gt;</strong></span><br />
</span></p>
<p style="margin-left: 19pt">So we&#8217;ll need at least a host, a user, a password, and a command (let&#8217;s pick the fru command for fun). Using the user ipmi with a password of &#8216;password&#8217; let&#8217;s see what happens:</p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">ctronco@orw-ctronco-vm-01:~$ ipmitool -H 147.34.14.5 -U ipmi -P password fru<br />
</span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">FRU Device Description : Builtin FRU Device (ID 0)</span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New"></span><span style="font-size: 9pt; font-family: Courier New">Activate Session error: Requested privilege level exceeds limit<br />
</span><span style="font-size: 9pt; font-family: Courier New">Error: Unable to establish LAN session<br />
</span><span style="font-size: 9pt; font-family: Courier New">ipmi_lan_send_cmd failed to open intf<br />
</span><span style="font-size: 9pt; font-family: Courier New">Device not present (No Response)<br />
</span><span style="font-size: 9pt; font-family: Courier New">ipmi_lan_send_cmd failed to open intf<br />
</span><span style="font-size: 9pt; font-family: Courier New">Get Device ID command failed<br />
</span><span style="font-size: 9pt; font-family: Courier New"> </span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">ctronco@orw-ctronco-vm-01:~$<br />
</span></p>
<p style="margin-left: 19pt">Okay looks like we have a couple of issues. The first is the &#8220;Requested privilege level exceeds limit&#8221; error. As I mentioned in the earlier post ipmitool by default want to connect with Administrator privileges. So we can override this by using the -L option. So our modified command line to deal with this would be:</p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">ipmitool -H 147.34.14.5 -U ipmi -P password -L USER fru<br />
</span></p>
<p style="margin-left: 19pt">But we also have another error: Unable to establish LAN session. In the earlier post I mentioned that we weren&#8217;t using the Encryption key because of difficulties in entering it on the command line for ipmitool. So we need to use the -I (that&#8217;s uppercase i) option to specify the &#8216;lan&#8217; interface type (IPMI v1.5 LAN Interface). So our new command line incorporating this is:</p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">ipmitool -H 147.34.14.5 -U ipmi -P password -L USER -I lan fru<br />
</span></p>
<p style="margin-left: 19pt">Running this in our Linux term:</p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">ctronco@orw-ctronco-vm-01:~$ ipmitool -H 147.34.14.5 -U ipmi -P password -L USER -I lan fru<br />
</span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New">FRU Device Description : Builtin FRU Device (ID 0)<br />
</span><span style="font-size: 9pt; font-family: Courier New">Chassis Type : Unknown<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Mfg : DELL<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Product : FRU16K,DELL P/N<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Serial : CN1374071J01LO<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Part Number : 0DT021A00<br />
</span><span style="font-size: 9pt; font-family: Courier New">Product Manufacturer : DELL<br />
</span><span style="font-size: 9pt; font-family: Courier New">FRU Device Description : CPU1 (ID 176)<br />
</span><span style="font-size: 9pt; font-family: Courier New">Device not present (Parameter out of range)<br />
</span><span style="font-size: 9pt; font-family: Courier New">FRU Device Description : CPU2 (ID 176)<br />
</span><span style="font-size: 9pt; font-family: Courier New">Device not present (Parameter out of range)<br />
</span><span style="font-size: 9pt; font-family: Courier New">FRU Device Description : Storage (ID 2)</span></p>
<p style="margin-left: 19pt"><span style="font-size: 9pt; font-family: Courier New"></span><span style="font-size: 9pt; font-family: Courier New">Board Mfg : DELL<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Product : FRU256,DELL P/N 03K345A00<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Serial : CN1374073M00CV<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Part Number : 0FT781A01<br />
</span><span style="font-size: 9pt; font-family: Courier New">FRU Device Description : PS 1 (ID 3)<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Mfg : DELL<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Product : PWR SPLY,750W,RDNT<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Serial : CN1797272D28JI<br />
</span><span style="font-size: 9pt; font-family: Courier New">Board Part Number : 0Y8132A05<br />
</span><span style="color: red"><strong>&lt;and so on&gt;</strong></span></p>
<p style="margin-left: 19pt">
Some other interesting commands from a monitoring standpoint are:</p>
<ul>
<li>Sdr &#8211; reports on various hardware sensors: temp, fan speed etc.</li>
<li>Sel &#8211; System Event Log<br />
 <br />
 </li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge-part-the-third/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>IPMI and the Dell PowerEdge &#8211; Part the Second</title>
		<link>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge-part-the-second/</link>
		<comments>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge-part-the-second/#comments</comments>
		<pubDate>Sun, 17 Feb 2008 06:39:25 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[hardware]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge-part-the-second/</guid>
		<description><![CDATA[<p style="margin-left: 23pt">Setting up IPMI via the DRAC </p>
<p style="margin-left: 23pt">We&#8217;ll walk through the steps to set up the server for monitoring via IPMI using the DRAC. It is supposedly possible to do this without the DRAC, but I haven&#8217;t had a reason to try to do that yet.  </p>

Log in the DRAC (ex: https://&#60;RAC IP [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-left: 23pt"><strong>Setting up IPMI via the DRAC </strong></p>
<p style="margin-left: 23pt">We&#8217;ll walk through the steps to set up the server for monitoring via IPMI using the DRAC. It is supposedly possible to do this without the DRAC, but I haven&#8217;t had a reason to try to do that yet.  </p>
<ol>
<li>Log in the DRAC (ex: https://&lt;RAC IP Address&gt;/ )</li>
<li>Once logged in (see below) choose the &#8220;Remote Access&#8221; option<img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0639-ipmiandthed1.png" /><br />
<span id="more-33"></span> </li>
<li>Choose the &#8220;Configuration&#8221; tab<br />
 <img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0639-ipmiandthed2.png" /></li>
<li>This should take you to the &#8220;Network Configuration&#8221; page.<img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0639-ipmiandthed3.png" /></li>
<li>Scroll down to the IPMI LAN Settings section and ensure the &#8220;Enable IPMI over LAN&#8221; option is checked. If it is not, check it and click the &#8220;Apply Changes&#8221; button at the bottom of the page.<img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0639-ipmiandthed4.png" />
<p><em>You can also change the Privilege Level Limit setting (i.e. restricting the kinds of actions that can be executed. Options are Administrator, Operator and User) I&#8217;ve left this one at Administrator because I will be limiting the user account we set up. However you also need to be aware that ipmitool by default wants to connect with Admin privileges but we can override this on the command line. </em></p>
<p><em>You can also set the encryption key but we won&#8217;t be using this with ipmitool because there appears to be an issue with entering it in an easy fashion. </em></li>
<li>Back up at the top (the &#8220;Configuration&#8221; tab) choose &#8220;Users&#8221;<img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0639-ipmiandthed5.png" /></li>
<li>Click on the number representing one of the &#8220;Disabled&#8221; user accounts. We&#8217;ll pick #4.<img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0639-ipmiandthed6.png" />
<p>Check the box to enable this user. Enter the user name and password. Under the User Privileges sections set the &#8220;Maximum LAN User Privilege Granted&#8221; to &#8220;User&#8221;. Under the DRAC settings configure as you wish if you want this account to be able to manage the DRAC or just leave it empty. The completed form looks like:</p>
<p><img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0639-ipmiandthed7.png" /></li>
<li>Click the &#8220;Apply Changes&#8221; button at the bottom of the screen.<br />
That&#8217;s it. Our user is now set up for IPMI access. Using ipmitool will follow in another post.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge-part-the-second/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>IPMI and the Dell PowerEdge</title>
		<link>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge/</link>
		<comments>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge/#comments</comments>
		<pubDate>Sun, 17 Feb 2008 06:34:21 +0000</pubDate>
		<dc:creator>cars</dc:creator>
				<category><![CDATA[Work]]></category>
		<category><![CDATA[hardware]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[virtualization]]></category>

		<guid isPermaLink="false">http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge/</guid>
		<description><![CDATA[<p style="margin-left: 1pt">In one of my projects at work, we&#8217;ve been debating whether to use ESX 3i (installable) or ESX 3.5 on a large number of Dell servers we&#8217;re getting ready to deploy. The advantage of 3i is we can treat the host as more of an appliance (i.e. hopefully fewer patches/maintenance). Downside is monitoring. Since [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-left: 1pt">In one of my projects at work, we&#8217;ve been debating whether to use ESX 3i (installable) or ESX 3.5 on a large number of Dell servers we&#8217;re getting ready to deploy. The advantage of 3i is we can treat the host as more of an appliance (i.e. hopefully fewer patches/maintenance). Downside is monitoring. Since there&#8217;s no service console we can&#8217;t run the Dell OpenManage agents. Since many of these will end up at sites without on site staff and we don&#8217;t want to require someone to log into VirtualCenter(s) and manually check the status of each server we need a way to pro-actively monitor the hardware. The DRAC can send us some alerts, but not at a level we&#8217;d like. (ex: It will alert if a Power Supply is disconnected/unavailable, but not if a drive is removed/lost from a RAID array.</p>
<p style="margin-left: 1pt">When installed 3i <strong>can</strong> show the status of the hardware it&#8217;s running on. The shot below is from a new Dell PE 2950. However some of the HP blades I&#8217;ve seen with 3i installed on them have limited information. My assumption is that this requires integration between Vmware and the hardware vendor. Since the 2950 is the first (and so far only) system that is certified to run 3i I expect this to show up on the HP blades at some point in the future.</p>
<p style="margin-left: 1pt"><img src="http://cars.lostroncos.org/wp-content/uploads/2008/02/021708-0634-ipmiandthed1.png" /><span style="font-size: 12pt; font-family: Times New Roman"><br />
</span></p>
<p style="margin-left: 1pt">Unfortunately what it doesn&#8217;t allow us to do is to alert when a component has issues (i.e. non &#8220;Normal&#8221; status). In various discussions about the issue wed talked about looking at IPMI and what it can tell us.</p>
<p style="margin-left: 1pt">I&#8217;ve set up Nagios on a VM running Ubuntu and installed ipmi-tool to poke around. I&#8217;ll follow up with a couple of posts on setting up the DRAC card in the dell to allow access via IPMI, as well as what it can/does show us.</p>
<p style="margin-left: 1pt">
 </p>
]]></content:encoded>
			<wfw:commentRss>http://cars.lostroncos.org/2008/02/16/ipmi-and-the-dell-poweredge/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
