Discussion:
Configuring zenping ping retries / failures?
James Pearce
2011-11-24 01:12:31 UTC
Permalink
James Pearce [http://community.zenoss.org/people/SirTechnology] created the discussion

"Configuring zenping ping retries / failures?"

To view the discussion, visit: http://community.zenoss.org/message/62805#62805

--------------------------------------------------------------
The default ZenOSS behavior with pings is to fail a device (mark it as 'down') on 2 consecutive ping failures. I would like to increase this to 3.

I found the setting 'ping retries' in the collector settings (Advanced > Collectors > Localhost > Edit) which was set to 2. I couldn't find any documentation on this but it seems logical that this variable controls the number of ping attempts before marking a devices as 'down'. Upon altering this though, the debugger in zenping still fails devices after 2 consecutive ping failures. I tried restarting zenping to get it to reload the config, without any change. I also tried restarting ZenOSS completely, with no change. 2 possibilities I see:

1) I am adjusting the wrong variable (quite possible given the lack of documentation)
2) I am adjusting the variable incorrectly (I checked for hard-coded values in the devices but I can't find any ping related entries other than 'ignore ping: T/F'

Can anyone shed any light on how to adjust the zenping failure threshold?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/62805#62805]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
dpetzel
2011-11-24 03:09:51 UTC
Permalink
dpetzel [http://community.zenoss.org/people/dpetzel] created the discussion

"Re: Configuring zenping ping retries / failures?"

To view the discussion, visit: http://community.zenoss.org/message/62807#62807

--------------------------------------------------------------
I don't have a direct answer to your question, but I did uncover some interesting stuff while looking into this. It would appear that "*Ping Tries*" is not exactly what it sounds like. I would have guessed it did exactly what you expected, but after a little debugging it turns out it should read more like "How many ICMP Packets to send before considering it a failed ping". IE Send 2 icmps before marking the first failure, send 2 more icmps before marking the second failure. I adjusted my value to 5 and as you can see from the debug log below it sends 5 icmps before logging the timeout and failure.

Note the last number of this line *+zen.ZenPing: Failed 192.168.1.1 2+*. The trailing 2 is the number of failures. This is reflected in the UI on the events page in the count column.

So... it stands to reason, WHAT IS PROPERTY?... Near as I can figure from looking at the source: http://dev.zenoss.com/trac/browser/trunk/Products/ZenStatus/zenping.py http://dev.zenoss.com/trac/browser/trunk/Products/ZenStatus/zenping.py its not configurable. (I'm not a great developer so I can absolutely be misreading this so take it with a grain of salt). It seems each failure is stored in *pj.status* (this number always matches the count column in the UI.

It appears that if the count is exactly 1, it will retry without logging an event, however it is NOT 1, you get an event. I don't really see any configuration option around this.

If you take anything away from this response I think this will be the most useful (albet not as detailed) portion:
*+Now all that said aside.... I think you might be able to accomplish what you want using an event transform. Basically you could have an event transform that checks the failure count and if its less than 3, drop the event... Its not exactly what you asked, but might accomplish the goal.+*


The key information here appears to be in the doPingFailed method starting on line 342. Near as I can tell on line 348 it checks if the error count is 1 if it is retry. If it is not 1, it fires the event (after a few other tasks) on line 366.

Here is my log output showing the 5 icmps as a result of changing *Ping Tries*. I've highlighted each check in a different color in hopes its easier to read. You can basically see 3 batches of 5 icmps. Two of which sent events...*
*
2011-11-23 21:38:26,111 DEBUG zen.ZenPing: starting 192.168.1.1
2011-11-23 21:38:26,111 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:27,613 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:29,114 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:30,615 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:32,118 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:33,619 DEBUG zen.Ping: pj timeout for 192.168.1.1
2011-11-23 21:38:33,619 DEBUG zen.Ping: pj fail for 192.168.1.1
2011-11-23 21:38:33,619 DEBUG zen.ZenPing: Failed 192.168.1.1 1
2011-11-23 21:38:33,619 DEBUG zen.ZenPing: first failure '192.168.1.1'
2011-11-23 21:38:33,620 DEBUG zen.ZenPing: starting 192.168.1.1
2011-11-23 21:38:33,620 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:35,120 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:36,621 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:38,123 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:39,624 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:38:41,125 DEBUG zen.Ping: pj timeout for 192.168.1.1
2011-11-23 21:38:41,126 DEBUG zen.Ping: pj fail for 192.168.1.1
2011-11-23 21:38:41,126 DEBUG zen.ZenPing: Failed 192.168.1.1 2
2011-11-23 21:38:41,126 WARNING zen.ZenPing: ip 192.168.1.1 is down
2011-11-23 21:38:41,126 DEBUG zen.ZenPing: Queueing event {'severity': 5, 'component': '', 'agent': 'zenping', 'summary': 'ip 192.168.1.1 is down', 'manager': 'zenoss-dev.local', 'eventGroup': 'Ping', 'eventState': 0, 'device': '192.168.1.1', 'eventClass': '/Status/Ping', 'ipAddress': '192.168.1.1', 'monitor': 'localhost'}
2011-11-23 21:39:26,112 DEBUG zen.ZenPing: starting 192.168.1.1
2011-11-23 21:39:26,112 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:39:27,615 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:39:29,116 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:39:30,617 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:39:32,118 DEBUG zen.Ping: send icmp to '192.168.1.1'
2011-11-23 21:39:33,619 DEBUG zen.Ping: pj timeout for 192.168.1.1
2011-11-23 21:39:33,619 DEBUG zen.Ping: pj fail for 192.168.1.1
2011-11-23 21:39:33,619 DEBUG zen.ZenPing: Failed 192.168.1.1 3
2011-11-23 21:39:33,619 WARNING zen.ZenPing: ip 192.168.1.1 is down
2011-11-23 21:39:33,619 DEBUG zen.ZenPing: Queueing event {'severity': 5, 'component': '', 'agent': 'zenping', 'summary': 'ip 192.168.1.1 is down', 'manager': 'zenoss-dev.local', 'eventGroup': 'Ping', 'eventState': 0, 'device': '192.168.1.1', 'eventClass': '/Status/Ping', 'ipAddress': '192.168.1.1', 'monitor': 'localhost'}
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/62807#62807]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
James Pearce
2011-11-25 13:43:31 UTC
Permalink
James Pearce [http://community.zenoss.org/people/SirTechnology] created the discussion

"Re: Configuring zenping ping retries / failures?"

To view the discussion, visit: http://community.zenoss.org/message/62812#62812

--------------------------------------------------------------
Good debugging, that really helped. So we now have the following documentation then:
    Ping Tries: How many simultaneous ICMP ping requests to send to a host in each cycle
I did some more debugging after yours and also found:
    Max Ping Failures: How many ping failures before ceasing to send pings anymore

In terms of your 'answer' to do an event transform for this, I wasn't really keen on that. Instead, I took your research and hacked the zenping code to do what I think it SHOULD do. I now have, in my code, the following definition for Ping Tries:
How many simultaneous ICMP ping requests to send to a host in each cycle. If this number of cycles passes without a successful ping (ICMP reply), the host will be marked as 'Down'.
            self.log.debug("%s failure '%s' of maximum allowed %s", pj.hostname, pj.status, self.pingTries)
I can now change the number of ping tries before marking as host as down by changing the ping tries option in the collector. This alteration also gives me genuinely useful debugging information in telling me what it things the maximum ping tries is before failing. Note that on the final ping failure, this line will not print, as that is handed in the next conditional block. I didn't alter it there because the less code I have to alter, the better.

Thanks for the help!
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/62812#62812]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...