Discussion:
How to Alert on NaNs..
joanypony
2013-03-08 15:58:24 UTC
Permalink
joanypony [http://community.zenoss.org/people/joanypony] created the discussion

"How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72327#72327

--------------------------------------------------------------
Hi,

Does anyone know if it's possible to get Zenoss to alert when there are NaNs in a graph? I want them to cause an alert so I can detect wonky zenpacks before too much time passes. We have a very large enviroment with 10 zenoss servers and it's easy to miss a broken graph..

Thanks!

Joan
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72327#72327]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Shane Scott
2013-03-08 21:07:49 UTC
Permalink
Shane Scott [http://community.zenoss.org/people/hackman238] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72320#72320

--------------------------------------------------------------
Joan:

Interesting question. It's something so obvious and I don't think it's ever been addressed. Off the top of my head I think a custom threshold plugin would need to be written for this. Let me think about it a bit.

Best,
--Shane Scott (Hackman238)
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72320#72320]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Andrew Kirch
2013-03-08 21:17:33 UTC
Permalink
Andrew Kirch [http://community.zenoss.org/people/akirch] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72330#72330

--------------------------------------------------------------
Shane,

I haven't given too much thought about this, but these graphs are on the local system, why not just set up a ZenCommand template for 'rrdfetch', and grep for NaN?
http://oss.oetiker.ch/rrdtool/doc/rrdfetch.en.html http://oss.oetiker.ch/rrdtool/doc/rrdfetch.en.html
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72330#72330]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Shane Scott
2013-03-11 19:40:21 UTC
Permalink
Shane Scott [http://community.zenoss.org/people/hackman238] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72344#72344

--------------------------------------------------------------
Andrew:

This would work on a case basis, but it wouldn't scale for everything. It's a tricky problem, I haven't thought of a good solution yet short of updating how datasources work. :)

Best,

--Shane Scott (Hackman238)
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72344#72344]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Shane Scott
2013-03-15 19:38:25 UTC
Permalink
Shane Scott [http://community.zenoss.org/people/hackman238] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72426#72426

--------------------------------------------------------------
I've given this a lot of thought. I think the best solution would be a custom threshold, I think that's probably the only scalable solution.

--Shane Scott (Hackman238)
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72426#72426]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
joanypony
2013-04-09 12:52:40 UTC
Permalink
joanypony [http://community.zenoss.org/people/joanypony] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72750#72750

--------------------------------------------------------------
Hi Shane,
Thanks for that. Unfortunately, I can't see how to make a non-integer threshold anywhere.. Any ideas?

Regards,
Joan
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72750#72750]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Shane Scott
2013-04-09 14:02:03 UTC
Permalink
Shane Scott [http://community.zenoss.org/people/hackman238] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72761#72761

--------------------------------------------------------------
Joan:

A solution for this is currently in development.

Best,
--Shane Scott (Hackman238)
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72761#72761]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
TitoOrtega
2013-04-09 18:22:57 UTC
Permalink
TitoOrtega [http://community.zenoss.org/people/TitoOrtega] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72752#72752

--------------------------------------------------------------
Let me find the Zenpack for this...it's somewhere
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72752#72752]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
TitoOrtega
2013-04-09 19:58:15 UTC
Permalink
TitoOrtega [http://community.zenoss.org/people/TitoOrtega] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/72762#72762

--------------------------------------------------------------
PM me for the Zenpack used to detect NaNs in collection.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/72762#72762]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
joanypony
2013-05-30 11:13:39 UTC
Permalink
joanypony [http://community.zenoss.org/people/joanypony] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/73359#73359

--------------------------------------------------------------
Hi Tito,
Any update on this?

Regards,
Joan
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73359#73359]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Doug Syer
2013-05-30 14:38:57 UTC
Permalink
Doug Syer [http://community.zenoss.org/people/dsyer%40nwnit.com] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/73392#73392

--------------------------------------------------------------
The practical challenge is that just getting a few nans is normal and you will also get nans if the device goes unpingable.  (Polling stops) And the nans roll up so depending on how you are trying to get at the rrd data you may not see them.  I think you will go crazy trying to attack it from an rrd perspective unless you have a very loose sla to tackle.

I dont think its practical to attack the problem like that.  For each daemon there is/should be a status event showing that you either arent getting data for the device itself and for the datapoint.

For example with windows(i run enterprise but id assume core is the same) you get /status/wmi alerts for timeouts, failed logins, etc for all wmi stuff on the device.  You also get /status/wmi if you are trying to pull from a wmi datassource that doesnt exist so you can get what you want there.  I put count transforms on the devce level status/wmi alerts to filter out the noise plus a few other transform tricks.  In general i find that for most things like /status/wmi that once you get you a certain count of alert, the chances of it fixing itself reduced exponentially.  But the number of false positives is high unless you use a counting transform.

If you dont get snmp at all for a device you will get a status/snmp alert(or /status/ping). For individual data points on snmp you can see in the zenperfsnmp daemon which oids are being skipped.  There is alot of good stuff in the logs but really the daemon should be sending an event if something isnt polling and if it isnt its probably a bug or a problem with your system or your device.  As yiu probably know There are all kinds of monitoring protocol bugs and inconsistent implementations across vendor hardware and sofware so dont assume its zenoss unless you rule out the device /os/hw etc.

If you just want to know which data points arent populating you can do a dmd script to dump them all out.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73392#73392]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Shane Scott
2013-05-30 14:40:25 UTC
Permalink
Shane Scott [http://community.zenoss.org/people/hackman238] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/73407#73407

--------------------------------------------------------------
All,

Pack for this to be released next week.

--Shane Scott (Hackman238)
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73407#73407]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
joanypony
2013-05-30 15:23:40 UTC
Permalink
joanypony [http://community.zenoss.org/people/joanypony] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/73408#73408

--------------------------------------------------------------
Thank you Shane!

Regards,
Joan
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73408#73408]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
joanypony
2013-05-30 15:29:23 UTC
Permalink
joanypony [http://community.zenoss.org/people/joanypony] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/73409#73409

--------------------------------------------------------------
Thanks Doug,
The purpose isn't to catch every NaN, but to catch graphs that aren't populating at all, but that are not causing any alerts. It's more for housekeeping than for escalating. We have several thousand hosts and this would help to catch issues when they happen rather than waiting for human eyes to notice a graph has no data.

Thanks,
Joan
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73409#73409]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Shane Scott
2013-05-30 15:31:44 UTC
Permalink
Shane Scott [http://community.zenoss.org/people/hackman238] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/73410#73410

--------------------------------------------------------------
Joan,

That's exactly right.

Doug,

If you paged on NaNs then you'd get a ton of pages if a collector daemon went down. LOL

--Shane Scott (Hackman238)
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73410#73410]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Shane Scott
2013-06-17 11:02:15 UTC
Permalink
Shane Scott [http://community.zenoss.org/people/hackman238] created the discussion

"Re: How to Alert on NaNs.."

To view the discussion, visit: http://community.zenoss.org/message/73633#73633

--------------------------------------------------------------
All,

NaN Monitoring / Paging:

https://github.com/Hackman238/ZenPacks.community.NanThreshold https://github.com/Hackman238/ZenPacks.community.NanThreshold

I'll be building docs for it soon. The way it works is by adding a new data source to a template (nanMonitor datasource) then providing the datasource with a comma seperated list of datapoints to check (ex, 'sysUpTime, laLoadInt5,'). When a datapoint reads NaN a critical event is created with information on the datapoint in question. The solution is scalable by implementing a new daemon, zennanthresh, which batches the tests.

Best,
--Shane Scott (Hackman238)
http://shanewilliamscott.com http://shanewilliamscott.com
http://linkedin.com/in/shanewilliamscott http://linkedin.com/in/shanewilliamscott
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/73633#73633]

Start a new discussion in zenoss-users at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...