Discussion:
localhost zenperfsnmp heartbeat failure
jshank
2011-11-08 20:53:27 UTC
Permalink
jshank [http://community.zenoss.org/people/jshank] created the discussion

"Re: localhost zenperfsnmp heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/62519#62519

--------------------------------------------------------------
After months of troubleshooting zenperfsnmp errors I finally got fed up and found a workaround, I added the following to my crontab to silence the heartbeat errors that were driving me and my team mad.

* * * * * mysql -uzenoss -ppassword -Devents -e "update heartbeat set lasttime = now() where component = 'zenperfsnmp';"

I worked through countless device timeouts, performance tuning, added more memory, cleared heartbeats, extended cycle times and even moved 50% of my devices into an ignored snmp state. I believe it all comes down to an issue of scalability. I have the maxqueuelen set to 20000 and still get "56 devices still queued at end of cycle and did not get queried.". We are monitoring over 360 network devices which have dozens of ethernet interfaces each.

Here is a good cycle:
2011-10-27 07:55:09,443 INFO zen.zenperfsnmp: ******** Cycle completed ********
2011-10-27 07:55:09,443 INFO zen.zenperfsnmp: Sent 95401 OID requests
2011-10-27 07:55:09,443 INFO zen.zenperfsnmp: Queried 143 devices
2011-10-27 07:55:09,443 INFO zen.zenperfsnmp:   0 in queue still unqueried
2011-10-27 07:55:09,443 INFO zen.zenperfsnmp:   Successes: 143  Failures: 0  Not reporting: 0
2011-10-27 07:55:09,443 INFO zen.zenperfsnmp: Waited on 0 queries from previous cycles.
2011-10-27 07:55:09,443 INFO zen.zenperfsnmp:   Successes: 0  Failures: 0  Not reporting: 0
2011-10-27 07:55:09,443 INFO zen.zenperfsnmp: Cycle lasted 131.72 seconds
2011-10-27 07:55:09,444 INFO zen.zenperfsnmp: *********************************

Sometimes devices don't respond though and I can't figure out how to get rid of the heartbeat events otherwise. I even modified the alerting rules to ignore heartbeats and they still come through. My team was to the point of creating email rules to send alerts from zenoss directly to trash.

| Zenoss (http://www.zenoss.com/) | Zenoss 3.2.1 |
| OS (http://www.tldp.org/) | Linux (x86_64) 2.6.18 (Linux its-netmon.x.org 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64) |
| Zope (http://www.zope.org/) | Zope 2.12.1 |
| Python (http://www.python.org/) | Python 2.6.2 |
| Database (http://www.mysql.com/) | MySQL 5.0.77 (Ver 5.0.77) |
| RRD (http://oss.oetiker.ch/rrdtool) | RRDtool 1.3.9 |
| Twisted (http://twistedmatrix.com/trac) | Twisted 8.1.0 |
| NetSnmp (http://net-snmp.sourceforge.net/) | NetSnmp 5.3.2 |
| PyNetSnmp (http://www.zenoss.com/) | PyNetSnmp 0.29.13 |
| WMI (http://www.zenoss.com/) | Wmi 1.3.13 |
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/62519#62519]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
mwcotton
2011-11-09 01:22:13 UTC
Permalink
mwcotton [http://community.zenoss.org/people/mwcotton] created the discussion

"Re: localhost zenperfsnmp heartbeat failure"

To view the discussion, visit: http://community.zenoss.org/message/62531#62531

--------------------------------------------------------------
I once experianced a memory shortage, zenperfsnmp memory usage grew with time, when the box started to swap out processes, my snmp failures would go up and I would also see the "waiting on " message.
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/62531#62531]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...