Discussion:
zenstatus and zenperfsnmp confusion
chitambira
2012-05-09 19:08:42 UTC
Permalink
chitambira [http://community.zenoss.org/people/chitambira] created the discussion

"zenstatus and zenperfsnmp confusion"

To view the discussion, visit: http://community.zenoss.org/message/66278#66278

--------------------------------------------------------------
I have a rather weird problem with a zenoss install

I changed the network config of my zenoss server
the config didnt not work out, so it left the server unreachable, and also not able to reach its monitored devices.
I corrected my config to a working one and I manually cleared the "network unreachable" events that had been generated.
I realised only a handful devices were monitored at this stage.
Most devices continue to be marked as "down" on their status page.
I restarted zenoss, but these devices are still not being monitored, no snmp graphs, rrds are not updated
I reboted this zenoss machine and still the problem persist.

zenhub run -v10  end is error:

2012-05-09 19:41:30,548 DEBUG zen.Plugins: Loading collector plugins from: /opt/zenoss/ZenPacks/ZenPacks.zenoss.ZenossVirtualHostMonitor-2.3.0-py2.4.egg/ZenPacks/zenoss/ZenossVirtualHostMonitor/modeler/plugins
Traceback (most recent call last):
  File "/opt/zenoss/Products/ZenHub/zenhub.py", line 613, in ?
    z = ZenHub()
  File "/opt/zenoss/Products/ZenHub/zenhub.py", line 269, in __init__
    reactor.listenTCP(self.options.pbport, pb.PBServerFactory(pt))
  File "/opt/zenoss/lib/python/twisted/internet/posixbase.py", line 328, in listenTCP
    p.startListening()
  File "/opt/zenoss/lib/python/twisted/internet/tcp.py", line 739, in startListening
    raise CannotListenError, (self.interface, self.port, le)
twisted.internet.error.CannotListenError: Couldn't listen on any:8789: (98, 'Address already in use').


in zenhub.log evrything seems ok, except that I see:

INFO zen.ZenHub: Worker reports 2012-03-09 10:45:34,277 WARNING zen.ZenStatus: device '+device_name+' network '192.168.1.0/24' not in topology



if i run zenperfsnmp run -v10 -d +device_name+

I get  the following:

.....
.....
2012-03-09 10:02:53,874 DEBUG zen.zenperfsnmp: Finished fetching configs for 1 devices
2012-03-09 10:02:53,874 DEBUG zen.zenperfsnmp: Gathering performance data for device1.mydomain.com
2012-03-09 10:02:53,874 INFO zen.zenperfsnmp: Configured 1 of 1 devices
2012-03-09 10:02:53,874 DEBUG zen.zenperfsnmp: Getting device ping issues
2012-03-09 10:02:55,003 DEBUG zen.thresholds: Checking value 0 on Daemons/localhost/zenperfsnmp_eventQueueLength
2012-03-09 10:02:55,004 DEBUG zen.MinMaxCheck: Checking zenperfsnmp_eventQueueLength 0 against min None and max 1000
2012-03-09 10:02:55,004 DEBUG zen.zenperfsnmp: Queueing event {'manager': 'zenoss.domain.com', 'eventKey': 'high event queue', 'device': 'localhost', 'eventClass': '/Perf', 'summary': 'threshold of high event queue restored: current value 0.00', 'component': '', 'monitor': 'localhost', 'agent': 'zenperfsnmp', 'severity': 0}
2012-03-09 10:02:55,004 DEBUG zen.zenperfsnmp: Total of 1 queued events
2012-03-09 10:02:56,086 DEBUG zen.zenperfsnmp: unresponsive devices: [['server0053', 2, 64946], ['server0061, 1, 1], ......
...
...

......and so on listing all servers not working properly


Any ideas
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/66278#66278]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
jcurry
2012-05-09 19:21:11 UTC
Permalink
jcurry [http://community.zenoss.org/people/jcurry] created the discussion

"Re: zenstatus and zenperfsnmp confusion"

To view the discussion, visit: http://community.zenoss.org/message/66288#66288

--------------------------------------------------------------
So can your Zenoss server actually ping the devices?  Both from command line and with a few sample tests from the Command menu?

You put your Zenoss server back exactly as it was???  Anything else changed - DNS, firewalls, network topology?

Have you remodeled your Zenoss server and is it in a consistent stae, especially with respect to its network cards?

Do you have any heartbeat events?

Are all the daemons running (zenoss status) ??

Cheers,
Jane
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/66288#66288]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
chitambira
2012-05-09 19:49:51 UTC
Permalink
chitambira [http://community.zenoss.org/people/chitambira] created the discussion

"Re: zenstatus and zenperfsnmp confusion"

To view the discussion, visit: http://community.zenoss.org/message/66298#66298

--------------------------------------------------------------
Yes i can ping all the devices, I can also snmpwalk them
The server is as it was, no dns/firewall issues and the network topology hasnt changed.
I tried remodelled the affected servers with no luck
All daems are running and erros I can see are the ones I have posted above.
Its wierd because some devices are monitored ok, but thats only about 5% of the total

zenperfsnmp when the server was ok: (showing 391 devices)

2012-03-07 05:13:59,575 INFO zen.zenperfsnmp: ******** Cycle completed ********
2012-03-07 05:13:59,575 INFO zen.zenperfsnmp: Sent 38309 OID requests
2012-03-07 05:13:59,576 INFO zen.zenperfsnmp: Queried 391 devices
2012-03-07 05:13:59,576 INFO zen.zenperfsnmp:   0 in queue still unqueried
2012-03-07 05:13:59,576 INFO zen.zenperfsnmp:   Successes: 384  Failures: 7  Not reporting: 0
2012-03-07 05:13:59,576 INFO zen.zenperfsnmp: Waited on 0 queries from previous cycles.
2012-03-07 05:13:59,576 INFO zen.zenperfsnmp:   Successes: 0  Failures: 0  Not reporting: 0
2012-03-07 05:13:59,576 INFO zen.zenperfsnmp: Cycle lasted 166.34 seconds
2012-03-07 05:13:59,576 INFO zen.zenperfsnmp: *********************************



Now showing 57 devices only

2012-03-09 19:03:07,285 INFO zen.zenperfsnmp: ******** Cycle completed ********
2012-03-09 19:03:07,285 INFO zen.zenperfsnmp: Sent 5920 OID requests
2012-03-09 19:03:07,286 INFO zen.zenperfsnmp: Queried 57 devices
2012-03-09 19:03:07,286 INFO zen.zenperfsnmp:   0 in queue still unqueried
2012-03-09 19:03:07,286 INFO zen.zenperfsnmp:   Successes: 54  Failures: 3  Not reporting: 0
2012-03-09 19:03:07,286 INFO zen.zenperfsnmp: Waited on 0 queries from previous cycles.
2012-03-09 19:03:07,286 INFO zen.zenperfsnmp:   Successes: 0  Failures: 0  Not reporting: 0
2012-03-09 19:03:07,286 INFO zen.zenperfsnmp: Cycle lasted 12.01 seconds
2012-03-09 19:03:07,286 INFO zen.zenperfsnmp: *********************************
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/66298#66298]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
chitambira
2012-05-09 20:03:12 UTC
Permalink
chitambira [http://community.zenoss.org/people/chitambira] created the discussion

"Re: zenstatus and zenperfsnmp confusion"

To view the discussion, visit: http://community.zenoss.org/message/66300#66300

--------------------------------------------------------------
Also zenping run -v10  is showing error;

Unhandled error in Deferred:
Traceback (most recent call last):
  File "/opt/zenoss/Products/ZenUtils/ZenDaemon.py", line 232, in sigTerm
    if callable(stop): stop()
  File "/opt/zenoss/Products/ZenHub/PBDaemon.py", line 298, in stop
    drive(self.pushEvents).addBoth(stopNow)
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 214, in addBoth
    callbackKeywords=kw, errbackKeywords=kw)
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 186, in addCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/opt/zenoss/lib/python/twisted/internet/defer.py", line 328, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
  File "/opt/zenoss/Products/ZenHub/PBDaemon.py", line 288, in stopNow
    reactor.stop()
  File "/opt/zenoss/lib/python/twisted/internet/base.py", line 494, in stop
    raise error.ReactorNotRunning(
twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running.
2012-03-09 21:00:13,889 DEBUG zen.ZenPing: Sent a 'stop' event
2012-03-09 21:00:13,889 INFO zen.ZenPing: Daemon ZenPing shutting down
2012-03-09 21:00:13,889 DEBUG zen.ZenPing: Removing service EventService
2012-03-09 21:00:13,889 DEBUG zen.ZenPing: Removing service PingConfig
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/66300#66300]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Chet Luther
2012-05-09 21:20:18 UTC
Permalink
Chet Luther [http://community.zenoss.org/people/cluther] created the discussion

"Re: zenstatus and zenperfsnmp confusion"

To view the discussion, visit: http://community.zenoss.org/message/66289#66289

--------------------------------------------------------------
The "Couldn't listen on any:8789" error you get when running zenhub in the foreground is normal. The zenhub daemon binds two ports: 8789 and 8081. So you can't run two copies a the same time. If you want to run zenhub in the foreground you must first stop the daemon. This is a long way of saying that I don't think the error is related to your problem.

Don't worry about the "not in topology" warnings. They're completely benign and not related to your problem.

The reason zenperfsnmp is showing so many "unresponsive devices" is almost certainly because those devices have active critical /Status/Ping events. The zenperfsnmp daemon won't attempt to collect from devices that Zenoss thinks are ping unreachable. Can you confirm or deny this by looking at your event console?

Depending on what version of Zenoss you're running, it may be normal to see that "Can't stop reactor that isn't running." error when running zenping in the foreground without the --cycle parameter. It'll do one pass then terminate in that ugly way. I don't think it's related to your problem.

All of that being said, the "unexpected pkt" could potentially be related to the problem since the root of the problem seems to be that Zenoss thinks devices are ping unreachable when they're not.

Would you try clearing all /Status/Ping events from your event console and restarting zenping?
--------------------------------------------------------------

Reply to this message by replying to this email -or- go to the discussion on Zenoss Community
[http://community.zenoss.org/message/66289#66289]

Start a new discussion in zenoss-users by email
[discussions-community-forums-zenoss--***@community.zenoss.org] -or- at Zenoss Community
[http://community.zenoss.org/choose-container!input.jspa?contentType=1&containerType=14&container=2003]
Loading...